linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Chuck Lever	806d65b617	NFSD: Add cb_lost tracepoint Provide more clarity about when the callback channel is in trouble. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	e8f80c5545	NFSD: Add tracepoints for EXCHANGEID edge cases Some of the most common cases are traced. Enough infrastructure is now in place that more can be added later, as needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	237f91c85a	NFSD: Add tracepoints for SETCLIENTID edge cases Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	2958d2ee71	NFSD: Add a couple more nfsd_clid_expired call sites Improve observation of NFSv4 lease expiry. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	c41a9b7a90	NFSD: Add nfsd_clid_destroyed tracepoint Record client-requested termination of client IDs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	cee8aa0742	NFSD: Add nfsd_clid_reclaim_complete tracepoint Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	7e3b32ace6	NFSD: Add nfsd_clid_confirmed tracepoint This replaces a dprintk call site in order to get greater visibility on when client IDs are confirmed or re-used. Simple example: nfsd-995 [000] 126.622975: nfsd_compound: xid=0x3a34e2b1 opcnt=1 nfsd-995 [000] 126.623005: nfsd_cb_args: addr=192.168.2.51:45901 client 60958e3b:9213ef0e prog=1073741824 ident=1 nfsd-995 [000] 126.623007: nfsd_compound_status: op=1/1 OP_SETCLIENTID status=0 nfsd-996 [001] 126.623142: nfsd_compound: xid=0x3b34e2b1 opcnt=1 >>>> nfsd-996 [001] 126.623146: nfsd_clid_confirmed: client 60958e3b:9213ef0e nfsd-996 [001] 126.623148: nfsd_cb_probe: addr=192.168.2.51:45901 client 60958e3b:9213ef0e state=UNKNOWN nfsd-996 [001] 126.623154: nfsd_compound_status: op=1/1 OP_SETCLIENTID_CONFIRM status=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	744ea54c86	NFSD: Add nfsd_clid_verf_mismatch tracepoint Record when a client presents a different boot verifier than the one we know about. Typically this is a sign the client has rebooted, but sometimes it signals a conflicting client ID, which the client's administrator will need to address. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	27787733ef	NFSD: Add nfsd_clid_cred_mismatch tracepoint Record when a client tries to establish a lease record but uses an unexpected credential. This is often a sign of a configuration problem. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:02 -04:00
Chuck Lever	a948b1142c	NFSD: Fix TP_printk() format specifier in nfsd_clid_class Since commit `9a6944fee6` ("tracing: Add a verifier to check string pointers for trace events"), which was merged in v5.13-rc1, TP_printk() no longer tacitly supports the "%.*s" format specifier. These are low value tracepoints, so just remove them. Reported-by: David Wysochanski <dwysocha@redhat.com> Fixes: `dd5e3fbc1f` ("NFSD: Add tracepoints to the NFSD state management code") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:02 -04:00
Gustavo A. R. Silva	76c50eb70d	nfsd: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding a couple of break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-20 16:55:07 -04:00
J. Bruce Fields	aba2072f45	nfsd: grant read delegations to clients holding writes It's OK to grant a read delegation to a client that holds a write, as long as it's the only client holding the write. We originally tried to do this in commit `94415b06eb` ("nfsd4: a client's own opens needn't prevent delegations"), which had to be reverted in commit `6ee65a7730` ("Revert "nfsd4: a client's own opens needn't prevent delegations""). Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-19 16:41:36 -04:00
J. Bruce Fields	ebd9d2c2f5	nfsd: reshuffle some code No change in behavior, I'm just moving some code around to avoid forward references in a following patch. (To do someday: figure out how to split up nfs4state.c. It's big and disorganized.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-19 16:41:36 -04:00
J. Bruce Fields	a0ce48375a	nfsd: track filehandle aliasing in nfs4_files It's unusual but possible for multiple filehandles to point to the same file. In that case, we may end up with multiple nfs4_files referencing the same inode. For delegation purposes it will turn out to be useful to flag those cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-19 16:41:36 -04:00
J. Bruce Fields	f9b60e2209	nfsd: hash nfs4_files by inode number The nfs4_file structure is per-filehandle, not per-inode, because the spec requires open and other state to be per filehandle. But it will turn out to be convenient for nfs4_files associated with the same inode to be hashed to the same bucket, so let's hash on the inode instead of the filehandle. Filehandle aliasing is rare, so that shouldn't have much performance impact. (If you have a ton of exported filesystems, though, and all of them have a root with inode number 2, could that get you an overlong hash chain? Perhaps this (and the v4 open file cache) should be hashed on the inode pointer instead.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-19 16:41:33 -04:00
J. Bruce Fields	217fd6f625	nfsd: ensure new clients break delegations If nfsd already has an open file that it plans to use for IO from another, it may not need to do another vfs open, but it still may need to break any delegations in case the existing opens are for another client. Symptoms are that we may incorrectly fail to break a delegation on a write open from a different client, when the delegation-holding client already has a write open. Fixes: `28df3d1539` ("nfsd: clients don't need to break their own delegations") Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-16 15:08:37 -04:00
Jiapeng Chong	363f8dd5ee	nfsd: remove unused function Fix the following clang warning: fs/nfsd/nfs4state.c:6276:1: warning: unused function 'end_offset' [-Wunused-function]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-04-15 09:59:51 -04:00
NeilBrown	472d155a06	nfsd: report client confirmation status in "info" file mountd can now monitor clients appearing and disappearing in /proc/fs/nfsd/clients, and will log these events, in liu of the logging of mount/unmount events for NFSv3. Currently it cannot distinguish between unconfirmed clients (which might be transient and totally uninteresting) and confirmed clients. So add a "status: " line which reports either "confirmed" or "unconfirmed", and use fsnotify to report that the info file has been modified. This requires a bit of infrastructure to keep the dentry for the "info" file. There is no need to take a counted reference as the dentry must remain around until the client is removed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:04 -04:00
Trond Myklebust	c6c7f2a84d	nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted In order to ensure that knfsd threads don't linger once the nfsd pseudofs is unmounted (e.g. when the container is killed) we let nfsd_umount() shut down those threads and wait for them to exit. This also should ensure that we don't need to do a kernel mount of the pseudofs, since the thread lifetime is now limited by the lifetime of the filesystem. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:03 -04:00
J. Bruce Fields	7f7e7a4006	nfsd: helper for laundromat expiry calculations We do this same logic repeatedly, and it's easy to get the sense of the comparison wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:03 -04:00
Chuck Lever	bddfdbcddb	NFSD: Extract the svcxdr_init_encode() helper NFSD initializes an encode xdr_stream only after the RPC layer has already inserted the RPC Reply header. Thus it behaves differently than xdr_init_encode does, which assumes the passed-in xdr_buf is entirely devoid of content. nfs4proc.c has this server-side stream initialization helper, but it is visible only to the NFSv4 code. Move this helper to a place that can be accessed by NFSv2 and NFSv3 server XDR functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:51 -04:00
J. Bruce Fields	6ee65a7730	Revert "nfsd4: a client's own opens needn't prevent delegations" This reverts commit `94415b06eb`. That commit claimed to allow a client to get a read delegation when it was the only writer. Actually it allowed a client to get a read delegation when any client has a write open! The main problem is that it's depending on nfs4_clnt_odstate structures that are actually only maintained for pnfs exports. This causes clients to miss writes performed by other clients, even when there have been intervening closes and opens, violating close-to-open cache consistency. We can do this a different way, but first we should just revert this. I've added pynfs 4.1 test DELEG19 to test for this, as I should have done originally! Cc: stable@vger.kernel.org Reported-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-09 10:37:34 -05:00
J. Bruce Fields	4aa5e00203	Revert "nfsd4: remove check_conflicting_opens warning" This reverts commit `50747dd5e4` "nfsd4: remove check_conflicting_opens warning", as a prerequisite for reverting `94415b06eb`, which has a serious bug. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-09 10:37:34 -05:00
J. Bruce Fields	bfdd89f232	nfsd: don't abort copies early The typical result of the backwards comparison here is that the source server in a server-to-server copy will return BAD_STATEID within a few seconds of the copy starting, instead of giving the copy a full lease period, so the copy_file_range() call will end up unnecessarily returning a short read. Fixes: `624322f1ad` "NFSD add COPY_NOTIFY operation" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-06 16:41:48 -05:00
J. Bruce Fields	ec59659b49	nfsd: cstate->session->se_client -> cstate->clp I'm not sure why we're writing this out the hard way in so many places. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	1722b04624	nfsd: simplify nfsd4_check_open_reclaim The set_client() was already taken care of by process_open1(). The comments here are mostly redundant with the code. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	f71475ba8c	nfsd: remove unused set_client argument Every caller is setting this argument to false, so we don't need it. Also cut this comment a bit and remove an unnecessary warning. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	47fdb22dac	nfsd: find_cpntf_state cleanup I think this unusual use of struct compound_state could cause confusion. It's not that much more complicated just to open-code this stateid lookup. The only change in behavior should be a different error return in the case the copy is using a source stateid that is a revoked delegation, but I doubt that matters. Signed-off-by: J. Bruce Fields <bfields@redhat.com> [ cel: squashed in fix reported by Coverity ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	7950b5316e	nfsd: refactor set_client This'll be useful elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	460d27091a	nfsd: rename lookup_clientid->set_client I think this is a better name, and I'm going to reuse elsewhere the code that does the lookup itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	b4587eb2cf	nfsd: simplify nfsd_renew You can take the single-exit thing too far, I think. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	a9d53a75cf	nfsd: simplify process_lock Similarly, this STALE_CLIENTID check is already handled by: nfs4_preprocess_confirmed_seqid_op()-> nfs4_preprocess_seqid_op()-> nfsd4_lookup_stateid()-> set_client()-> STALE_CLIENTID() (This may cause it to return a different error in some cases where there are multiple things wrong; pynfs test SEQ10 regressed on this commit because of that, but I think that's the test's fault, and I've fixed it separately.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	33311873ad	nfsd4: simplify process_lookup1 This STALE_CLIENTID check is redundant with the one in lookup_clientid(). There's a difference in behavior is in case of memory allocation failure, which I think isn't a big deal. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
Dai Ngo	ca9364dde5	NFSD: Fix 5 seconds delay when doing inter server copy Since commit `b4868b44c5` ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix by modifying nfs4_init_cp_state to return the stateid with seqid 1 instead of 0. This is also to conform with section 4.8 of RFC 7862. Here is the relevant paragraph from section 4.8 of RFC 7862: A copy offload stateid's seqid MUST NOT be zero. In the context of a copy offload operation, it is inappropriate to indicate "the most recent copy offload operation" using a stateid with a seqid of zero (see Section 8.2.2 of [RFC5661]). It is inappropriate because the stateid refers to internal state in the server and there may be several asynchronous COPY operations being performed in parallel on the same file by the server. Therefore, a copy offload stateid with a seqid of zero MUST be considered invalid. Fixes: `ce0887ac96` ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:38:34 -05:00
Chuck Lever	523ec6ed6f	NFSD: Add a helper to decode state_protect4_a Refactor for clarity. Also, remove a stale comment. Commit `ed94164398` ("nfsd: implement machine credential support for some operations") added support for SP4_MACH_CRED, so state_protect_a is no longer completely ignored. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:41 -05:00
Tom Rix	c1488428a8	nfsd: remove unneeded break Because every path through nfs4_find_file()'s switch does an explicit return, the break is not needed. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-10-16 15:15:04 -04:00
J. Bruce Fields	13956160fc	nfsd: rq_lease_breaker cleanup Since only the v4 code cares about it, maybe it's better to leave rq_lease_breaker out of the common dispatch code? Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:02:03 -04:00
J. Bruce Fields	50747dd5e4	nfsd4: remove check_conflicting_opens warning There are actually rare races where this is possible (e.g. if a new open intervenes between the read of i_writecount and the fi_fds). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:02:02 -04:00
Hou Tao	3caf91757c	nfsd: rename delegation related tracepoints to make them less confusing Now when a read delegation is given, two delegation related traces will be printed: nfsd_deleg_open: client 5f45b854:e6058001 stateid 00000030:00000001 nfsd_deleg_none: client 5f45b854:e6058001 stateid 0000002f:00000001 Although the intention is to let developers know two stateid are returned, the traces are confusing about whether or not a read delegation is handled out. So renaming trace_nfsd_deleg_none() to trace_nfsd_open() and trace_nfsd_deleg_open() to trace_nfsd_deleg_read() to make the intension clearer. The patched traces will be: nfsd_deleg_read: client 5f48a967:b55b21cd stateid 00000003:00000001 nfsd_open: client 5f48a967:b55b21cd stateid 00000002:00000001 Suggested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:01:27 -04:00
J. Bruce Fields	12ed22f3c3	nfsd: give up callbacks on revoked delegations The delegation is no longer returnable, so I don't think there's much point retrying the recall. (I think it's worth asking why we even need separate CLOSED_DELEG and REVOKED_DELEG states. But treating them the same would currently cause nfsd4_free_stateid to call list_del_init(&dp->dl_recall_lru) on a delegation that the laundromat had unhashed but not revoked, incorrectly removing it from the laundromat's reaplist or a client's dl_recall_lru.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:01:26 -04:00
J. Bruce Fields	e56dc9e294	nfsd: remove fault injection code It was an interesting idea but nobody seems to be using it, it's buggy at this point, and nfs4state.c is already complicated enough without it. The new nfsd/clients/ code provides some of the same functionality, and could probably do more if desired. This feature has been deprecated since `9d60d93198` ("Deprecate nfsd fault injection"). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:01:26 -04:00
Linus Torvalds	2ac69819ba	Fixes: - Eliminate an oops introduced in v5.8 - Remove a duplicate #include added by nfsd-5.9 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl8/4gUACgkQM2qzM29m f5cxKBAAp7UjD3YNlLhSviowuOfYpWNjyk1cEQ6hWFA9oVeSfZfU3/axW8uYTHPm QZ6ams6gjorP4CXwVkFGFHpTRg4CfVN9g5lKxrcjvELqNllWBhE9UupRgbX3+XBE qselRI22M64o2tfDE+tPrDB8w8PwHmqrHwRydXfgiFlHk7nt6xD7NitaJBnPlYPM 21OBl6mrjLwtRwvX9n5wpy/+bfOTHbGV5VNez0fAfKXggNmRdt/UNROC4doLg4M0 2khAV3vgx49FRpCPL6SZPcBYd6zfrYOcj3iSf6wpxS5nTb2MifXFqz1MvKRTj863 gzvSmh7vuf0+EaOAXuLjCD9dURZpuG/k0vJGijOgaSt0+vNQHjIgZ1XRFHQtQCp4 zPJ/Qyk5k7uajHzcBFuNPUFAkOovH6LRoOzpqGvXhwaxrMPWti0LyyVKidVJrt/d EtOKQR+HCN0zAwjadXSPK8Nw1PjMzplkF7TaxXvF2LdO/4vpEZZNoz+if59gRcFY 65h2++7y+0MCX8l83uUZfs+jQU2aR1w5a0DjVzi86xzJtyhr6gEyTj3Z6L9HIHwW dnSpUmoiaCoN0eqxvEBjw0VEPqB806CuiUER0Jdd8k7mPk04fsQ/9+UsYyliSLEG N56LFSWLXLHsySa2WkuB/ghzT2/Q0vFoZKXW0KNSD7W4C5XMxi4= =czB3 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfs server fixes from Chuck Lever: - Eliminate an oops introduced in v5.8 - Remove a duplicate #include added by nfsd-5.9 * tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: SUNRPC: remove duplicate include nfsd: fix oops on mixed NFSv4/NFSv3 client access	2020-08-25 18:01:36 -07:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
J. Bruce Fields	34b09af4f5	nfsd: fix oops on mixed NFSv4/NFSv3 client access If an NFSv2/v3 client breaks an NFSv4 client's delegation, it will hit a NULL dereference in nfsd_breaker_owns_lease(). Easily reproduceable with for example mount -overs=4.2 server:/export /mnt/ sleep 1h </mnt/file & mount -overs=3 server:/export /mnt2/ touch /mnt2/file Reported-by: Robert Dinse <nanook@eskimo.com> Fixes: `28df3d1539` ("nfsd: clients don't need to break their own delegations") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208807 Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-08-16 16:51:18 -04:00
Linus Torvalds	7a6b60441f	Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl8oBt0ACgkQM2qzM29m f5dTFQ/9H72E6gr1onsia0/Py0CO8F9qzLgmUBl1vVYAh2/vPqUL1ypxrC5OYrAy TOqESTsJvmGluCFc/77XUTD7NvJY3znIWim49okwDiyee4Y14ZfRhhCxyyA6Z94E FjJQb5TbF1Mti4X3dN8Gn7O1Y/BfTjDAAXnXGlTA1xoLcxM5idWIj+G8x0bPmeDb 2fTbgsoETu6MpS2/L6mraXVh3d5ESOJH+73YvpBl0AhYPzlNASJZMLtHtd+A/JbO IPkMP/7UA5DuJtWGeuQ4I4D5bQNpNWMfN6zhwtih4IV5bkRC7vyAOLG1R7w9+Ufq 58cxPiorMcsg1cHnXG0Z6WVtbMEdWTP/FzmJdE5RC7DEJhmmSUG/R0OmgDcsDZET GovPARho01yp80GwTjCIctDHRRFRL4pdPfr8PjVHetSnx9+zoRUT+D70Zeg/KSy2 99gmCxqSY9BZeHoiVPEX/HbhXrkuDjUSshwl98OAzOFmv6kbwtLntgFbWlBdE6dB mqOxBb73zEoZ5P9GA2l2ShU3GbzMzDebHBb9EyomXHZrLejoXeUNA28VJ+8vPP5S IVHnEwOkdJrNe/7cH4jd/B0NR6f8Da/F9kmkLiG2GNPMqQ8bnVhxTUtZkcAE+fd4 f34qLxsoht70wSSfISjBs7hP5KxEM1lOAf0w0RpycPUKJNV1FB0= =OEpF -----END PGP SIGNATURE----- Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull NFS server updates from Chuck Lever: "Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints" * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits) svcrdma: CM event handler clean up svcrdma: Remove transport reference counting svcrdma: Fix another Receive buffer leak SUNRPC: Refresh the show_rqstp_flags() macro nfsd: netns.h: delete a duplicated word SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") nfsd: avoid a NULL dereference in __cld_pipe_upcall() nfsd4: a client's own opens needn't prevent delegations nfsd: Use seq_putc() in two functions svcrdma: Display chunk completion ID when posting a rw_ctxt svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send() svcrdma: Introduce Send completion IDs svcrdma: Record Receive completion ID in svc_rdma_decode_rqst svcrdma: Introduce Receive completion IDs svcrdma: Introduce infrastructure to support completion IDs svcrdma: Add common XDR encoders for RDMA and Read segments svcrdma: Add common XDR decoders for RDMA and Read segments SUNRPC: Add helpers for decoding list discriminators symbolically svcrdma: Remove declarations for functions long removed svcrdma: Clean up trace_svcrdma_send_failed() tracepoint ...	2020-08-09 13:58:04 -07:00
J. Bruce Fields	9affa43581	nfsd4: fix NULL dereference in nfsd/clients display code We hold the cl_lock here, and that's enough to keep stateid's from going away, but it's not enough to prevent the files they point to from going away. Take fi_lock and a reference and check for NULL, as we do in other code. Reported-by: NeilBrown <neilb@suse.de> Fixes: `78599c42ae` ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-07-22 16:47:14 -04:00
J. Bruce Fields	94415b06eb	nfsd4: a client's own opens needn't prevent delegations We recently fixed lease breaking so that a client's actions won't break its own delegations. But we still have an unnecessary self-conflict when granting delegations: a client's own write opens will prevent us from handing out a read delegation even when no other client has the file open for write. Fix that by turning off the checks for conflicting opens under vfs_setlease, and instead performing those checks in the nfsd code. We don't depend much on locks here: instead we acquire the delegation, then check for conflicts, and drop the delegation again if we find any. The check beforehand is an optimization of sorts, just to avoid acquiring the delegation unnecessarily. There's a race where the first check could cause us to deny the delegation when we could have granted it. But, that's OK, delegation grants are optional (and probably not even a good idea in that case). Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-07-13 17:28:46 -04:00
J. Bruce Fields	681370f4b0	nfsd4: fix nfsdfs reference count loop We don't drop the reference on the nfsdfs filesystem with mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called until the nfsd module's unloaded, and we can't unload the module as long as there's a reference on nfsdfs. So this prevents module unloading. Fixes: `2c830dd720` ("nfsd: persist nfsd filesystem across mounts") Reported-and-Tested-by: Luo Xiaogang <lxgrxd@163.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-06-29 14:48:02 -04:00
J. Bruce Fields	6670ee2ef2	Merge branch 'nfsd-5.8' of git://linux-nfs.org/~cel/cel-2.6 into for-5.8-incoming Highlights of this series: * Remove serialization of sending RPC/RDMA Replies * Convert the TCP socket send path to use xdr_buf::bvecs (pre-requisite for RPC-on-TLS) * Fix svcrdma backchannel sendto return code * Convert a number of dprintk call sites to use tracepoints * Fix the "suggest braces around empty body in an 'else' statement" warning	2020-05-21 10:58:15 -04:00
Chuck Lever	45f56da822	NFSD: Squash an annoying compiler warning Clean up: Fix gcc empty-body warning when -Wextra is used. ../fs/nfsd/nfs4state.c:3898:3: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 17:30:44 -04:00
Chuck Lever	1eace0d1e9	NFSD: Add tracepoints for monitoring NFSD callbacks Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 17:30:44 -04:00
Chuck Lever	dd5e3fbc1f	NFSD: Add tracepoints to the NFSD state management code Capture obvious events and replace dprintk() call sites. Introduce infrastructure so that adding more tracepoints in this code later is simplified. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 17:30:44 -04:00
J. Bruce Fields	28df3d1539	nfsd: clients don't need to break their own delegations We currently revoke read delegations on any write open or any operation that modifies file data or metadata (including rename, link, and unlink). But if the delegation in question is the only read delegation and is held by the client performing the operation, that's not really necessary. It's not always possible to prevent this in the NFSv4.0 case, because there's not always a way to determine which client an NFSv4.0 delegation came from. (In theory we could try to guess this from the transport layer, e.g., by assuming all traffic on a given TCP connection comes from the same client. But that's not really correct.) In the NFSv4.1 case the session layer always tells us the client. This patch should remove such self-conflicts in all cases where we can reliably determine the client from the compound. To do that we need to track "who" is performing a given (possibly lease-breaking) file operation. We're doing that by storing the information in the svc_rqst and using kthread_data() to map the current task back to a svc_rqst. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-08 21:23:10 -04:00
J. Bruce Fields	c2d715a1af	nfsd: handle repeated BIND_CONN_TO_SESSION If the client attempts BIND_CONN_TO_SESSION on an already bound connection, it should be either a no-op or an error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-06 15:59:23 -04:00
Achilles Gaikwad	580da465a0	nfsd4: add filename to states output Add filename to states output for ease of debugging. Signed-off-by: Achilles Gaikwad <agaikwad@redhat.com> Signed-off-by: Kenneth Dsouza <kdsouza@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-06 15:59:23 -04:00
J. Bruce Fields	ee590d2597	nfsd4: stid display should preserve on-the-wire byte order When we decode the stateid we byte-swap si_generation. But for simplicity's sake and ease of comparison with network traces, it's better to display the whole thing in network order. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-06 15:59:23 -04:00
J. Bruce Fields	ace7ade4f5	nfsd4: common stateid-printing code There's a problem with how I'm formatting stateids. Before I fix it, I'd like to move the stateid formatting into a common helper. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-06 15:59:23 -04:00
Vasily Averin	e1e8399eee	nfsd: memory corruption in nfsd4_lock() New struct nfsd4_blocked_lock allocated in find_or_allocate_block() does not initialized nbl_list and nbl_lru. If conflock allocation fails rollback can call list_del_init() access uninitialized fields and corrupt memory. v2: just initialize nbl_list and nbl_lru right after nbl allocation. Fixes: `76d348fadf` ("nfsd: have nfsd4_lock use blocking locks for v4.1+ lock") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-04-13 10:28:21 -04:00
J. Bruce Fields	663e36f076	nfsd4: kill warnings on testing stateids with mismatched clientids It's normal for a client to test a stateid from a previous instance, e.g. after a network partition. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-03-19 10:51:42 -04:00
Petr Vorel	6cbfad5f20	nfsd: remove read permission bit for ctl sysctl It's meant to be write-only. Fixes: `89c905becc` ("nfsd: allow forced expiration of NFSv4 clients") Signed-off-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-03-16 12:04:34 -04:00
Trond Myklebust	a451b12311	nfsd: Don't add locks to closed or closing open stateids In NFSv4, the lock stateids are tied to the lockowner, and the open stateid, so that the action of closing the file also results in either an automatic loss of the locks, or an error of the form NFS4ERR_LOCKS_HELD. In practice this means we must not add new locks to the open stateid after the close process has been invoked. In fact doing so, can result in the following panic: kernel BUG at lib/list_debug.c:51! invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 PID: 1085 Comm: nfsd Not tainted 5.6.0-rc3+ #2 Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.14410784.B64.1908150010 08/15/2019 RIP: 0010:__list_del_entry_valid.cold+0x31/0x55 Code: 1a 3d 9b e8 74 10 c2 ff 0f 0b 48 c7 c7 f0 1a 3d 9b e8 66 10 c2 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 b0 1a 3d 9b e8 52 10 c2 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 78 1a 3d 9b e8 3e 10 c2 ff 0f 0b RSP: 0018:ffffb296c1d47d90 EFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff8ba032456ec8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8ba039e99cc8 RDI: ffff8ba039e99cc8 RBP: ffff8ba032456e60 R08: 0000000000000781 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ba009a4abe0 R13: ffff8ba032456e8c R14: 0000000000000000 R15: ffff8ba00adb01d8 FS: 0000000000000000(0000) GS:ffff8ba039e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb213f0b008 CR3: 00000001347de006 CR4: 00000000003606e0 Call Trace: release_lock_stateid+0x2b/0x80 [nfsd] nfsd4_free_stateid+0x1e9/0x210 [nfsd] nfsd4_proc_compound+0x414/0x700 [nfsd] ? nfs4svc_decode_compoundargs+0x407/0x4c0 [nfsd] nfsd_dispatch+0xc1/0x200 [nfsd] svc_process_common+0x476/0x6f0 [sunrpc] ? svc_sock_secure_port+0x12/0x30 [sunrpc] ? svc_recv+0x313/0x9c0 [sunrpc] ? nfsd_svc+0x2d0/0x2d0 [nfsd] svc_process+0xd4/0x110 [sunrpc] nfsd+0xe3/0x140 [nfsd] kthread+0xf9/0x130 ? nfsd_destroy+0x50/0x50 [nfsd] ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x40 The fix is to ensure that lock creation tests for whether or not the open stateid is unhashed, and to fail if that is the case. Fixes: `659aefb68e` ("nfsd: Ensure we don't recognise lock stateids after freeing them") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-03-16 12:04:33 -04:00
Madhuparna Bhowmik	36a8049181	fs: nfsd: nfs4state.c: Use built-in RCU list checking list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-03-16 12:04:31 -04:00
Arnd Bergmann	9104ae494e	nfsd: use ktime_get_real_seconds() in nfs4_verifier gen_confirm() generates a unique identifier based on the current time. This overflows in year 2038, but that is harmless since it generally does not lead to duplicates, as long as the time has been initialized by a real-time clock or NTP. Using ktime_get_boottime_seconds() or ktime_get_seconds() would avoid the overflow, but it would be more likely to result in non-unique numbers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 22:07:17 -05:00
Arnd Bergmann	20b7d86f29	nfsd: use boottime for lease expiry calculation A couple of time_t variables are only used to track the state of the lease time and its expiration. The code correctly uses the 'time_after()' macro to make this work on 32-bit architectures even beyond year 2038, but the get_seconds() function and the time_t type itself are deprecated as they behave inconsistently between 32-bit and 64-bit architectures and often lead to code that is not y2038 safe. As a minor issue, using get_seconds() leads to problems with concurrent settimeofday() or clock_settime() calls, in the worst case timeout never triggering after the time has been set backwards. Change nfsd to use time64_t and ktime_get_boottime_seconds() here. This is clearly excessive, as boottime by itself means we never go beyond 32 bits, but it does mean we handle this correctly and consistently without having to worry about corner cases and should be no more expensive than the previous implementation on 64-bit architectures. The max_cb_time() function gets changed in order to avoid an expensive 64-bit division operation, but as the lease time is at most one hour, there is no change in behavior. Also do the same for server-to-server copy expiration time. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [bfields@redhat.com: fix up copy expiration] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 22:07:17 -05:00
Arnd Bergmann	9594497f2c	nfsd: fix jiffies/time_t mixup in LRU list The nfsd4_blocked_lock->nbl_time timestamp is recorded in jiffies, but then compared to a CLOCK_REALTIME timestamp later on, which makes no sense. For consistency with the other timestamps, change this to use a time_t. This is a change in behavior, which may cause regressions, but the current code is not sensible. On a system with CONFIG_HZ=1000, the 'time_after((unsigned long)nbl->nbl_time, (unsigned long)cutoff))' check is false for roughly the first 18 days of uptime and then true for the next 49 days. Fixes: `7919d0a27f` ("nfsd: add a LRU list for blocked locks") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 17:46:08 -05:00
Arnd Bergmann	2a1aa48929	nfsd: pass a 64-bit guardtime to nfsd_setattr() Guardtime handling in nfs3 differs between 32-bit and 64-bit architectures, and uses the deprecated time_t type. Change it to using time64_t, which behaves the same way on 64-bit and 32-bit architectures, treating the number as an unsigned 32-bit entity with a range of year 1970 to 2106 consistently, and avoiding the y2038 overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 17:46:08 -05:00
Arnd Bergmann	9cc7680149	nfsd: make 'boot_time' 64-bit wide The local boot time variable gets truncated to time_t at the moment, which can lead to slightly odd behavior on 32-bit architectures. Use ktime_get_real_seconds() instead of get_seconds() to always get a 64-bit result, and keep it that way wherever possible. It still gets truncated in a few places: - When assigning to cl_clientid.cl_boot, this is already documented and is only used as a unique identifier. - In clients_still_reclaiming(), the truncation is to 'unsigned long' in order to use the 'time_before() helper. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 17:46:08 -05:00
Arnd Bergmann	e29f470396	nfsd: print 64-bit timestamps in client_info_show The nii_time field gets truncated to 'time_t' on 32-bit architectures before printing. Remove the use of 'struct timespec' to product the correct output beyond 2038. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 17:46:08 -05:00
Arnd Bergmann	b3f255ef6b	nfsd: use ktime_get_seconds() for timestamps The delegation logic in nfsd uses the somewhat inefficient seconds_since_boot() function to record time intervals. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 17:46:08 -05:00
zhengbin	fc5fc5d7cc	nfsd4: Remove unneeded semicolon Fixes coccicheck warning: fs/nfsd/nfs4state.c:3376:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-19 17:42:44 -05:00
Dan Carpenter	5277a79e2d	nfsd: unlock on error in manage_cpntf_state() We are holding the "nn->s2s_cp_lock" so we can't return directly without unlocking first. Fixes: f3dee17721a0 ("NFSD check stateids against copy stateids") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-12-09 11:44:07 -05:00
Olga Kornievskaia	ce0887ac96	NFSD add nfs4 inter ssc to nfsd4_copy Given a universal address, mount the source server from the destination server. Use an internal mount. Call the NFS client nfs42_ssc_open to obtain the NFS struct file suitable for nfsd_copy_range. Ability to do "inter" server-to-server depends on the an nfsd kernel parameter "inter_copy_offload_enable". Signed-off-by: Olga Kornievskaia <kolga@netapp.com>	2019-12-09 11:44:07 -05:00
Olga Kornievskaia	51100d2b87	NFSD generalize nfsd4_compound_state flag names Allow for sid_flag field non-stateid use. Signed-off-by: Andy Adamson <andros@netapp.com>	2019-12-09 11:42:14 -05:00
Olga Kornievskaia	b734220425	NFSD check stateids against copy stateids Incoming stateid (used by a READ) could be a saved copy stateid. Using the provided stateid, look it up in the list of copy_notify stateids. If found, use the parent's stateid and parent's clid to look up the parent's stid to do the appropriate checks. Update the copy notify timestamp (cpntf_time) with current time this making it 'active' so that laundromat thread will not delete copy notify state. Signed-off-by: Olga Kornievskaia <kolga@netapp.com>	2019-12-09 11:42:14 -05:00
Olga Kornievskaia	624322f1ad	NFSD add COPY_NOTIFY operation Introducing the COPY_NOTIFY operation. Create a new unique stateid that will keep track of the copy state and the upcoming READs that will use that stateid. Each associated parent stateid has a list of copy notify stateids. A copy notify structure makes a copy of the parent stateid and a clientid and will use it to look up the parent stateid during the READ request (suggested by Trond Myklebust <trond.myklebust@hammerspace.com>). At nfs4_put_stid() time, we walk the list of the associated copy notify stateids and delete them. Laundromat thread will traverse globally stored copy notify stateid in idr and notice if any haven't been referenced in the lease period, if so, it'll remove them. Return single netaddr to advertise to the copy. Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com>	2019-12-09 11:42:14 -05:00
Scott Mayhew	6e73e92b15	nfsd4: fix up replay_matches_cache() When running an nfs stress test, I see quite a few cached replies that don't match up with the actual request. The first comment in replay_matches_cache() makes sense, but the code doesn't seem to match... fix it. This isn't exactly a bugfix, as the server isn't required to catch every case of a false retry. So, we may as well do this, but if this is fixing a problem then that suggests there's a client bug. Fixes: `53da6a53e1` ("nfsd4: catch some false session retries") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-10-09 16:39:31 -04:00
J. Bruce Fields	c4b77edb3f	nfsd: "\%s" should be "%s" Randy says: > sparse complains about these, as does gcc when used with --pedantic. > sparse says: > > ../fs/nfsd/nfs4state.c:2385:23: warning: unknown escape sequence: '\%' > ../fs/nfsd/nfs4state.c:2385:23: warning: unknown escape sequence: '\%' > ../fs/nfsd/nfs4state.c:2388:23: warning: unknown escape sequence: '\%' > ../fs/nfsd/nfs4state.c:2388:23: warning: unknown escape sequence: '\%' I'm not sure how this crept in. Fix it. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-10-08 16:01:33 -04:00
NeilBrown	2030ca560c	nfsd: degraded slot-count more gracefully as allocation nears exhaustion. This original code in nfsd4_get_drc_mem() would hand out 30 slots (approximately NFSD_MAX_MEM_PER_SESSION bytes at slightly over 2K per slot) to each requesting client until it ran out of space, then it would possibly give one last client a reduced allocation, then fail the allocation. Since commit `de766e5704` ("nfsd: give out fewer session slots as limit approaches") the last 90 slots to be given to about 12 clients with quickly reducing slot counts (better than just 3 clients). This still seems unnecessarily hasty. A subsequent patch allows over-allocation so every client gets at least one slot, but that might be a bit restrictive. The requested number of nfsd threads is the best guide we have to the expected number of clients, so use that - if it is at least 8. 256 threads on a 256Meg machine - which is a lot for a tiny machine - would result in nfsd_drc_max_mem being 2Meg, so 8K (3 slots) would be available for the first client, and over 200 clients would get more than 1 slot. So I don't think this change will be too debilitating on poorly configured machines, though it does mean that a sensible configuration is a little more important. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-09-20 12:31:51 -04:00
NeilBrown	7f49fd5d7a	nfsd: handle drc over-allocation gracefully. Currently, if there are more clients than allowed for by the space allocation in set_max_drc(), we fail a SESSION_CREATE request with NFS4ERR_DELAY. This means that the client retries indefinitely, which isn't a user-friendly response. The RFC requires NFS4ERR_NOSPC, but that would at best result in a clean failure on the client, which is not much more friendly. The current space allocation is a best-guess and doesn't provide any guarantees, we could still run out of space when trying to allocate drc space. So fail more gracefully - always give out at least one slot. If all clients used all the space in all slots, we might start getting memory pressure, but that is possible anyway. So ensure 'num' is always at least 1, and remove the test for it being zero. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-09-20 12:30:02 -04:00
Scott Mayhew	6ee95d1c89	nfsd: add support for upcall version 2 Version 2 upcalls will allow the nfsd to include a hash of the kerberos principal string in the Cld_Create upcall. If a principal is present in the svc_cred, then the hash will be included in the Cld_Create upcall. We attempt to use the svc_cred.cr_raw_principal (which is returned by gssproxy) first, and then fall back to using the svc_cred.cr_principal (which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent restart, the hash will be returned in the Cld_Gracestart downcall and stored in the reclaim_str_hashtbl so it can be used when handling reclaim opens. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-09-10 09:26:33 -04:00
Jeff Layton	6b556ca287	nfsd: have nfsd_test_lock use the nfsd_file cache Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-08-19 11:09:09 -04:00
Jeff Layton	5c4583b2b7	nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp. Since we now presume that the struct file will be persistent in most cases, we can stop fiddling with the raparms in the read code. This also means that we don't really care about the rd_tmp_file field anymore. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-08-19 11:09:09 -04:00
Jeff Layton	eb82dd3937	nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Have them keep an nfsd_file reference instead of a struct file. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-08-19 11:09:09 -04:00
Jeff Layton	fd4f83fd7d	nfsd: convert nfs4_file->fi_fds array to use nfsd_files Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-08-19 11:09:09 -04:00
YueHaibing	297e57a24f	nfsd: Make two functions static Fix sparse warnings: fs/nfsd/nfs4state.c:1908:6: warning: symbol 'drop_client' was not declared. Should it be static? fs/nfsd/nfs4state.c:2518:6: warning: symbol 'force_expire_client' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-09 19:36:33 -04:00
J. Bruce Fields	791234448d	nfsd: decode implementation id Decode the implementation ID and display in nfsd/clients/#/info. It may be help identify the client. It won't be used otherwise. (When this went into the protocol, I thought the implementation ID would be a slippery slope towards implementation-specific workarounds as with the http user-agent. But I guess I was wrong, the risk seems pretty low now.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 20:54:03 -04:00
J. Bruce Fields	6f4859b8a7	nfsd: create xdr_netobj_dup helper Move some repeated code to a common helper. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:51 -04:00
J. Bruce Fields	89c905becc	nfsd: allow forced expiration of NFSv4 clients NFSv4 clients are automatically expired and all their locks removed if they don't contact the server for a certain amount of time (the lease period, 90 seconds by default). There can still be situations where that's not enough, so allow userspace to force expiry by writing "expire\n" to the new nfsd/client/#/ctl file. (The generic "ctl" name is because I expect we may want to allow other operations on clients in the future.) The write will not return until the client is expired and all of its locks and other state removed. The fault injection code also provides a way of expiring clients, but it fails if there are any in-progress RPC's referencing the client. Also, its method of selecting a client to expire is a little more primitive--it uses an IP address, which can't always uniquely specify an NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	a204f25e37	nfsd: create get_nfsdfs_clp helper Factor our some common code. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	0c4b62b042	nfsd4: show layout stateids These are also minimal for now, I'm not sure what information would be useful. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	16d36e0999	nfsd: show lock and deleg stateids These entries are pretty minimal for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	78599c42ae	nfsd4: add file to display list of client's opens Add a nfsd/clients/#/opens file to list some information about all the opens held by the given client, including open modes, device numbers, inode numbers, and open owners. Open owners are totally opaque but seem to sometimes have some useful ascii strings included, so passing through printable ascii characters and escaping the rest seems useful while still being machine-readable. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	169319f13c	nfsd: add more information to client info file Add ip address, full client-provided identifier, and minor version. There's much more that could possibly be useful but this is a start. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	3bade247fc	nfsd: copy client's address including port number to cl_addr rpc_copy_addr() copies only the IP address and misses any port numbers. It seems potentially useful to keep the port number around too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	97ad4031e2	nfsd4: add a client info file Add a new nfsd/clients/#/info file with some basic information about each NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	bf5ed3e3bb	nfsd: make client/ directory names small ints We want clientid's on the wire to be randomized for reasons explained in `ebd7c72c63` "nfsd: randomize SETCLIENTID reply to help distinguish servers". But I'd rather have mostly small integers for the clients/ directory. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:50 -04:00
J. Bruce Fields	e8a79fb14f	nfsd: add nfsd/clients directory I plan to expose some information about nfsv4 clients here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:49 -04:00
J. Bruce Fields	59f8e91b75	nfsd4: use reference count to free client Keep a second reference count which is what is really used to decide when to free the client's memory. Next I'm going to add an nfsd/clients/ directory with a subdirectory for each NFSv4 client. File objects under nfsd/clients/ will hold these references. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:49 -04:00
J. Bruce Fields	14ed14cc7c	nfsd: rename cl_refcount Rename this to a more descriptive name: it counts the number of in-progress rpc's referencing this client. Next I'm going to add a second refcount with a slightly different use. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:52:49 -04:00
Paul Menzel	3b2d4dcf71	nfsd: Fix overflow causing non-working mounts on 1 TB machines Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: `c54f24e338` (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-07-03 17:51:31 -04:00
Linus Torvalds	700a800a94	This pull consists mostly of nfsd container work: Scott Mayhew revived an old api that communicates with a userspace daemon to manage some on-disk state that's used to track clients across server reboots. We've been using a usermode_helper upcall for that, but it's tough to run those with the right namespaces, so a daemon is much friendlier to container use cases. Trond fixed nfsd's handling of user credentials in user namespaces. He also contributed patches that allow containers to support different sets of NFS protocol versions. The only remaining container bug I'm aware of is that the NFS reply cache is shared between all containers. If anyone's aware of other gaps in our container support, let me know. The rest of this is miscellaneous bugfixes. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAlzcWNcVHGJmaWVsZHNA ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DUEP/0WD3jKNAHFV3M5YQPAI9fz/iCND Db/A4oWP5qa6JmwmHe61il29QeGqkeFr/NPexgzM3Xw2E39d7RBXBeWyVDuqb0wr 6SCXjXibTsuAHg11nR8Xf0P5Vej3rfGbG6up5lLCIDTEZxVpWoaBJnM8+3bewuCj XbeiDW54oiMbmDjon3MXqVAIF/z7LjorecJ+Yw5+0Jy7KZ6num9Kt8+fi7qkEfFd i5Bp9KWgzlTbJUJV4EX3ZKN3zlGkfOvjoo2kP3PODPVMB34W8jSLKkRSA1tDWYZg 43WhBt5OODDlV6zpxSJXehYKIB4Ae469+RRaIL4F+ORRK+AzR0C/GTuOwJiG+P3J n95DX5WzX74nPOGQJgAvq4JNpZci85jM3jEK1TR2M7KiBDG5Zg+FTsPYVxx5Sgah Akl/pjLtHQPSdBbFGHn5TsXU+gqWNiKsKa9663tjxLb8ldmJun6JoQGkAEF9UJUn dzv0UxyHeHAblhSynY+WsUR+Xep9JDo/p5LyFK4if9Sd62KeA1uF/MFhAqpKZF81 mrgRCqW4sD8aVTBNZI06pZzmcZx4TRr2o+Oj5KAXf6Yk6TJRSGfnQscoMMBsTLkw VK1rBQ/71TpjLHGZZZEx1YJrkVZAMmw2ty4DtK2f9jeKO13bWmUpc6UATzVufHKA C1rUZXJ5YioDbYDy =TUdw -----END PGP SIGNATURE----- Merge tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "This consists mostly of nfsd container work: Scott Mayhew revived an old api that communicates with a userspace daemon to manage some on-disk state that's used to track clients across server reboots. We've been using a usermode_helper upcall for that, but it's tough to run those with the right namespaces, so a daemon is much friendlier to container use cases. Trond fixed nfsd's handling of user credentials in user namespaces. He also contributed patches that allow containers to support different sets of NFS protocol versions. The only remaining container bug I'm aware of is that the NFS reply cache is shared between all containers. If anyone's aware of other gaps in our container support, let me know. The rest of this is miscellaneous bugfixes" * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits) nfsd: update callback done processing locks: move checks from locks_free_lock() to locks_release_private() nfsd: fh_drop_write in nfsd_unlink nfsd: allow fh_want_write to be called twice nfsd: knfsd must use the container user namespace SUNRPC: rsi_parse() should use the current user namespace SUNRPC: Fix the server AUTH_UNIX userspace mappings lockd: Pass the user cred from knfsd when starting the lockd server SUNRPC: Temporary sockets should inherit the cred from their parent SUNRPC: Cache the process user cred in the RPC server listener nfsd: Allow containers to set supported nfs versions nfsd: Add custom rpcbind callbacks for knfsd SUNRPC: Allow further customisation of RPC program registration SUNRPC: Clean up generic dispatcher code SUNRPC: Add a callback to initialise server requests SUNRPC/nfs: Fix return value for nfs4_callback_compound() nfsd: handle legacy client tracking records sent by nfsdcld nfsd: re-order client tracking method selection nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld nfsd: un-deprecate nfsdcld ...	2019-05-15 18:21:43 -07:00
Linus Torvalds	b4b52b881c	Wimplicit-fallthrough patches for 5.2-rc1 Hi Linus, This is my very first pull-request. I've been working full-time as a kernel developer for more than two years now. During this time I've been fixing bugs reported by Coverity all over the tree and, as part of my work, I'm also contributing to the KSPP. My work in the kernel community has been supervised by Greg KH and Kees Cook. OK. So, after the quick introduction above, please, pull the following patches that mark switch cases where we are expecting to fall through. These patches are part of the ongoing efforts to enable -Wimplicit-fallthrough. They have been ignored for a long time (most of them more than 3 months, even after pinging multiple times), which is the reason why I've created this tree. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. I'm happy to let you know that we are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the following missing break/return bugs, some of them introduced more than 5 years ago: `84242b82d8` `7850b51b6c` `5e420fe635` `09186e5034` `b5be853181` `7264235ee7` `cc5034a5d2` `479826cc86` `5340f23df8` `df997abeeb` `2f10d82373` `307b00c5e6` `5d25ff7a54` `a7ed5b3e7d` `c24bfa8f21` `ad0eaee619` `9ba8376ce1` `dc586a60a1` `a8e9b186f1` `4e57562b48` `60747828ea` `c5b974bee9` `cc44ba9116` `2c930e3d0a` Once this work is finish, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again. Thanks Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAlzQR2IACgkQRwW0y0cG 2zEJbQ//X930OcBtT/9DRW4XL1Jeq0Mjssz/GLX2Vpup5CwwcTROG65no80Zezf/ yQRWnUjGX0OBv/fmUK32/nTxI/7k7NkmIXJHe0HiEF069GEENB7FT6tfDzIPjU8M qQkB8NsSUWJs3IH6BVynb/9MGE1VpGBDbYk7CBZRtRJT1RMM+3kQPucgiZMgUBPo Yd9zKwn4i/8tcOCli++EUdQ29ukMoY2R3qpK4LftdX9sXLKZBWNwQbiCwSkjnvJK I6FDiA7RaWH2wWGlL7BpN5RrvAXp3z8QN/JZnivIGt4ijtAyxFUL/9KOEgQpBQN2 6TBRhfTQFM73NCyzLgGLNzvd8awem1rKGSBBUvevaPbgesgM+Of65wmmTQRhFNCt A7+e286X1GiK3aNcjUKrByKWm7x590EWmDzmpmICxNPdt5DHQ6EkmvBdNjnxCMrO aGA24l78tBN09qN45LR7wtHYuuyR0Jt9bCmeQZmz7+x3ICDHi/+Gw7XPN/eM9+T6 lZbbINiYUyZVxOqwzkYDCsdv9+kUvu3e4rPs20NERWRpV8FEvBIyMjXAg6NAMTue K+ikkyMBxCvyw+NMimHJwtD7ho4FkLPcoeXb2ZGJTRHixiZAEtF1RaQ7dA05Q/SL gbSc0DgLZeHlLBT+BSWC2Z8SDnoIhQFXW49OmuACwCUC68NHKps= =k30z -----END PGP SIGNATURE----- Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva: "Mark switch cases where we are expecting to fall through. This is part of the ongoing efforts to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. We are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the several missing break/return bugs, some of them introduced more than 5 years ago. Once this work is finished, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again" * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) memstick: mark expected switch fall-throughs drm/nouveau/nvkm: mark expected switch fall-throughs NFC: st21nfca: Fix fall-through warnings NFC: pn533: mark expected switch fall-throughs block: Mark expected switch fall-throughs ASN.1: mark expected switch fall-through lib/cmdline.c: mark expected switch fall-throughs lib: zstd: Mark expected switch fall-throughs scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs scsi: ppa: mark expected switch fall-through scsi: osst: mark expected switch fall-throughs scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs scsi: lpfc: lpfc_nvme: Mark expected switch fall-through scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs scsi: lpfc: lpfc_els: Mark expected switch fall-throughs scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs scsi: imm: mark expected switch fall-throughs scsi: csiostor: csio_wr: mark expected switch fall-through ...	2019-05-07 12:48:10 -07:00
Scott Mayhew	1c73b9d24f	nfsd: update callback done processing Instead of having the convention where individual nfsd4_callback_ops->done operations return -1 to indicate the callback path is down, move the check to nfsd4_cb_done. Only mark the callback path down on transport-level errors, not NFS-level errors. The existing logic causes the server to set SEQ4_STATUS_CB_PATH_DOWN just because the client returned an error to a CB_RECALL for a delegation that the client had already done a FREE_STATEID for. But clearly that error doesn't mean that there's anything wrong with the backchannel. Additionally, handle NFS4ERR_DELAY in nfsd4_cb_recall_done. The client returns NFS4ERR_DELAY if it is already in the process of returning the delegation. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-05-03 11:01:38 -04:00
Scott Mayhew	362063a595	nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld When using nfsdcld for NFSv4 client tracking, track the number of RECLAIM_COMPLETE operations we receive from "known" clients to help in deciding if we can lift the grace period early (or whether we need to start a v4 grace period at all). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-04-24 09:46:34 -04:00
Scott Mayhew	6b1891052a	nfsd: make nfs4_client_reclaim use an xdr_netobj instead of a fixed char array This will allow the reclaim_str_hashtbl to store either the recovery directory names used by the legacy client tracking code or the full client strings used by the nfsdcld client tracking code. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-04-24 09:46:34 -04:00
Jeff Layton	f456458e4d	nfsd: wake blocked file lock waiters before sending callback When a blocked NFS lock is "awoken" we send a callback to the server and then wake any hosts waiting on it. If a client attempts to get a lock and then drops off the net, we could end up waiting for a long time until we end up waking locks blocked on that request. So, wake any other waiting lock requests before sending the callback. Do this by calling locks_delete_block in a new "prepare" phase for CB_NOTIFY_LOCK callbacks. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: `16306a61d3` ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-04-22 15:38:41 -04:00
Jeff Layton	6aaafc43a4	nfsd: wake waiters blocked on file_lock before deleting it After a blocked nfsd file_lock request is deleted, knfsd will send a callback to the client and then free the request. Commit `16306a61d3` ("fs/locks: always delete_block after waiting.") changed it such that locks_delete_block is always called on a request after it is awoken, but that patch missed fixing up blocked nfsd request handling. Call locks_delete_block on the block to wake up any locks still blocked on the nfsd lock request before freeing it. Some of its callers already do this however, so just remove those calls. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: `16306a61d3` ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-04-22 15:31:54 -04:00
Gustavo A. R. Silva	0a4c92657f	fs: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>	2019-04-08 18:21:02 -05:00
J. Bruce Fields	c54f24e338	nfsd: fix performance-limiting session calculation We're unintentionally limiting the number of slots per nfsv4.1 session to 10. Often more than 10 simultaneous RPCs are needed for the best performance. This calculation was meant to prevent any one client from using up more than a third of the limit we set for total memory use across all clients and sessions. Instead, it's limiting the client to a third of the maximum for a single session. Fix this. Reported-by: Chris Tracy <ctracy@engr.scu.edu> Cc: stable@vger.kernel.org Fixes: `de766e5704` "nfsd: give out fewer session slots as limit approaches" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-02-21 10:47:00 -05:00
Linus Torvalds	e45428a436	Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJcLR3nAAoJECebzXlCjuG+oyAQALrPSTH9Qg2AwP2eGm+AevUj u/VFmimImIO9dYuT02t4w42w4qMIQ0/Y7R0UjT3DxG5Oixy/zA+ZaNXCCEKwSMIX abGF4YalUISbDc6n0Z8J14/T33wDGslhy3IQ9Jz5aBCDCocbWlzXvFlmrowbb3ak vtB0Fc3Xo6Z/Pu2GzNzlqR+f69IAmwQGJrRrAEp3JUWSIBKiSWBXTujDuVBqJNYj ySLzbzyAc7qJfI76K635XziULR2ueM3y5JbPX7kTZ0l3OJ6Yc0PtOj16sIv5o0XK DBYPrtvw3ZbxQE/bXqtJV9Zn6MG5ODGxKszG1zT1J3dzotc9l/LgmcAY8xVSaO+H QNMdU9QuwmyUG20A9rMoo/XfUb5KZBHzH7HIYOmkfBidcaygwIInIKoIDtzimm4X OlYq3TL/3QDY6rgTCZv6n2KEnwiIDpc5+TvFhXRWclMOJMcMSHJfKFvqERSv9V3o 90qrCebPA0K8Dnc0HMxcBXZ+0TqZ2QeXp/wfIjibCXqMwlg+BZhmbeA0ngZ7x7qf 2F33E9bfVJjL+VI5FcVYQf43bOTWZgD6ZKGk4T7keYl0CPH+9P70bfhl4KKy9dqc GwYooy/y5FPb2CvJn/EETeILRJ9OyIHUrw7HBkpz9N8n9z+V6Qbp9yW7LKgaMphW 1T+GpHZhQjwuBPuJhDK0 =dRLp -----END PGP SIGNATURE----- Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits) nfs: fixed broken compilation in nfs_callback_up_net() nfs: minor typo in nfs4_callback_up_net() sunrpc: fix debug message in svc_create_xprt() sunrpc: make visible processing error in bc_svc_process() sunrpc: remove unused xpo_prep_reply_hdr callback sunrpc: remove svc_rdma_bc_class sunrpc: remove svc_tcp_bc_class sunrpc: remove unused bc_up operation from rpc_xprt_ops sunrpc: replace svc_serv->sv_bc_xprt by boolean flag sunrpc: use-after-free in svc_process_common() sunrpc: use SVC_NET() in svcauth_gss_* functions nfsd: drop useless LIST_HEAD lockd: Show pid of lockd for remote locks NFSD remove OP_CACHEME from 4.2 op_flags nfsd: Return EPERM, not EACCES, in some SETATTR cases sunrpc: fix cache_head leak due to queued request nfsd: clean up indentation, increase indentation in switch statement svcrdma: Optimize the logic that selects the R_key to invalidate nfsd: fix a warning in __cld_pipe_upcall() nfsd4: fix crash on writing v4_end_grace before nfsd startup ...	2019-01-02 16:21:50 -08:00
NeilBrown	cb03f94ffb	fs/locks: merge posix_unblock_lock() and locks_delete_block() posix_unblock_lock() is not specific to posix locks, and behaves nearly identically to locks_delete_block() - the former returning a status while the later doesn't. So discard posix_unblock_lock() and use locks_delete_block() instead, after giving that function an appropriate return value. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>	2018-12-07 06:50:56 -05:00
Colin Ian King	f50c9d797d	nfsd: clean up indentation, increase indentation in switch statement Trivial fix to clean up indentation, add in missing tabs. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-11-28 18:36:03 -05:00
J. Bruce Fields	d8836f7724	nfsd4: remove unused nfs4_check_olstateid parameter Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-11-27 16:24:01 -05:00
Andrew Elble	bd8d725078	nfsd: correctly decrement odstate refcount in error path alloc_init_deleg() both allocates an nfs4_delegation, and bumps the refcount on odstate. So after this point, we need to put_clnt_odstate() and nfs4_put_stid() to not leave the odstate refcount inappropriately bumped. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-10-29 16:58:04 -04:00
Olga Kornievskaia	e0639dc580	NFSD introduce async copy feature Upon receiving a request for async copy, create a new kthread. If we get asynchronous request, make sure to copy the needed arguments/state from the stack before starting the copy. Then start the thread and reply back to the client indicating copy is asynchronous. nfsd_copy_file_range() will copy in a loop over the total number of bytes is needed to copy. In case a failure happens in the middle, we ignore the error and return how much we copied so far. Once done creating a workitem for the callback workqueue and send CB_OFFLOAD with the results. The lifetime of the copy stateid is bound to the vfs copy. This way we don't need to keep the nfsd_net structure for the callback. We could keep it around longer so that an OFFLOAD_STATUS that came late would still get results, but clients should be able to deal without that. We handle OFFLOAD_CANCEL by sending a signal to the copy thread and calling kthread_stop. A client should cancel any ongoing copies before calling DESTROY_CLIENT; if not, we return a CLIENT_BUSY error. If the client is destroyed for some other reason (lease expiration, or server shutdown), we must clean up any ongoing copies ourselves. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> [colin.king@canonical.com: fix leak in error case] [bfields@fieldses.org: remove signalling, merge patches] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-09-25 20:34:54 -04:00
Chuck Lever	a26dd64f54	nfsd: Remove callback_cred Clean up: The global callback_cred is no longer used, so it can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-08-22 18:32:07 -04:00
Chuck Lever	9abdda5dda	sunrpc: Extract target name into svc_cred NFSv4.0 callback needs to know the GSS target name the client used when it established its lease. That information is available from the GSS context created by gssproxy. Make it available in each svc_cred. Note this will also give us access to the real target service principal name (which is typically "nfs", but spec does not require that). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-08-22 18:32:07 -04:00
Amir Goldstein	64bed6cbe3	nfsd: fix leaked file lock with nfs exported overlayfs nfsd and lockd call vfs_lock_file() to lock/unlock the inode returned by locks_inode(file). Many places in nfsd/lockd code use the inode returned by file_inode(file) for lock manipulation. With Overlayfs, file_inode() (the underlying inode) is not the same object as locks_inode() (the overlay inode). This can result in "Leaked POSIX lock" messages and eventually to a kernel crash as reported by Eddie Horng: https://marc.info/?l=linux-unionfs&m=153086643202072&w=2 Fix all the call sites in nfsd/lockd that should use locks_inode(). This is a correctness bug that manifested when overlayfs gained NFS export support in v4.16. Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Fixes: `8383f17488` ("ovl: wire up NFS export operations") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-08-09 16:11:21 -04:00
J. Bruce Fields	4a269efb6f	nfsd: update obselete comment referencing the BKL It's inode->i_lock that's now taken in setlease and break_lease, instead of the big kernel lock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-06-17 10:42:56 -04:00
J. Bruce Fields	ca0552f464	nfsd4: cleanup sessionid in nfsd4_destroy_session The name of this variable doesn't fit the type. And we only ever use one field of it. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-06-17 10:42:51 -04:00
J. Bruce Fields	665d507276	nfsd4: less confusing nfsd4_compound_in_session Make the function prototype match the name a little better. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-06-17 10:42:27 -04:00
J. Bruce Fields	03f318ca65	nfsd4: extend reclaim period for reclaiming clients If the client is only renewing state a little sooner than once a lease period, then it might not discover the server has restarted till close to the end of the grace period, and might run out of time to do the actual reclaim. Extend the grace period by a second each time we notice there are clients still trying to reclaim, up to a limit of another whole lease period. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-06-17 10:20:47 -04:00
Linus Torvalds	b08fc5277a	- Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees) -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7 oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2 zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8 tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3 BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3 LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh AtfvEeaPHaOyD8/h2Q== =zUUp -----END PGP SIGNATURE----- Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...	2018-06-12 18:28:00 -07:00
Kees Cook	6da2ec5605	treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(char) * COUNT + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) \| kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) \| kmalloc(sizeof(TYPE) * C2, ...) \| kmalloc(C1 * C2 * C3, ...) \| kmalloc(C1 * C2, ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Andrew Elble	692ad280bf	nfsd: fix error handling in nfs4_set_delegation() I noticed a memory corruption crash in nfsd in 4.17-rc1. This patch corrects the issue. Fix to return error if the delegation couldn't be hashed or there was a recall in progress. Use the existing error path instead of destroy_delegation() for readability. Signed-off-by: Andrew Elble <aweits@rit.edu> Fixes: `353601e7d3` ("nfsd: create a separate lease for each delegation") Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-06-08 16:42:29 -04:00
Fengguang Wu	51d87bc2bf	nfsd: fix boolreturn.cocci warnings fs/nfsd/nfs4state.c:926:8-9: WARNING: return of 0/1 in function 'nfs4_delegation_exists' with return type bool fs/nfsd/nfs4state.c:2955:9-10: WARNING: return of 0/1 in function 'nfsd4_compound_in_session' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: scripts/coccinelle/misc/boolreturn.cocci Fixes: `68b18f5294` ("nfsd: make nfs4_get_existing_delegation less confusing") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> [bfields: also fix -EAGAIN] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-04-03 15:08:08 -04:00
J. Bruce Fields	353601e7d3	nfsd: create a separate lease for each delegation Currently we only take one vfs-level delegation (lease) for each file, no matter how many clients hold delegations on that file. Let's instead keep a one-to-one mapping between NFSv4 delegations and VFS delegations. This turns out to be simpler. There is still a many-to-one mapping of NFS opens to NFS files, and the delegations on one file are all associated with one struct file. The VFS can still distinguish between these delegations since we're setting fl_owner to the struct nfs4_delegation now, not to the shared file. I'm replacing at least one complicated function wholesale, which I don't like to do, but I haven't figured out how to do this more incrementally. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:14 -04:00
J. Bruce Fields	86d29b10eb	nfsd: move sc_file assignment into alloc_init_deleg Take an easy chance to simplify the caller a little. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:13 -04:00
J. Bruce Fields	0af6e690f0	nfsd: factor out common delegation-destruction code Pull some duplicated code into a common helper. This changes the order in destroy_delegation a little, but it looks to me like that shouldn't matter. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:13 -04:00
J. Bruce Fields	68b18f5294	nfsd: make nfs4_get_existing_delegation less confusing This doesn't "get" anything. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:12 -04:00
J. Bruce Fields	0c911f5408	nfsd4: dp->dl_stid.sc_file doesn't need locking The delegation isn't visible to anyone yet. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:12 -04:00
J. Bruce Fields	653e514e9e	nfsd4: set fl_owner to delegation, not file pointer For now this makes no difference, as for files having delegations, there's a one-to-one relationship between an nfs4_file and its nfs4_delegation. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:11 -04:00
J. Bruce Fields	cba7b3d150	nfsd: simplify nfs4_put_deleg_lease calls Every single caller gets the file out of the delegation, so let's do that once in nfs4_put_deleg_lease. Plus we'll need it there for other reasons. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:11 -04:00
J. Bruce Fields	b8232d3315	nfsd: simplify put of fi_deleg_file fi_delegees is basically just a reference count on users of fi_deleg_file, which is cleared when fi_delegees goes to zero. The fi_deleg_file check here is redundant. Also add an assertion to make sure we don't have unbalanced puts. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-03-20 17:51:10 -04:00
Jeff Layton	9258a2d5cd	nfsd: move nfs4_client allocation to dedicated slabcache On x86_64, it's 1152 bytes, so we can avoid wasting 896 bytes each. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-03-19 16:38:13 -04:00
Jeff Layton	bd2decac5e	nfsd4: send the special close_stateid in v4.0 replies as well We already send it for v4.1, but RFC7530 also notes that the stateid in the close reply is bogus. Always send the special close stateid, even in v4.0 responses. No client should put any meaning on it whatsoever. For now, we continue to increment the stateid value, though that might not be necessary either. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-03-19 16:38:12 -04:00
Jeff Layton	68ef3bc316	nfsd: remove blocked locks on client teardown We had some reports of panics in nfsd4_lm_notify, and that showed a nfs4_lockowner that had outlived its so_client. Ensure that we walk any leftover lockowners after tearing down all of the stateids, and remove any blocked locks that they hold. With this change, we also don't need to walk the nbl_lru on nfsd_net shutdown, as that will happen naturally when we tear down the clients. Fixes: `76d348fadf` (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks) Reported-by: Frank Sorenson <fsorenso@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 4.9 Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-03-19 16:37:21 -04:00
J. Bruce Fields	2502072058	nfsd4: don't set lock stateid's sc_type to CLOSED There's no point I can see to stp->st_stid.sc_type = NFS4_CLOSED_STID; given release_lock_stateid immediately sets sc_type to 0. That set of sc_type to 0 should be enough to prevent it being used where we don't want it to be; NFS4_CLOSED_STID should only be needed for actual open stateid's that are actually closed. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-02-05 17:13:17 -05:00
Trond Myklebust	4f1764172a	nfsd: Detect unhashed stids in nfsd4_verify_open_stid() The state of the stid is guaranteed by 2 locks: - The nfs4_client 'cl_lock' spinlock - The nfs4_ol_stateid 'st_mutex' mutex so it is quite possible for the stid to be unhashed after lookup, but before calling nfsd4_lock_ol_stateid(). So we do need to check for a zero value for 'sc_type' in nfsd4_verify_open_stid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Checuk Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Fixes: `659aefb68e` "nfsd: Ensure we don't recognise lock stateids..." Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2018-02-05 17:13:16 -05:00
Vasily Averin	81833de1a4	lockd: fix "list_add double add" caused by legacy signal interface restart_grace() uses hardcoded init_net. It can cause to "list_add double add" in following scenario: 1) nfsd and lockd was started in several net namespaces 2) nfsd in init_net was stopped (lockd was not stopped because it have users from another net namespaces) 3) lockd got signal, called restart_grace() -> set_grace_period() and enabled lock_manager in hardcoded init_net. 4) nfsd in init_net is started again, its lockd_up() calls set_grace_period() and tries to add lock_manager into init_net 2nd time. Jeff Layton suggest: "Make it safe to call locks_start_grace multiple times on the same lock_manager. If it's already on the global grace_list, then don't try to add it again. (But we don't intentionally add twice, so for now we WARN about that case.) With this change, we also need to ensure that the nfsd4 lock manager initializes the list before we call locks_start_grace. While we're at it, move the rest of the nfsd_net initialization into nfs4_state_create_net. I see no reason to have it spread over two functions like it is today." Suggested patch was updated to generate warning in described situation. Suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:11 -05:00
Andrew Elble	ae254dac72	nfsd: check for use of the closed special stateid Prevent the use of the closed (invalid) special stateid by clients. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:11 -05:00
Naofumi Honda	64ebe12494	nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat From kernel 4.9, my two nfsv4 servers sometimes suffer from "panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat(). These panics diseappear if we revert the commit "nfsd: add a LRU list for blocked locks". The cause appears to be a typo in nfs4_laundromat(), which is also present in nfs4_state_shutdown_net(). Cc: stable@vger.kernel.org Fixes: `7919d0a27f` "nfsd: add a LRU list for blocked locks" Cc: jlayton@redhat.com Reveiwed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:11 -05:00
Andrew Elble	4f34bd0540	nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class The use of the st_mutex has been confusing the validator. Use the proper nested notation so as to not produce warnings. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	03da3169c6	nfsd: Fix races with check_stateid_generation() The various functions that call check_stateid_generation() in order to compare a client-supplied stateid with the nfs4_stid state, usually need to atomically check for closed state. Those that perform the check after locking the st_mutex using nfsd4_lock_ol_stateid() should now be OK, but we do want to fix up the others. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	9271d7e509	nfsd: Ensure we check stateid validity in the seqid operation checks After taking the stateid st_mutex, we want to know that the stateid still represents valid state before performing any non-idempotent actions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	beeca19cf1	nfsd: Fix race in lock stateid creation If we're looking up a new lock state, and the creation fails, then we want to unhash it, just like we do for OPEN. However in order to do so, we need to that no other LOCK requests can grab the mutex until we have unhashed it (and marked it as closed). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	fd1fd685b3	nfsd4: move find_lock_stateid Trivial cleanup to simplify following patch. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	659aefb68e	nfsd: Ensure we don't recognise lock stateids after freeing them In order to deal with lookup races, nfsd4_free_lock_stateid() needs to be able to signal to other stateful functions that the lock stateid is no longer valid. Right now, nfsd_lock() will check whether or not an existing stateid is still hashed, but only in the "new lock" path. To ensure the stateid invalidation is also recognised by the "existing lock" path, and also by a second call to nfsd4_free_lock_stateid() itself, we can change the type to NFS4_CLOSED_STID under the stp->st_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	fb500a7cfe	nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00
Trond Myklebust	d8a1a00055	nfsd: Fix another OPEN stateid race If nfsd4_process_open2() is initialising a new stateid, and yet the call to nfs4_get_vfs_file() fails for some reason, then we must declare the stateid closed, and unhash it before dropping the mutex. Right now, we unhash the stateid after dropping the mutex, and without changing the stateid type, meaning that another OPEN could theoretically look it up and attempt to use it. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-11-27 16:45:10 -05:00

1 2 3 4 5 ...

1223 Commits