Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.
In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:
git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'
drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.
No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.
[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
I don't understand this code well, but I'm seeing a warning about a
still-referenced inode on unmount, and every other similar filesystem
does a dput() here.
Fixes: e8a79fb14f ("nfsd: add nfsd/clients directory")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't drop the reference on the nfsdfs filesystem with
mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called
until the nfsd module's unloaded, and we can't unload the module as long
as there's a reference on nfsdfs. So this prevents module unloading.
Fixes: 2c830dd720 ("nfsd: persist nfsd filesystem across mounts")
Reported-and-Tested-by: Luo Xiaogang <lxgrxd@163.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fs/nfsd/nfsctl.c:256: warning: Function parameter or member 'file' not described in 'write_unlock_ip'
fs/nfsd/nfsctl.c:256: warning: Function parameter or member 'buf' not described in 'write_unlock_ip'
fs/nfsd/nfsctl.c:256: warning: Function parameter or member 'size' not described in 'write_unlock_ip'
fs/nfsd/nfsctl.c:295: warning: Function parameter or member 'file' not described in 'write_unlock_fs'
fs/nfsd/nfsctl.c:295: warning: Function parameter or member 'buf' not described in 'write_unlock_fs'
fs/nfsd/nfsctl.c:295: warning: Function parameter or member 'size' not described in 'write_unlock_fs'
fs/nfsd/nfsctl.c:352: warning: Function parameter or member 'file' not described in 'write_filehandle'
fs/nfsd/nfsctl.c:352: warning: Function parameter or member 'buf' not described in 'write_filehandle'
fs/nfsd/nfsctl.c:352: warning: Function parameter or member 'size' not described in 'write_filehandle'
fs/nfsd/nfsctl.c:434: warning: Function parameter or member 'file' not described in 'write_threads'
fs/nfsd/nfsctl.c:434: warning: Function parameter or member 'buf' not described in 'write_threads'
fs/nfsd/nfsctl.c:434: warning: Function parameter or member 'size' not described in 'write_threads'
fs/nfsd/nfsctl.c:478: warning: Function parameter or member 'file' not described in 'write_pool_threads'
fs/nfsd/nfsctl.c:478: warning: Function parameter or member 'buf' not described in 'write_pool_threads'
fs/nfsd/nfsctl.c:478: warning: Function parameter or member 'size' not described in 'write_pool_threads'
fs/nfsd/nfsctl.c:697: warning: Function parameter or member 'file' not described in 'write_versions'
fs/nfsd/nfsctl.c:697: warning: Function parameter or member 'buf' not described in 'write_versions'
fs/nfsd/nfsctl.c:697: warning: Function parameter or member 'size' not described in 'write_versions'
fs/nfsd/nfsctl.c:858: warning: Function parameter or member 'file' not described in 'write_ports'
fs/nfsd/nfsctl.c:858: warning: Function parameter or member 'buf' not described in 'write_ports'
fs/nfsd/nfsctl.c:858: warning: Function parameter or member 'size' not described in 'write_ports'
fs/nfsd/nfsctl.c:892: warning: Function parameter or member 'file' not described in 'write_maxblksize'
fs/nfsd/nfsctl.c:892: warning: Function parameter or member 'buf' not described in 'write_maxblksize'
fs/nfsd/nfsctl.c:892: warning: Function parameter or member 'size' not described in 'write_maxblksize'
fs/nfsd/nfsctl.c:941: warning: Function parameter or member 'file' not described in 'write_maxconn'
fs/nfsd/nfsctl.c:941: warning: Function parameter or member 'buf' not described in 'write_maxconn'
fs/nfsd/nfsctl.c:941: warning: Function parameter or member 'size' not described in 'write_maxconn'
fs/nfsd/nfsctl.c:1023: warning: Function parameter or member 'file' not described in 'write_leasetime'
fs/nfsd/nfsctl.c:1023: warning: Function parameter or member 'buf' not described in 'write_leasetime'
fs/nfsd/nfsctl.c:1023: warning: Function parameter or member 'size' not described in 'write_leasetime'
fs/nfsd/nfsctl.c:1039: warning: Function parameter or member 'file' not described in 'write_gracetime'
fs/nfsd/nfsctl.c:1039: warning: Function parameter or member 'buf' not described in 'write_gracetime'
fs/nfsd/nfsctl.c:1039: warning: Function parameter or member 'size' not described in 'write_gracetime'
fs/nfsd/nfsctl.c:1094: warning: Function parameter or member 'file' not described in 'write_recoverydir'
fs/nfsd/nfsctl.c:1094: warning: Function parameter or member 'buf' not described in 'write_recoverydir'
fs/nfsd/nfsctl.c:1094: warning: Function parameter or member 'size' not described in 'write_recoverydir'
fs/nfsd/nfsctl.c:1125: warning: Function parameter or member 'file' not described in 'write_v4_end_grace'
fs/nfsd/nfsctl.c:1125: warning: Function parameter or member 'buf' not described in 'write_v4_end_grace'
fs/nfsd/nfsctl.c:1125: warning: Function parameter or member 'size' not described in 'write_v4_end_grace'
fs/nfsd/nfs4proc.c:1164: warning: Function parameter or member 'nss' not described in 'nfsd4_interssc_connect'
fs/nfsd/nfs4proc.c:1164: warning: Function parameter or member 'rqstp' not described in 'nfsd4_interssc_connect'
fs/nfsd/nfs4proc.c:1164: warning: Function parameter or member 'mount' not described in 'nfsd4_interssc_connect'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'rqstp' not described in 'nfsd4_setup_inter_ssc'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'cstate' not described in 'nfsd4_setup_inter_ssc'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'copy' not described in 'nfsd4_setup_inter_ssc'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'mount' not described in 'nfsd4_setup_inter_ssc'
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Userspace should be able to monitor nfsd/clients/ to see when clients
come and go, but we're failing to send fsnotify events.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- Server-to-server copy code from Olga. To use it, client and
both servers must have support, the target server must be able
to access the source server over NFSv4.2, and the target
server must have the inter_copy_offload_enable module
parameter set.
- Improvements and bugfixes for the new filehandle cache,
especially in the container case, from Trond
- Also from Trond, better reporting of write errors.
- Y2038 work from Arnd.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl490mAVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+HkkP/33CsYXp0wvfNrfxCY3zHRxHpfw+
T9Ownxxw0RAJc/dRluC/2PIKJ20uVqtLrplU63bMBqJn84WF7OALq9twZ79a3fVF
mvdmnZbNq9B3ncKJlT7akkEelyJCRap7NgG/oTyubE8MlPl6gKpD8c+G7XdW/uN+
r0fprQz4rW4CYCBGSHq7HusEKqY4Gw+gbyAfJ6A79TMjF1ei51PG+9c8rkIsI5CO
1TQ3gY1gSJmGf2DoF86Q9WTVb+DvRTEs+t7QkxY/Vlo+QXY8CZyu+qSxN7i/F20m
gv2GrSpQMS9DEK/ZaG6cxaH+sM18Db4KLvcl3koL6lONHDR2OafSdKLyy0I60jhO
WfDSHhfDCrAdASTjNlTPrjBrdK3gafiaJVL9vy901ZJjPaNb3EH0nMQ5bEvOBECq
TCqPcQUcbku+qUVIcFwzSK1hXQFQHNh8WIuqXvNviZIzFDoipwsHVnQK02Owj89L
R2tbZue1O8voacg/9xw3tWAT7pI+SaBb0EvJuqRxBshiZEU8kKKtMchOwSECRDcu
k4lcqC5EFW7e4EzGlr6Wx8sI5lwCapva8ccjmPXX+R/vyM81oxWGB84GqWjjwubH
3Fcok23F9rW2IQJkqgPlNj/9hAjTn2+vM13UbfMlnchGNsQ2gbkc5CDGC/J6Wwpo
tHVristV9Gu5bJym
=FxLY
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Highlights:
- Server-to-server copy code from Olga.
To use it, client and both servers must have support, the target
server must be able to access the source server over NFSv4.2, and
the target server must have the inter_copy_offload_enable module
parameter set.
- Improvements and bugfixes for the new filehandle cache, especially
in the container case, from Trond
- Also from Trond, better reporting of write errors.
- Y2038 work from Arnd"
* tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux: (55 commits)
sunrpc: expiry_time should be seconds not timeval
nfsd: make nfsd_filecache_wq variable static
nfsd4: fix double free in nfsd4_do_async_copy()
nfsd: convert file cache to use over/underflow safe refcount
nfsd: Define the file access mode enum for tracing
nfsd: Fix a perf warning
nfsd: Ensure sampling of the write verifier is atomic with the write
nfsd: Ensure sampling of the commit verifier is atomic with the commit
sunrpc: clean up cache entry add/remove from hashtable
sunrpc: Fix potential leaks in sunrpc_cache_unhash()
nfsd: Ensure exclusion between CLONE and WRITE errors
nfsd: Pass the nfsd_file as arguments to nfsd4_clone_file_range()
nfsd: Update the boot verifier on stable writes too.
nfsd: Fix stable writes
nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument
nfsd: Fix a soft lockup race in nfsd_file_mark_find_or_create()
nfsd: Reduce the number of calls to nfsd_file_gc()
nfsd: Schedule the laundrette regularly irrespective of file errors
nfsd: Remove unused constant NFSD_FILE_LRU_RESCAN
nfsd: Containerise filecache laundrette
...
A couple of time_t variables are only used to track the state of the
lease time and its expiration. The code correctly uses the 'time_after()'
macro to make this work on 32-bit architectures even beyond year 2038,
but the get_seconds() function and the time_t type itself are deprecated
as they behave inconsistently between 32-bit and 64-bit architectures
and often lead to code that is not y2038 safe.
As a minor issue, using get_seconds() leads to problems with concurrent
settimeofday() or clock_settime() calls, in the worst case timeout never
triggering after the time has been set backwards.
Change nfsd to use time64_t and ktime_get_boottime_seconds() here. This
is clearly excessive, as boottime by itself means we never go beyond 32
bits, but it does mean we handle this correctly and consistently without
having to worry about corner cases and should be no more expensive than
the previous implementation on 64-bit architectures.
The max_cb_time() function gets changed in order to avoid an expensive
64-bit division operation, but as the lease time is at most one hour,
there is no change in behavior.
Also do the same for server-to-server copy expiration time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[bfields@redhat.com: fix up copy expiration]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- add a new knfsd file cache, so that we don't have to open and
close on each (NFSv2/v3) READ or WRITE. This can speed up
read and write in some cases. It also replaces our readahead
cache.
- Prevent silent data loss on write errors, by treating write
errors like server reboots for the purposes of write caching,
thus forcing clients to resend their writes.
- Tweak the code that allocates sessions to be more forgiving,
so that NFSv4.1 mounts are less likely to hang when a server
already has a lot of clients.
- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should
now be limited only by the backend filesystem and the
maximum RPC size.
- Allow the server to enforce use of the correct kerberos
credentials when a client reclaims state after a reboot.
And some miscellaneous smaller bugfixes and cleanup.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl2OoFcVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+dRoP/3OW1NxPjpjbCQWZL0M+O3AYJJla
W8E+uoZKMosFEe/ymokMD0Vn5s47jPaMCifMjHZa2GygW8zHN9X2v0HURx/lob+o
/rJXwMn78N/8kdbfDz2FvaCPeT0IuNzRIFBV8/sSXofqwCBwvPo+cl0QGrd4/xLp
X35qlupx62TRk+kbdRjvv8kpS5SJ7BvR+FSA1WubNYWw2hpdEsr2OCFdGq2Wvthy
DK6AfGBXfJGsOE+HGCSj6ejRV6i0UOJ17P8gRSsx+YT0DOe5E7ROjt+qvvRwk489
wmR8Vjuqr1e40eGAUq3xuLfk5F5NgycY4ekVxk/cTVFNwWcz2DfdjXQUlyPAbrSD
SqIyxN1qdKT24gtr7AHOXUWJzBYPWDgObCVBXUGzyL81RiDdhf38HRNjL2TcSDld
tzCjQ0wbPw+iT74v6qQRY05oS+h3JOtDjU4pxsBnxVtNn4WhGJtaLfWW8o1C1QwU
bc4aX3TlYhDmzU7n7Zjt4rFXGJfyokM+o6tPao1Z60Pmsv1gOk4KQlzLtW/jPHx4
ZwYTwVQUKRDBfC62nmgqDyGI3/Qu11FuIxL2KXUCgkwDxNWN4YkwYjOGw9Lb5qKM
wFpxq6CDNZB/IWLEu8Yg85kDPPUJMoI657lZb7Osr/MfBpU0YljcMOIzLBy8uV1u
9COUbPaQipiWGu/0
=diBo
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Highlights:
- Add a new knfsd file cache, so that we don't have to open and close
on each (NFSv2/v3) READ or WRITE. This can speed up read and write
in some cases. It also replaces our readahead cache.
- Prevent silent data loss on write errors, by treating write errors
like server reboots for the purposes of write caching, thus forcing
clients to resend their writes.
- Tweak the code that allocates sessions to be more forgiving, so
that NFSv4.1 mounts are less likely to hang when a server already
has a lot of clients.
- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
limited only by the backend filesystem and the maximum RPC size.
- Allow the server to enforce use of the correct kerberos credentials
when a client reclaims state after a reboot.
And some miscellaneous smaller bugfixes and cleanup"
* tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
sunrpc: clean up indentation issue
nfsd: fix nfs read eof detection
nfsd: Make nfsd_reset_boot_verifier_locked static
nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
nfsd: handle drc over-allocation gracefully.
nfsd: add support for upcall version 2
nfsd: add a "GetVersion" upcall for nfsdcld
nfsd: Reset the boot verifier on all write I/O errors
nfsd: Don't garbage collect files that might contain write errors
nfsd: Support the server resetting the boot verifier
nfsd: nfsd_file cache entries should be per net namespace
nfsd: eliminate an unnecessary acl size limit
Deprecate nfsd fault injection
nfsd: remove duplicated include from filecache.c
nfsd: Fix the documentation for svcxdr_tmpalloc()
nfsd: Fix up some unused variable warnings
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
nfsd: rip out the raparms cache
nfsd: have nfsd_test_lock use the nfsd_file cache
nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
...
Pull vfs mount API infrastructure updates from Al Viro:
"Infrastructure bits of mount API conversions.
The rest is more of per-filesystem updates and that will happen
in separate pull requests"
* 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mtd: Provide fs_context-aware mount_mtd() replacement
vfs: Create fs_context-aware mount_bdev() replacement
new helper: get_tree_keyed()
vfs: set fs_context::user_ns for reconfigure
Add support to allow the server to reset the boot verifier in order to
force clients to resend I/O after a timeout failure.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A process could race in an open and attempt to read one of these files
before i_private is initialized, and get a spurious error.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
synchronize_rcu() gets called multiple times each time a client is
destroyed. If the laundromat thread has a lot of clients to destroy,
the delay can be noticeable. This was causing pynfs test RENEW3 to
fail.
We could embed an rcu_head in each inode and do the kref_put in an rcu
callback. But simplest is just to take a lock here.
(I also wonder if the laundromat thread would be better replaced by a
bunch of scheduled work or timers or something.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
syzbot is reporting that nfsd_mkdir() forgot to remove dentry created by
d_alloc_name() when __nfsd_mkdir() failed (due to memory allocation fault
injection) [1].
[1] https://syzkaller.appspot.com/bug?id=ce41a1f769ea4637ebffedf004a803e8405b4674
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+2c95195d5d433f6ed6cb@syzkaller.appspotmail.com>
Fixes: e8a79fb14f ("nfsd: add nfsd/clients directory")
[bfields: clean up in nfsd_mkdir instead of __nfsd_mkdir]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull vfs mount updates from Al Viro:
"The first part of mount updates.
Convert filesystems to use the new mount API"
* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
mnt_init(): call shmem_init() unconditionally
constify ksys_mount() string arguments
don't bother with registering rootfs
init_rootfs(): don't bother with init_ramfs_fs()
vfs: Convert smackfs to use the new mount API
vfs: Convert selinuxfs to use the new mount API
vfs: Convert securityfs to use the new mount API
vfs: Convert apparmorfs to use the new mount API
vfs: Convert openpromfs to use the new mount API
vfs: Convert xenfs to use the new mount API
vfs: Convert gadgetfs to use the new mount API
vfs: Convert oprofilefs to use the new mount API
vfs: Convert ibmasmfs to use the new mount API
vfs: Convert qib_fs/ipathfs to use the new mount API
vfs: Convert efivarfs to use the new mount API
vfs: Convert configfs to use the new mount API
vfs: Convert binfmt_misc to use the new mount API
convenience helper: get_tree_single()
convenience helper get_tree_nodev()
vfs: Kill sget_userns()
...
Here is the "big" driver core and debugfs changes for 5.3-rc1
It's a lot of different patches, all across the tree due to some api
changes and lots of debugfs cleanups. Because of this, there is going
to be some merge issues with your tree at the moment, I'll follow up
with the expected resolutions to make it easier for you.
Other than the debugfs cleanups, in this set of changes we have:
- bus iteration function cleanups (will cause build warnings
with s390 and coresight drivers in your tree)
- scripts/get_abi.pl tool to display and parse Documentation/ABI
entries in a simple way
- cleanups to Documenatation/ABI/ entries to make them parse
easier due to typos and other minor things
- default_attrs use for some ktype users
- driver model documentation file conversions to .rst
- compressed firmware file loading
- deferred probe fixes
All of these have been in linux-next for a while, with a bunch of merge
issues that Stephen has been patient with me for. Other than the merge
issues, functionality is working properly in linux-next :)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXSgpnQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykcwgCfS30OR4JmwZydWGJ7zK/cHqk+KjsAnjOxjC1K
LpRyb3zX29oChFaZkc5a
=XrEZ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the "big" driver core and debugfs changes for 5.3-rc1
It's a lot of different patches, all across the tree due to some api
changes and lots of debugfs cleanups.
Other than the debugfs cleanups, in this set of changes we have:
- bus iteration function cleanups
- scripts/get_abi.pl tool to display and parse Documentation/ABI
entries in a simple way
- cleanups to Documenatation/ABI/ entries to make them parse easier
due to typos and other minor things
- default_attrs use for some ktype users
- driver model documentation file conversions to .rst
- compressed firmware file loading
- deferred probe fixes
All of these have been in linux-next for a while, with a bunch of
merge issues that Stephen has been patient with me for"
* tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits)
debugfs: make error message a bit more verbose
orangefs: fix build warning from debugfs cleanup patch
ubifs: fix build warning after debugfs cleanup patch
driver: core: Allow subsystems to continue deferring probe
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
arch_topology: Remove error messages on out-of-memory conditions
lib: notifier-error-inject: no need to check return value of debugfs_create functions
swiotlb: no need to check return value of debugfs_create functions
ceph: no need to check return value of debugfs_create functions
sunrpc: no need to check return value of debugfs_create functions
ubifs: no need to check return value of debugfs_create functions
orangefs: no need to check return value of debugfs_create functions
nfsd: no need to check return value of debugfs_create functions
lib: 842: no need to check return value of debugfs_create functions
debugfs: provide pr_fmt() macro
debugfs: log errors when something goes wrong
drivers: s390/cio: Fix compilation warning about const qualifiers
drivers: Add generic helper to match by of_node
driver_find_device: Unify the match function with class_find_device()
bus_find_device: Unify the match callback with class_find_device
...
Fix sparse warning:
fs/nfsd/nfsctl.c:1221:22: warning:
symbol '__get_nfsdfs_client' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We want clientid's on the wire to be randomized for reasons explained in
ebd7c72c63 "nfsd: randomize SETCLIENTID reply to help distinguish
servers". But I'd rather have mostly small integers for the clients/
directory.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Keep around one internal mount of the nfsd filesystem so that we can add
stuff to it when clients come and go, regardless of whether anyone has
it mounted.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfsd duplicate reply cache should not be shared between network
namespaces.
The most straightforward way to fix this is just to move every global in
the code to per-net-namespace memory, so that's what we do.
Still todo: sort out which members of nfsd_stats should be global and
which per-net-namespace.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20190612152603.GB18440@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Convert the nfsctl filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "J. Bruce Fields" <bfields@fieldses.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-nfs@vger.kernel.org
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to be able to interpret uids and gids correctly in knfsd, we
should cache the user namespace of the process that created the RPC
server's listener. To do so, we refcount the credential of that process.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Support use of the --nfs-version/--no-nfs-version arguments to rpc.nfsd
in containers.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When using nfsdcld for NFSv4 client tracking, track the number of
RECLAIM_COMPLETE operations we receive from "known" clients to help in
deciding if we can lift the grace period early (or whether we need to
start a v4 grace period at all).
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
that could artificially limit NFSv4.1 performance by limiting the number
of oustanding rpcs from a single client. Neil Brown also gets a special
mention for fixing a 14.5-year-old memory-corruption bug in the encoding
of NFSv3 readdir responses.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJch/d7AAoJECebzXlCjuG+VzkP/0AlSz0+vC/65oyqxAtvsn4z
xxp6p+PcjuP7NhbImA9zO0d27rpkVFiY0vIArbcwI4R/PsqjRu8kW+dYzK2QAkY2
pzlV2aXxB9es16Vit5M+rvM4nwSsiYdnmN7kd4ntr8DKxtzePoCSL0/kGnbVE8tt
bnI5VhbWdd2kekR8xjRFhbHEQm2X6238K4Rv0wRc2o117GSBZCaHh7sgAdZ8BfFt
lJET9iOGagrusVWP0Cy6+Jomg0SzyPjcwn6WuBM7MRyn9Viso+ZEOkvnxVtlX+uZ
Z4GjbfAxC+7PKajovEW+/zIOjFgHYwNpQEsqGWAALsXXRj9vLgb0+a1DRWFCAj5n
KkWxPMcqqxpinhluLjw+r8U90KaA3DqewJdmmSol0IjM+mvVjhUuzhvCaMOhTxJM
WG3YZKhZzurQyp+q79xl2MycZ1QWDEDfEf7LD7MAfL88MSVbmYTM/744B4A61LZ4
Ap9p8Jy5LeT6SgSWUh9QBxMIrab4dY0L4xUs0e0s9VE62oS3R62BojTzXSXcztNI
UPwabKhQHnsNMAZoPlMoG5UjtCFcTHM24+NDYveWyi55dBwDq8q0FnAlKSevE1pi
BC3WtWtnLeL1YgAMH7TLjpTUcl6jVZLHuvELW/2Lz5o0zyo8qlgF3sC7eOtGNB4j
LQYmnf67aG+dGvZ91YPc
=ntUa
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux
Pull NFS server updates from Bruce Fields:
"Miscellaneous NFS server fixes.
Probably the most visible bug is one that could artificially limit
NFSv4.1 performance by limiting the number of oustanding rpcs from a
single client.
Neil Brown also gets a special mention for fixing a 14.5-year-old
memory-corruption bug in the encoding of NFSv3 readdir responses"
* tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux:
nfsd: allow nfsv3 readdir request to be larger.
nfsd: fix wrong check in write_v4_end_grace()
nfsd: fix memory corruption caused by readdir
nfsd: fix performance-limiting session calculation
svcrpc: fix UDP on servers with lots of threads
svcrdma: Remove syslog warnings in work completion handlers
svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled
svcrdma: Use struct_size() in kmalloc()
svcrpc: fix unlikely races preventing queueing of sockets
svcrpc: svc_xprt_has_something_to_do seems a little long
SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
nfsd: fix an IS_ERR() vs NULL check
Commit 62a063b8e7 "nfsd4: fix crash on writing v4_end_grace before
nfsd startup" is trying to fix a NULL dereference issue, but it
mistakenly checks if the nfsd server is started. So fix it.
Fixes: 62a063b8e7 "nfsd4: fix crash on writing v4_end_grace before nfsd startup"
Cc: stable@vger.kernel.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit d6ebf5088f.
I forgot that the kernel's default lease period should never be
decreased!
After a kernel upgrade, the kernel has no way of knowing on its own what
the previous lease time was. Unless userspace tells it otherwise, it
will assume the previous lease period was the same.
So if we decrease this value in a kernel upgrade, we end up enforcing a
grace period that's too short, and clients will fail to reclaim state in
time. Symptoms may include EIO and log messages like "NFS:
nfs4_reclaim_open_state: Lock reclaim failed!"
There was no real justification for the lease period decrease anyway.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Fixes: d6ebf5088f "nfsd4: return default lease period"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Anatoly Trosinenko reports that this:
1) Checkout fresh master Linux branch (tested with commit e195ca6cb)
2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build
3) From `kvm-xfstests shell`:
results in NULL dereference in locks_end_grace.
Check that nfsd has been started before trying to end the grace period.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Upon receiving a request for async copy, create a new kthread. If we
get asynchronous request, make sure to copy the needed arguments/state
from the stack before starting the copy. Then start the thread and reply
back to the client indicating copy is asynchronous.
nfsd_copy_file_range() will copy in a loop over the total number of
bytes is needed to copy. In case a failure happens in the middle, we
ignore the error and return how much we copied so far. Once done
creating a workitem for the callback workqueue and send CB_OFFLOAD with
the results.
The lifetime of the copy stateid is bound to the vfs copy. This way we
don't need to keep the nfsd_net structure for the callback. We could
keep it around longer so that an OFFLOAD_STATUS that came late would
still get results, but clients should be able to deal without that.
We handle OFFLOAD_CANCEL by sending a signal to the copy thread and
calling kthread_stop.
A client should cancel any ongoing copies before calling DESTROY_CLIENT;
if not, we return a CLIENT_BUSY error.
If the client is destroyed for some other reason (lease expiration, or
server shutdown), we must clean up any ongoing copies ourselves.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
[colin.king@canonical.com: fix leak in error case]
[bfields@fieldses.org: remove signalling, merge patches]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
write_op[] is never modified, so make it 'const'.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I don't have a good rationale for the lease period, but 90 seconds seems
long, and as long as we're allowing the server to extend the grace
period up to double the lease period, let's half the default to 45.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the client is only renewing state a little sooner than once a lease
period, then it might not discover the server has restarted till close
to the end of the grace period, and might run out of time to do the
actual reclaim.
Extend the grace period by a second each time we notice there are
clients still trying to reclaim, up to a limit of another whole lease
period.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex,
which can be changed during execution of notifiers and crash the host.
Moreover if notifiers were enabled in one net namespace they are enabled
in all other net namespaces, from creation until destruction.
This patch allows notifiers to access nn->nfsd_serv only after the
pointer is correctly initialized and delays cleanup until notifiers are
no longer in use.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory. Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Prior to
e35659f1b0 ("NFSD: correctly range-check v4.x minor version when setting versions.")
v4.0 could not be disabled without disabling all NFSv4 protocols.
So the 'versions' file contained ±4 ±4.1 ±4.2.
Writing "-4" would disable all v4 completely. Writing +4 would enabled those
minor versions that are currently enabled, either by default or otherwise.
After that commit, it was possible to disable v4.0 independently. To
maximize backward compatibility with use cases which never disabled
v4.0, the "versions" file would never contain "+4.0" - that was implied
by "+4", unless explicitly negated by "-4.0".
This introduced an inconsistency in that it was possible to disable all
minor versions, but still have the major version advertised.
e.g. "-4.0 -4.1 -4.2 +4" would result in NFSv4 support being advertised,
but all attempts to use it rejected.
Commit
d3635ff07e ("nfsd: fix configuration of supported minor versions")
and following removed this inconsistency. If all minor version were disabled,
the major would be disabled too. If any minor was enabled, the major would be
disabled.
This patch also treated "+4" as equivalent to "+4.0" and "-4" as "-4.0".
A consequence of this is that writing "-4" would only disable 4.0.
This is a regression against the earlier behaviour, in a use case that rpc.nfsd
actually uses.
The command "rpc.nfsd -N 4" will write "+2 +3 -4" to the versions files.
Previously, that would disable v4 completely. Now it will only disable v4.0.
Also "4.0" never appears in the "versions" file when read.
So if only v4.1 is available, the previous kernel would have reported
"+4 -4.0 +4.1 -4.2" the current kernel reports "-4 +4.1 -4.2" which
could easily confuse.
This patch restores the implication that "+4" and "-4" apply more
globals and do not imply "4.0".
Specifically:
writing "-4" will disable all 4.x minor versions.
writing "+4" will enable all 4.1 minor version if none are currently enabled.
rpc.nfsd will list minor versions before major versions, so
rpc.nfsd -V 4.2 -N 4.1
will write "-4.1 +4.2 +2 +3 +4"
so it would be a regression for "+4" to enable always all versions.
reading "-4" implies that no v4.x are enabled
reading "+4" implies that some v4.x are enabled, and that v4.0 is enabled unless
"-4.0" is also present. All other minor versions will explicitly be listed.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current display code assumes that v4 minor version 0 is tracked by
the call to nfsd_vers(). Now it is tracked by nfsd_minorversion(), and
so we need to adjust the display code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When the user turns off all minor versions of NFSv4, that should be
equivalent to turning off NFSv4 support, so a mount attempt using NFSv4
should get RPC_PROG_MISMATCH, not NFSERR_MINOR_VERS_MISMATCH.
Allow the user to use either '4.0' or '4' to enable or disable minor
version 0. Other minor versions are still enabled or disabled using the
'4.x' format.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Writing to /proc/fs/nfsd/versions allows individual major versions
and NFSv4 minor versions to be enabled or disabled.
However NFSv4.0 cannot currently be disabled, thought there is no good reason.
Also the minor number is parsed as a 'long' but used as an 'int'
so '4294967297' will be incorrectly treated as '1'.
This patch removes the test on 'minor == 0' and switches to kstrtouint()
to get correct range checking.
When reading from /proc/fs/nfsd/versions, 4.0 is current not reported.
To allow the disabling for v4.0 to be visible, while maintaining
backward compatibility, change code to report "-4.0" if appropriate, but
not "+4.0".
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
that makes ACL inheritance a little more useful in environments that
default to restrictive umasks. Requires client-side support, also on
its way for 4.10.
Other than that, miscellaneous smaller fixes and cleanup, especially to
the server rdma code.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYVAEqAAoJECebzXlCjuG+VM0QAKaR+ibSM31Ahpnrgit5/wrb
n630KDFztO7iqEeuHfPQ4/n05T2QR0JWsLpjLMFvx88Gy4gyXYk9cuDPIrNKX1IS
3/nnhBo0+EVnjODjufommCrtbPZlqOSsS3N03vWkB7rTi8QYsWBOThh+XLRJYOXo
LZzJE1WmXNeCXV1kXPBsauryywql1fmwTXBzmIf1HbzoGAVROMEA2qqh4Z3nb7BP
sJuGchWx0STBOuAa278ighXQPUW2lUft9uzw2bssOtMwfNyOs/Pd6nx4F1Lg6WwD
1UQXoiR8K3PqelZfoeFJ05v0css/sbNKep+huWRdOXZj3Kjpa20lKBX8xHfat7sN
1OQ4FHx8ToigX3c+wwtlCqRMCcIxqUYkRjqzPHyeBiSSSp0rLrId44rI5x/K0yay
3bkGw7hFDSzc0Nq2uZgmtlbyTC71hLNhkWe7ThofcVG/pS0JtAqBiKIVwXJPh/e0
PLmVHYGU6Xowjag5edJlXY1tlIlxtWfqsWUarCXS5bfKUa3UjMVSjyuljsDqqJsn
96fEWu7DiUo4HeGYmf8MJoeZYV2y0DKSQGeguVkUKWp2DoTzinQHTfdKvrZVwNuu
hVE9/QeWzUvPY13HOUaKD2skozhbUChqv0NHESKUv8gxE3svTEpYZkXrE74WNqMk
l/WXAhw+RdKZof4+qdjU
=JANY
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"The one new feature is support for a new NFSv4.2 mode_umask attribute
that makes ACL inheritance a little more useful in environments that
default to restrictive umasks. Requires client-side support, also on
its way for 4.10.
Other than that, miscellaneous smaller fixes and cleanup, especially
to the server rdma code"
[ The client side of the umask attribute was merged yesterday ]
* tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux:
nfsd: add support for the umask attribute
sunrpc: use DEFINE_SPINLOCK()
svcrdma: Further clean-up of svc_rdma_get_inv_rkey()
svcrdma: Break up dprintk format in svc_rdma_accept()
svcrdma: Remove unused variable in rdma_copy_tail()
svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
svcrdma: Remove svc_rdma_op_ctxt::wc_status
svcrdma: Remove DMA map accounting
svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
svcrdma: Renovate sendto chunk list parsing
svcauth_gss: Close connection when dropping an incoming message
svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
nfsd: constify reply_cache_stats_operations structure
nfsd: update workqueue creation
sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code
nfsd: catch errors in decode_fattr earlier
nfsd: clean up supported attribute handling
nfsd: fix error handling for clients that fail to return the layout
nfsd: more robust allocation failure handling in nfsd_reply_cache_init
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.
2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.
"int" being used as an array index needs to be sign-extended
to 64-bit before being used.
void f(long *p, int i)
{
g(p[i]);
}
roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call g
MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.
Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:
static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}
And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]
However, overall balance is in negative direction:
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
reply_cache_stats_operations, of type struct file_operations, is never
modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4.1 has built-in trunking support that allows a client to determine
whether two connections to two different IP addresses are actually to
the same server. NFSv4.0 does not, but RFC 7931 attempts to provide
clients a means to do this, basically by performing a SETCLIENTID to one
address and confirming it with a SETCLIENTID_CONFIRM to the other.
Linux clients since 05f4c350ee "NFS: Discover NFSv4 server trunking
when mounting" implement a variation on this suggestion. It is possible
that other clients do too.
This depends on the clientid and verifier not being accepted by an
unrelated server. Since both are 64-bit values, that would be very
unlikely if they were random numbers. But they aren't:
knfsd generates the 64-bit clientid by concatenating the 32-bit boot
time (in seconds) and a counter. This makes collisions between
clientids generated by the same server extremely unlikely. But
collisions are very likely between clientids generated by servers that
boot at the same time, and it's quite common for multiple servers to
boot at the same time. The verifier is a concatenation of the
SETCLIENTID time (in seconds) and a counter, so again collisions between
different servers are likely if multiple SETCLIENTIDs are done at the
same time, which is a common case.
Therefore recent NFSv4.0 clients may decide two different servers are
really the same, and mount a filesystem from the wrong server.
Fortunately the Linux client, since 55b9df93dd "nfsv4/v4.1: Verify the
client owner id during trunking detection", only does this when given
the non-default "migration" mount option.
The fault is really with RFC 7931, and needs a client fix, but in the
meantime we can mitigate the chance of these collisions by randomizing
the starting value of the counters used to generate clientids and
verifiers.
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull userns vfs updates from Eric Biederman:
"This tree contains some very long awaited work on generalizing the
user namespace support for mounting filesystems to include filesystems
with a backing store. The real world target is fuse but the goal is
to update the vfs to allow any filesystem to be supported. This
patchset is based on a lot of code review and testing to approach that
goal.
While looking at what is needed to support the fuse filesystem it
became clear that there were things like xattrs for security modules
that needed special treatment. That the resolution of those concerns
would not be fuse specific. That sorting out these general issues
made most sense at the generic level, where the right people could be
drawn into the conversation, and the issues could be solved for
everyone.
At a high level what this patchset does a couple of simple things:
- Add a user namespace owner (s_user_ns) to struct super_block.
- Teach the vfs to handle filesystem uids and gids not mapping into
to kuids and kgids and being reported as INVALID_UID and
INVALID_GID in vfs data structures.
By assigning a user namespace owner filesystems that are mounted with
only user namespace privilege can be detected. This allows security
modules and the like to know which mounts may not be trusted. This
also allows the set of uids and gids that are communicated to the
filesystem to be capped at the set of kuids and kgids that are in the
owning user namespace of the filesystem.
One of the crazier corner casees this handles is the case of inodes
whose i_uid or i_gid are not mapped into the vfs. Most of the code
simply doesn't care but it is easy to confuse the inode writeback path
so no operation that could cause an inode write-back is permitted for
such inodes (aka only reads are allowed).
This set of changes starts out by cleaning up the code paths involved
in user namespace permirted mounts. Then when things are clean enough
adds code that cleanly sets s_user_ns. Then additional restrictions
are added that are possible now that the filesystem superblock
contains owner information.
These changes should not affect anyone in practice, but there are some
parts of these restrictions that are changes in behavior.
- Andy's restriction on suid executables that does not honor the
suid bit when the path is from another mount namespace (think
/proc/[pid]/fd/) or when the filesystem was mounted by a less
privileged user.
- The replacement of the user namespace implicit setting of MNT_NODEV
with implicitly setting SB_I_NODEV on the filesystem superblock
instead.
Using SB_I_NODEV is a stronger form that happens to make this state
user invisible. The user visibility can be managed but it caused
problems when it was introduced from applications reasonably
expecting mount flags to be what they were set to.
There is a little bit of work remaining before it is safe to support
mounting filesystems with backing store in user namespaces, beyond
what is in this set of changes.
- Verifying the mounter has permission to read/write the block device
during mount.
- Teaching the integrity modules IMA and EVM to handle filesystems
mounted with only user namespace root and to reduce trust in their
security xattrs accordingly.
- Capturing the mounters credentials and using that for permission
checks in d_automount and the like. (Given that overlayfs already
does this, and we need the work in d_automount it make sense to
generalize this case).
Furthermore there are a few changes that are on the wishlist:
- Get all filesystems supporting posix acls using the generic posix
acls so that posix_acl_fix_xattr_from_user and
posix_acl_fix_xattr_to_user may be removed. [Maintainability]
- Reducing the permission checks in places such as remount to allow
the superblock owner to perform them.
- Allowing the superblock owner to chown files with unmapped uids and
gids to something that is mapped so the files may be treated
normally.
I am not considering even obvious relaxations of permission checks
until it is clear there are no more corner cases that need to be
locked down and handled generically.
Many thanks to Seth Forshee who kept this code alive, and putting up
with me rewriting substantial portions of what he did to handle more
corner cases, and for his diligent testing and reviewing of my
changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
fs: Call d_automount with the filesystems creds
fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
evm: Translate user/group ids relative to s_user_ns when computing HMAC
dquot: For now explicitly don't support filesystems outside of init_user_ns
quota: Handle quota data stored in s_user_ns in quota_setxquota
quota: Ensure qids map to the filesystem
vfs: Don't create inodes with a uid or gid unknown to the vfs
vfs: Don't modify inodes with a uid or gid unknown to the vfs
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
vfs: Verify acls are valid within superblock's s_user_ns.
userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
fs: Refuse uid/gid changes which don't map into s_user_ns
selinux: Add support for unprivileged mounts from user namespaces
Smack: Handle labels consistently in untrusted mounts
Smack: Add support for unprivileged mounts from user namespaces
fs: Treat foreign mounts as nosuid
fs: Limit file caps to the user namespace of the super block
userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
userns: Remove implicit MNT_NODEV fragility.
...
Today what is normally called data (the mount options) is not passed
to fill_super through mount_ns.
Pass the mount options and the namespace separately to mount_ns so
that filesystems such as proc that have mount options, can use
mount_ns.
Pass the user namespace to mount_ns so that the standard permission
check that verifies the mounter has permissions over the namespace can
be performed in mount_ns instead of in each filesystems .mount method.
Thus removing the duplication between mqueuefs and proc in terms of
permission checks. The extra permission check does not currently
affect the rpc_pipefs filesystem and the nfsd filesystem as those
filesystems do not currently allow unprivileged mounts. Without
unpvileged mounts it is guaranteed that the caller has already passed
capable(CAP_SYS_ADMIN) which guarantees extra permission check will
pass.
Update rpc_pipefs and the nfsd filesystem to ensure that the network
namespace reference is always taken in fill_super and always put in kill_sb
so that the logic is simpler and so that errors originating inside of
fill_super do not cause a network namespace leak.
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
it's not needed for file_operations of inodes located on fs defined
in the hosting module and for file_operations that go into procfs.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...)
in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id.
The following was observed on a MIPS 32-core processor:
kernel: Call Trace:
kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd]
kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8
kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70
kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0
kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0
kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8
kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0
kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28
kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8
kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68
kernel:
kernel:
Code: 0040f809 00000000 2e020001 <00020336> 3c12c00d
3c02801a de100000 6442eb98 0040f809
kernel: ---[ end trace 7471374335809536 ]---
Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
registering rpc_pipefs_event(...) with the notifier chain.
Signed-off-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Signed-off-by: Lorenzo Restelli <lorenzo.restelli.ext@nokia.com>
Reviewed-by: Kinlong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.
Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.
The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.
Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
My static checker complains that if "len == remaining" then it means we
have truncated the last character off the version string.
The intent of the code is that we print as many versions as we can
without truncating a version. Then we put a newline at the end. If the
newline can't fit we return -EINVAL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow a privileged userland process to end the v4 grace period early.
Writing "Y", "y", or "1" to the file will cause the v4 grace period to
be lifted. The basic idea with this will be to allow the userland
client tracking program to lift the grace period once it knows that no
more clients will be reclaiming state.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Currently, the maximum number of connections that nfsd will allow
is based on the number of threads spawned. While this is fine for a
default, there really isn't a clear relationship between the two.
The number of threads corresponds to the number of concurrent requests
that we want to allow the server to process at any given time. The
connection limit corresponds to the maximum number of clients that we
want to allow the server to handle. These are two entirely different
quantities.
Break the dependency on increasing threads in order to allow for more
connections, by adding a new per-net parameter that can be set to a
non-zero value. The default is still to base it on the number of threads,
so there should be no behavior change for anyone who doesn't use it.
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There could be a case, when NFSd file system is mounted in network, different
to socket's one, like below:
"ip netns exec" creates new network and mount namespace, which duplicates NFSd
mount point, created in init_net context. And thus NFS server stop in nested
network context leads to RPCBIND client destruction in init_net.
Then, on NFSd start in nested network context, rpc.nfsd process creates socket
in nested net and passes it into "write_ports", which leads to RPCBIND sockets
creation in init_net context because of the same reason (NFSd monut point was
created in init_net context). An attempt to register passed socket in nested
net leads to panic, because no RPCBIND client present in nexted network
namespace.
This patch add check that passed socket's net matches NFSd superblock's one.
And returns -EINVAL error to user psace otherwise.
v2: Put socket on exit.
Reported-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd changes from J Bruce Fields:
"Highlights include:
- Some more DRC cleanup and performance work from Jeff Layton
- A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the
server using credentials from Active Directory often fail due to
limitations of the svcgssd upcall interface. This replacement
lifts those limitations. The existing upcall is still supported
for backwards compatibility.
- More NFSv4.1 support: at this point, if a user with a current
client who upgrades from 4.0 to 4.1 should see no regressions. In
theory we do everything a 4.1 server is required to do. Patches
for a couple minor exceptions are ready for 3.11, and with those
and some more testing I'd like to turn 4.1 on by default in 3.11."
Fix up semantic conflict as per Stephen Rothwell and linux-next:
Commit 030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication") adds two new users of "PDE(inode)->data", but we're
supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad
("procfs: new helper - PDE_DATA(inode)").
The old PDE() macro is no longer available since commit c30480b92c
("proc: Make the PROC_I() and PDE() macros internal to procfs")
* 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits)
NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
nfsd: make symbol nfsd_reply_cache_shrinker static
svcauth_gss: fix error return code in rsc_parse()
nfsd4: don't remap EISDIR errors in rename
svcrpc: fix gss-proxy to respect user namespaces
SUNRPC: gssp_procedures[] can be static
SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case
nfsd4: better error return to indicate SSV non-support
nfsd: fix EXDEV checking in rename
SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
SUNRPC: conditionally return endtime from import_sec_context
SUNRPC: allow disabling idle timeout
SUNRPC: attempt AF_LOCAL connect on setup
nfsd: Decode and send 64bit time values
nfsd4: put_client_renew_locked can be static
nfsd4: remove unused macro
nfsd4: remove some useless code
nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
...
when create /proc/fs/nfs/exports error, we should remove /proc/fs/nfs,
if don't do it, it maybe cause Memory leak.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Reviewed-by: chendt.fnst <chendt.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For presenting statistics relating to duplicate reply cache.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull nfsd changes from J Bruce Fields:
"Miscellaneous bugfixes, plus:
- An overhaul of the DRC cache by Jeff Layton. The main effect is
just to make it larger. This decreases the chances of intermittent
errors especially in the UDP case. But we'll need to watch for any
reports of performance regressions.
- Containerized nfsd: with some limitations, we now support
per-container nfs-service, thanks to extensive work from Stanislav
Kinsbursky over the last year."
Some notes about conflicts, since there were *two* non-data semantic
conflicts here:
- idr_remove_all() had been added by a memory leak fix, but has since
become deprecated since idr_destroy() does it for us now.
- xs_local_connect() had been added by this branch to make AF_LOCAL
connections be synchronous, but in the meantime Trond had changed the
calling convention in order to avoid a RCU dereference.
There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.
* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
SUNRPC: make AF_LOCAL connect synchronous
nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
svcrpc: fix rpc server shutdown races
svcrpc: make svc_age_temp_xprts enqueue under sv_lock
lockd: nlmclnt_reclaim(): avoid stack overflow
nfsd: enable NFSv4 state in containers
nfsd: disable usermode helper client tracker in container
nfsd: use proper net while reading "exports" file
nfsd: containerize NFSd filesystem
nfsd: fix comments on nfsd_cache_lookup
SUNRPC: move cache_detail->cache_request callback call to cache_read()
SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
SUNRPC: rework cache upcall logic
SUNRPC: introduce cache_detail->cache_request callback
NFS: simplify and clean cache library
NFS: use SUNRPC cache creation and destruction helper for DNS cache
nfsd4: free_stid can be static
nfsd: keep a checksum of the first 256 bytes of request
sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
sunrpc: fix comment in struct xdr_buf definition
...
Functuon "exports_open" is used for both "/proc/fs/nfs/exports" and
"/proc/fs/nfsd/exports" files.
Now NFSd filesystem is containerised, so proper net can be taken from
superblock for "/proc/fs/nfsd/exports" reader.
But for "/proc/fs/nfsd/exports" only current->nsproxy->net_ns can be used.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch makes NFSD file system superblock to be created per net.
This makes possible to get proper network namespace from superblock instead of
using hard-coded "init_net".
Note: NFSd fs super-block holds network namespace. This garantees, that
network namespace won't disappear from underneath of it.
This, obviously, means, that in case of kill of a container's "init" (which is not a mount
namespace, but network namespace creator) netowrk namespace won't be
destroyed.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
These routines are used by server and client code, so having them in a
separate header would be best.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current code will allow silly things like:
echo "+2 +3 +4 +7.1">/proc/fs/nfsd/versions
Reported-by: Fan Chaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch makes main step in NFSd containerisation.
There could be different approaches to how to make NFSd able to handle
incoming RPC request from different network namespaces. The two main
options are:
1) Share NFSd kthreads betwween all network namespaces.
2) Create separated pool of threads for each namespace.
While first approach looks more flexible, second one is simpler and
non-racy. This patch implements the second option.
To make it possible to allocate separate pools of threads, we have to
make it possible to allocate separate NFSd service structures per net.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There were only a small number of functions in this file and since they
all affect stored state I think it makes sense to put them in state.h
instead. I also dropped most static inline declarations since there are
no callers when fault injection is not enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Grace time is a part of NFSv4 state engine, which is constructed per network
namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Lease time is a part of NFSv4 state engine, which is constructed per network
namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is a cleanup patch. Functions nfsd_pool_stats_open() and
nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
You can use nfsd/portlist to give nfsd additional sockets to listen on.
In theory you can also remove listening sockets this way. But nobody's
ever done that as far as I can tell.
Also this was partially broken in 2.6.25, by
a217813f90 "knfsd: Support adding
transports by writing portlist file".
(Note that we decide whether to take the "delfd" case by checking for a
digit--but what's actually expected in that case is something made by
svc_one_sock_name(), which won't begin with a digit.)
So, let's just rip out this stuff.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
"port" in all these functions is always NFS_PORT.
nfsd can already be run on a nonstandard port using the "nfsd/portlist"
interface.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.
Notes:
1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.
2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
These functions will be called from per-net operations.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch also changes svcauth_unix_purge() function: added network namespace
as a parameter and thus loop over all networks was replaced by only one call
for ip map cache purge.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch also changes prototypes of nfsd_export_flush() and exp_rootfh():
network namespace parameter added.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
These functions will be called from per-net operations.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Global svc_export_cache cache is going to be replaced with per-net instance. So
prepare the ground for it.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Otherwise, we get a warning or error similar to this when building with
CONFIG_NFSD_V4 disabled:
ERROR: "nfsd4_cld_block" [fs/nfsd/nfsd.ko] undefined!
Fix this by wrapping the calls to rpc_pipefs_notifier_register and
..._unregister in another function and providing no-op replacements
when CONFIG_NFSD_V4 is disabled.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the event that rpc_pipefs isn't mounted when nfsd starts, we
must register a notifier to handle creating the dentry once it
is mounted, and to remove the dentry on unmount.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Eventually, we'll need this when nfsd gets containerized fully. For
now, create a struct on a per-net-namespace basis that will just hold
a pointer to the cld_net structure. That struct will hold all of the
per-net data that we need for the cld tracker.
Eventually we can add other pernet objects to struct nfsd_net.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Service transports are parametrized by network namespace. And thus lookup of
transport instance have to take network namespace into account.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Parametrize rpc_pton() by network context and thus force it's callers to pass
in network context instead of using hard-coded "init_net".
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: nfsd4_create_clid_dir return value is unused
NFSD: Change name of extended attribute containing junction
svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
svcrpc: fix double-free on shutdown of nfsd after changing pool mode
nfsd4: be forgiving in the absence of the recovery directory
nfsd4: fix spurious 4.1 post-reboot failures
NFSD: forget_delegations should use list_for_each_entry_safe
NFSD: Only reinitilize the recall_lru list under the recall lock
nfsd4: initialize special stateid's at compile time
NFSd: use network-namespace-aware cache registering routines
SUNRPC: create svc_xprt in proper network namespace
svcrpc: update outdated BKL comment
nfsd41: allow non-reclaim open-by-fh's in 4.1
svcrpc: avoid memory-corruption on pool shutdown
svcrpc: destroy server sockets all at once
svcrpc: make svc_delete_xprt static
nfsd: Fix oops when parsing a 0 length export
nfsd4: Use kmemdup rather than duplicating its implementation
nfsd4: add a separate (lockowner, inode) lookup
nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
...
init_nfsd() was calling free_slabs() during cleanup code, but the call
to init_slabs() was hidden in nfsd4_state_init(). This could be
confusing to people unfamiliar with the code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fault injection on the NFS server makes it easier to test the client's
state manager and recovery threads. Simulating errors on the server is
easier than finding the right conditions that cause them naturally.
This patch uses debugfs to add a simple framework for fault injection to
the server. This framework is a config option, and can be enabled
through CONFIG_NFSD_FAULT_INJECTION. Assuming you have debugfs mounted
to /sys/debug, a set of files will be created in /sys/debug/nfsd/.
Writing to any of these files will cause the corresponding action and
write a log entry to dmesg.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Some files were using the complete module.h infrastructure without
actually including the header at all. Fix them up in advance so
once the implicit presence is removed, we won't get failures like this:
CC [M] fs/nfsd/nfssvc.o
fs/nfsd/nfssvc.c: In function 'nfsd_create_serv':
fs/nfsd/nfssvc.c:335: error: 'THIS_MODULE' undeclared (first use in this function)
fs/nfsd/nfssvc.c:335: error: (Each undeclared identifier is reported only once
fs/nfsd/nfssvc.c:335: error: for each function it appears in.)
fs/nfsd/nfssvc.c: In function 'nfsd':
fs/nfsd/nfssvc.c:555: error: implicit declaration of function 'module_put_and_exit'
make[3]: *** [fs/nfsd/nfssvc.o] Error 1
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
As promised in feature-removal-schedule.txt it is time to
remove the nfsctl system call.
Userspace has perferred to not use this call throughout 2.6 and it has been
excluded in the default configuration since 2.6.36 (9 months ago).
So this patch removes all the code that was being compiled out.
There are still references to sys_nfsctl in various arch systemcall tables
and related code. These should be cleaned out too, probably in the next
merge window.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit b0b0c0a26e "nfsd: add proc file listing kernel's gss_krb5
enctypes" added an nunnecessary dependency of nfsd on the auth_rpcgss
module.
It's a little ad hoc, but since the only piece of information nfsd needs
from rpcsec_gss_krb5 is a single static string, one solution is just to
share it with an include file.
Cc: stable@kernel.org
Reported-by: Michael Guntsche <mike@it-loops.com>
Cc: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
"fs/built-in.o: In function `supported_enctypes_show':
nfsctl.c:(.text+0x7beb0): undefined reference to `gss_mech_get_by_name'
nfsctl.c:(.text+0x7bebc): undefined reference to `gss_mech_put'
"
Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a new proc file which lists the encryption types supported
by the kernel's gss_krb5 code.
Newer MIT Kerberos libraries support the assertion of acceptor
subkeys. This enctype information allows user-land (svcgssd)
to request that the Kerberos libraries limit the encryption
types that it uses when generating the subkeys.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
these pieces of code only make sense when CONFIG_NFSD_DEPRECATED enabled
Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
fs/nfsd/nfsctl.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
svcrpc: assume svc_delete_xprt() called only once
svcrpc: never clear XPT_BUSY on dead xprt
nfsd4: fix connection allocation in sequence()
nfsd4: only require krb5 principal for NFSv4.0 callbacks
nfsd4: move minorversion to client
nfsd4: delay session removal till free_client
nfsd4: separate callback change and callback probe
nfsd4: callback program number is per-session
nfsd4: track backchannel connections
nfsd4: confirm only on succesful create_session
nfsd4: make backchannel sequence number per-session
nfsd4: use client pointer to backchannel session
nfsd4: move callback setup into session init code
nfsd4: don't cache seq_misordered replies
SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
SUNRPC: Use conventional switch statement when reclassifying sockets
sunrpc/xprtrdma: clean up workqueue usage
sunrpc: Turn list_for_each-s into the ..._entry-s
...
Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
There are two calls that operate on ip_map_cache and are
directly called from the nfsd code. Other places will be
handled in a different way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The git://linux-nfs.org/~bfields/linux.git nfsd-next branch doesn't
compile when nfsd is a module with the following error:
ERROR: "get_task_comm" [fs/nfsd/nfsd.ko] undefined!
Replace the get_task_comm call with direct comm access, which is
safe for current.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add CONFIG_NFSD_DEPRECATED, default to y.
Only include deprecated interface if this is defined.
This allows distros to remove this interface before the official
removal, and allows developers to test without it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The syscall interface is has been replaced by a more flexible
interface since 2.6.0. It is time to work towards discarding
the old interface.
So add a entry in feature-removal-schedule.txt and print a warning
when the interface is used.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's harmless to set this after the server is created, but also
ineffective, since the value is only used at the time of
svc_create_pooled(). So fail the attempt, in keeping with the pattern
set by write_versions, write_{lease,grace}time and write_recoverydir.
(This could break userspace that tried to write to nfsd/max_block_size
between setting up sockets and starting the server. However, such code
wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd
in nfs-utils, probably the only user of the interface, doesn't do that.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fixes at least one real minor bug: the nfs4 recovery dir sysctl
would not return its status properly.
Also I finished Al's 1e41568d73 ("Take ima_path_check() in nfsd
past dentry_open() in nfsd_open()") commit, it moved the IMA
code, but left the old path initializer in there.
The rest is just dead code removed I think, although I was not
fully sure about the "is_borc" stuff. Some more review
would be still good.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Right now, nfsd keeps a lockd reference for each socket that it has
open. This is unnecessary and complicates the error handling on
startup and shutdown. Change it to just do a lockd_up when starting
the first nfsd thread just do a single lockd_down when taking down the
last nfsd thread. Because of the strange way the sv_count is handled
this requires an extra flag to tell whether the nfsd_serv holds a
reference for lockd or not.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
__write_ports_addxprt calls nfsd_create_serv. That increases the
refcount of nfsd_serv (which is tracked in sv_nrthreads). The service
only decrements the thread count on error, not on success like
__write_ports_addfd does, so using this interface leaves the nfsd
thread count high.
Fix this by having this function call svc_destroy() on error to release
the reference (and possibly to tear down the service) and simply
decrement the refcount without tearing down the service on success.
This makes the sv_threads handling work basically the same in both
__write_ports_addxprt and __write_ports_addfd.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The refcounting for nfsd is a little goofy. What happens is that we
create the nfsd RPC service, attach sockets to it but don't actually
start the threads until someone writes to the "threads" procfile. To do
this, __write_ports_addfd will create the nfsd service and then will
decrement the refcount when exiting but won't actually destroy the
service.
This is fine when there aren't errors, but when there are this can
cause later attempts to start nfsd to fail. nfsd_serv will be set,
and that causes __write_versions to return EBUSY.
Fix this by calling svc_destroy on nfsd_serv when this function is
going to return error.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.
- Make SHRT_MIN of type s16, not int, for consistency.
[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
The current documentation here is out of date, and not quite right.
(Future work: some user documentation would be useful.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of accessing the lease time directly, some users call
nfs4_lease_time(), and some a macro, NFSD_LEASE_TIME, defined as
nfs4_lease_time(). Neither layer of indirection serves any purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Try to create a PF_INET6 listener for NFSD, if IPv6 is enabled in the
kernel.
Make sure nfsd_serv's reference count is decreased if
__write_ports_addxprt() failed to create a listener. See
__write_ports_addfd().
Our current plan is to rely on rpc.nfsd to create appropriate IPv6
listeners when server-side NFS/IPv6 support is desired. Legacy
behavior, via the write_threads or write_svc kernel APIs, will remain
the same -- only IPv4 listeners are created.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@citi.umich.edu: Move error-handling code to end]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
write_ports() converts svc_create_xprt()'s ENOENT error return to
EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
message that makes sense.
It turns out that several of the other kernel APIs rpc.nfsd use can
also return ENOENT from svc_create_xprt(), by way of lockd_up().
On the client side, an NFSv2 or NFSv3 mount request can also return
the result of lockd_up(). This error may also be returned during an
NFSv4 mount request, since the NFSv4 callback service uses
svc_create_xprt() to create the callback listener. An ENOENT error
return results in a confusing error message from the mount command.
Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Soon we will add the new V4ROOT flag, and allow the INSECURE flag to
vary by pseudoflavor. It would be useful for nfs-utils (for example,
for improved exportfs error reporting) to be able to know when this
happens. Use this new interface for that purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.
This patch should improve the compilation speed of the nfsd module.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
lock_kernel() in knfsd was replaced with a mutex. The later
commit 03cf6c9f49 ("knfsd:
add file to export stats about nfsd pools") did not follow
that change. This patch fixes the issue.
Also move the get and put of nfsd_serv to the open and close methods
(instead of start and stop methods) to allow atomic check and increment
of reference count in the open method (where we can still return an
error).
Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Greg Banks <gnb@fmeh.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In write_failover_ip(), replace the sscanf() with a call to the common
sunrpc.ko presentation address parser.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
kmemleak produces the following warning
unreferenced object 0xc9ec02a0 (size 8):
comm "cat", pid 19048, jiffies 730243
backtrace:
[<c01bf970>] create_object+0x100/0x240
[<c01bfadb>] kmemleak_alloc+0x2b/0x60
[<c01bcd4b>] __kmalloc+0x14b/0x270
[<c02fd027>] write_pool_threads+0x87/0x1d0
[<c02fcc08>] nfsctl_transaction_write+0x58/0x70
[<c02fcc6f>] nfsctl_transaction_read+0x4f/0x60
[<c01c2574>] vfs_read+0x94/0x150
[<c01c297d>] sys_read+0x3d/0x70
[<c0102d6b>] sysenter_do_call+0x12/0x32
[<ffffffff>] 0xffffffff
write_pool_threads() only frees nthreads on error paths, in the success case
we leak it.
Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently when we write a number to 'threads' in nfsdfs,
we take the nfsd_mutex, update the number of threads, then take the
mutex again to read the number of threads.
Mostly this isn't a big deal. However if we are write '0', and
portmap happens to be dead, then we can get unpredictable behaviour.
If the nfsd threads all got killed quickly and the last thread is
waiting for portmap to respond, then the second time we take the mutex
we will block waiting for the last thread.
However if the nfsd threads didn't die quite that fast, then there
will be no contention when we try to take the mutex again.
Unpredictability isn't fun, and waiting for the last thread to exit is
pointless, so avoid taking the lock twice.
To achieve this, get nfsd_svc return a non-negative number of active
threads when not returning a negative error.
Signed-off-by: NeilBrown <neilb@suse.de>
Clean up: For consistency, handle output buffer size checking in a
other nfsctl functions the same way it's done for write_versions().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
While it's not likely today that there are enough NFS versions to
overflow the output buffer in write_versions(), we should be more
careful about detecting the end of the buffer.
The number of NFS versions will only increase as NFSv4 minor versions
are added.
Note that this API doesn't behave the same as portlist. Here we
attempt to display as many versions as will fit in the buffer, and do
not provide any indication that an overflow would have occurred. I
don't have any good rationale for that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
While it's not likely a pathname will be longer than
SIMPLE_TRANSACTION_SIZE, we should be more careful about just
plopping it into the output buffer without bounds checking.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Adjust the synopsis of svc_sock_names() to pass in the size of the
output buffer. Add a documenting comment.
This is a cosmetic change for now. A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>