2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

326 Commits

Author SHA1 Message Date
Chuck Lever
da1661b93b SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends
xprt_sock_sendmsg uses the more efficient iov_iter-enabled kernel
socket API, and is a pre-requisite for server send-side support for
TLS.

Note that svc_process no longer needs to reserve a word for the
stream record marker, since the TCP transport now provides the
record marker automatically in a separate buffer.

The dprintk() in svc_send_common is also removed. It didn't seem
crucial for field troubleshooting. If more is needed there, a trace
point could be added in xprt_sock_sendmsg().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
9e55eef4ab SUNRPC: Refactor xs_sendpages()
Re-locate xs_sendpages() so that it can be shared with server code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
412055398b nfsd: Fix NFSv4 READ on RDMA when using readv
svcrdma expects that the payload falls precisely into the xdr_buf
page vector. This does not seem to be the case for
nfsd4_encode_readv().

This code is called only when fops->splice_read is missing or when
RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
common cases.

Add new transport method: ->xpo_read_payload so that when a READ
payload does not fit exactly in rq_res's page vector, the XDR
encoder can inform the RPC transport exactly where that payload is,
without the payload's XDR pad.

That way, when a Write chunk is present, the transport knows what
byte range in the Reply message is supposed to be matched with the
chunk.

Note that the Linux NFS server implementation of NFS/RDMA can
currently handle only one Write chunk per RPC-over-RDMA message.
This simplifies the implementation of this fix.

Fixes: b042098063 ("nfsd4: allow exotic read compounds")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:31 -04:00
Thomas Gleixner
457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Trond Myklebust
4df493a260 SUNRPC: Cache the process user cred in the RPC server listener
In order to be able to interpret uids and gids correctly in knfsd, we
should cache the user namespace of the process that created the RPC
server's listener. To do so, we refcount the credential of that process.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24 09:46:35 -04:00
J. Bruce Fields
b7e5034cbe svcrpc: fix UDP on servers with lots of threads
James Pearson found that an NFS server stopped responding to UDP
requests if started with more than 1017 threads.

sv_max_mesg is about 2^20, so that is probably where the calculation
performed by

	svc_sock_setbufsize(svsk->sk_sock,
                            (serv->sv_nrthreads+3) * serv->sv_max_mesg,
                            (serv->sv_nrthreads+3) * serv->sv_max_mesg);

starts to overflow an int.

Reported-by: James Pearson <jcpearson@gmail.com>
Tested-by: James Pearson <jcpearson@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-21 10:17:36 -05:00
Linus Torvalds
43d86ee8c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several fixes here. Basically split down the line between newly
  introduced regressions and long existing problems:

   1) Double free in tipc_enable_bearer(), from Cong Wang.

   2) Many fixes to nf_conncount, from Florian Westphal.

   3) op->get_regs_len() can throw an error, check it, from Yunsheng
      Lin.

   4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
      driver, from Scott Wood.

   5) Inifnite loop in fib_empty_table(), from Yue Haibing.

   6) Use after free in ax25_fillin_cb(), from Cong Wang.

   7) Fix socket locking in nr_find_socket(), also from Cong Wang.

   8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.

   9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.

  10) Fix ptr_ring wrap during queue swap, from Cong Wang.

  11) Missing shutdown callback in hinic driver, from Xue Chaojing.

  12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
      Brivio.

  13) BPF out of bounds speculation fixes from Daniel Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  ipv6: Consider sk_bound_dev_if when binding a socket to an address
  ipv6: Fix dump of specific table with strict checking
  bpf: add various test cases to selftests
  bpf: prevent out of bounds speculation on pointer arithmetic
  bpf: fix check_map_access smin_value test when pointer contains offset
  bpf: restrict unknown scalars of mixed signed bounds for unprivileged
  bpf: restrict stack pointer arithmetic for unprivileged
  bpf: restrict map value pointer arithmetic for unprivileged
  bpf: enable access to ax register also from verifier rewrite
  bpf: move tmp variable into ax register in interpreter
  bpf: move {prev_,}insn_idx into verifier env
  isdn: fix kernel-infoleak in capi_unlocked_ioctl
  ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
  net/hamradio/6pack: use mod_timer() to rearm timers
  net-next/hinic:add shutdown callback
  net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
  ip: validate header length on virtual device xmit
  tap: call skb_probe_transport_header after setting skb->dev
  ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
  net: rds: remove unnecessary NULL check
  ...
2019-01-03 12:53:47 -08:00
Deepa Dinamani
3a0ed3e961 sock: Make sock->sk_stamp thread-safe
Al Viro mentioned (Message-ID
<20170626041334.GZ10672@ZenIV.linux.org.uk>)
that there is probably a race condition
lurking in accesses of sk_stamp on 32-bit machines.

sock->sk_stamp is of type ktime_t which is always an s64.
On a 32 bit architecture, we might run into situations of
unsafe access as the access to the field becomes non atomic.

Use seqlocks for synchronization.
This allows us to avoid using spinlocks for readers as
readers do not need mutual exclusion.

Another approach to solve this is to require sk_lock for all
modifications of the timestamps. The current approach allows
for timestamps to have their own lock: sk_stamp_lock.
This allows for the patch to not compete with already
existing critical sections, and side effects are limited
to the paths in the patch.

The addition of the new field maintains the data locality
optimizations from
commit 9115e8cd2a ("net: reorganize struct sock for better data
locality")

Note that all the instances of the sk_stamp accesses
are either through the ioctl or the syscall recvmsg.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-01 09:47:59 -08:00
Vasily Averin
64e20ba204 sunrpc: remove unused xpo_prep_reply_hdr callback
xpo_prep_reply_hdr are not used now.

It was defined for tcp transport only, however it cannot be
called indirectly, so let's move it to its caller and
remove unused callback.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27 21:01:41 -05:00
Vasily Averin
7f39154609 sunrpc: remove svc_tcp_bc_class
Remove svc_xprt_class svc_tcp_bc_class and related functions

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27 21:01:41 -05:00
Vasily Averin
a289ce5311 sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
svc_serv-> sv_bc_xprt is netns-unsafe and cannot be used as pointer.
To prevent its misuse in future it is replaced by new boolean flag.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27 21:01:41 -05:00
Vasily Averin
d4b09acf92 sunrpc: use-after-free in svc_process_common()
if node have NFSv41+ mounts inside several net namespaces
it can lead to use-after-free in svc_process_common()

svc_process_common()
        /* Setup reply header */
        rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

svc_process_common() can use incorrect rqstp->rq_xprt,
its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
The problem is that serv is global structure but sv_bc_xprt
is assigned per-netnamespace.

According to Trond, the whole "let's set up rqstp->rq_xprt
for the back channel" is nothing but a giant hack in order
to work around the fact that svc_process_common() uses it
to find the xpt_ops, and perform a couple of (meaningless
for the back channel) tests of xpt_flags.

All we really need in svc_process_common() is to be able to run
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()

Bruce J Fields points that this xpo_prep_reply_hdr() call
is an awfully roundabout way just to do "svc_putnl(resv, 0);"
in the tcp case.

This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
now it calls svc_process_common() with rqstp->rq_xprt = NULL.

To adjust reply header svc_process_common() just check
rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.

To handle rqstp->rq_xprt = NULL case in functions called from
svc_process_common() patch intruduces net namespace pointer
svc_rqst->rq_bc_net and adjust SVC_NET() definition.
Some other function was also adopted to properly handle described case.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Fixes: 23c20ecd44 ("NFS: callback up - users counting cleanup")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27 21:00:58 -05:00
Linus Torvalds
9931a07d51 Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "AFS series, with some iov_iter bits included"

* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  missing bits of "iov_iter: Separate type from direction and use accessor functions"
  afs: Probe multiple fileservers simultaneously
  afs: Fix callback handling
  afs: Eliminate the address pointer from the address list cursor
  afs: Allow dumping of server cursor on operation failure
  afs: Implement YFS support in the fs client
  afs: Expand data structure fields to support YFS
  afs: Get the target vnode in afs_rmdir() and get a callback on it
  afs: Calc callback expiry in op reply delivery
  afs: Fix FS.FetchStatus delivery from updating wrong vnode
  afs: Implement the YFS cache manager service
  afs: Remove callback details from afs_callback_break struct
  afs: Commit the status on a new file/dir/symlink
  afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
  afs: Don't invoke the server to read data beyond EOF
  afs: Add a couple of tracepoints to log I/O errors
  afs: Handle EIO from delivery function
  afs: Fix TTL on VL server and address lists
  afs: Implement VL server rotation
  afs: Improve FS server rotation error handling
  ...
2018-11-01 19:58:52 -07:00
Al Viro
78a63f1235 Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
backmerge to do fixup of iov_iter_kvec() conflict
2018-11-01 18:17:23 -04:00
Linus Torvalds
310c7585e8 Olga added support for the NFSv4.2 asynchronous copy protocol. We
already supported COPY, by copying a limited amount of data and then
 returning a short result, letting the client resend.  The asynchronous
 protocol should offer better performance at the expense of some
 complexity.
 
 The other highlight is Trond's work to convert the duplicate reply cache
 to a red-black tree, and to move it and some other server caches to RCU.
 (Previously these have meant taking global spinlocks on every RPC.)
 
 Otherwise, some RDMA work and miscellaneous bugfixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb2KWzAAoJECebzXlCjuG+gcQP/3DldB86CFxgSFx0t+h+s+TV
 CdYJDPyLyRkEMiD+4dCPPuhueve+j5BPHVsDbn98FTWrEn131NMIs6uhU/VGTtAU
 6a8f/ExtZ5U7s39MJCzlk2ozVElBc3QPp7p3p9NKn0Wi0PXbVgjuIqR5o2vwa8Si
 KOVdLm6ylfav/HTH8DO6zFPJRsTgTwcJOivXXshjpglMKAcw8AuqSsGgBrDeGpgU
 u91Vi0EM1vt96+CA6a01mTgC/sFX7EqGvxUUHOrKWf5cIjnpT3FDvouYPxi+GH8Z
 SIDlaMQyXF5m4m6MhELNTP4v97XAHyPJtvLkEe5lggTyABPiA2heo9e8onysWkzV
 1v8OZHCVFa1UL34mDlnFxbFCYVr7FFKMGjTBR/ntinobPfAbWRCO1Hdd+bBGPDD4
 byf7ctDVp7KQ2bSatIdlYavikuGDHWFDZHzPHlqkD3gpIZSNvhe26sV3NZqIFlXO
 cMUega2Y5mXmULauHhxAcNGtDK7dF5hHoMWKJy0DNxiyDiDLylwDOIfwt1De3Q7V
 ycd/wUytUS2LkAhyS2mvoDK6eXTBAeQwzmXAqveh6rewwO83HC/t9mtKBBDomvKG
 xRpRPmmbj9ijbwkilEBmijjR47wrihmEVIFahznEerZ+//QOfVVOB0MNtzIyU9/k
 CnP1ZNvOs3LR1pxxwFa8
 =TTo0
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Olga added support for the NFSv4.2 asynchronous copy protocol. We
  already supported COPY, by copying a limited amount of data and then
  returning a short result, letting the client resend. The asynchronous
  protocol should offer better performance at the expense of some
  complexity.

  The other highlight is Trond's work to convert the duplicate reply
  cache to a red-black tree, and to move it and some other server caches
  to RCU. (Previously these have meant taking global spinlocks on every
  RPC)

  Otherwise, some RDMA work and miscellaneous bugfixes"

* tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits)
  lockd: fix access beyond unterminated strings in prints
  nfsd: Fix an Oops in free_session()
  nfsd: correctly decrement odstate refcount in error path
  svcrdma: Increase the default connection credit limit
  svcrdma: Remove try_module_get from backchannel
  svcrdma: Remove ->release_rqst call in bc reply handler
  svcrdma: Reduce max_send_sges
  nfsd: fix fall-through annotations
  knfsd: Improve lookup performance in the duplicate reply cache using an rbtree
  knfsd: Further simplify the cache lookup
  knfsd: Simplify NFS duplicate replay cache
  knfsd: Remove dead code from nfsd_cache_lookup
  SUNRPC: Simplify TCP receive code
  SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock
  SUNRPC: Remove non-RCU protected lookup
  NFS: Fix up a typo in nfs_dns_ent_put
  NFS: Lockless DNS lookups
  knfsd: Lockless lookup of NFSv4 identities.
  SUNRPC: Lockless server RPCSEC_GSS context lookup
  knfsd: Allow lockless lookups of the exports
  ...
2018-10-30 13:03:29 -07:00
Trond Myklebust
4c8e5537bb SUNRPC: Simplify TCP receive code
Use the fact that the iov iterators already have functionality for
skipping a base offset.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
David Howells
aa563d7bca iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements.  This makes it easier to add further
iterator types.  Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself.  Only the direction is required.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
Trond Myklebust
75c84151a9 SUNRPC: Rename xprt->recv_lock to xprt->queue_lock
We will use the same lock to protect both the transmit and receive queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Linus Torvalds
5e4d659713 Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on
server xdr decoding (with an eye towards eliminating a data copy in the
 RDMA case).
 
 I did some refactoring of the delegation code in preparation for
 eliminating some delegation self-conflicts and implementing write
 delegations.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaxi5LAAoJECebzXlCjuG+deAQAL9NHsv6bIydkE6wX305c/bR
 gm73yryF1kfOuHmiLq15mrljiKyCEbRSPqzzM8k2TkywHhyvOLEjxHbhZnwDyQec
 DaZUzWLKNkK64UFXEvTKyNwfGObwsGQ+QLkV7N9mF3Ps9M9/u2vMHKQypvA9hJ7z
 DGN7MO7Ud7N0Viu03vp4m+p7gypoWGFj6Sh1QAkR/7TE/supcS+qqOWU4vLpYFhu
 /l2gJym59FWqHajwqs0Qu9LpHfsEx5HySZbj7GczbGRMka3y/AnjgnngcriP+63B
 ZcPpqSdD4Yeq1OJklU5Wicy+u54rFkA9VE1EArrC9RAEwav6iMhhhaESpRH2JFJE
 SO7cgUSCb2+65XgeSBfDygn+09PcN+eRF3sxXkQpHKsozQOH+qdyr7F4/ePunwi8
 Ah7pIkczRUrj7gMmlNOg97wpHbffO4YnpRESA934qf7MMHRQwDsEkl512kFAyadZ
 g1DI3iByfUpBQvRWJSLasyjyWUqRZDMmyO3yi3i/08sMI3XE1IOWpNkJAooNYC1X
 1FCDn1VXlTdcmC8yw6Da1L05PCVf25tSjpYQZ6r25KtrVi9iBOGV741vmyVMvDEw
 OwUuVatRd+AY+YJ6iUraNH4SlnHWZ/qKDFJECWbrG/4uUHyo8FIXGAgL8RgtbS8+
 fQeRjaDyhHD0XH7TJ10x
 =fJI1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on
  server xdr decoding (with an eye towards eliminating a data copy in
  the RDMA case).

  I did some refactoring of the delegation code in preparation for
  eliminating some delegation self-conflicts and implementing write
  delegations"

* tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits)
  nfsd: fix incorrect umasks
  sunrpc: remove incorrect HMAC request initialization
  NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
  NFSD: Clean up legacy NFS WRITE argument XDR decoders
  nfsd: Trace NFSv4 COMPOUND execution
  nfsd: Add I/O trace points in the NFSv4 read proc
  nfsd: Add I/O trace points in the NFSv4 write path
  nfsd: Add "nfsd_" to trace point names
  nfsd: Record request byte count, not count of vectors
  nfsd: Fix NFSD trace points
  svc: Report xprt dequeue latency
  sunrpc: Report per-RPC execution stats
  sunrpc: Re-purpose trace_svc_process
  sunrpc: Save remote presentation address in svc_xprt for trace events
  sunrpc: Simplify trace_svc_recv
  sunrpc: Simplify do_enqueue tracing
  sunrpc: Move trace_svc_xprt_dequeue()
  sunrpc: Update show_svc_xprt_flags() to include recently added flags
  svc: Simplify ->xpo_secure_port
  sunrpc: Remove unneeded pointer dereference
  ...
2018-04-05 19:15:29 -07:00
Chuck Lever
ece200ddd5 sunrpc: Save remote presentation address in svc_xprt for trace events
TP_printk defines a format string that is passed to user space for
converting raw trace event records to something human-readable.

My user space's printf (Oracle Linux 7), however, does not have a
%pI format specifier. The result is that what is supposed to be an
IP address in the output of "trace-cmd report" is just a string that
says the field couldn't be displayed.

To fix this, adopt the same approach as the client: maintain a pre-
formated presentation address for occasions when %pI is not
available.

The location of the trace_svc_send trace point is adjusted so that
rqst->rq_xprt is not NULL when the trace event is recorded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:11 -04:00
Chuck Lever
989f881ebf svc: Simplify ->xpo_secure_port
Clean up: Instead of returning a value that is used to set or clear
a bit, just make ->xpo_secure_port mangle that bit, and return void.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:09 -04:00
Denys Vlasenko
9b2c45d479 net: make getname() functions return length rather than use int* parameter
Changes since v1:
Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

    text    data     bss      dec     hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12 14:15:04 -05:00
Linus Torvalds
f1517df870 This request is late, apologies.
But it's also a fairly small update this time around.  Some cleanup,
 RDMA fixes, overlayfs fixes, and a fix for an NFSv4 state bug.
 
 The bigger deal for nfsd this time around is Jeff Layton's
 already-merged i_version patches.  This series has a minor conflict with
 that one, and the resolution should be obvious.  (Stephen Rothwell has
 been carrying it in linux-next for what it's worth.)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJafNVvAAoJECebzXlCjuG+yZUP/2SctFtkW638z9frLcIVt5M6
 x5hluw5jtFrVqq/KoMwi7rVaMzhdvcgwwfaLciqrPCOmcMKlOqiWslyCV0wZVCZS
 jabkOeinKVAyPTlESesNyArWKBWaB8QaYDwbkQ5Y76U9Ma5gwSghS1wc8vrNduZY
 2StieESOiOs9LljXf5SqCC5nN9s7gs4qtCK7aZ3JIt4661Lh39LqyO5zxLnc78eL
 USnJKHjTSreY2Vd1/TdNWyZhiim43wdrB+jpy6IoocTqyhYalkCz1iYdJn1arqtP
 iIddPpczKxkHekFVj7/Kfa+ATFtdXIpivOBhhOT0oY8HukTd58bh/oUMrFt4BSuP
 MQst0R9h1sanBE18XBPlXuIK51sm3AjjOGaQycl/Mzes+dMRgIP/KspAcnwwXHqG
 gyZsF3VzliFTc9s0SyiAz2AxNTUnjd+LV3E0DUeivURa6V3pc+sFlQzi8PRxRaep
 0gmhYcZsfwdDKZ/kbQyQdSWN48NxOLFke4fYjmoUtoyILa0NAHEqafeJkR5EiRTm
 tZsL9H/3THEGWygYlXGGBo/J4w5jE3uL/8KkfeuZefzSo0Ujqu0pBALMTnGFLKRx
 Mpw7JEqfUwqIVZ0Qh6q9yIcjr89qWv96UpBqRRIkFX5zOPN7B1BH8C89g8qy3Hyt
 gm/5BTw4FPE0uAM9Nhsd
 =icEX
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux

Pull nfsd update from Bruce Fields:
 "A fairly small update this time around. Some cleanup, RDMA fixes,
  overlayfs fixes, and a fix for an NFSv4 state bug.

  The bigger deal for nfsd this time around was Jeff Layton's
  already-merged i_version patches"

* tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Fix Read chunk round-up
  NFSD: hide unused svcxdr_dupstr()
  nfsd: store stat times in fill_pre_wcc() instead of inode times
  nfsd: encode stat->mtime for getattr instead of inode->i_mtime
  nfsd: return RESOURCE not GARBAGE_ARGS on too many ops
  nfsd4: don't set lock stateid's sc_type to CLOSED
  nfsd: Detect unhashed stids in nfsd4_verify_open_stid()
  sunrpc: remove dead code in svc_sock_setbufsize
  svcrdma: Post Receives in the Receive completion handler
  nfsd4: permit layoutget of executable-only files
  lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
  lockd: convert nlm_lockowner.count from atomic_t to refcount_t
  lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
2018-02-08 15:18:32 -08:00
Christoph Hellwig
d0945caafc sunrpc: remove dead code in svc_sock_setbufsize
Setting values in struct sock directly is the usual method.  Remove
the long dead code using set_fs() and the related comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-05 17:13:16 -05:00
Al Viro
1e4ed1b7ae svc_recvfrom(): switch to sock_recvmsg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-02 20:37:40 -05:00
Linus Torvalds
8e7757d83d NFS client updates for Linux 4.14
Hightlights include:
 
 Stable bugfixes:
 - Fix mirror allocation in the writeback code to avoid a use after free
 - Fix the O_DSYNC writes to use the correct byte range
 - Fix 2 use after free issues in the I/O code
 
 Features:
 - Writeback fixes to split up the inode->i_lock in order to reduce contention
 - RPC client receive fixes to reduce the amount of time the
   xprt->transport_lock is held when receiving data from a socket into am
   XDR buffer.
 - Ditto fixes to reduce contention between call side users of the rdma
   rb_lock, and its use in rpcrdma_reply_handler.
 - Re-arrange rdma stats to reduce false cacheline sharing.
 - Various rdma cleanups and optimisations.
 - Refactor the NFSv4.1 exchange id code and clean up the code.
 - Const-ify all instances of struct rpc_xprt_ops
 
 Bugfixes:
 - Fix the NFSv2 'sec=' mount option.
 - NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'
 - Fix the NFSv3 GRANT callback when the port changes on the server.
 - Fix livelock issues with COMMIT
 - NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when doing
   and NFSv4.1 open by filehandle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZtbvIAAoJEGcL54qWCgDy/boP/jRuVk6B2VyhWnJkOgdQzIN3
 Q8PIR0oxkywH2MI7c9/G2k5b/HD9BK2iQrXzIoPxRuPrckKLwzqYclzG8PR4Niyg
 D3CCzrvGcEXZrv/nHQ+HDMD0ZuUyXFqhrYeyQwNSJ9p/oP0gaxnYwteennfJVa99
 mv6+LdoY+lzVYJI1gmMHVF2zOhN+rTe7xUVnjYnsVCpwMvL+u992oZl3qQJRFG6b
 HlXOy7h5JRFyue61P20PSgh9D1JUWWYD/V0EG+7cIvByAg5KxhvVgjqSsTTT7FXe
 Omn4fTv1MFzk8er9qYFRjpM2IoIdAejFMqX3/PxQVr2qOFNmHYrq+WsdWNQEr/Wu
 WREJu5Ac1Hboe2/scA+DtuVPFePPPyrolhwk533aNWrdDywg01e0XqBEDKR/atJd
 u5lvW20UfLQuCFLOpaxDpq2ngQSOg6t96N36tsydG0SAVpiydOPMLqkQi7Nb3aoB
 79xGpmtnijP5T6jnOI2/nexM08OMTI0BhMbXJC5v1+lnxIJKcKdnGlTM4UJyxUMq
 /3dFI4IQZLfkMEjIvZFoi+nKWx3DYhiUhkKhbBYwtB4P4q8Z2qKTPHFxORz9griZ
 Pa+8BPuDuodIWuDD97q1Dnw2NWjQim8Rx/ce4c8FHGzwMJLPkcVqk+guGsub5IdO
 7qF7Vvv02gJ48TAqTBDf
 =1Ssl
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Hightlights include:

  Stable bugfixes:
   - Fix mirror allocation in the writeback code to avoid a use after
     free
   - Fix the O_DSYNC writes to use the correct byte range
   - Fix 2 use after free issues in the I/O code

  Features:
   - Writeback fixes to split up the inode->i_lock in order to reduce
     contention
   - RPC client receive fixes to reduce the amount of time the
     xprt->transport_lock is held when receiving data from a socket into
     am XDR buffer.
   - Ditto fixes to reduce contention between call side users of the
     rdma rb_lock, and its use in rpcrdma_reply_handler.
   - Re-arrange rdma stats to reduce false cacheline sharing.
   - Various rdma cleanups and optimisations.
   - Refactor the NFSv4.1 exchange id code and clean up the code.
   - Const-ify all instances of struct rpc_xprt_ops

  Bugfixes:
   - Fix the NFSv2 'sec=' mount option.
   - NFSv4.1: don't use machine credentials for CLOSE when using
     'sec=sys'
   - Fix the NFSv3 GRANT callback when the port changes on the server.
   - Fix livelock issues with COMMIT
   - NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when
     doing and NFSv4.1 open by filehandle"

* tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits)
  NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests()
  NFS: Don't hold the group lock when calling nfs_release_request()
  NFS: Remove pnfs_generic_transfer_commit_list()
  NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock
  NFS: Fix 2 use after free issues in the I/O code
  NFS: Sync the correct byte range during synchronous writes
  lockd: Delete an error message for a failed memory allocation in reclaimer()
  NFS: remove jiffies field from access cache
  NFS: flush data when locking a file to ensure cache coherence for mmap.
  SUNRPC: remove some dead code.
  NFS: don't expect errors from mempool_alloc().
  xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler
  xprtrdma: Re-arrange struct rx_stats
  NFS: Fix NFSv2 security settings
  NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'
  SUNRPC: ECONNREFUSED should cause a rebind.
  NFS: Remove unused parameter gfp_flags from nfs_pageio_init()
  NFSv4: Fix up mirror allocation
  SUNRPC: Add a separate spinlock to protect the RPC request receive list
  SUNRPC: Cleanup xs_tcp_read_common()
  ...
2017-09-11 22:01:44 -07:00
J. Bruce Fields
0828170f3d merge nfsd 4.13 bugfixes into nfsd for-4.14 branch 2017-09-05 15:11:47 -04:00
Chuck Lever
2412e92760 sunrpc: Const-ify instances of struct svc_xprt_ops
Close an attack vector by moving the arrays of server-side transport
methods to read-only memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24 22:13:50 -04:00
Vadim Lomovtsev
eebe53e87f net: sunrpc: svcsock: fix NULL-pointer exception
While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24 18:11:28 -04:00
Trond Myklebust
ce7c252a8c SUNRPC: Add a separate spinlock to protect the RPC request receive list
This further reduces contention with the transport_lock, and allows us
to convert to using a non-bh-safe spinlock, since the list is now never
accessed from a bh context.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-18 14:45:04 -04:00
Kinglong Mee
5427290d64 SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
The xprt for backchannel is created separately, not in TCP/UDP code.  It
needs the XPT_CONG_CTRL flag set on it too--otherwise requests on the
NFSv4.1 backchannel are rjected in svc_process_common():

1191         if (versp->vs_need_cong_ctrl &&
1192             !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags))
1193                 goto err_bad_vers;

Fixes: 5283b03ee5 ("nfs/nfsd/sunrpc: enforce transport...")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-03-09 15:20:46 -05:00
Linus Torvalds
8313064c2e The nfsd update this round is mainly a lot of miscellaneous cleanups and
bugfixes.
 
 A couple changes could theoretically break working setups on upgrade.  I
 don't expect complaints in practice, but they seem worth calling out
 just in case:
 
 	- NFS security labels are now off by default; a new
 	  security_label export flag reenables it per export.  But,
 	  having them on by default is a disaster, as it generally only
 	  makes sense if all your clients and servers have similar
 	  enough selinux policies.  Thanks to Jason Tibbitts for
 	  pointing this out.
 
 	- NFSv4/UDP support is off.  It was never really supported, and
 	  the spec explicitly forbids it.  We only ever left it on out
 	  of laziness; thanks to Jeff Layton for finally fixing that.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYtejbAAoJECebzXlCjuG+JhEQAK3YTYYNrPY26Pfiu0FghLFV
 4qOHK4DOkJzrWIom5uWyBo7yOwH6WnQtTe/gCx/voOEW3lsJO7F3IfTnTVp+Smp6
 GJeVtsr1vI9EBnwhMlyoJ5hZ2Ju5kX3MBVnew6+momt6620ZO7a+EtT+74ePaY8Y
 jxLzWVA1UqbWYoMabNQpqgKypKvNrhwst72iYyBhNuL/qtGeBDQWwcrA+TFeE9tv
 Ad7qB53xL1mr0Wn1CNIOR/IzVAj4o2H0vdjqrPjAdvbfwf8YYLNpJXt5k591wx/j
 1TpiWIPqnwLjMT3X5NkQN3agZKeD+2ZWPrClr35TgRRe62CK6JblK9/Wwc5BRSzV
 paMP3hOm6/dQOBA5C+mqPaHdEI8VqcHyZpxU4VC/ttsVEGTgaLhGwHGzSn+5lYiM
 Qx9Sh50yFV3oiBW/sb/y8lBDwYm/Cq0OyqAU277idbdzjcFerMg1qt06tjEQhYMY
 K2V7rS8NuADUF6F1BwOONZzvg7Rr7iWHmLh+iSM9TeQoEmz2jIHSaLIFaYOET5Jr
 PIZS3rOYoa0FaKOYVYnZMC74n/LqP/Aou8B+1rRcLy5YEdUIIIVpFvqpg1nGv6PI
 sA3zx/f13IRte3g0CuQiY0/2cx7uXk/gXJ7s5+ejEzljF/aYWiomx3mr6HqPQETn
 CWEtXlfyJCyX+A8hbO+U
 =iLLz
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "The nfsd update this round is mainly a lot of miscellaneous cleanups
  and bugfixes.

  A couple changes could theoretically break working setups on upgrade.
  I don't expect complaints in practice, but they seem worth calling out
  just in case:

   - NFS security labels are now off by default; a new security_label
     export flag reenables it per export. But, having them on by default
     is a disaster, as it generally only makes sense if all your clients
     and servers have similar enough selinux policies. Thanks to Jason
     Tibbitts for pointing this out.

   - NFSv4/UDP support is off. It was never really supported, and the
     spec explicitly forbids it. We only ever left it on out of
     laziness; thanks to Jeff Layton for finally fixing that"

* tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linux: (34 commits)
  nfsd: Fix display of the version string
  nfsd: fix configuration of supported minor versions
  sunrpc: don't register UDP port with rpcbind when version needs congestion control
  nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
  sunrpc: flag transports as having congestion control
  sunrpc: turn bitfield flags in svc_version into bools
  nfsd: remove superfluous KERN_INFO
  nfsd: special case truncates some more
  nfsd: minor nfsd_setattr cleanup
  NFSD: Reserve adequate space for LOCKT operation
  NFSD: Get response size before operation for all RPCs
  nfsd/callback: Drop a useless data copy when comparing sessionid
  nfsd/callback: skip the callback tag
  nfsd/callback: Cleanup callback cred on shutdown
  nfsd/idmap: return nfserr_inval for 0-length names
  SUNRPC/Cache: Always treat the invalid cache as unexpired
  SUNRPC: Drop all entries from cache_detail when cache_purge()
  svcrdma: Poll CQs in "workqueue" mode
  svcrdma: Combine list fields in struct svc_rdma_op_ctxt
  svcrdma: Remove unused sc_dto_q field
  ...
2017-02-28 15:39:09 -08:00
Alexey Dobriyan
5b5e0928f7 lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Jeff Layton
362142b258 sunrpc: flag transports as having congestion control
NFSv4 requires a transport protocol with congestion control in most
cases.

On an IP network, that means that NFSv4 over UDP should be forbidden.

The situation with RDMA is a bit more nuanced, but most RDMA transports
are suitable for this. For now, we assume that all RDMA transports are
suitable, but we may need to revise that at some point.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24 16:55:46 -05:00
Thomas Gleixner
2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
David S. Miller
f9aa9dc7d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.

That driver has a change_mtu method explicitly for sending
a message to the hardware.  If that fails it returns an
error.

Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.

However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 13:27:16 -05:00
Scott Mayhew
ea08e39230 sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
This fixes the following panic that can occur with NFSoRDMA.

general protection fault: 0000 [#1] SMP
Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
tg3 crct10dif_pclmul drm crct10dif_common
ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
dm_log dm_mod
CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
Workqueue: events check_lifetime
task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
RIP: 0010:[<ffffffff8168d847>]  [<ffffffff8168d847>]
_raw_spin_lock_bh+0x17/0x50
RSP: 0018:ffff88031f587ba8  EFLAGS: 00010206
RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff880322a40000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
Call Trace:
[<ffffffff81557830>] lock_sock_nested+0x20/0x50
[<ffffffff8155ae08>] sock_setsockopt+0x78/0x940
[<ffffffff81096acb>] ? lock_timer_base.isra.33+0x2b/0x50
[<ffffffff8155397d>] kernel_setsockopt+0x4d/0x50
[<ffffffffa0386284>] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
[<ffffffffa03b681d>] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
[<ffffffff81691ebc>] notifier_call_chain+0x4c/0x70
[<ffffffff810b687d>] __blocking_notifier_call_chain+0x4d/0x70
[<ffffffff810b68b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff815e8538>] __inet_del_ifa+0x168/0x2d0
[<ffffffff815e8cef>] check_lifetime+0x25f/0x270
[<ffffffff810a7f3b>] process_one_work+0x17b/0x470
[<ffffffff810a8d76>] worker_thread+0x126/0x410
[<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
[<ffffffff810b052f>] kthread+0xcf/0xe0
[<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81696418>] ret_from_fork+0x58/0x90
[<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 <f0> 0f
c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
RIP  [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50
RSP <ffff88031f587ba8>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-14 10:30:58 -05:00
Paolo Abeni
7c13f97ffd udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.

Overall, this allows acquiring only once the receive queue
lock on dequeue.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr sinks	vanilla		patched
1		440		560
3		2150		2300
6		3650		3800
9		4450		4600
12		6250		6450

v1 -> v2:
 - do rmem and allocated memory scheduling under the receive lock
 - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
 - avoid the typdef for the dequeue callback

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:24:41 -05:00
Paolo Abeni
850cbaddb5 udp: use it's own memory accounting schema
Completely avoid default sock memory accounting and replace it
with udp-specific accounting.

Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.

Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr readers      Kpps (vanilla)  Kpps (patched)
1               170             440
3               1250            2150
6               3000            3650
9               4200            4450
12              5700            6250

v4 -> v5:
  - avoid unneeded test in first_packet_length

v3 -> v4:
  - remove useless sk_rcvqueues_full() call

v2 -> v3:
  - do not set the now unsed backlog_rcv callback

v1 -> v2:
  - add memory pressure support
  - fixed dropwatch accounting for ipv6

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:05 -04:00
Trond Myklebust
c7995f8a70 SUNRPC: Detect immediate closure of accepted sockets
This modification is useful for debugging issues that happen while
the socket is being initialised.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-01 17:53:42 -04:00
Trond Myklebust
b2f21f7d85 SUNRPC: accept() may return sockets that are still in SYN_RECV
We're seeing traces of the following form:

 [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
 [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
 [10952.396362] nfsd: connect from 10.2.6.1, port=187
 [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
 [10952.396368] setting up TCP socket for reading
 [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
 [10952.396373] svc: transport ffff8803eb10a000 put into queue
 [10952.396375] svc: transport ffff88042ba4a000 put into queue
 [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
 [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
 [10952.396381] svc_recv: found XPT_CLOSE
 [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
 [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
 [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
 [10952.396412] svc: svc_sock_free(ffff8803eb10a000)

i.e. an immediate close of the socket after initialisation.

The culprit appears to be the test at the end of svc_tcp_init, which
checks if the newly created socket is in the TCP_ESTABLISHED state,
and immediately closes it if not. The evidence appears to suggest that
the socket might still be in the SYN_RECV state at this time.

The fix is to check for both states, and then to add a check in
svc_tcp_state_change() to ensure we don't close the socket when
it transitions into TCP_ESTABLISHED.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-01 17:53:41 -04:00
Trond Myklebust
637600f3ff SUNRPC: Change TCP socket space reservation
The current server rpc tcp code attempts to predict how much writeable
socket space will be available to a given RPC call before accepting it
for processing.  On a 40GigE network, we've found this throttles
individual clients long before the network or disk is saturated.  The
server may handle more clients easily, but the bandwidth of individual
clients is still artificially limited.

Instead of trying (and failing) to predict how much writeable socket space
will be available to the RPC call, just fall back to the simple model of
deferring processing until the socket is uncongested.

This may increase the risk of fast clients starving slower clients; in
such cases, the previous patch allows setting a hard per-connection
limit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:49 -04:00
Trond Myklebust
4720b0703a SUNRPC: Micro optimisation for svc_data_ready
Don't call svc_xprt_enqueue() if the XPT_DATA flag is already set.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:46 -04:00
Trond Myklebust
fa9251afc3 SUNRPC: Call the default socket callbacks instead of open coding
Rather than code up our own versions of the socket callbacks, just
call the defaults.
This also allows us to merge svc_udp_data_ready() and svc_tcp_data_ready().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:45 -04:00
Trond Myklebust
069c225b88 SUNRPC: lock the socket while detaching it
Prevent callbacks from triggering while we're detaching the socket.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:44 -04:00
Hannes Frederic Sowa
fafc4e1ea1 sock: tigthen lockdep checks for sock_owned_by_user
sock_owned_by_user should not be used without socket lock held. It seems
to be a common practice to check .owned before lock reclassification, so
provide a little help to abstract this check away.

Cc: linux-cifs@vger.kernel.org
Cc: linux-bluetooth@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13 22:37:20 -04:00
Willem de Bruijn
1da8c681d5 sunrpc: do not pull udp headers on receive
Commit e6afc8ace6 modified the udp receive path by pulling the udp
header before queuing an skbuff onto the receive queue.

Sunrpc also calls skb_recv_datagram to dequeue an skb from a udp
socket. Modify this receive path to also no longer expect udp
headers.

Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")

Reported-by: Franklin S Cooper Jr. <fcooper@ti.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:31:33 -04:00
J. Bruce Fields
0442f14b15 svcrpc: document lack of some memory barriers
We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd
expect them.  But it doesn't appear they're necessary in our case, and
this is likely a hot path--for now just document the odd behavior.

Kosuke Tatsukawa found this issue while looking through the linux source
code for places calling waitqueue_active() before wake_up*(), but
without preceding memory barriers, after sending a patch to fix a
similar issue in drivers/tty/n_tty.c  (Details about the original issue
can be found here: https://lkml.org/lkml/2015/9/28/849).

Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 17:02:47 -05:00
Stefan Hajnoczi
ea833f5de3 SUNRPC: drop stale comment in svc_setup_socket()
The svc_setup_socket() function does set the send and receive buffer
sizes, so the comment is out-of-date:

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:25:50 -05:00
Trond Myklebust
226453d8cf SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
If we're sending more pages via kernel_sendpage(), then set
MSG_SENDPAGE_NOTLAST.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-23 15:57:27 -04:00
Al Viro
d8725c86ae get rid of the size argument of sock_sendmsg()
it's equal to iov_iter_count(&msg->msg_iter) in all cases

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 15:27:37 -04:00
Jeff Layton
7501cc2bcf sunrpc: move rq_local field to rq_flags
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09 11:21:21 -05:00
Trond Myklebust
093a1468b6 SUNRPC: Fix locking around callback channel reply receive
Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-11-19 12:03:20 -05:00
Chuck Lever
71efecb3f5 sunrpc: fix byte-swapping of displayed XID
xprt_lookup_rqst() and bc_send_request() display a byte-swapped XID,
but receive_cb_reply() does not.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-28 16:00:07 -04:00
Trond Myklebust
f8d1ff47b6 SUNRPC: Optimise away svc_recv_available
We really do not want to do ioctls in the server's fast path. Instead, let's
use the fact that we managed to read a full record as the indicator that
we should try to read the socket again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17 12:00:11 -04:00
Trond Myklebust
518776800c SUNRPC: Allow svc_reserve() to notify TCP socket that space has been freed
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 16:10:20 -04:00
Trond Myklebust
c7fb3f0631 SUNRPC: svc_tcp_write_space: don't clear SOCK_NOSPACE prematurely
If requests are queued in the socket inbuffer waiting for an
svc_tcp_has_wspace() requirement to be satisfied, then we do not want
to clear the SOCK_NOSPACE flag until we've satisfied that requirement.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 16:10:19 -04:00
Chuck Lever
3c45ddf823 svcrdma: Select NFSv4.1 backchannel transport based on forward channel
The current code always selects XPRT_TRANSPORT_BC_TCP for the back
channel, even when the forward channel was not TCP (eg, RDMA). When
a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
code when trying to send CB_NULL.

Instead, construct the transport protocol number from the forward
channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
not support bi-directional RPC will not have registered a "BC"
transport, causing create_backchannel_client() to fail immediately.

Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-18 11:35:45 -04:00
Kinglong Mee
a48fd0f9f7 SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
When debugging, rpc prints messages from dprintk(KERN_WARNING ...)
with "^A4" prefixed,

[ 2780.339988] ^A4nfsd: connect from unprivileged port: 127.0.0.1, port=35316

Trond tells,
> dprintk != printk. We have NEVER supported dprintk(KERN_WARNING...)

This patch removes using of dprintk with KERN_WARNING.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 20:25:28 -04:00
NeilBrown
ef11ce2487 SUNRPC: track whether a request is coming from a loop-back interface.
If an incoming NFS request is coming from the local host, then
nfsd will need to perform some special handling.  So detect that
possibility and make the source visible in rq_local.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:59:18 -04:00
Chuck Lever
16e4d93f6d NFSD: Ignore client's source port on RDMA transports
An NFS/RDMA client's source port is meaningless for RDMA transports.
The transport layer typically sets the source port value on the
connection to a random ephemeral port.

Currently, NFS server administrators must specify the "insecure"
export option to enable clients to access exports via RDMA.

But this means NFS clients can access such an export via IP using an
ephemeral port, which may not be desirable.

This patch eliminates the need to specify the "insecure" export
option to allow NFS/RDMA clients access to an export.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:55:48 -04:00
Linus Torvalds
454fd351f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull yet more networking updates from David Miller:

 1) Various fixes to the new Redpine Signals wireless driver, from
    Fariya Fatima.

 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
    Dmitry Petukhov.

 3) UFO and TSO packets differ in whether they include the protocol
    header in gso_size, account for that in skb_gso_transport_seglen().
   From Florian Westphal.

 4) If VLAN untagging fails, we double free the SKB in the bridging
    output path.  From Toshiaki Makita.

 5) Several call sites of sk->sk_data_ready() were referencing an SKB
    just added to the socket receive queue in order to calculate the
    second argument via skb->len.  This is dangerous because the moment
    the skb is added to the receive queue it can be consumed in another
    context and freed up.

    It turns out also that none of the sk->sk_data_ready()
    implementations even care about this second argument.

    So just kill it off and thus fix all these use-after-free bugs as a
    side effect.

 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.

 7) pktgen needs to do locking properly for LLTX devices, from Daniel
    Borkmann.

 8) xen-netfront driver initializes TX array entries in RX loop :-) From
    Vincenzo Maffione.

 9) After refactoring, some tunnel drivers allow a tunnel to be
    configured on top itself.  Fix from Nicolas Dichtel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vti: don't allow to add the same tunnel twice
  gre: don't allow to add the same tunnel twice
  drivers: net: xen-netfront: fix array initialization bug
  pktgen: be friendly to LLTX devices
  r8152: check RTL8152_UNPLUG
  net: sun4i-emac: add promiscuous support
  net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  net: ipv6: Fix oif in TCP SYN+ACK route lookup.
  drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
  drivers: net: cpsw: discard all packets received when interface is down
  net: Fix use after free by removing length arg from sk_data_ready callbacks.
  Drivers: net: hyperv: Address UDP checksum issues
  Drivers: net: hyperv: Negotiate suitable ndis version for offload support
  Drivers: net: hyperv: Allocate memory for all possible per-pecket information
  bridge: Fix double free and memory leak around br_allowed_ingress
  bonding: Remove debug_fs files when module init fails
  i40evf: program RSS LUT correctly
  i40evf: remove open-coded skb_cow_head
  ixgb: remove open-coded skb_cow_head
  igbvf: remove open-coded skb_cow_head
  ...
2014-04-12 17:31:22 -07:00
David S. Miller
676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
Stanislav Kinsbursky
3064639423 nfsd: check passed socket's net matches NFSd superblock's one
There could be a case, when NFSd file system is mounted in network, different
to socket's one, like below:

"ip netns exec" creates new network and mount namespace, which duplicates NFSd
mount point, created in init_net context. And thus NFS server stop in nested
network context leads to RPCBIND client destruction in init_net.
Then, on NFSd start in nested network context, rpc.nfsd process creates socket
in nested net and passes it into "write_ports", which leads to RPCBIND sockets
creation in init_net context because of the same reason (NFSd monut point was
created in init_net context). An attempt to register passed socket in nested
net leads to panic, because no RPCBIND client present in nexted network
namespace.

This patch add check that passed socket's net matches NFSd superblock's one.
And returns -EINVAL error to user psace otherwise.

v2: Put socket on exit.

Reported-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-31 16:58:16 -04:00
Eric Dumazet
c2bb06db59 net: fix build errors if ipv6 is disabled
CONFIG_IPV6=n is still a valid choice ;)

It appears we can remove dead code.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 13:04:03 -04:00
Eric Dumazet
efe4208f47 ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 00:01:25 -04:00
David S. Miller
0e76a3a587 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 21:36:46 -07:00
NeilBrown
447383d2ba NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
Since we enabled auto-tuning for sunrpc TCP connections we do not
guarantee that there is enough write-space on each connection to
queue a reply.

If memory pressure causes the window to shrink too small, the request
throttling in sunrpc/svc will not accept any requests so no more requests
will be handled.  Even when pressure decreases the window will not
grow again until data is sent on the connection.
This means we get a deadlock:  no requests will be handled until there
is more space, and no space will be allocated until a request is
handled.

This can be simulated by modifying svc_tcp_has_wspace to inflate the
number of byte required and removing the 'svc_sock_setbufsize' calls
in svc_setup_socket.

I found that multiplying by 16 was enough to make the requirement
exceed the default allocation.  With this modification in place:
   mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
would block and eventually time out because the nfs server could not
accept any requests.

This patch relaxes the request throttling to always allow at least one
request through per connection.  It does this by checking both
  sk_stream_min_wspace() and xprt->xpt_reserved
are zero.
The first is zero when the TCP transmit queue is empty.
The second is zero when there are no RPC requests being processed.
When both of these are zero the socket is idle and so one more
request can safely be allowed through.

Applying this patch allows the above mount command to succeed cleanly.
Tracing shows that the allocated write buffer space quickly grows and
after a few requests are handled, the extra tests are no longer needed
to permit further requests to be processed.

The main purpose of request throttling is to handle the case when one
client is slow at collecting replies and the send queue gets full of
replies that the client hasn't acknowledged (at the TCP level) yet.
As we only change behaviour when the send queue is empty this main
purpose is still preserved.

Reported-by: Ben Myers <bpm@sgi.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01 08:39:16 -04:00
Eric Dumazet
64dc61306c net: add sk_stream_is_writeable() helper
Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:54:48 -07:00
J. Bruce Fields
1f691b07c5 svcrpc: don't error out on small tcp fragment
Though clients we care about mostly don't do this, it is possible for
rpc requests to be sent in multiple fragments.  Here we have a sanity
check to ensure that the final received rpc isn't too small--except that
the number we're actually checking is the length of just the final
fragment, not of the whole rpc.  So a perfectly legal rpc that's
unluckily fragmented could cause the server to close the connection
here.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:04 -04:00
J. Bruce Fields
cf3aa02cb4 svcrpc: fix handling of too-short rpc's
If we detect that an rpc is too short, we abort and close the
connection.  Except, there's a bug here: we're leaving sk_datalen
nonzero without leaving any pages in the sk_pages array.  The most
likely result of the inconsistency is a subsequent crash in
svc_tcp_clear_pages.

Also demote the BUG_ON in svc_tcp_clear_pages to a WARN.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:04 -04:00
Tom Parkin
73df66f8b1 ipv6: rename datagram_send_ctl and datagram_recv_ctl
The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific.  Since
datagram_send_ctl is publicly exported it should be appropriately named to
reflect the fact that it's for IPv6 only.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 13:53:08 -05:00
Linus Torvalds
982197277c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
 "Included this time:

   - more nfsd containerization work from Stanislav Kinsbursky: we're
     not quite there yet, but should be by 3.9.

   - NFSv4.1 progress: implementation of basic backchannel security
     negotiation and the mandatory BACKCHANNEL_CTL operation.  See

       http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

     for remaining TODO's

   - Fixes for some bugs that could be triggered by unusual compounds.
     Our xdr code wasn't designed with v4 compounds in mind, and it
     shows.  A more thorough rewrite is still a todo.

   - If you've ever seen "RPC: multiple fragments per record not
     supported" logged while using some sort of odd userland NFS client,
     that should now be fixed.

   - Further work from Jeff Layton on our mechanism for storing
     information about NFSv4 clients across reboots.

   - Further work from Bryan Schumaker on his fault-injection mechanism
     (which allows us to discard selective NFSv4 state, to excercise
     rarely-taken recovery code paths in the client.)

   - The usual mix of miscellaneous bugs and cleanup.

  Thanks to everyone who tested or contributed this cycle."

* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
  nfsd4: don't leave freed stateid hashed
  nfsd4: free_stateid can use the current stateid
  nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
  nfsd: warn on odd reply state in nfsd_vfs_read
  nfsd4: fix oops on unusual readlike compound
  nfsd4: disable zero-copy on non-final read ops
  svcrpc: fix some printks
  NFSD: Correct the size calculation in fault_inject_write
  NFSD: Pass correct buffer size to rpc_ntop
  nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  nfsd: simplify service shutdown
  nfsd: replace boolean nfsd_up flag by users counter
  nfsd: simplify NFSv4 state init and shutdown
  nfsd: introduce helpers for generic resources init and shutdown
  nfsd: make NFSd service structure allocated per net
  nfsd: make NFSd service boot time per-net
  nfsd: per-net NFSd up flag introduced
  nfsd: move per-net startup code to separated function
  nfsd: pass net to __write_ports() and down
  nfsd: pass net to nfsd_set_nrthreads()
  ...
2012-12-20 14:04:11 -08:00
J. Bruce Fields
afc59400d6 nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:16 -05:00
J. Bruce Fields
3a28e33111 svcrpc: fix some printks
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 16:02:40 -05:00
J. Bruce Fields
836fbadb96 svcrpc: support multiple-fragment rpc's
Over TCP, RPC's are preceded by a single 4-byte field telling you how
long the rpc is (in bytes).  The spec also allows you to send an RPC in
multiple such records (the high bit of the length field is used to tell
you whether this is the final record).

We've survived for years without supporting this because in practice the
clients we care about don't use it.  But the userland rpc libraries do,
and every now and then an experimental client will run into this.  (Most
recently I noticed it while trying to write a pynfs check.)  And we're
really on the wrong side of the spec here--let's fix this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:49:24 -05:00
J. Bruce Fields
8af345f58a svcrpc: track rpc data length separately from sk_tcplen
Keep a separate field, sk_datalen, that tracks only the data contained
in a fragment, not including the fragment header.

For now, this is always just max(0, sk_tcplen - 4), but after we allow
multiple fragments sk_datalen will accumulate the total rpc data size
while sk_tcplen only tracks progress receiving the current fragment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:49:14 -05:00
J. Bruce Fields
6a72ae2e23 svcrpc: fix off-by-4 error in "incomplete TCP record" dprintk
The full reclen doesn't include the fragment header, but sk_tcplen does.
Fix this to make it an apples-to-apples comparison.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:49:06 -05:00
J. Bruce Fields
ad46ccf094 svcrpc: delay minimum-rpc-size check till later
Soon we want to support multiple fragments, in which case it may be
legal for a single fragment to be smaller than 8 bytes, so we'll want to
delay this check till we've reached the last fragment.

Also fix an outdated comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:47:58 -05:00
J. Bruce Fields
cc248d4b1d svcrpc: don't byte-swap sk_reclen in place
Byte-swapping in place is always a little dubious.

Let's instead define this field to always be big-endian, and do the
swapping on demand where we need it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:47:23 -05:00
Weston Andros Adamson
1b7a181907 SUNRPC: remove BUG_ONs from *_reclassify_socket*
Replace multiple BUG_ON() calls with WARN_ON_ONCE() and early return when
sanity checking socket ownership (lock). The bind call will fail if the
socket was unsuccessfully reclassified.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
J. Bruce Fields
eccf50c129 nfsd: remove unused listener-removal interfaces
You can use nfsd/portlist to give nfsd additional sockets to listen on.
In theory you can also remove listening sockets this way.  But nobody's
ever done that as far as I can tell.

Also this was partially broken in 2.6.25, by
a217813f90 "knfsd: Support adding
transports by writing portlist file".

(Note that we decide whether to take the "delfd" case by checking for a
digit--but what's actually expected in that case is something made by
svc_one_sock_name(), which won't begin with a digit.)

So, let's just rip out this stuff.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 10:55:19 -04:00
J. Bruce Fields
9f9d2ebe69 svcrpc: make xpo_recvfrom return only >=0
The only errors returned from xpo_recvfrom have been -EAGAIN and
-EAFNOSUPPORT.  The latter was removed by a previous patch.  That leaves
only -EAGAIN, which is treated just like 0 by the caller (svc_recv).

So, just ditch -EAGAIN and return 0 instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:41:07 -04:00
J. Bruce Fields
af6d572134 svcrpc: don't bother checking bad svc_addr_len result
None of the callers should see an unsupported address family (only one
of them even bothers to check for that case), so just check for the
buggy case in svc_addr_len and don't bother elsewhere.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:40:10 -04:00
J. Bruce Fields
f23abfdb94 svcrpc: minor udp code cleanup
Order the code in a more boring way.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:50 -04:00
J. Bruce Fields
39b5530137 svcrpc: share some setup of listening sockets
There's some duplicate code here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:48 -04:00
J. Bruce Fields
a8e10078a8 svcrpc: clean up control flow
Mainly, use the kernel standard

	err = -ERROR;
	if (something_bad)
		goto out;
	normal case;

rather than

	if (something_bad)
		err = -ERROR
	else {
		normal case;
	}

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:41 -04:00
J. Bruce Fields
72c3537607 svcrpc: standardize svc_setup_socket return convention
Use the kernel-standard ptr-or-error return convention instead of
passing a pointer to the error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:40 -04:00
J. Bruce Fields
be1e44441a svcrpc: fix BUG() in svc_tcp_clear_pages
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY.  However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:44 -04:00
Eric Dumazet
22911fc581 net: skb_free_datagram_locked() doesnt drop all packets
dropwatch wrongly diagnose all received UDP packets as drops.

This patch removes trace_kfree_skb() done in skb_free_datagram_locked().

Locations calling skb_free_datagram_locked() should do it on their own.

As a result, drops are accounted on the right function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 15:40:57 -07:00
Joe Perches
e87cc4728f net: Convert net_ratelimit uses to net_<level>_ratelimited
Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 13:45:03 -04:00
Pavel Emelyanov
4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
J. Bruce Fields
1df00640c9 Merge nfs containerization work from Trond's tree
The nfs containerization work is a prerequisite for Jeff Layton's reboot
recovery rework.
2012-03-26 11:48:54 -04:00
Trond Myklebust
09acfea5d8 SUNRPC: Fix a few sparse warnings
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
 - svc_partial_recvfrom now takes a struct kvec, so the variable
   save_iovbase needs to be an ordinary (void *)

Make a bunch of variables in net/sunrpc/xprtsock.c static

Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.

Fix a couple of conflicting function declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 19:30:02 -04:00
Stanislav Kinsbursky
9f912ceb7e SUNPRC: remove marking service temporary sockets with XPT_CHNGBUF
This is a cleanup patch.
Service temporary sockets can be TCP or RDMA only. But XPT_CHNGBUF service
socket flag is checked only for UDP sockets on receive.
Thus (if I don't miss something non-obvious) this bit raising for temporary
sockets can be removed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-03 14:26:43 -05:00
Trond Myklebust
d3b773e4fd SUNRPC: fixup for namespace changes
Fixes this build error when CONFIG_NET_NS is not set:

net/sunrpc/svcsock.c: In function 'svc_setup_socket':
net/sunrpc/svcsock.c:1412:40: error: 'struct sock_common' has no member named 'skc_net'

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:22 -05:00
Stanislav Kinsbursky
5247fab5c8 SUNRPC: pass network namespace to service registering routines
Lockd and NFSd services will handle requests from and to many network
nsamespaces. And thus have to be registered and unregistered per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:13 -05:00
Linus Torvalds
0b48d42235 Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linux
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: nfsd4_create_clid_dir return value is unused
  NFSD: Change name of extended attribute containing junction
  svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
  svcrpc: fix double-free on shutdown of nfsd after changing pool mode
  nfsd4: be forgiving in the absence of the recovery directory
  nfsd4: fix spurious 4.1 post-reboot failures
  NFSD: forget_delegations should use list_for_each_entry_safe
  NFSD: Only reinitilize the recall_lru list under the recall lock
  nfsd4: initialize special stateid's at compile time
  NFSd: use network-namespace-aware cache registering routines
  SUNRPC: create svc_xprt in proper network namespace
  svcrpc: update outdated BKL comment
  nfsd41: allow non-reclaim open-by-fh's in 4.1
  svcrpc: avoid memory-corruption on pool shutdown
  svcrpc: destroy server sockets all at once
  svcrpc: make svc_delete_xprt static
  nfsd: Fix oops when parsing a 0 length export
  nfsd4: Use kmemdup rather than duplicating its implementation
  nfsd4: add a separate (lockowner, inode) lookup
  nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
  ...
2012-01-14 12:26:41 -08:00
Stanislav Kinsbursky
bd4620ddf6 SUNRPC: create svc_xprt in proper network namespace
This patch makes svc_xprt inherit network namespace link from its socket.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-06 16:20:42 -05:00