linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Sathish Narasimman	abb638b311	Bluetooth: Handle own address type change with HCI_ENABLE_LL_PRIVACY own_address_type has to changed to 0x02 and 0x03 only when HCI_ENABLE_LL_PRIVACY flag is set. Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-06 10:47:45 +02:00
Daniel Winkler	b6f1b79dea	Bluetooth: Do not set cur_adv_instance in adv param MGMT request We set hdev->cur_adv_instance in the adv param MGMT request to allow the callback to the hci param request to set the tx power to the correct instance. Now that the callbacks use the advertising handle from the hci request (as they should), this workaround is no longer necessary. Furthermore, this change resolves a race condition that is more prevalent when using the extended advertising MGMT calls - if hdev->cur_adv_instance is set in the params request, then when the data request is called, we believe our new instance is already active. This treats it as an update and immediately schedules the instance with the controller, which has a potential race with the software rotation adv update. By not setting hdev->cur_adv_instance too early, the new instance is queued as it should be, to be used when the rotation comes around again. This change is tested on harrison peak to confirm that it resolves the race condition on registration, and that there is no regression in single- and multi-advertising automated tests. Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-06 10:43:26 +02:00
Daniel Winkler	25e70886c2	Bluetooth: Use ext adv handle from requests in CCs Some extended advertising hci command complete events are still using hdev->cur_adv_instance to map the request to the correct advertisement handle. However, with extended advertising, "current instance" doesn't make sense as we can have multiple concurrent advertisements. This change switches these command complete handlers to use the advertising handle from the request/event, to ensure we will always use the correct advertising handle regardless of the state of hdev->cur_adv_instance. This change is tested on hatch and kefka chromebooks and run through single- and multi-advertising automated tests to confirm callbacks report tx power to the correct advertising handle, etc. Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-06 10:43:25 +02:00
Kai Ye	93917fd224	Bluetooth: use the correct print format for L2CAP debug statements Use the correct print format. Printing an unsigned int value should use %u instead of %d. For details, please read document: Documentation/core-api/printk-formats.rst Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-06 10:40:20 +02:00
Florian Westphal	1379940bf8	netfilter: conntrack: move ecache dwork to net_generic infra dwork struct is large (>128 byte) and not needed when conntrack module is not loaded. Place it in net_generic data instead. The struct net dwork member is now obsolete and will be removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:53 +02:00
Florian Westphal	7b5974709f	netfilter: conntrack: move sysctl pointer to net_generic infra No need to keep this in struct net, place it in the net_generic data. The sysctl pointer is removed from struct net in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:53 +02:00
Florian Westphal	1d610d4d31	netfilter: x_tables: move known table lists to net_generic infra Will reduce struct net size by 208 bytes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:52 +02:00
Florian Westphal	0854db2aae	netfilter: nf_tables: use net_generic infra for transaction data This moves all nf_tables pernet data from struct net to a net_generic extension, with the exception of the gencursor. The latter is used in the data path and also outside of the nf_tables core. All others are only used from the configuration plane. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:52 +02:00
Florian Westphal	5b53951cfc	netfilter: ebtables: use net_generic infra ebtables currently uses net->xt.tables[BRIDGE], but upcoming patch will move net->xt.tables away from struct net. To avoid exposing x_tables internals to ebtables, use a private list instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:52 +02:00
Florian Westphal	7b1957b049	netfilter: nf_defrag_ipv4: use net_generic infra This allows followup patch to remove the defrag_ipv4 member from struct net. It also allows to auto-remove the hooks later on by adding a _disable() function. This will be done later in a follow patch series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:52 +02:00
Florian Westphal	8b0adbe3e3	netfilter: nf_defrag_ipv6: use net_generic infra This allows followup patch to remove these members from struct net. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:51 +02:00
Florian Westphal	ebfbe67568	netfilter: cttimeout: use net_generic infra reduce size of struct net and make this self-contained. The member in struct net is kept to minimize changes to struct net layout, it will be removed in a separate patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:51 +02:00
Florian Westphal	1be05ea766	netfilter: nfnetlink: use net_generic infra No need to place it in struct net, nfnetlink is a module and usage doesn't occur in fastpath. Also remove rcu usage: Not a single reader of net->nfnl uses rcu accessors. When exit_batch callbacks are executed the net namespace is already dead so no calls to these functions are possible anymore (else we'd get NULL deref crash too). If the module is removed, then modules that call any of those functions have been removed too so no calls to nfnl functions are possible either. The nfnl and nfl_stash pointers in struct net are no longer used, they will be removed in a followup patch to minimize changes to struct net (causes rebuild for entire network stack). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:51 +02:00
Florian Westphal	237c609f87	netfilter: nfnetlink: add and use nfnetlink_broadcast This removes the only reference of net->nfnl outside of the nfnetlink module. This allows to move net->nfnl to net_generic infra. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:51 +02:00
Tetsuo Handa	08c27f3322	batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field KMSAN found uninitialized value at batadv_tt_prepare_tvlv_local_data() [1], for commit `ced72933a5` ("batman-adv: use CRC32C instead of CRC16 in TT code") inserted 'reserved' field into "struct batadv_tvlv_tt_data" and commit `7ea7b4a142` ("batman-adv: make the TT CRC logic VLAN specific") moved that field to "struct batadv_tvlv_tt_vlan_data" but left that field uninitialized. [1] https://syzkaller.appspot.com/bug?id=07f3e6dba96f0eb3cabab986adcd8a58b9bdbe9d Reported-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: `ced72933a5` ("batman-adv: use CRC32C instead of CRC16 in TT code") Fixes: `7ea7b4a142` ("batman-adv: make the TT CRC logic VLAN specific") Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 15:06:03 -07:00
Andrei Vagin	eeb85a14ee	net: Allow to specify ifindex when device is moved to another namespace Currently, we can specify ifindex on link creation. This change allows to specify ifindex when a device is moved to another network namespace. Even now, a device ifindex can be changed if there is another device with the same ifindex in the target namespace. So this change doesn't introduce completely new behavior, it adds more control to the process. CRIU users want to restore containers with pre-created network devices. A user will provide network devices and instructions where they have to be restored, then CRIU will restore network namespaces and move devices into them. The problem is that devices have to be restored with the same indexes that they have before C/R. Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 14:49:40 -07:00
Zheng Yongjun	d3295869c4	net: nfc: Fix spelling errors in net/nfc module These patches fix a series of spelling errors in net/nfc module. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 12:58:31 -07:00
Maciej Żenczykowski	630e4576f8	net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind() Found by virtue of ipv6 raw sockets not honouring the per-socket IP{,V6}_FREEBIND setting. Based on hits found via: git grep '[.]ip_nonlocal_bind' We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT, and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl. The helper is defined as: static inline bool ipv6_can_nonlocal_bind(struct net net, struct inet_sock inet) { return net->ipv6.sysctl.ip_nonlocal_bind \|\| inet->freebind \|\| inet->transparent; } so this change only widens the accepted opt-outs and is thus a clean bugfix. I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug, but IMHO this should be applied to stable kernels as far back as possible. As such I'm adding a 'fixes' tag with the commit that originally added the helper, which happened in 4.19. Backporting to older LTS kernels (at least 4.9 and 4.14) would presumably require open-coding it or backporting the helper as well. Other possibly relevant commits: v4.18-rc6-1502-g83ba4645152d net: add helpers checking if socket can be bound to nonlocal address v4.18-rc6-1431-gd0c1f01138c4 net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind v4.14-rc5-271-gb71d21c274ef sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND v4.7-rc7-1883-g9b9742022888 sctp: support ipv6 nonlocal bind v4.1-12247-g35a256fee52c ipv6: Nonlocal bind Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: `83ba464515` ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 12:56:52 -07:00
Ilya Maximets	4d51419d49	openvswitch: fix send of uninitialized stack memory in ct limit reply 'struct ovs_zone_limit' has more members than initialized in ovs_ct_limit_get_default_limit(). The rest of the memory is a random kernel stack content that ends up being sent to userspace. Fix that by using designated initializer that will clear all non-specified fields. Fixes: `11efd5cb04` ("openvswitch: Support conntrack zone limit") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 12:54:42 -07:00
Wu XiangCheng	85d091a794	tipc: Fix a kernel-doc warning in name_table.c Fix kernel-doc warning: Documentation/networking/tipc:66: /home/sfr/next/next/net/tipc/name_table.c :558: WARNING: Unexpected indentation. Documentation/networking/tipc:66: /home/sfr/next/next/net/tipc/name_table.c :559: WARNING: Block quote ends without a blank line; unexpected unindent. Due to blank line missing. Fixes: `908148bc50` ("tipc: refactor tipc_sendmsg() and tipc_lookup_anycast()") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/netdev/20210318172255.63185609@canb.auug.org.au/ Signed-off-by: Wu XiangCheng <bobwxc@email.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 12:51:34 -07:00
Taehee Yoo	4b4b84468a	mld: change lockdep annotation for ip6_sf_socklist and ipv6_mc_socklist struct ip6_sf_socklist and ipv6_mc_socklist are per-socket MLD data. These data are protected by rtnl lock, socket lock, and RCU. So, when these are used, it verifies whether rtnl lock is acquired or not. ip6_mc_msfget() is called by do_ipv6_getsockopt(). But caller doesn't acquire rtnl lock. So, when these data are used in the ip6_mc_msfget() lockdep warns about it. But accessing these is actually safe because socket lock was acquired by do_ipv6_getsockopt(). So, it changes lockdep annotation from rtnl lock to socket lock. (rtnl_dereference -> sock_dereference) Locking graph for mld data is like below: When writing mld data: do_ipv6_setsockopt() rtnl_lock lock_sock (mld functions) idev->mc_lock(if per-interface mld data is modified) When reading mld data: do_ipv6_getsockopt() lock_sock ip6_mc_msfget() Splat looks like: ============================= WARNING: suspicious RCU usage 5.12.0-rc4+ #503 Not tainted ----------------------------- net/ipv6/mcast.c:610 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by mcast-listener-/923: #0: ffff888007958a70 (sk_lock-AF_INET6){+.+.}-{0:0}, at: ipv6_get_msfilter+0xaf/0x190 stack backtrace: CPU: 1 PID: 923 Comm: mcast-listener- Not tainted 5.12.0-rc4+ #503 Call Trace: dump_stack+0xa4/0xe5 ip6_mc_msfget+0x553/0x6c0 ? ipv6_sock_mc_join_ssm+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? mark_held_locks+0xb7/0x120 ? lockdep_hardirqs_on_prepare+0x27c/0x3e0 ? __local_bh_enable_ip+0xa5/0xf0 ? lock_sock_nested+0x82/0xf0 ipv6_get_msfilter+0xc3/0x190 ? compat_ipv6_get_msfilter+0x300/0x300 ? lock_downgrade+0x690/0x690 do_ipv6_getsockopt.isra.6.constprop.13+0x1809/0x29e0 ? do_ipv6_mcast_group_source+0x150/0x150 ? register_lock_class+0x1750/0x1750 ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x3a/0x1c0 ? lock_downgrade+0x690/0x690 ? ipv6_getsockopt+0xdb/0x1b0 ipv6_getsockopt+0xdb/0x1b0 [ ... ] Fixes: `88e2ca3080` ("mld: convert ifmcaddr6 to RCU") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 12:50:04 -07:00
Benjamin Coddington	98b5cee373	SUNRPC: Ensure the transport backchannel association If the server sends CB_ calls on a connection that is not associated with the backchannel, refuse to process the call and shut down the connection. This avoids a NULL dereference crash in xprt_complete_bc_request(). There's not much more we can do in this situation unless we want to look into allowing all connections to be associated with the fore and back channel. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-04-05 09:04:21 -04:00
Eryu Guan	6b996476f3	sunrpc: honor rpc_task's timeout value in rpcb_create() Currently rpcbind client is created without setting rpc timeout (thus using the default value). But if the rpc_task already has a customized timeout in its tk_client field, it's also ignored. Let's use the same timeout setting in rpc_task->tk_client->cl_timeout for rpcbind connection. Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-04-05 09:04:21 -04:00
Trond Myklebust	d737e5d418	SUNRPC: Set TCP_CORK until the transmit queue is empty When we have multiple RPC requests queued up, it makes sense to set the TCP_CORK option while the transmit queue is non-empty. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-04-05 09:04:20 -04:00
Greg Kroah-Hartman	9594408763	Merge 5.12-rc6 into tty-next We need the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-05 08:59:21 +02:00
Christophe JAILLET	7d42e84eb9	net: openvswitch: Use 'skb_push_rcsum()' instead of hand coding it 'skb_push()'/'skb_postpush_rcsum()' can be replaced by an equivalent 'skb_push_rcsum()' which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-04 01:43:02 -07:00
Pablo Neira Ayuso	8c56049fec	netfilter: nftables: remove documentation on static functions Since `4f16d25c68` ("netfilter: nftables: add nft_parse_register_load() and use it") and `345023b0db` ("netfilter: nftables: add nft_parse_register_store() and use it"), the following functions are not exported symbols anymore: - nft_parse_register() - nft_validate_register_load() - nft_validate_register_store() Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-03 20:36:03 +02:00
Dan Carpenter	dadf33c9f6	netfilter: nftables: fix a warning message in nf_tables_commit_audit_collect() The first argument of a WARN_ONCE() is a condition. This WARN_ONCE() will only print the table name, and is potentially problematic if the table name has a %s in it. Fixes: `c520292f29` ("audit: log nftables configuration change events once per table") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-03 20:19:37 +02:00
Florian Westphal	daf47a7c10	netfilter: ipvs: do not printk on netns creation This causes dmesg spew during normal operation, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-03 20:17:11 +02:00
Florian Westphal	dc87efdb1a	mptcp: add mptcp reset option support The MPTCP reset option allows to carry a mptcp-specific error code that provides more information on the nature of a connection reset. Reset option data received gets stored in the subflow context so it can be sent to userspace via the 'subflow closed' netlink event. When a subflow is closed, the desired error code that should be sent to the peer is also placed in the subflow context structure. If a reset is sent before subflow establishment could complete, e.g. on HMAC failure during an MP_JOIN operation, the mptcp skb extension is used to store the reset information. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Paolo Abeni	781bf13d4f	mptcp: remove unneeded check on first subflow Currently we explicitly check for the first subflow being NULL in a couple of places, even if we don't need any special actions in such scenario. Just drop the unneeded checks, to avoid confusion. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Paolo Abeni	5695eb8891	mptcp: add active MPC mibs We are not currently tracking the active MPTCP connection attempts. Let's add the related counters. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Paolo Abeni	a16195e35c	mptcp: add mib for token creation fallback If the MPTCP protocol is unable to create a new token, the socket fallback to plain TCP, let's keep track of such events via a specific MIB. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Yunjian Wang	990b03b05b	net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb The 'unlocked_driver_cb' struct field in 'bo' is not being initialized in tcf_block_offload_init(). The uninitialized 'unlocked_driver_cb' will be used when calling unlocked_driver_cb(). So initialize 'bo' to zero to avoid the issue. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `0fdcf78d59` ("net: use flow_indr_dev_setup_offload()") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:14:22 -07:00
David S. Miller	c2bcb4cf02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-01 The following pull-request contains BPF updates for your net-next tree. We've added 68 non-merge commits during the last 7 day(s) which contain a total of 70 files changed, 2944 insertions(+), 1139 deletions(-). The main changes are: 1) UDP support for sockmap, from Cong. 2) Verifier merge conflict resolution fix, from Daniel. 3) xsk selftests enhancements, from Maciej. 4) Unstable helpers aka kernel func calling, from Martin. 5) Batches ops for LPM map, from Pedro. 6) Fix race in bpf_get_local_storage, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 11:03:07 -07:00
Luiz Augusto von Dentz	0ae8ef674e	Bluetooth: SMP: Fix variable dereferenced before check 'conn' This fixes kbuild findings: smatch warnings: net/bluetooth/smp.c:1633 smp_user_confirm_reply() warn: variable dereferenced before check 'conn' (see line 1631) Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-02 11:09:12 +02:00
Archie Pusaka	06752d1678	Bluetooth: Check inquiry status before sending one There is a possibility where HCI_INQUIRY flag is set but we still send HCI_OP_INQUIRY anyway. Such a case can be reproduced by connecting to an LE device while active scanning. When the device is discovered, we initiate a connection, stop LE Scan, and send Discovery MGMT with status disabled, but we don't cancel the inquiry. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-02 11:06:17 +02:00
Meng Yu	149b3f13b4	Bluetooth: Coding style fix 1. Add space when needed; 2. Block comments style fix; 3. Move open brace '{' following function definitions to the next line; 4. Remove unnecessary braces '{}' for single statement blocks. Signed-off-by: Meng Yu <yumeng18@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-02 11:03:04 +02:00
Meng Yu	82a1242619	Bluetooth: Remove 'return' in void function void function return statements are not generally useful. Signed-off-by: Meng Yu <yumeng18@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-04-02 11:01:35 +02:00
Paolo Abeni	0a3cc57978	mptcp: revert "mptcp: provide subflow aware release function" This change reverts commit `ad98dd3705` ("mptcp: provide subflow aware release function"). The latter introduced a deadlock spotted by syzkaller and is not needed anymore after the previous commit. Fixes: `ad98dd3705` ("mptcp: provide subflow aware release function") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-01 16:02:50 -07:00
Paolo Abeni	86581852d7	mptcp: forbit mcast-related sockopt on MPTCP sockets Unrolling mcast state at msk dismantel time is bug prone, as syzkaller reported: ====================================================== WARNING: possible circular locking dependency detected 5.11.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor905/8822 is trying to acquire lock: ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323 but task is already holding lock: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline] ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507 which lock already depends on the new lock. Instead we can simply forbit any mcast-related setsockopt Fixes: `717e79c867` ("mptcp: Add setsockopt()/getsockopt() socket operations") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-01 16:02:50 -07:00
Wan Jiabing	ec7e48ca4b	net: smc: Remove repeated struct declaration struct smc_clc_msg_local is declared twice. One is declared at 301st line. The blew one is not needed. Remove the duplicate. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-01 15:52:38 -07:00
Norman Maurer	98184612ac	net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...); Support for UDP_GRO was added in the past but the implementation for getsockopt was missed which did lead to an error when we tried to retrieve the setting for UDP_GRO. This patch adds the missing switch case for UDP_GRO Fixes: `e20cf8d3f1` ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Norman Maurer <norman_maurer@apple.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-01 15:50:50 -07:00
Xu Jia	b7a320c3a1	net: ipv6: Refactor in rt6_age_examine_exception The logic in rt6_age_examine_exception is confusing. The commit is to refactor the code. Signed-off-by: Xu Jia <xujia39@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-01 15:45:26 -07:00
Hoang Le	f20a46c304	tipc: fix unique bearer names sanity check When enabling a bearer by name, we don't sanity check its name with higher slot in bearer list. This may have the effect that the name of an already enabled bearer bypasses the check. To fix the above issue, we just perform an extra checking with all existing bearers. Fixes: `cb30a63384` ("tipc: refactor function tipc_enable_bearer()") Cc: stable@vger.kernel.org Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-01 15:43:31 -07:00
Cong Wang	122e6c79ef	sock_map: Update sock type checks for UDP Now UDP supports sockmap and redirection, we can safely update the sock type checks for it accordingly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-15-xiyou.wangcong@gmail.com	2021-04-01 10:56:14 -07:00
Cong Wang	1f5be6b3b0	udp: Implement udp_bpf_recvmsg() for sockmap We have to implement udp_bpf_recvmsg() to replace the ->recvmsg() to retrieve skmsg from ingress_msg. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-14-xiyou.wangcong@gmail.com	2021-04-01 10:56:14 -07:00
Cong Wang	2bc793e327	skmsg: Extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331023237.41094-13-xiyou.wangcong@gmail.com	2021-04-01 10:56:14 -07:00
Cong Wang	d7f571188e	udp: Implement ->read_sock() for sockmap This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue. Note, the return value of ->read_sock() is unused in sk_psock_verdict_data_ready(), and UDP still does not support splice() due to lack of ->splice_read(), so users can not reach udp_read_sock() directly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-12-xiyou.wangcong@gmail.com	2021-04-01 10:56:14 -07:00
Cong Wang	8a59f9d1e3	sock: Introduce sk->sk_prot->psock_update_sk_prot() Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331023237.41094-11-xiyou.wangcong@gmail.com	2021-04-01 10:56:14 -07:00
Cong Wang	a7ba4558e6	sock_map: Introduce BPF_SK_SKB_VERDICT Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to attach stream_verdict and skb_verdict programs to the same map. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-10-xiyou.wangcong@gmail.com	2021-04-01 10:56:14 -07:00
Cong Wang	b017055255	sock_map: Kill sock_map_link_no_progs() Now we can fold sock_map_link_no_progs() into sock_map_link() and get rid of sock_map_link_no_progs(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331023237.41094-9-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	2004fdbd8a	sock_map: Simplify sock_map_link() a bit sock_map_link() passes down map progs, but it is confusing to see both map progs and psock progs. Make the map progs more obvious by retrieving it directly with sock_map_progs() inside sock_map_link(). Now it is aligned with sock_map_link_no_progs() too. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-8-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	190179f65b	skmsg: Use GFP_KERNEL in sk_psock_create_ingress_msg() This function is only called in process context. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-7-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	7786dfc41a	skmsg: Use rcu work for destroying psock The RCU callback sk_psock_destroy() only queues work psock->gc, so we can just switch to rcu work to simplify the code. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-6-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	799aa7f98d	skmsg: Avoid lock_sock() in sk_psock_backlog() We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches zero but here we want to purge them explicitly in sock_map_close(). There are also some nasty race conditions on testing bit SK_PSOCK_TX_ENABLED and queuing/canceling the psock work, we can expand psock->ingress_lock a bit to protect them too. As noticed by John, we still have to lock the psock->work, because the same work item could be running concurrently on different CPU's. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-5-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	0739cd28f2	net: Introduce skb_send_sock() for sock_map We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	b01fd6e802	skmsg: Introduce a spinlock to protect ingress_msg Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-3-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Cong Wang	37f0e514db	skmsg: Lock ingress_skb when purging Currently we purge the ingress_skb queue only when psock refcnt goes down to 0, so locking the queue is not necessary, but in order to be called during ->close, we have to lock it here. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-2-xiyou.wangcong@gmail.com	2021-04-01 10:56:13 -07:00
Ong Boon Leong	622d13694b	xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model xdp_return_frame() may be called outside of NAPI context to return xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with napi_direct = false. For page_pool memory model, __xdp_return() calls xdp_return_frame_no_direct() unconditionally and below false negative kernel BUG throw happened under preempt-rt build: [ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884 [ 430.451678] caller is __xdp_return+0x1ff/0x2e0 [ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45 Changes in v2: - This patch fixes the issue by making xdp_return_frame_no_direct() is only called if napi_direct = true, as recommended for better by Jesper Dangaard Brouer. Thanks! Fixes: `2539650fad` ("xdp: Helpers for disabling napi_direct of xdp_return_frame") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 15:15:23 -07:00
Eric Dumazet	0d7a7b2014	ipv6: remove extra dev_hold() for fallback tunnels My previous commits added a dev_hold() in tunnels ndo_init(), but forgot to remove it from special functions setting up fallback tunnels. Fallback tunnels do call their respective ndo_init() This leads to various reports like : unregister_netdevice: waiting for ip6gre0 to become free. Usage count = 2 Fixes: `48bb569726` ("ip6_tunnel: sit: proper dev_{hold\|put} in ndo_[un]init methods") Fixes: `6289a98f08` ("sit: proper dev_{hold\|put} in ndo_[un]init methods") Fixes: `40cb881b5a` ("ip6_vti: proper dev_{hold\|put} in ndo_[un]init methods") Fixes: `7f700334be` ("ip6_gre: proper dev_{hold\|put} in ndo_[un]init methods") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:53:11 -07:00
Yang Yingliang	ac1db7acea	net/tipc: fix missing destroy_workqueue() on error in tipc_crypto_start() Add the missing destroy_workqueue() before return from tipc_crypto_start() in the error handling case. Fixes: `1ef6f7c939` ("tipc: add automatic session key exchange") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:52:00 -07:00
Eric Dumazet	a6175633a2	ipv6: convert elligible sysctls to u8 Convert most sysctls that can fit in a byte. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:48:20 -07:00
Eric Dumazet	1c3289c931	tcp: convert tcp_comp_sack_nr sysctl to u8 tcp_comp_sack_nr max value was already 255. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:48:20 -07:00
Eric Dumazet	7d4b37ebb9	ipv4: convert igmp_link_local_mcast_reports sysctl to u8 This sysctl is a bool, can use less storage. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:48:20 -07:00
Eric Dumazet	be205fe6ec	ipv4: convert fib_multipath_{use_neigh\|hash_policy} sysctls to u8 Make room for better packing of netns_ipv4 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:48:20 -07:00
Eric Dumazet	cd04bd0222	ipv4: convert udp_l3mdev_accept sysctl to u8 Reduce footprint of sysctls. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:48:20 -07:00
Eric Dumazet	b2908fac5b	ipv4: convert fib_notify_on_flag_change sysctl to u8 Reduce footprint of sysctls. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:48:19 -07:00
David S. Miller	c9170f1321	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-03-31 1) Fix ipv4 pmtu checks for xfrm anf vti interfaces. From Eyal Birger. 2) There are situations where the socket passed to xfrm_output_resume() is not the same as the one attached to the skb. Use the socket passed to xfrm_output_resume() to avoid lookup failures when xfrm is used with VRFs. From Evan Nimmo. 3) Make the xfrm_state_hash_generation sequence counter per network namespace because but its write serialization lock is also per network namespace. Write protection is insufficient otherwise. From Ahmed S. Darwish. 4) Fixup sctp featue flags when used with esp offload. From Xin Long. 5) xfrm BEET mode doesn't support fragments for inner packets. This is a limitation of the protocol, so no fix possible. Warn at least to notify the user about that situation. From Xin Long. 6) Fix NULL pointer dereference on policy lookup when namespaces are uses in combination with esp offload. 7) Fix incorrect transformation on esp offload when packets get segmented at layer 3. 8) Fix some user triggered usages of WARN_ONCE in the xfrm compat layer. From Dmitry Safonov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:37:51 -07:00
Matthew Wilcox (Oracle)	3cbf7530a1	qrtr: Convert qrtr_ports from IDR to XArray The XArray interface is easier for this driver to use. Also fixes a bug reported by the improper use of GFP_ATOMIC. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:32:20 -07:00
Lv Yunlong	bdc2ab5c61	net/rds: Fix a use after free in rds_message_map_pages In rds_message_map_pages, the rm is freed by rds_message_put(rm). But rm is still used by rm->data.op_sg in return value. My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is freed to avoid the uaf. Fixes: `7dba92037b` ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:26:56 -07:00
Eric Dumazet	48bb569726	ip6_tunnel: sit: proper dev_{hold\|put} in ndo_[un]init methods Same reasons than for the previous commits : `6289a98f08` ("sit: proper dev_{hold\|put} in ndo_[un]init methods") `40cb881b5a` ("ip6_vti: proper dev_{hold\|put} in ndo_[un]init methods") `7f700334be` ("ip6_gre: proper dev_{hold\|put} in ndo_[un]init methods") After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. [1] WARNING: CPU: 1 PID: 21059 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 21059 Comm: syz-executor.4 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc900025aefe8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520004b5def RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888023488568 R13: ffff8880254e9000 R14: 00000000dfd82cfd R15: ffff88802ee2d7c0 FS: 00007f13bc590700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0943e74000 CR3: 0000000025273000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] ip6_tnl_dev_uninit+0x370/0x3d0 net/ipv6/ip6_tunnel.c:387 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 ip6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_tunnel.c:263 ip6_tnl_newlink+0x312/0x580 net/ipv6/ip6_tunnel.c:2052 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `919067cc84` ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:16:43 -07:00
Jakub Kicinski	1e5d1f69d9	ethtool: support FEC settings over netlink Add FEC API to netlink. This is not a 1-to-1 conversion. FEC settings already depend on link modes to tell user which modes are supported. Take this further an use link modes for manual configuration. Old struct ethtool_fecparam is still used to talk to the drivers, so we need to translate back and forth. We can revisit the internal API if number of FEC encodings starts to grow. Enforce only one active FEC bit (by using a bit position rather than another mask). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:15:23 -07:00
Tong Zhu	d47ec7a0a7	neighbour: Disregard DEAD dst in neigh_update After a short network outage, the dst_entry is timed out and put in DST_OBSOLETE_DEAD. We are in this code because arp reply comes from this neighbour after network recovers. There is a potential race condition that dst_entry is still in DST_OBSOLETE_DEAD. With that, another neighbour lookup causes more harm than good. In best case all packets in arp_queue are lost. This is counterproductive to the original goal of finding a better path for those packets. I observed a worst case with 4.x kernel where a dst_entry in DST_OBSOLETE_DEAD state is associated with loopback net_device. It leads to an ethernet header with all zero addresses. A packet with all zero source MAC address is quite deadly with mac80211, ath9k and 802.11 block ack. It fails ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx queue (ath_tx_complete_aggr). BAW (block ack window) is not updated. BAW logic is damaged and ath9k transmission is disabled. Signed-off-by: Tong Zhu <zhutong@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:10:46 -07:00
Pablo Neira Ayuso	19c28b1374	netfilter: add helper function to set up the nfnetlink header and use it This patch adds a helper function to set up the netlink and nfnetlink headers. Update existing codebase to use it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:11 +02:00
Pablo Neira Ayuso	802b805162	netfilter: nftables: add helper function to set the base sequence number This patch adds a helper function to calculate the base sequence number field that is stored in the nfnetlink header. Use the helper function whenever possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:11 +02:00
Yang Yingliang	7726c9ce71	netfilter: nftables: remove unnecessary spin_lock_init() The spinlock nf_tables_destroy_list_lock is initialized statically. It is unnecessary to initialize by spin_lock_init(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:11 +02:00
Pablo Neira Ayuso	8b9229d158	netfilter: flowtable: dst_check() from garbage collector path Move dst_check() to the garbage collector path. Stale routes trigger the flow entry teardown state which makes affected flows go back to the classic forwarding path to re-evaluate flow offloading. IPv6 requires the dst cookie to work, store it in the flow_tuple, otherwise dst_check() always fails. Fixes: `e5075c0bad` ("netfilter: flowtable: call dst_check() to fall back to classic forwarding") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:11 +02:00
Richard Guy Briggs	c520292f29	audit: log nftables configuration change events once per table Reduce logging of nftables events to a level similar to iptables. Restore the table field to list the table, adding the generation. Indicate the op as the most significant operation in the event. A couple of sample events: type=PROCTITLE msg=audit(2021-03-18 09:30:49.801:143) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid type=SYSCALL msg=audit(2021-03-18 09:30:49.801:143) : arch=x86_64 syscall=sendmsg success=yes exit=172 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=roo t sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null) type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv6 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv4 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=inet entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=PROCTITLE msg=audit(2021-03-18 09:30:49.839:144) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid type=SYSCALL msg=audit(2021-03-18 09:30:49.839:144) : arch=x86_64 syscall=sendmsg success=yes exit=22792 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=r oot sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null) type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv6 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv4 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=inet entries=165 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld The issue was originally documented in https://github.com/linux-audit/audit-kernel/issues/124 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:11 +02:00
Florian Westphal	cefa31a9d4	netfilter: nft_log: perform module load from nf_tables modprobe calls from the nf_logger_find_get() API causes deadlock in very special cases because they occur with the nf_tables transaction mutex held. In the specific case of nf_log, deadlock is via: A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \ -> pernet_ops rwsem -> wait for C B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load the backend module during a transaction. For nft_log we would have to add a softdep for both nfnetlink_log or nf_log_syslog, since we do not know in advance which of the two backends are going to be configured. This defers the modprobe op until after the transaction mutex is released. Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:11 +02:00
Florian Westphal	a38b5b56d6	netfilter: nf_log: add module softdeps xt_LOG has no direct dependency on the syslog-based logger, it relies on the nf_log core to probe the requested backend. Now that all syslog-based loggers reside in the same module, we can just add a soft dependency on nf_log_syslog and let modprobe take care of it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:10 +02:00
Florian Westphal	e465cccd0b	netfilter: nf_log_common: merge with nf_log_syslog Remove nf_log_common. Now that all per-af modules have been merged there is no longer a need to provide a helper module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:10 +02:00
Florian Westphal	77ccee96a6	netfilter: nf_log_bridge: merge with nf_log_syslog Provide bridge log support from nf_log_syslog. After the merge there is no need to load the "real packet loggers", all of them now reside in the same module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:05 +02:00
Chuck Lever	e3eded5e81	svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom() This, to me, seems less cluttered and less redundant. I was hoping it could help reduce lock contention on the dto_q lock by reducing the size of the critical section, but alas, the only improvement is readability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-31 15:58:48 -04:00
Chuck Lever	5533c4f4b9	svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg These fields are no longer used. The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on x86_64, down from 2440 bytes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-31 15:57:48 -04:00
Chuck Lever	9af723be86	svcrdma: Remove sc_read_complete_q Now that svc_rdma_recvfrom() waits for Read completion, sc_read_complete_q is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-31 15:57:48 -04:00
Chuck Lever	7d81ee8722	svcrdma: Single-stage RDMA Read Currently the generic RPC server layer calls svc_rdma_recvfrom() twice to retrieve an RPC message that uses Read chunks. I'm not exactly sure why this design was chosen originally. Instead, let's wait for the Read chunk completion inline in the first call to svc_rdma_recvfrom(). The goal is to eliminate some page allocator churn. rdma_read_complete() replaces pages in the second svc_rqst by calling put_page() repeatedly while the upper layer waits for the request to be constructed, which adds unnecessary NFS WRITE round- trip latency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com>	2021-03-31 15:57:39 -04:00
Geliang Tang	740d798e87	mptcp: remove id 0 address This patch added a new function mptcp_nl_remove_id_zero_address to remove the id 0 address. In this function, traverse all the existing msk sockets to find the msk matched the input IP address. Then fill the removing list with id 0, and pass it to mptcp_pm_remove_addr and mptcp_pm_remove_subflow. Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:42:23 -07:00
Geliang Tang	9f12e97bf1	mptcp: unify RM_ADDR and RM_SUBFLOW receiving There are some duplicate code in mptcp_pm_nl_rm_addr_received and mptcp_pm_nl_rm_subflow_received. This patch unifies them into a new function named mptcp_pm_nl_rm_addr_or_subflow. In it, use the input parameter rm_type to identify it's now removing an address or a subflow. Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:42:23 -07:00
Geliang Tang	774c8a8dcb	mptcp: remove all subflows involving id 0 address There's only one subflow involving the non-zero id address, but there may be multi subflows involving the id 0 address. Here's an example: local_id=0, remote_id=0 local_id=1, remote_id=0 local_id=0, remote_id=1 If the removing address id is 0, all the subflows involving the id 0 address need to be removed. In mptcp_pm_nl_rm_addr_received/mptcp_pm_nl_rm_subflow_received, the "break" prevents the iteration to the next subflow, so this patch dropped them. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:42:23 -07:00
Eric Dumazet	b8128656a5	net: fix icmp_echo_enable_probe sysctl sysctl_icmp_echo_enable_probe is an u8. ipv4_net_table entry should use .maxlen = sizeof(u8). .proc_handler = proc_dou8vec_minmax, Fixes: `f1b8fa9fa5` ("net: add sysctl for enabling RFC 8335 PROBE messages") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:38:43 -07:00
Paolo Abeni	78352f73dc	udp: never accept GSO_FRAGLIST packets Currently the UDP protocol delivers GSO_FRAGLIST packets to the sockets without the expected segmentation. This change addresses the issue introducing and maintaining a couple of new fields to explicitly accept SKB_GSO_UDP_L4 or GSO_FRAGLIST packets. Additionally updates udp_unexpected_gso() accordingly. UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist zeroed. v1 -> v2: - use 2 bits instead of a whole GSO bitmask (Willem) Fixes: `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:06:49 -07:00
Paolo Abeni	e0e3070a9b	udp: properly complete L4 GRO over UDP tunnel packet After the previous patch, the stack can do L4 UDP aggregation on top of a UDP tunnel. In such scenario, udp{4,6}_gro_complete will be called twice. This function will enter its is_flist branch immediately, even though that is only correct on the second call, as GSO_FRAGLIST is only relevant for the inner packet. Instead, we need to try first UDP tunnel-based aggregation, if the GRO packet requires that. This patch changes udp{4,6}_gro_complete to skip the frag list processing when while encap_mark == 1, identifying processing of the outer tunnel header. Additionally, clears the field in udp_gro_complete() so that we can enter the frag list path on the next round, for the inner header. v1 -> v2: - hopefully clarified the commit message Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:06:49 -07:00
Paolo Abeni	18f25dc399	udp: skip L4 aggregation for UDP tunnel packets If NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD are enabled, and there are UDP tunnels available in the system, udp_gro_receive() could end-up doing L4 aggregation (either SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST) at the outer UDP tunnel level for packets effectively carrying and UDP tunnel header. That could cause inner protocol corruption. If e.g. the relevant packets carry a vxlan header, different vxlan ids will be ignored/ aggregated to the same GSO packet. Inner headers will be ignored, too, so that e.g. TCP over vxlan push packets will be held in the GRO engine till the next flush, etc. Just skip the SKB_GSO_UDP_L4 and SKB_GSO_FRAGLIST code path if the current packet could land in a UDP tunnel, and let udp_gro_receive() do GRO via udp_sk(sk)->gro_receive. The check implemented in this patch is broader than what is strictly needed, as the existing UDP tunnel could be e.g. configured on top of a different device: we could end-up skipping GRO at-all for some packets. Anyhow, that is a very thin corner case and covering it will add quite a bit of complexity. v1 -> v2: - hopefully clarify the commit message Fixes: `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") Fixes: `36707061d6` ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:06:49 -07:00
Paolo Abeni	000ac44da7	udp: fixup csum for GSO receive slow path When UDP packets generated locally by a socket with UDP_SEGMENT traverse the following path: UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx) -> UDP socket (no UDP_GRO) ip_summed will be set to CHECKSUM_PARTIAL at creation time and such checksum mode will be preserved in the above path up to the UDP tunnel receive code where we have: __iptunnel_pull_header() -> skb_pull_rcsum() -> skb_postpull_rcsum() -> __skb_postpull_rcsum() The latter will convert the skb to CHECKSUM_NONE. The UDP GSO packet will be later segmented as part of the rx socket receive operation, and will present a CHECKSUM_NONE after segmentation. Additionally the segmented packets UDP CB still refers to the original GSO packet len. Overall that causes unexpected/wrong csum validation errors later in the UDP receive path. We could possibly address the issue with some additional checks and csum mangling in the UDP tunnel code. Since the issue affects only this UDP receive slow path, let's set a suitable csum status there. Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP encapsulation present a valid checksum when landing to udp_queue_rcv_skb(), as the UDP checksum has been validated by the GRO engine. v2 -> v3: - even more verbose commit message and comments v1 -> v2: - restrict the csum update to the packets strictly needing them - hopefully clarify the commit message and code comments Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:06:49 -07:00
Wang Qing	b9aa074b89	net/decnet: Delete obsolete TODO file The TODO file here has not been updated from 2005, and the function development described in the file have been implemented or abandoned. Its existence will mislead developers seeking to view outdated information. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 16:54:50 -07:00
Wang Qing	8d9e5bbf5c	net/ax25: Delete obsolete TODO file The TODO file here has not been updated for 13 years, and the function development described in the file have been implemented or abandoned. Its existence will mislead developers seeking to view outdated information. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 16:54:50 -07:00
Pablo Neira Ayuso	fbea31808c	netfilter: conntrack: do not print icmpv6 as unknown via /proc /proc/net/nf_conntrack shows icmpv6 as unknown. Fixes: `09ec82f5af` ("netfilter: conntrack: remove protocol name from l4proto struct") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 01:12:47 +02:00
Pablo Neira Ayuso	0e07e25b48	netfilter: flowtable: fix NAT IPv6 offload mangling Fix out-of-bound access in the address array. Fixes: `5c27d8d76c` ("netfilter: nf_flow_table_offload: add IPv6 support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 01:12:47 +02:00
Florian Westphal	1510618e45	netfilter: nf_log_netdev: merge with nf_log_syslog Provide netdev family support from the nf_log_syslog module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 00:37:27 +02:00
Florian Westphal	f5466caab9	netfilter: nf_log_ipv6: merge with nf_log_syslog This removes the nf_log_ipv6 module, the functionality is now provided by nf_log_syslog. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 00:37:27 +02:00
Florian Westphal	f11d61e795	netfilter: nf_log_arp: merge with nf_log_syslog similar to previous change: nf_log_syslog now covers ARP logging as well, the old nf_log_arp module is removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 00:37:27 +02:00
Florian Westphal	db3187ae21	netfilter: nf_log_ipv4: rename to nf_log_syslog Netfilter has multiple log modules: nf_log_arp nf_log_bridge nf_log_ipv4 nf_log_ipv6 nf_log_netdev nfnetlink_log nf_log_common With the exception of nfnetlink_log (packet is sent to userspace for dissection/logging), all of them log to the kernel ringbuffer. This is the first part of a series to merge all modules except nfnetlink_log into a single module: nf_log_syslog. This allows to reduce code. After the series, only two log modules remain: nfnetlink_log and nf_log_syslog. The latter provides the same functionality as the old per-af log modules. This renames nf_log_ipv4 to nf_log_syslog. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 00:37:27 +02:00
Paolo Abeni	9adc89af72	net: let skb_orphan_partial wake-up waiters. Currently the mentioned helper can end-up freeing the socket wmem without waking-up any processes waiting for more write memory. If the partially orphaned skb is attached to an UDP (or raw) socket, the lack of wake-up can hang the user-space. Even for TCP sockets not calling the sk destructor could have bad effects on TSQ. Address the issue using skb_orphan to release the sk wmem before setting the new sock_efree destructor. Additionally bundle the whole ownership update in a new helper, so that later other potential users could avoid duplicate code. v1 -> v2: - use skb_orphan() instead of sort of open coding it (Eric) - provide an helper for the ownership change (Eric) Fixes: `f6ba8d33cf` ("netem: fix skb_orphan_partial()") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:57:28 -07:00
Yunjian Wang	ae81feb733	sch_htb: fix null pointer dereference on a null new_q sch_htb: fix null pointer dereference on a null new_q Currently if new_q is null, the null new_q pointer will be dereference when 'q->offload' is true. Fix this by adding a braces around htb_parent_to_leaf_offload() to avoid it. Addresses-Coverity: ("Dereference after null check") Fixes: `d03b195b5a` ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:50:03 -07:00
Loic Poulain	8a03dd9257	net: qrtr: Fix memory leak on qrtr_tx_wait failure qrtr_tx_wait does not check for radix_tree_insert failure, causing the 'flow' object to be unreferenced after qrtr_tx_wait return. Fix that by releasing flow on radix_tree_insert failure. Fixes: `5fdeb0d372` ("net: qrtr: Implement outgoing flow control") Reported-by: syzbot+739016799a89c530b32a@syzkaller.appspotmail.com Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:48:29 -07:00
Andreas Roeseler	d329ea5bd8	icmp: add response to RFC 8335 PROBE messages Modify the icmp_rcv function to check PROBE messages and call icmp_echo if a PROBE request is detected. Modify the existing icmp_echo function to respond ot both ping and PROBE requests. This was tested using a custom modification to the iputils package and wireshark. It supports IPV4 probing by name, ifindex, and probing by both IPV4 and IPV6 addresses. It currently does not support responding to probes off the proxy node (see RFC 8335 Section 2). The modification to the iputils package is still in development and can be found here: https://github.com/Juniper-Clinic-2020/iputils.git. It supports full sending functionality of PROBE requests, but currently does not parse the response messages, which is why Wireshark is required to verify the sent and recieved PROBE messages. The modification adds the ``-e'' flag to the command which allows the user to specify the interface identifier to query the probed host. An example usage would be <./ping -4 -e 1 [destination]> to send a PROBE request of ifindex 1 to the destination node. Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:29:39 -07:00
Andreas Roeseler	504a40113c	ipv6: add ipv6_dev_find to stubs Add ipv6_dev_find to ipv6_stub to allow lookup of net_devices by IPV6 address in net/ipv4/icmp.c. Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:29:39 -07:00
Andreas Roeseler	08baf54f01	net: add support for sending RFC 8335 PROBE messages Modify the ping_supported function to support PROBE message types. This allows tools such as the ping command in the iputils package to be modified to send PROBE requests through the existing framework for sending ping requests. Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:29:39 -07:00
Andreas Roeseler	f1b8fa9fa5	net: add sysctl for enabling RFC 8335 PROBE messages Section 8 of RFC 8335 specifies potential security concerns of responding to PROBE requests, and states that nodes that support PROBE functionality MUST be able to enable/disable responses and that responses MUST be disabled by default Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:29:39 -07:00
Kumar Kartikeya Dwivedi	6855e8213e	net: sched: bump refcount for new action in ACT replace mode Currently, action creation using ACT API in replace mode is buggy. When invoking for non-existent action index 42, tc action replace action bpf obj foo.o sec <xyz> index 42 kernel creates the action, fills up the netlink response, and then just deletes the action after notifying userspace. tc action show action bpf doesn't list the action. This happens due to the following sequence when ovr = 1 (replace mode) is enabled: tcf_idr_check_alloc is used to atomically check and either obtain reference for existing action at index, or reserve the index slot using a dummy entry (ERR_PTR(-EBUSY)). This is necessary as pointers to these actions will be held after dropping the idrinfo lock, so bumping the reference count is necessary as we need to insert the actions, and notify userspace by dumping their attributes. Finally, we drop the reference we took using the tcf_action_put_many call in tcf_action_add. However, for the case where a new action is created due to free index, its refcount remains one. This when paired with the put_many call leads to the kernel setting up the action, notifying userspace of its creation, and then tearing it down. For existing actions, the refcount is still held so they remain unaffected. Fortunately due to rtnl_lock serialization requirement, such an action with refcount == 1 will not be concurrently deleted by anything else, at best CLS API can move its refcount up and down by binding to it after it has been published from tcf_idr_insert_many. Since refcount is atleast one until put_many call, CLS API cannot delete it. Also __tcf_action_put release path already ensures deterministic outcome (either new action will be created or existing action will be reused in case CLS API tries to bind to action concurrently) due to idr lock serialization. We fix this by making refcount of newly created actions as 2 in ACT API replace mode. A relaxed store will suffice as visibility is ensured only after the tcf_idr_insert_many call. Note that in case of creation or overwriting using CLS API only (i.e. bind = 1), overwriting existing action object is not allowed, and any such request is silently ignored (without error). The refcount bump that occurs in tcf_idr_check_alloc call there for existing action will pair with tcf_exts_destroy call made from the owner module for the same action. In case of action creation, there is no existing action, so no tcf_exts_destroy callback happens. This means no code changes for CLS API. Fixes: `cae422f379` ("net: sched: use reference counting action init") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:18:02 -07:00
Milton Miller	03cb4d05b4	net/ncsi: Avoid channel_monitor hrtimer deadlock Calling ncsi_stop_channel_monitor from channel_monitor is a guaranteed deadlock on SMP because stop calls del_timer_sync on the timer that invoked channel_monitor as its timer function. Recognise the inherent race of marking the monitor disabled before deleting the timer by just returning if enable was cleared. After a timeout (the default case -- reset to START when response received) just mark the monitor.enabled false. If the channel has an entry on the channel_queue list, or if the state is not ACTIVE or INACTIVE, then warn and mark the timer stopped and don't restart, as the locking is broken somehow. Fixes: `0795fb2021` ("net/ncsi: Stop monitor if channel times out or is inactive") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Eddie James <eajames@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 13:16:23 -07:00
Sven Eckelmann	35796c1d34	batman-adv: Fix misspelled "wont" checkpatch started to complain about the mispelling of: CHECK: 'wont' may be misspelled - perhaps 'won't'? #459: FILE: ./net/batman-adv/bat_iv_ogm.c:459: + * - the resulting packet wont be bigger than Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2021-03-30 21:19:50 +02:00
Dmitry Safonov	ef19e11133	xfrm/compat: Cleanup WARN()s that can be user-triggered Replace WARN_ONCE() that can be triggered from userspace with pr_warn_once(). Those still give user a hint what's the issue. I've left WARN()s that are not possible to trigger with current code-base and that would mean that the code has issues: - relying on current compat_msg_min[type] <= xfrm_msg_min[type] - expected 4-byte padding size difference between compat_msg_min[type] and xfrm_msg_min[type] - compat_policy[type].len <= xfrma_policy[type].len (for every type) Reported-by: syzbot+834ffd1afc7212eb8147@syzkaller.appspotmail.com Fixes: `5f3eea6b7e` ("xfrm/compat: Attach xfrm dumps to 64=>32 bit translator") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-30 07:29:09 +02:00
Martin KaFai Lau	7aae231ac9	bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE pahole currently only generates the btf_id for external function and ftrace-able function. Some functions in the bpf_tcp_ca_kfunc_ids are static (e.g. cubictcp_init). Thus, unless CONFIG_DYNAMIC_FTRACE is set, btf_ids for those functions will not be generated and the compilation fails during resolve_btfids. This patch limits those functions to CONFIG_DYNAMIC_FTRACE. I will address the pahole generation in a followup and then remove the CONFIG_DYNAMIC_FTRACE limitation. Fixes: `e78aea8b21` ("bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210329221357.834438-1-kafai@fb.com	2021-03-29 18:42:43 -07:00
Eric Dumazet	d24f511b04	tcp: fix tcp_min_tso_segs sysctl tcp_min_tso_segs is now stored in u8, so max value is 255. 255 limit is enforced by proc_dou8vec_minmax(). We can therefore remove the gso_max_segs variable. Fixes: 47996b489bdc ("tcp: convert elligible sysctls to u8") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 16:33:48 -07:00
Eric Dumazet	6289a98f08	sit: proper dev_{hold\|put} in ndo_[un]init methods After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Fixes: `919067cc84` ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 16:31:51 -07:00
Eric Dumazet	40cb881b5a	ip6_vti: proper dev_{hold\|put} in ndo_[un]init methods After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Therefore, we need to move dev_hold() call from vti6_tnl_create2() to vti6_dev_init_gen() [1] WARNING: CPU: 0 PID: 15951 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 15951 Comm: syz-executor.3 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc90001eaef28 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520003d5dd7 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff88801bb1c568 R13: ffff88801f69e800 R14: 00000000ffffffff R15: ffff888050889d40 FS: 00007fc79314e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1c1ff47108 CR3: 0000000020fd5000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] vti6_dev_uninit+0x31a/0x360 net/ipv6/ip6_vti.c:297 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 vti6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_vti.c:190 vti6_newlink+0x9d/0xd0 net/ipv6/ip6_vti.c:1020 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x331/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmmsg+0x195/0x470 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2516 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 16:31:51 -07:00
Eric Dumazet	7f700334be	ip6_gre: proper dev_{hold\|put} in ndo_[un]init methods After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding dev_hold(), and vice versa. - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. ip6_gre for example (among others problematic drivers) has to use dev_hold() in ip6gre_tunnel_init_common() instead of from ip6gre_newlink_common(), covering both ip6gre_tunnel_init() and ip6gre_tap_init()/ Note that ip6gre_tunnel_init_common() is not called from ip6erspan_tap_init() thus we also need to add a dev_hold() there, as ip6erspan_tunnel_uninit() does call dev_put() [1] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 8422 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 8422 Comm: syz-executor854 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc900018befd0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88801ef19c40 RSI: ffffffff815c51f5 RDI: fffff52000317dec RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888018cf4568 R13: ffff888018cf4c00 R14: ffff8880228f2000 R15: ffffffff8d659b80 FS: 00000000014eb300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d7bf2b3138 CR3: 0000000014933000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] ip6gre_tunnel_uninit+0x3d7/0x440 net/ipv6/ip6_gre.c:420 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 ip6gre_newlink_common.constprop.0+0x158/0x410 net/ipv6/ip6_gre.c:1984 ip6gre_newlink+0x275/0x7a0 net/ipv6/ip6_gre.c:2017 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: `919067cc84` ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 16:31:51 -07:00
Jon Maloy	02fdc14d9b	tipc: fix htmldoc and smatch warnings We fix a warning from the htmldoc tool and an indentation error reported by smatch. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 16:28:50 -07:00
Lv Yunlong	6bf24dc0cc	net:tipc: Fix a double free in tipc_sk_mcast_rcv In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq) finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time. Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after the branch completed. My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because this skb will be freed by kfree_skb(skb) finally. Fixes: `cb1b728096` ("tipc: eliminate race condition at multicast reception") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 16:21:37 -07:00
Maxim Kochetkov	fb6ec87f72	net: dsa: Fix type was not set for devlink port If PHY is not available on DSA port (described at devicetree but absent or failed to detect) then kernel prints warning after 3700 secs: [ 3707.948771] ------------[ cut here ]------------ [ 3707.948784] Type was not set for devlink port. [ 3707.948894] WARNING: CPU: 1 PID: 17 at net/core/devlink.c:8097 0xc083f9d8 We should unregister the devlink port as a user port and re-register it as an unused port before executing "continue" in case of dsa_port_setup error. Fixes: `86f8b1c01a` ("net: dsa: Do not make user port errors fatal") Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 13:49:04 -07:00
Oliver Hartkopp	f522d9559b	can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE Since commit `f5223e9eee` ("can: extend sockaddr_can to include j1939 members") the sockaddr_can has been extended in size and a new CAN_REQUIRED_SIZE macro has been introduced to calculate the protocol specific needed size. The ABI for the msg_name and msg_namelen has not been adapted to the new CAN_REQUIRED_SIZE macro for the other CAN protocols which leads to a problem when an existing binary reads the (increased) struct sockaddr_can in msg_name. Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Reported-by: Richard Weinberger <richard@nod.at> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Link: https://lore.kernel.org/linux-can/1135648123.112255.1616613706554.JavaMail.zimbra@nod.at/T/#t Link: https://lore.kernel.org/r/20210325125850.1620-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-03-29 09:51:43 +02:00
Oliver Hartkopp	9e9714742f	can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZE Since commit `f5223e9eee` ("can: extend sockaddr_can to include j1939 members") the sockaddr_can has been extended in size and a new CAN_REQUIRED_SIZE macro has been introduced to calculate the protocol specific needed size. The ABI for the msg_name and msg_namelen has not been adapted to the new CAN_REQUIRED_SIZE macro for the other CAN protocols which leads to a problem when an existing binary reads the (increased) struct sockaddr_can in msg_name. Fixes: `f5223e9eee` ("can: extend sockaddr_can to include j1939 members") Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Link: https://lore.kernel.org/linux-can/1135648123.112255.1616613706554.JavaMail.zimbra@nod.at/T/#t Link: https://lore.kernel.org/r/20210325125850.1620-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-03-29 09:51:20 +02:00
Steffen Klassert	c7dbf4c088	xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets Commit `94579ac3f6` ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.") added a XFRM_XMIT flag to avoid duplicate ESP trailer insertion on HW offload. This flag is set on the secpath that is shared amongst segments. This lead to a situation where some segments are not transformed correctly when segmentation happens at layer 3. Fix this by using private skb extensions for segmented and hw offloaded ESP packets. Fixes: `94579ac3f6` ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-29 09:14:12 +02:00
Martin KaFai Lau	21cfd2db9f	bpf: tcp: Fix an error in the bpf_tcp_ca_kfunc_ids list There is a typo in the bbr function, s/even/event/. This patch fixes it. Fixes: `e78aea8b21` ("bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210329003213.2274210-1-kafai@fb.com	2021-03-28 18:17:08 -07:00
kernel test robot	284fda1eff	sit: use min Opportunity for min() Generated by: scripts/coccinelle/misc/minmax.cocci CC: Denis Efremov <efremov@linux.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:58:39 -07:00
Xiongfeng Wang	b6908cf795	NFC: digital: Correct function name in the kerneldoc comments Fix the following W=1 kernel build warning(s): net/nfc/digital_core.c:473: warning: expecting prototype for start_poll operation(). Prototype was for digital_start_poll() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	f7b88985a1	ip6_tunnel:: Correct function name parse_tvl_tnl_enc_lim() in the kerneldoc comments Fix the following W=1 kernel build warning(s): net/ipv6/ip6_tunnel.c:401: warning: expecting prototype for parse_tvl_tnl_enc_lim(). Prototype was for ip6_tnl_parse_tlv_enc_lim() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	03ff7371cb	net: 9p: Correct function names in the kerneldoc comments Fix the following W=1 kernel build warning(s): net/9p/client.c:133: warning: expecting prototype for parse_options(). Prototype was for parse_opts() instead net/9p/client.c:269: warning: expecting prototype for p9_req_alloc(). Prototype was for p9_tag_alloc() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	54e625e3bd	9p/trans_fd: Correct function name p9_mux_destroy() in the kerneldoc Fix the following W=1 kernel build warning(s): net/9p/trans_fd.c:881: warning: expecting prototype for p9_mux_destroy(). Prototype was for p9_conn_destroy() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	8bf94a9250	net: 9p: Correct function name errstr2errno() in the kerneldoc comments Fix the following W=1 kernel build warning(s): net/9p/error.c:207: warning: expecting prototype for errstr2errno(). Prototype was for p9_errstr2errno() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	bb2882bc6c	net: core: Correct function name netevent_unregister_notifier() in the kerneldoc Fix the following W=1 kernel build warning(s): net/core/netevent.c:45: warning: expecting prototype for netevent_unregister_notifier(). Prototype was for unregister_netevent_notifier() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	af82508743	net: core: Correct function name dev_uc_flush() in the kerneldoc Fix the following W=1 kernel build warning(s): net/core/dev_addr_lists.c:732: warning: expecting prototype for dev_uc_flush(). Prototype was for dev_uc_init() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:56 -07:00
Xiongfeng Wang	3ba937fb95	netlabel: Correct function name netlbl_mgmt_add() in the kerneldoc comments Fix the following W=1 kernel build warning(s): net/netlabel/netlabel_mgmt.c:78: warning: expecting prototype for netlbl_mgmt_add(). Prototype was for netlbl_mgmt_add_common() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:55 -07:00
Xiongfeng Wang	37569287cb	l3mdev: Correct function names in the kerneldoc comments Fix the following W=1 kernel build warning(s): net/l3mdev/l3mdev.c:111: warning: expecting prototype for l3mdev_master_ifindex(). Prototype was for l3mdev_master_ifindex_rcu() instead net/l3mdev/l3mdev.c:145: warning: expecting prototype for l3mdev_master_upper_ifindex_by_index(). Prototype was for l3mdev_master_upper_ifindex_by_index_rcu() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:56:55 -07:00
Petr Machata	de1d1ee3e3	nexthop: Rename artifacts related to legacy multipath nexthop groups After resilient next-hop groups have been added recently, there are two types of multipath next-hop groups: the legacy "mpath", and the new "resilient". Calling the legacy next-hop group type "mpath" is unfortunate, because that describes the fact that a packet could be forwarded in one of several paths, which is also true for the resilient next-hop groups. Therefore, to make the naming clearer, rename various artifacts to reflect the assumptions made. Therefore as of this patch: - The flag for multipath groups is nh_grp_entry::is_multipath. This includes the legacy and resilient groups, as well as any future group types that behave as multipath groups. Functions that assume this have "mpath" in the name. - The flag for legacy multipath groups is nh_grp_entry::hash_threshold. Functions that assume this have "hthr" in the name. - The flag for resilient groups is nh_grp_entry::resilient. Functions that assume this have "res" in the name. Besides the above, struct nh_grp_entry::mpath was renamed to ::hthr as well. UAPI artifacts were obviously left intact. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:53:39 -07:00
Lu Wei	9195f06b2d	net: vsock: Fix a typo Modify "occured" to "occurred" in net/vmw_vsock/af_vsock.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:52:51 -07:00
Lu Wei	21c00a186f	net: sctp: Fix some typos Modify "unkown" to "unknown" in net/sctp/sm_make_chunk.c and Modify "orginal" to "original" in net/sctp/socket.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:52:50 -07:00
Lu Wei	ebf893958c	net: rds: Fix a typo Modify "beween" to "between" in net/rds/send.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:52:50 -07:00
Bhaskar Chowdhury	a7fd0e6d75	xfrm_user.c: Added a punctuation s/wouldnt/wouldn\'t/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury	aa8ef1b9ab	xfrm_policy.c : Mundane typo fix s/sucessful/successful/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury	fb373c8455	sm_statefuns.c: Mundane spello fixes s/simulataneous/simultaneous/ ....in three dirrent places. s/tempory/temporary/ s/interpeter/interpreter/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury	f2e3093172	reg.c: Fix a spello s/ingoring/ignoring/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury	0184235ec6	node.c: A typo fix s/synching/syncing/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury	bcae6d5faf	netfilter: nf_conntrack_acct.c: A typo fix s/Accouting/Accounting/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury	f60d94f0d7	netfilter: ipvs: A spello fix s/registerd/registered/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	195a8ec403	ncsi: internal.h: Fix a spello s/Firware/Firmware/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	55320b82d6	mptcp: subflow.c: Fix a typo s/concerened/concerned/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	b18dacab6b	mac80211: cfg.c: A typo fix s/assocaited/associated/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	61f8406010	llc: llc_core.c: COuple of typo fixes s/searchs/searches/ ....two different places. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	71a2fae508	kcm: kcmsock.c: Couple of typo fixes s/synchonization/synchronization/ s/aready/already/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	bf05d48dbd	iucv: af_iucv.c: Couple of typo fixes s/unitialized/uninitialized/ s/notifcations/notifications/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	89e8347f0f	ipv6: route.c: A spello fix s/notfication/notification/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	912b519afc	ipv6: addrconf.c: Fix a typo s/Identifers/Identifiers/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	e5ca43e82d	ipv4: tcp_lp.c: Couple of typo fixes s/resrved/reserved/ s/within/within/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	a66e04ce0e	ipv4: ip_output.c: Couple of typo fixes s/readibility/readability/ s/insufficent/insufficient/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	e919ee389c	bearer.h: Spellos fixed s/initalized/initialized/ ...three different places Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Bhaskar Chowdhury	8406d38fde	af_x25.c: Fix a spello s/facilties/facilities/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-28 17:31:13 -07:00
Atul Gopinathan	7e32a09fdc	bpf: tcp: Remove comma which is causing build error Currently, building the bpf-next source with the CONFIG_BPF_SYSCALL enabled is causing a compilation error: "net/ipv4/bpf_tcp_ca.c:209:28: error: expected identifier or '(' before ',' token" Fix this by removing an unnecessary comma. Fixes: `e78aea8b21` ("bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc") Reported-by: syzbot+0b74d8ec3bf0cc4e4209@syzkaller.appspotmail.com Signed-off-by: Atul Gopinathan <atulgopinathan@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210328120515.113895-1-atulgopinathan@gmail.com	2021-03-28 11:23:55 -07:00
Martin KaFai Lau	7bd1590d4e	bpf: selftests: Add kfunc_call test This patch adds a few kernel function bpf_kfunc_call_test() for the selftest's test_run purpose. They will be allowed for tc_cls prog. The selftest calling the kernel function bpf_kfunc_call_test() is also added in this patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210325015252.1551395-1-kafai@fb.com	2021-03-26 20:41:52 -07:00
Martin KaFai Lau	e78aea8b21	bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc This patch puts some tcp cong helper functions, tcp_slow_start() and tcp_cong_avoid_ai(), into the allowlist for the bpf-tcp-cc program. A few tcp cc implementation functions are also put into the allowlist. A potential use case is the bpf-tcp-cc implementation may only want to override a subset of a tcp_congestion_ops. For others, the bpf-tcp-cc can directly call the kernel counter parts instead of re-implementing (or copy-and-pasting) them to the bpf program. They will only be available to the bpf-tcp-cc typed program. The allowlist functions are not bounded to a fixed ABI contract. When any of them has changed, the bpf-tcp-cc program has to be changed like any in-tree/out-of-tree kernel tcp-cc implementations do also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210325015201.1546345-1-kafai@fb.com	2021-03-26 20:41:51 -07:00
Martin KaFai Lau	d22f6ad187	tcp: Rename bictcp function prefix to cubictcp The cubic functions in tcp_cubic.c are using the bictcp prefix as in tcp_bic.c. This patch gives it the proper name cubictcp because the later patch will allow the bpf prog to directly call the cubictcp implementation. Renaming them will avoid the name collision when trying to find the intended one to call during bpf prog load time. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210325015155.1545532-1-kafai@fb.com	2021-03-26 20:41:51 -07:00
Yang Yingliang	72e6afe6b4	net: llc: Correct function name llc_pdu_set_pf_bit() in header Fix the following make W=1 kernel build warning: net/llc/llc_pdu.c:36: warning: expecting prototype for pdu_set_pf_bit(). Prototype was for llc_pdu_set_pf_bit() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:24:14 -07:00
Yang Yingliang	8114f099d9	net: llc: Correct function name llc_sap_action_unitdata_ind() in header Fix the following make W=1 kernel build warning: net/llc/llc_s_ac.c:38: warning: expecting prototype for llc_sap_action_unit_data_ind(). Prototype was for llc_sap_action_unitdata_ind() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:24:14 -07:00
Yang Yingliang	26440a63a1	net: llc: Correct some function names in header Fix the following make W=1 kernel build warning: net/llc/llc_c_ev.c:622: warning: expecting prototype for conn_ev_qlfy_last_frame_eq_1(). Prototype was for llc_conn_ev_qlfy_last_frame_eq_1() instead net/llc/llc_c_ev.c:636: warning: expecting prototype for conn_ev_qlfy_last_frame_eq_0(). Prototype was for llc_conn_ev_qlfy_last_frame_eq_0() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:24:14 -07:00
Hoang Le	bc556d3edd	tipc: fix kernel-doc warnings Fix kernel-doc warning introduced in commit `b83e214b2e` ("tipc: add extack messages for bearer/media failure"): net/tipc/bearer.c:248: warning: Function parameter or member 'extack' not described in 'tipc_enable_bearer' Fixes: `b83e214b2e` ("tipc: add extack messages for bearer/media failure") Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:22:29 -07:00
Taehee Yoo	63ed8de4be	mld: add mc_lock for protecting per-interface mld data The purpose of this lock is to avoid a bottleneck in the query/report event handler logic. By previous patches, almost all mld data is protected by RTNL. So, the query and report event handler, which is data path logic acquires RTNL too. Therefore if a lot of query and report events are received, it uses RTNL for a long time. So it makes the control-plane bottleneck because of using RTNL. In order to avoid this bottleneck, mc_lock is added. mc_lock protect only per-interface mld data and per-interface mld data is used in the query/report event handler logic. So, no longer rtnl_lock is needed in the query/report event handler logic. Therefore bottleneck will be disappeared by mc_lock. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:56 -07:00
Taehee Yoo	f185de28d9	mld: add new workqueues for process mld events When query/report packets are received, mld module processes them. But they are processed under BH context so it couldn't use sleepable functions. So, in order to switch context, the two workqueues are added which processes query and report event. In the struct inet6_dev, mc_{query \| report}_queue are added so it is per-interface queue. And mc_{query \| report}_work are workqueue structure. When the query or report event is received, skb is queued to proper queue and worker function is scheduled immediately. Workqueues and queues are protected by spinlock, which is mc_{query \| report}_lock, and worker functions are protected by RTNL. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:56 -07:00
Taehee Yoo	88e2ca3080	mld: convert ifmcaddr6 to RCU The ifmcaddr6 has been protected by inet6_dev->lock(rwlock) so that the critical section is atomic context. In order to switch this context, changing locking is needed. The ifmcaddr6 actually already protected by RTNL So if it's converted to use RCU, its control path context can be switched to sleepable. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:56 -07:00
Taehee Yoo	4b200e3989	mld: convert ip6_sf_list to RCU The ip6_sf_list has been protected by mca_lock(spin_lock) so that the critical section is atomic context. In order to switch this context, changing locking is needed. The ip6_sf_list actually already protected by RTNL So if it's converted to use RCU, its control path context can be switched to sleepable. But It doesn't remove mca_lock yet because ifmcaddr6 isn't converted to RCU yet. So, It's not fully converted to the sleepable context. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:56 -07:00
Taehee Yoo	882ba1f73c	mld: convert ipv6_mc_socklist->sflist to RCU The sflist has been protected by rwlock so that the critical section is atomic context. In order to switch this context, changing locking is needed. The sflist actually already protected by RTNL So if it's converted to use RCU, its control path context can be switched to sleepable. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:56 -07:00
Taehee Yoo	cf2ce339b4	mld: get rid of inet6_dev->mc_lock The purpose of mc_lock is to protect inet6_dev->mc_tomb. But mc_tomb is already protected by RTNL and all functions, which manipulate mc_tomb are called under RTNL. So, mc_lock is not needed. Furthermore, it is spinlock so the critical section is atomic. In order to reduce atomic context, it should be removed. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:55 -07:00
Taehee Yoo	2d9a93b490	mld: convert from timer to delayed work mcast.c has several timers for delaying works. Timer's expire handler is working under atomic context so it can't use sleepable things such as GFP_KERNEL, mutex, etc. In order to use sleepable APIs, it converts from timers to delayed work. But there are some critical sections, which is used by both process and BH context. So that it still uses spin_lock_bh() and rwlock. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:14:55 -07:00
Jakub Kicinski	cf2cc0bf4f	ethtool: fec: fix FEC_NONE check Dan points out we need to use the mask not the bit (which is 0). Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `42ce127d98` ("ethtool: fec: sanitize ethtool_fecparam->fec") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:09:45 -07:00
Geliang Tang	b46a023810	mptcp: rename mptcp_pm_nl_add_addr_send_ack Since mptcp_pm_nl_add_addr_send_ack is now used for both ADD_ADDR and RM_ADDR cases, rename it to mptcp_pm_nl_addr_send_ack. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	8dd5efb1f9	mptcp: send ack for rm_addr This patch changes the sending ACK conditions for the ADD_ADDR, send an ACK packet for RM_ADDR too. In mptcp_pm_remove_addr, invoke mptcp_pm_nl_add_addr_send_ack to send the ACK packet. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	b65d95adb8	mptcp: drop useless addr_signal clear msk->pm.addr_signal is cleared in mptcp_pm_add_addr_signal, no need to clear it in mptcp_pm_nl_add_addr_send_ack again. Drop it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	557963c383	mptcp: move to next addr when subflow creation fail When an invalid address was announced, the subflow couldn't be created for this address. Therefore mptcp_pm_nl_subflow_established couldn't be invoked. Then the next addresses in the local address list didn't have a chance to be announced. This patch invokes the new function mptcp_pm_add_addr_echoed when the address is echoed. In it, use mptcp_lookup_anno_list_by_saddr to check whether this address is in the anno_list. If it is, PM schedules the status MPTCP_PM_SUBFLOW_ESTABLISHED to invoke mptcp_pm_create_subflow_or_signal_addr to deal with the next address in the local address list. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	d88c476f4a	mptcp: export lookup_anno_list_by_saddr This patch exported the static function lookup_anno_list_by_saddr, and renamed it to mptcp_lookup_anno_list_by_saddr. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	348d5c1dec	mptcp: move to next addr when timeout This patch called mptcp_pm_subflow_established to move to the next address when an ADD_ADDR has been retransmitted the maximum number of times. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	62535200be	mptcp: drop unused subflow in mptcp_pm_subflow_established This patch drops the unused parameter subflow in mptcp_pm_subflow_established(). Fixes: `926bdeab55` ("mptcp: Implement path manager interface commands") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	d84ad04941	mptcp: skip connecting the connected address This patch added a new helper named lookup_subflow_by_daddr to find whether the destination address is in the msk's conn_list. In mptcp_pm_nl_add_addr_received, use lookup_subflow_by_daddr to check whether the announced address is already connected. If it is, skip connecting this address and send out the echo. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Geliang Tang	f7efc7771e	mptcp: drop argument port from mptcp_pm_announce_addr Drop the redundant argument 'port' from mptcp_pm_announce_addr, use the port field of another argument 'addr' instead. Fixes: `0f5c9e3f07` ("mptcp: add port parameter for mptcp_pm_announce_addr") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Paolo Abeni	2d6f5a2b57	mptcp: clean-up the rtx path After the previous patch we can easily avoid invoking the workqueue to perform the retransmission, if the msk socket lock is held at rtx timer expiration. This also simplifies the relevant code. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-26 15:05:15 -07:00
Marcel Holtmann	d58cf00dce	Bluetooth: Increment management interface revision Increment the mgmt revision due to recent changes. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2021-03-26 14:05:22 -07:00
Marcel Holtmann	3d34a71ff8	Bluetooth: Move the advertisement monitor events to correct list The list of trusted events should contain the advertisement monitor events and not the untrusted one, so move entries to the correct list. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2021-03-26 14:05:22 -07:00
Marcel Holtmann	02431b6cdb	Bluetooth: Add missing entries for PHY configuration commands The list of supported mgmt commands for PHY configuration is missing, so just add them. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2021-03-26 14:05:22 -07:00
Marcel Holtmann	21dd118f8d	Bluetooth: Fix wrong opcode error for read advertising features The read advertising features error handling returns false the opcode for the set advertising command. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2021-03-26 12:58:13 -07:00
Marcel Holtmann	353cac0e10	Bluetooth: Fix mgmt status for LL Privacy experimental feature The return error when trying to change the setting when a controller is powered up, shall be MGMT_STATUS_REJECTED. However instead now the error MGMT_STATUS_NOT_POWERED is used which is exactly the opposite. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2021-03-26 12:58:13 -07:00
Yonghong Song	b910eaaaa4	bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper Jiri Olsa reported a bug ([1]) in kernel where cgroup local storage pointer may be NULL in bpf_get_local_storage() helper. There are two issues uncovered by this bug: (1). kprobe or tracepoint prog incorrectly sets cgroup local storage before prog run, (2). due to change from preempt_disable to migrate_disable, preemption is possible and percpu storage might be overwritten by other tasks. This issue (1) is fixed in [2]. This patch tried to address issue (2). The following shows how things can go wrong: task 1: bpf_cgroup_storage_set() for percpu local storage preemption happens task 2: bpf_cgroup_storage_set() for percpu local storage preemption happens task 1: run bpf program task 1 will effectively use the percpu local storage setting by task 2 which will be either NULL or incorrect ones. Instead of just one common local storage per cpu, this patch fixed the issue by permitting 8 local storages per cpu and each local storage is identified by a task_struct pointer. This way, we allow at most 8 nested preemption between bpf_cgroup_storage_set() and bpf_cgroup_storage_unset(). The percpu local storage slot is released (calling bpf_cgroup_storage_unset()) by the same task after bpf program finished running. bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set() interface. The patch is tested on top of [2] with reproducer in [1]. Without this patch, kernel will emit error in 2-3 minutes. With this patch, after one hour, still no error. [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T [2] https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Link: https://lore.kernel.org/bpf/20210323055146.3334476-1-yhs@fb.com	2021-03-25 18:31:36 -07:00
Eric Dumazet	4ecc1baf36	tcp: convert elligible sysctls to u8 Many tcp sysctls are either bools or small ints that can fit into u8. Reducing space taken by sysctls can save few cache line misses when sending/receiving data while cpu caches are empty, for example after cpu idle period. This is hard to measure with typical network performance tests, but after this patch, struct netns_ipv4 has shrunk by three cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:39:33 -07:00
Eric Dumazet	2932bcda07	inet: convert tcp_early_demux and udp_early_demux to u8 For these sysctls, their dedicated helpers have to use proc_dou8vec_minmax(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:39:33 -07:00
Eric Dumazet	1c69dedc8f	ipv4: convert ip_forward_update_priority sysctl to u8 This sysctl uses ip_fwd_update_priority() helper, so the conversion needs to change it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:39:33 -07:00
Eric Dumazet	4b6bbf17d4	ipv4: shrink netns_ipv4 with sysctl conversions These sysctls that can fit in one byte instead of one int are converted to save space and thus reduce cache line misses. - icmp_echo_ignore_all, icmp_echo_ignore_broadcasts, - icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr - tcp_ecn, tcp_ecn_fallback - ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu - ip_nonlocal_bind, ip_autobind_reuse - ip_dynaddr, ip_early_demux, raw_l3mdev_accept - nexthop_compat_mode, fwmark_reflect Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:39:33 -07:00
Dmitry Vyukov	6c996e1994	net: change netdev_unregister_timeout_secs min value to 1 netdev_unregister_timeout_secs=0 can lead to printing the "waiting for dev to become free" message every jiffy. This is too frequent and unnecessary. Set the min value to 1 second. Also fix the merge issue introduced by "net: make unregister netdev warning timeout configurable": it changed "refcnt != 1" to "refcnt". Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: `5aa3afe107` ("net: make unregister netdev warning timeout configurable") Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:24:06 -07:00
Lu Wei	cbd801b3b0	net: ipv4: Fix some typos Modify "accomodate" to "accommodate" in net/ipv4/esp4.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:05:08 -07:00
Lu Wei	952a67f6f6	net: dsa: Fix a typo in tag_rtl4_a.c Modify "Apparantly" to "Apparently" in net/dsa/tag_rtl4_a.c.. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:05:08 -07:00
Lu Wei	e51443d54b	net: decnet: Fix a typo in dn_nsp_in.c Modify "erronous" to "erroneous" in net/decnet/dn_nsp_in.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:05:07 -07:00
Lu Wei	897b9fae7a	net: core: Fix a typo in dev_addr_lists.c Modify "funciton" to "function" in net/core/dev_addr_lists.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:05:07 -07:00
Lu Wei	3f9143f10c	net: ceph: Fix a typo in osdmap.c Modify "inital" to "initial" in net/ceph/osdmap.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:05:07 -07:00
Xiaoming Ni	4b5db93e7f	nfc: Avoid endless loops caused by repeated llcp_sock_connect() When sock_wait_state() returns -EINPROGRESS, "sk->sk_state" is LLCP_CONNECTING. In this case, llcp_sock_connect() is repeatedly invoked, nfc_llcp_sock_link() will add sk to local->connecting_sockets twice. sk->sk_node->next will point to itself, that will make an endless loop and hang-up the system. To fix it, check whether sk->sk_state is LLCP_CONNECTING in llcp_sock_connect() to avoid repeated invoking. Fixes: `b4011239a0` ("NFC: llcp: Fix non blocking sockets connections") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.11 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:02:01 -07:00
Xiaoming Ni	7574fcdbdc	nfc: fix memory leak in llcp_sock_connect() In llcp_sock_connect(), use kmemdup to allocate memory for "llcp_sock->service_name". The memory is not released in the sock_unlink label of the subsequent failure branch. As a result, memory leakage occurs. fix CVE-2020-25672 Fixes: `d646960f79` ("NFC: Initial LLCP support") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.3 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:02:01 -07:00
Xiaoming Ni	8a4cd82d62	nfc: fix refcount leak in llcp_sock_connect() nfc_llcp_local_get() is invoked in llcp_sock_connect(), but nfc_llcp_local_put() is not invoked in subsequent failure branches. As a result, refcount leakage occurs. To fix it, add calling nfc_llcp_local_put(). fix CVE-2020-25671 Fixes: `c7aa12252f` ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.6 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:02:01 -07:00
Xiaoming Ni	c33b1cc62a	nfc: fix refcount leak in llcp_sock_bind() nfc_llcp_local_get() is invoked in llcp_sock_bind(), but nfc_llcp_local_put() is not invoked in subsequent failure branches. As a result, refcount leakage occurs. To fix it, add calling nfc_llcp_local_put(). fix CVE-2020-25670 Fixes: `c7aa12252f` ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.6 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 17:02:01 -07:00
Lu Wei	f1dcffcc8a	net: Fix a misspell in socket.c s/addres/address Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:56:27 -07:00
Hoang Le	b83e214b2e	tipc: add extack messages for bearer/media failure Add extack error messages for -EINVAL errors when enabling bearer, getting/setting properties for a media/bearer Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:54:45 -07:00
Vladimir Oltean	479dc497db	net: dsa: only unset VLAN filtering when last port leaves last VLAN-aware bridge DSA is aware of switches with global VLAN filtering since the blamed commit, but it makes a bad decision when multiple bridges are spanning the same switch: ip link add br0 type bridge vlan_filtering 1 ip link add br1 type bridge vlan_filtering 1 ip link set swp2 master br0 ip link set swp3 master br0 ip link set swp4 master br1 ip link set swp5 master br1 ip link set swp5 nomaster ip link set swp4 nomaster [138665.939930] sja1105 spi0.1: port 3: dsa_core: VLAN filtering is a global setting [138665.947514] DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE When all ports leave br1, DSA blindly attempts to disable VLAN filtering on the switch, ignoring the fact that br0 still exists and is VLAN-aware too. It fails while doing that. This patch checks whether any port exists at all and is under a VLAN-aware bridge. Fixes: `d371b7c92d` ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:48:45 -07:00
Jakub Kicinski	42ce127d98	ethtool: fec: sanitize ethtool_fecparam->fec Reject NONE on set, this mode means device does not support FEC so it's a little out of place in the set interface. This should be safe to do - user space ethtool does not allow the use of NONE on set. A few drivers treat it the same as OFF, but none use it instead of OFF. Similarly reject an empty FEC mask. The common user space tool will not send such requests and most drivers correctly reject it already. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	d3b37fc805	ethtool: fec: sanitize ethtool_fecparam->active_fec struct ethtool_fecparam::active_fec is a GET-only field, all in-tree drivers correctly ignore it on SET. Clear the field on SET to avoid any confusion. Again, we can't reject non-zero now since ethtool user space does not zero-init the param correctly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	240e114411	ethtool: fec: sanitize ethtool_fecparam->reserved struct ethtool_fecparam::reserved is never looked at by the core. Make sure it's actually 0. Unfortunately we can't return an error because old ethtool doesn't zero-initialize the structure for SET. On GET we can be more verbose, there are no in tree (ab)users. Fix up the kdoc on the structure. Remove the mention of FEC bypass. Seems like a niche thing to configure in the first place. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
David S. Miller	241949e488	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-03-24 The following pull-request contains BPF updates for your net-next tree. We've added 37 non-merge commits during the last 15 day(s) which contain a total of 65 files changed, 3200 insertions(+), 738 deletions(-). The main changes are: 1) Static linking of multiple BPF ELF files, from Andrii. 2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo. 3) Spelling fixes from various folks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:30:46 -07:00
David S. Miller	efd13b71a3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 15:31:22 -07:00
Bhaskar Chowdhury	5153ceb9e6	Bluetooth: L2CAP: Rudimentary typo fixes s/minium/minimum/ s/procdure/procedure/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-25 16:13:28 +01:00
Arnd Bergmann	6ad2dd6c14	ipv6: fix clang Wformat warning When building with 'make W=1', clang warns about a mismatched format string: net/ipv6/ah6.c:710:4: error: format specifies type 'unsigned short' but the argument has type 'int' [-Werror,-Wformat] aalg_desc->uinfo.auth.icv_fullbits/8); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/printk.h:375:34: note: expanded from macro 'pr_info' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ net/ipv6/esp6.c:1153:5: error: format specifies type 'unsigned short' but the argument has type 'int' [-Werror,-Wformat] aalg_desc->uinfo.auth.icv_fullbits / 8); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/printk.h:375:34: note: expanded from macro 'pr_info' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ Here, the result of dividing a 16-bit number by a 32-bit number produces a 32-bit result, which is printed as a 16-bit integer. Change the %hu format to the normal %u, which has the same effect but avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-25 09:48:32 +01:00
Linus Torvalds	e138138003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: "Various fixes, all over: 1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu. 2) Always store the rx queue mapping in veth, from Maciej Fijalkowski. 3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov. 4) Fix memory leak in octeontx2-af from Colin Ian King. 5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from Yonghong Song. 6) Fix tx ptp stats in mlx5, from Aya Levin. 7) Check correct ip version in tun decap, fropm Roi Dayan. 8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit. 9) Work item memork leak in mlx5, from Shay Drory. 10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann. 11) Lack of preemptrion awareness in macvlan, from Eric Dumazet. 12) Fix data race in pxa168_eth, from Pavel Andrianov. 13) Range validate stab in red_check_params(), from Eric Dumazet. 14) Inherit vlan filtering setting properly in b53 driver, from Florian Fainelli. 15) Fix rtnl locking in igc driver, from Sasha Neftin. 16) Pause handling fixes in igc driver, from Muhammad Husaini Zulkifli. 17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits. 18) Use after free in qlcnic, from Lv Yunlong. 19) fix crash in fritzpci mISDN, from Tong Zhang. 20) Premature rx buffer reuse in igb, from Li RongQing. 21) Missing termination of ip[a driver message handler arrays, from Alex Elder. 22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25 driver, from Xie He. 23) Use after free in c_can_pci_remove(), from Tong Zhang. 24) Uninitialized variable use in nl80211, from Jarod Wilson. 25) Off by one size calc in bpf verifier, from Piotr Krysiuk. 26) Use delayed work instead of deferrable for flowtable GC, from Yinjun Zhang. 27) Fix infinite loop in NPC unmap of octeontx2 driver, from Hariprasad Kelam. 28) Fix being unable to change MTU of dwmac-sun8i devices due to lack of fifo sizes, from Corentin Labbe. 29) DMA use after free in r8169 with WoL, fom Heiner Kallweit. 30) Mismatched prototypes in isdn-capi, from Arnd Bergmann. 31) Fix psample UAPI breakage, from Ido Schimmel" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits) psample: Fix user API breakage math: Export mul_u64_u64_div_u64 ch_ktls: fix enum-conversion warning octeontx2-af: Fix memory leak of object buf ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation net: bridge: don't notify switchdev for local FDB addresses net/sched: act_ct: clear post_ct if doing ct_clear net: dsa: don't assign an error value to tag_ops isdn: capi: fix mismatched prototypes net/mlx5: SF, do not use ecpu bit for vhca state processing net/mlx5e: Fix division by 0 in mlx5e_select_queue net/mlx5e: Fix error path for ethtool set-priv-flag net/mlx5e: Offload tuple rewrite for non-CT flows net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP net/mlx5: Add back multicast stats for uplink representor net: ipconfig: ic_dev can be NULL in ic_close_devs MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one docs: networking: Fix a typo r8169: fix DMA being used after buffer free if WoL is enabled net: ipa: fix init header command validation ...	2021-03-24 18:16:04 -07:00
Wang Hai	da1da87fa7	6lowpan: Fix some typos in nhc_udp.c s/Orignal/Original/ s/infered/inferred/ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 17:52:11 -07:00
Wang Hai	0e4161d0ed	net/packet: Fix a typo in af_packet.c s/sequencially/sequentially/ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 17:52:11 -07:00
Wang Hai	72a0f6d052	net/tls: Fix a typo in tls_device.c s/beggining/beginning/ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 17:52:11 -07:00
Eric Dumazet	aa6dd211e4	inet: use bigger hash table for IP ID generation In commit `73f156a6e8` ("inetpeer: get rid of ip_id_count") I used a very small hash table that could be abused by patient attackers to reveal sensitive information. Switch to a dynamic sizing, depending on RAM size. Typical big hosts will now use 128x more storage (2 MB) to get a similar increase in security and reduction of hash collisions. As a bonus, use of alloc_large_system_hash() spreads allocated memory among all NUMA nodes. Fixes: `73f156a6e8` ("inetpeer: get rid of ip_id_count") Reported-by: Amit Klein <aksecurity@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 16:45:11 -07:00
Sai Kalyaan Palla	c3dde0ee71	net: decnet: Fixed multiple Coding Style issues Made changes to coding style as suggested by checkpatch.pl changes are of the type: space required before the open parenthesis '(' space required after that ',' Signed-off-by: Sai Kalyaan Palla <saikalyaan63@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 16:25:21 -07:00
Bhaskar Chowdhury	536e11f96b	net: sched: Mundane typo fixes s/procdure/procedure/ s/maintanance/maintenance/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 15:09:11 -07:00
Pablo Neira Ayuso	3fb24a43c9	dsa: slave: add support for TC_SETUP_FT The dsa infrastructure provides a well-defined hierarchy of devices, pass up the call to set up the flow block to the master device. From the software dataplane, the netfilter infrastructure uses the dsa slave devices to refer to the input and output device for the given skbuff. Similarly, the flowtable definition in the ruleset refers to the dsa slave port devices. This patch adds the glue code to call ndo_setup_tc with TC_SETUP_FT with the master device via the dsa slave devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	17e52c0aaa	netfilter: flowtable: support for FLOW_ACTION_PPPOE_PUSH Add a PPPoE push action if layer 2 protocol is ETH_P_PPP_SES to add PPPoE flowtable hardware offload support. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Felix Fietkau	26267bf9bb	netfilter: flowtable: bridge vlan hardware offload and switchdev The switch might have already added the VLAN tag through PVID hardware offload. Keep this extra VLAN in the flowtable but skip it on egress. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	73f97025a9	netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled If there is a forward path to reach an ethernet device and hardware offload is enabled, then use the direct xmit path. Moreover, store the real device in the direct xmit path info since software datapath uses dev_hard_header() to push the layer encapsulation headers while hardware offload refers to the real device. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	eeff3000f2	netfilter: flowtable: add offload support for xmit path types When the flow tuple xmit_type is set to FLOW_OFFLOAD_XMIT_DIRECT, the dst_cache pointer is not valid, and the h_source/h_dest/ifidx out fields need to be used. This patch also adds the FLOW_ACTION_VLAN_PUSH action to pass the VLAN tag to the driver. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	a11e7973cf	netfilter: flowtable: add dsa support Replace the master ethernet device by the dsa slave port. Packets coming in from the software ingress path use the dsa slave port as input device. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	72efd585f7	netfilter: flowtable: add pppoe support Add the PPPoE protocol and session id to the flow tuple using the encap fields to uniquely identify flows from the receive path. For the transmit path, dev_hard_header() on the vlan device push the headers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	e990cef651	netfilter: flowtable: add bridge vlan filtering support Add the vlan tag based when PVID is set on. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	4cd91f7c29	netfilter: flowtable: add vlan support Add the vlan id and protocol to the flow tuple to uniquely identify flows from the receive path. For the transmit path, dev_hard_header() on the vlan device push the headers. This patch includes support for two vlan headers (QinQ) from the ingress path. Add a generic encap field to the flowtable entry which stores the protocol and the tag id. This allows to reuse these fields in the PPPoE support coming in a later patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	7a27f6ab41	netfilter: flowtable: use dev_fill_forward_path() to obtain egress device The egress device in the tuple is obtained from route. Use dev_fill_forward_path() instead to provide the real egress device for this flow whenever this is available. The new FLOW_OFFLOAD_XMIT_DIRECT type uses dev_queue_xmit() to transmit ethernet frames. Cache the source and destination hardware address to use dev_queue_xmit() to transfer packets. The FLOW_OFFLOAD_XMIT_DIRECT replaces FLOW_OFFLOAD_XMIT_NEIGH if dev_fill_forward_path() finds a direct transmit path. In case of topology updates, if peer is moved to different bridge port, the connection will time out, reconnect will result in a new entry with the correct path. Snooping fdb updates would allow for cleaning up stale flowtable entries. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	c63a7cc4d7	netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device Obtain the ingress device in the tuple from the route in the reply direction. Use dev_fill_forward_path() instead to get the real ingress device for this flow. Fall back to use the ingress device that the IP forwarding route provides if: - dev_fill_forward_path() finds no real ingress device. - the ingress device that is obtained is not part of the flowtable devices. - this route has a xfrm policy. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso	5139c0c007	netfilter: flowtable: add xmit path types Add the xmit_type field that defines the two supported xmit paths in the flowtable data plane, which are the neighbour and the xfrm xmit paths. This patch prepares for new flowtable xmit path types to come. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Felix Fietkau	0994d492a1	net: dsa: resolve forwarding path for dsa slave ports Add .ndo_fill_forward_path for dsa slave port devices Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Felix Fietkau	bcf2766b13	net: bridge: resolve forwarding path for VLAN tag actions in bridge devices Depending on the VLAN settings of the bridge and the port, the bridge can either add or remove a tag. When vlan filtering is enabled, the fdb lookup also needs to know the VLAN tag/proto for the destination address To provide this, keep track of the stack of VLAN tags for the path in the lookup context Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Pablo Neira Ayuso	ec9d16bab6	net: bridge: resolve forwarding path for bridge devices Add .ndo_fill_forward_path for bridge devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Pablo Neira Ayuso	e4417d6950	net: 8021q: resolve forwarding path for vlan devices Add .ndo_fill_forward_path for vlan devices. For instance, assuming the following topology: IP forwarding / \ eth0.100 eth0 \| eth0 . . . ethX ab💿ef🆎cd:ef For packets going through IP forwarding to eth0.100 whose destination MAC address is ab💿ef🆎cd:ef, dev_fill_forward_path() provides the following path: eth0.100 -> eth0 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Pablo Neira Ayuso	ddb94eafab	net: resolve forwarding path from virtual netdevice and HW destination address This patch adds dev_fill_forward_path() which resolves the path to reach the real netdevice from the IP forwarding side. This function takes as input the netdevice and the destination hardware address and it walks down the devices calling .ndo_fill_forward_path() for each device until the real device is found. For instance, assuming the following topology: IP forwarding / \ br0 eth0 / \ eth1 eth2 . . . ethX ab💿ef🆎cd:ef where eth1 and eth2 are bridge ports and eth0 provides WAN connectivity. ethX is the interface in another box which is connected to the eth1 bridge port. For packets going through IP forwarding to br0 whose destination MAC address is ab💿ef🆎cd:ef, dev_fill_forward_path() provides the following path: br0 -> eth1 .ndo_fill_forward_path for br0 looks up at the FDB for the bridge port from the destination MAC address to get the bridge port eth1. This information allows to create a fast path that bypasses the classic bridge and IP forwarding paths, so packets go directly from the bridge port eth1 to eth0 (wan interface) and vice versa. fast path .------------------------. / \ \| IP forwarding \| \| / \ \/ \| br0 eth0 . / \ -> eth1 eth2 . . . ethX ab💿ef🆎cd:ef Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Colin Ian King	ad248f7761	net: bridge: Fix missing return assignment from br_vlan_replay_one call The call to br_vlan_replay_one is returning an error return value but this is not being assigned to err and the following check on err is currently always false because err was initialized to zero. Fix this by assigning err. Addresses-Coverity: ("'Constant' variable guards dead code") Fixes: `22f67cdfae` ("net: bridge: add helper to replay VLANs installed on port") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:45:48 -07:00
Horatiu Vultur	b3cb91b97c	bridge: mrp: Disable roles before deleting the MRP instance When an MRP instance was created, the driver was notified that the instance is created and then in a different callback about role of the instance. But when the instance was deleted the driver was notified only that the MRP instance is deleted and not also that the role is disabled. This patch make sure that the driver is notified that the role is changed to disabled before the MRP instance is deleted to have similar callbacks with the creating of the instance. In this way it would simplify the logic in the drivers. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:14:08 -07:00
Xin Long	68dc022d04	xfrm: BEET mode doesn't support fragments for inner packets BEET mode replaces the IP(6) Headers with new IP(6) Headers when sending packets. However, when it's a fragment before the replacement, currently kernel keeps the fragment flag and replace the address field then encaps it with ESP. It would cause in RX side the fragments to get reassembled before decapping with ESP, which is incorrect. In Xiumei's testing, these fragments went over an xfrm interface and got encapped with ESP in the device driver, and the traffic was broken. I don't have a good way to fix it, but only to warn this out in dmesg. Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-24 09:58:19 +01:00
Meng Yu	0f90d320b4	Bluetooth: Remove trailing semicolon in macros Macros should not use a trailing semicolon. Signed-off-by: Meng Yu <yumeng18@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-24 08:56:24 +01:00
Dmitry Vyukov	5aa3afe107	net: make unregister netdev warning timeout configurable netdev_wait_allrefs() issues a warning if refcount does not drop to 0 after 10 seconds. While 10 second wait generally should not happen under normal workload in normal environment, it seems to fire falsely very often during fuzzing and/or in qemu emulation (~10x slower). At least it's not possible to understand if it's really a false positive or not. Automated testing generally bumps all timeouts to very high values to avoid flake failures. Add net.core.netdev_unregister_timeout_secs sysctl to make the timeout configurable for automated testing systems. Lowering the timeout may also be useful for e.g. manual bisection. The default value matches the current behavior. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877 Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 17:22:50 -07:00
Vladimir Oltean	010e269f91	net: dsa: sync up switchdev objects and port attributes when joining the bridge If we join an already-created bridge port, such as a bond master interface, then we can miss the initial switchdev notifications emitted by the bridge for this port, while it wasn't offloaded by anybody. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:06 -07:00
Vladimir Oltean	5961d6a12c	net: dsa: inherit the actual bridge port flags at join time DSA currently assumes that the bridge port starts off with this constellation of bridge port flags: - learning on - unicast flooding on - multicast flooding on - broadcast flooding on just by virtue of code copy-pasta from the bridge layer (new_nbp). This was a simple enough strategy thus far, because the 'bridge join' moment always coincided with the 'bridge port creation' moment. But with sandwiched interfaces, such as: br0 \| bond0 \| swp0 it may happen that the user has had time to change the bridge port flags of bond0 before enslaving swp0 to it. In that case, swp0 will falsely assume that the bridge port flags are those determined by new_nbp, when in fact this can happen: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 ip link set bond0 type bridge_slave learning off ip link set swp0 master br0 Now swp0 has learning enabled, bond0 has learning disabled. Not nice. Fix this by "dumpster diving" through the actual bridge port flags with br_port_flag_is_set, at bridge join time. We use this opportunity to split dsa_port_change_brport_flags into two distinct functions called dsa_port_inherit_brport_flags and dsa_port_clear_brport_flags, now that the implementation for the two cases is no longer similar. This patch also creates two functions called dsa_port_switchdev_sync and dsa_port_switchdev_unsync which collect what we have so far, even if that's asymmetrical. More is going to be added in the next patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:06 -07:00
Vladimir Oltean	2afc526ab3	net: dsa: pass extack to dsa_port_{bridge,lag}_join This is a pretty noisy change that was broken out of the larger change for replaying switchdev attributes and objects at bridge join time, which is when these extack objects are actually used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	185c9a760a	net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge DSA can properly detect and offload this sequence of operations: ip link add br0 type bridge ip link add bond0 type bond ip link set swp0 master bond0 ip link set bond0 master br0 But not this one: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 Actually the second one is more complicated, due to the elapsed time between the enslavement of bond0 and the offloading of it via swp0, a lot of things could have happened to the bond0 bridge port in terms of switchdev objects (host MDBs, VLANs, altered STP state etc). So this is a bit of a can of worms, and making sure that the DSA port's state is in sync with this already existing bridge port is handled in the next patches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	22f67cdfae	net: bridge: add helper to replay VLANs installed on port Currently this simple setup with DSA: ip link add br0 type bridge vlan_filtering 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 will not work because the bridge has created the PVID in br_add_if -> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1, but that was too early, since swp0 was not yet a lower of bond0, so it had no reason to act upon that notification. We need a helper in the bridge to replay the switchdev VLAN objects that were notified since the bridge port creation, because some of them may have been missed. As opposed to the br_mdb_replay function, the vg->vlan_list write side protection is offered by the rtnl_mutex which is sleepable, so we don't need to queue up the objects in atomic context, we can replay them right away. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	04846f903b	net: bridge: add helper to replay port and local fdb entries When a switchdev port starts offloading a LAG that is already in a bridge and has an FDB entry pointing to it: ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp0 master bond0 the switchdev driver will have no idea that this FDB entry is there, because it missed the switchdev event emitted at its creation. Ido Schimmel pointed this out during a discussion about challenges with switchdev offloading of stacked interfaces between the physical port and the bridge, and recommended to just catch that condition and deny the CHANGEUPPER event: https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/ But in fact, we might need to deal with the hard thing anyway, which is to replay all FDB addresses relevant to this port, because it isn't just static FDB entries, but also local addresses (ones that are not forwarded but terminated by the bridge). There, we can't just say 'oh yeah, there was an upper already so I'm not joining that'. So, similar to the logic for replaying MDB entries, add a function that must be called by individual switchdev drivers and replays local FDB entries as well as ones pointing towards a bridge port. This time, we use the atomic switchdev notifier block, since that's what FDB entries expect for some reason. Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	4f2673b3a2	net: bridge: add helper to replay port and host-joined mdb entries I have a system with DSA ports, and udhcpcd is configured to bring interfaces up as soon as they are created. I create a bridge as follows: ip link add br0 type bridge As soon as I create the bridge and udhcpcd brings it up, I also have avahi which automatically starts sending IPv6 packets to advertise some local services, and because of that, the br0 bridge joins the following IPv6 groups due to the code path detailed below: 33:33:ff:6d:c1:9c vid 0 33:33:00:00:00:6a vid 0 33:33:00:00:00:fb vid 0 br_dev_xmit -> br_multicast_rcv -> br_ip6_multicast_add_group -> __br_multicast_add_group -> br_multicast_host_join -> br_mdb_notify This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host hooked up, and switchdev will attempt to offload the host joined groups to an empty list of ports. Of course nobody offloads them. Then when we add a port to br0: ip link set swp0 master br0 the bridge doesn't replay the host-joined MDB entries from br_add_if, and eventually the host joined addresses expire, and a switchdev notification for deleting it is emitted, but surprise, the original addition was already completely missed. The strategy to address this problem is to replay the MDB entries (both the port ones and the host joined ones) when the new port joins the bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can be populated and only then attached to a bridge that you offload). However there are 2 possibilities: the addresses can be 'pushed' by the bridge into the port, or the port can 'pull' them from the bridge. Considering that in the general case, the new port can be really late to the party, and there may have been many other switchdev ports that already received the initial notification, we would like to avoid delivering duplicate events to them, since they might misbehave. And currently, the bridge calls the entire switchdev notifier chain, whereas for replaying it should just call the notifier block of the new guy. But the bridge doesn't know what is the new guy's notifier block, it just knows where the switchdev notifier chain is. So for simplification, we make this a driver-initiated pull for now, and the notifier block is passed as an argument. To emulate the calling context for mdb objects (deferred and put on the blocking notifier chain), we must iterate under RCU protection through the bridge's mdb entries, queue them, and only call them once we're out of the RCU read-side critical section. There was some opportunity for reuse between br_mdb_switchdev_host_port, br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev mdb object is created, so a helper was created. Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	f1d42ea100	net: bridge: add helper to retrieve the current ageing time The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: sysfs/ioctl/netlink -> br_set_ageing_time -> __set_ageing_time therefore not at bridge port creation time, so: (a) switchdev drivers have to hardcode the initial value for the address ageing time, because they didn't get any notification (b) that hardcoded value can be out of sync, if the user changes the ageing time before enslaving the port to the bridge We need a helper in the bridge, such that switchdev drivers can query the current value of the bridge ageing time when they start offloading it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	c0e715bbd5	net: bridge: add helper for retrieving the current bridge port STP state It may happen that we have the following topology with DSA or any other switchdev driver with LAG offload: ip link add br0 type bridge stp_state 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 ip link set swp1 master bond0 STP decides that it should put bond0 into the BLOCKING state, and that's that. The ports that are actively listening for the switchdev port attributes emitted for the bond0 bridge port (because they are offloading it) and have the honor of seeing that switchdev port attribute can react to it, so we can program swp0 and swp1 into the BLOCKING state. But if then we do: ip link set swp2 master bond0 then as far as the bridge is concerned, nothing has changed: it still has one bridge port. But this new bridge port will not see any STP state change notification and will remain FORWARDING, which is how the standalone code leaves it in. We need a function in the bridge driver which retrieves the current STP state, such that drivers can synchronize to it when they may have missed switchdev events. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	6ab4c3117a	net: bridge: don't notify switchdev for local FDB addresses As explained in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug. The bridge would not say that this entry is local: ip link add br0 type bridge ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master local and the switchdev driver would be more than happy to offload it as a normal static FDB entry. This is despite the fact that 'local' and non-'local' entries have completely opposite directions: a local entry is locally terminated and not forwarded, whereas a static entry is forwarded and not locally terminated. So, for example, DSA would install this entry on swp0 instead of installing it on the CPU port as it should. There is an even sadder part, which is that the 'local' flag is implicit if 'static' is not specified, meaning that this command produces the same result of adding a 'local' entry: bridge fdb add dev swp0 00:01:02:03:04:05 master I've updated the man pages for 'bridge', and after reading it now, it should be pretty clear to any user that the commands above were broken and should have never resulted in the 00:01:02:03:04:05 address being forwarded (this behavior is coherent with non-switchdev interfaces): https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/ If you're a user reading this and this is what you want, just use: bridge fdb add dev swp0 00:01:02:03:04:05 master static Because switchdev should have given drivers the means from day one to classify FDB entries as local/non-local, but didn't, it means that all drivers are currently broken. So we can just as well omit the switchdev notifications for local FDB entries, which is exactly what this patch does to close the bug in stable trees. For further development work where drivers might want to trap the local FDB entries to the host, we can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and selectively make drivers act upon that bit, while all the others ignore those entries if the 'is_local' bit is set. Fixes: `6b26b51b1d` ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:39:41 -07:00
Marcelo Ricardo Leitner	8ca1b090e5	net/sched: act_ct: clear post_ct if doing ct_clear Invalid detection works with two distinct moments: act_ct tries to find a conntrack entry and set post_ct true, indicating that that was attempted. Then, when flow dissector tries to dissect CT info and no entry is there, it knows that it was tried and no entry was found, and synthesizes/sets key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED \| TCA_FLOWER_KEY_CT_FLAGS_INVALID; mimicing what OVS does. OVS has this a bit more streamlined, as it recomputes the key after trying to find a conntrack entry for it. Issue here is, when we have 'tc action ct clear', it didn't clear post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to the above. The fix, thus, is to clear it. Reproducer rules: tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \ protocol ip flower ip_proto tcp ct_state -trk \ action ct zone 1 pipe \ action goto chain 2 tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \ protocol ip flower \ action ct clear pipe \ action goto chain 4 tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \ protocol ip flower ct_state -trk \ action mirred egress redirect dev enp130s0f1np1_0 With the fix, the 3rd rule matches, like it does with OVS kernel datapath. Fixes: `7baf2429a1` ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:32:26 -07:00
Xie He	65d2dbb300	net: lapb: Make "lapb_t1timer_running" able to detect an already running timer Problem: The "lapb_t1timer_running" function in "lapb_timer.c" is used in only one place: in the "lapb_kick" function in "lapb_out.c". "lapb_kick" calls "lapb_t1timer_running" to check if the timer is already pending, and if it is not, schedule it to run. However, if the timer has already fired and is running, and is waiting to get the "lapb->lock" lock, "lapb_t1timer_running" will not detect this, and "lapb_kick" will then schedule a new timer. The old timer will then abort when it sees a new timer pending. I think this is not right. The purpose of "lapb_kick" should be ensuring that the actual work of the timer function is scheduled to be done. If the timer function is already running but waiting for the lock, "lapb_kick" should not abort and reschedule it. Changes made: I added a new field "t1timer_running" in "struct lapb_cb" for "lapb_t1timer_running" to use. "t1timer_running" will accurately reflect whether the actual work of the timer is pending. If the timer has fired but is still waiting for the lock, "t1timer_running" will still correctly reflect whether the actual work is waiting to be done. The old "t1timer_stop" field, whose only responsibility is to ask a timer (that is already running but waiting for the lock) to abort, is no longer needed, because the new "t1timer_running" field can fully take over its responsibility. Therefore "t1timer_stop" is deleted. "t1timer_running" is not simply a negation of the old "t1timer_stop". At the end of the timer function, if it does not reschedule itself, "t1timer_running" is set to false to indicate that the timer is stopped. For consistency of the code, I also added "t2timer_running" and deleted "t2timer_stop". Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:14:50 -07:00
Sven Eckelmann	5fc087ff96	batman-adv: Drop unused header preempt.h The commit `b1de0f01b0` ("batman-adv: Use netif_rx_any_context().") removed the last user for a function declaration from linux/preempt.h. The include should therefore be cleaned up. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2021-03-23 21:52:28 +01:00
Linus Lüssing	549750babe	batman-adv: Fix order of kernel doc in batadv_priv During the inlining process of kerneldoc in commit `8b84cc4fb5` ("batman-adv: Use inline kernel-doc for enum/struct"), some comments were placed at the wrong struct members. Fixing this by reordering the comments. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2021-03-23 21:49:14 +01:00
Meng Yu	c29fb5f650	Bluetooth: Remove trailing semicolon in macros remove trailing semicolon in macros and coding style fix. Signed-off-by: Meng Yu <yumeng18@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-23 16:05:35 +01:00
Archie Pusaka	3af70b39fa	Bluetooth: check for zapped sk before connecting There is a possibility of receiving a zapped sock on l2cap_sock_connect(). This could lead to interesting crashes, one such case is tearing down an already tore l2cap_sock as is happened with this call trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0xc4/0x118 lib/dump_stack.c:56 register_lock_class kernel/locking/lockdep.c:792 [inline] register_lock_class+0x239/0x6f6 kernel/locking/lockdep.c:742 __lock_acquire+0x209/0x1e27 kernel/locking/lockdep.c:3105 lock_acquire+0x29c/0x2fb kernel/locking/lockdep.c:3599 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:137 [inline] _raw_spin_lock_bh+0x38/0x47 kernel/locking/spinlock.c:175 spin_lock_bh include/linux/spinlock.h:307 [inline] lock_sock_nested+0x44/0xfa net/core/sock.c:2518 l2cap_sock_teardown_cb+0x88/0x2fb net/bluetooth/l2cap_sock.c:1345 l2cap_chan_del+0xa3/0x383 net/bluetooth/l2cap_core.c:598 l2cap_chan_close+0x537/0x5dd net/bluetooth/l2cap_core.c:756 l2cap_chan_timeout+0x104/0x17e net/bluetooth/l2cap_core.c:429 process_one_work+0x7e3/0xcb0 kernel/workqueue.c:2064 worker_thread+0x5a5/0x773 kernel/workqueue.c:2196 kthread+0x291/0x2a6 kernel/kthread.c:211 ret_from_fork+0x4e/0x80 arch/x86/entry/entry_64.S:604 Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reported-by: syzbot+abfc0f5e668d4099af73@syzkaller.appspotmail.com Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-23 16:03:48 +01:00
George McCollister	e0c755a45f	net: dsa: don't assign an error value to tag_ops Use a temporary variable to hold the return value from dsa_tag_driver_get() instead of assigning it to dst->tag_ops. Leaving an error value in dst->tag_ops can result in deferencing an invalid pointer when a deferred switch configuration happens later. Fixes: `357f203bb3` ("net: dsa: keep a copy of the tagging protocol in the DSA switch tree") Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 17:24:42 -07:00
David S. Miller	9a255a0635	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next: 1) Split flowtable workqueues per events, from Oz Shlomo. 2) fall-through warnings for clang, from Gustavo A. R. Silva 3) Remove unused declaration in conntrack, from YueHaibing. 4) Consolidate skb_try_make_writable() in flowtable datapath, simplify some of the existing codebase. 5) Call dst_check() to fall back to static classic forwarding path. 6) Update table flags from commit phase. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 17:07:30 -07:00
Eric Dumazet	add2d73631	net: set initial device refcount to 1 When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the initial net device refcount was 0. When CONFIG_PCPU_DEV_REFCNT is not set, this means the first dev_hold() triggers an illegal refcount operation (addition on 0) refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x128/0x1a4 Fix is to change initial (and final) refcount to be 1. Also add a missing kerneldoc piece, as reported by Stephen Rothwell. Fixes: `919067cc84` ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <groeck@google.com> Tested-by: Guenter Roeck <groeck@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:57:36 -07:00
Nikolay Aleksandrov	0353b4a96b	net: bridge: when suppression is enabled exclude RARP packets Recently we had an interop issue where RARP packets got suppressed with bridge neigh suppression enabled, but the check in the code was meant to suppress GARP. Exclude RARP packets from it which would allow some VMWare setups to work, to quote the report: "Those RARP packets usually get generated by vMware to notify physical switches when vMotion occurs. vMware may use random sip/tip or just use sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep and other times get dropped by the logic" Reported-by: Amer Abdalamer <amer@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:30:24 -07:00
Antoine Tenart	7f08ec6e04	net-sysfs: remove possible sleep from an RCU read-side critical section xps_queue_show is mostly made of an RCU read-side critical section and calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not allowed as this call may sleep and such behaviours aren't allowed in RCU read-side critical sections. Fix this by using GFP_NOWAIT instead. Fixes: `5478fcd0f4` ("net: embed nr_ids in the xps maps") Reported-by: kernel test robot <oliver.sang@intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:28:13 -07:00
Bhaskar Chowdhury	aa785f93fc	net: l2tp: Fix a typo s/verifed/verified/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:17:49 -07:00
Vladimir Oltean	744b837663	net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h ptype_all and ptype_base are declared in net/core/dev.c as non-static, because they are used by net-procfs.c too. However, a "make W=1" build complains that there was no previous declaration of ptype_all and ptype_base in a header file, so this way of declaring things constitutes a violation of coding style. Let's move the extern declarations of ptype_all and ptype_base to the linux/netdevice.h file, which is included by net-procfs.c too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:14:45 -07:00
Vladimir Oltean	5da9ace340	net: make xps_needed and xps_rxqs_needed static Since their introduction in commit `04157469b7` ("net: Use static_key for XPS maps"), xps_needed and xps_rxqs_needed were never used outside net/core/dev.c, so I don't really understand why they were exported as symbols in the first place. This is needed in order to silence a "make W=1" warning about these static keys not being declared as static variables, but not having a previous declaration in a header file nonetheless. Cc: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:13:55 -07:00
Vladimir Oltean	f5fcca89f5	net: bridge: declare br_vlan_tunnel_lookup argument tunnel_id as __be64 The only caller of br_vlan_tunnel_lookup, br_handle_ingress_vlan_tunnel, extracts the tunnel_id from struct ip_tunnel_info::struct ip_tunnel_key:: tun_id which is a __be64 value. The exact endianness does not seem to matter, because the tunnel id is just used as a lookup key for the VLAN group's tunnel hash table, and the value is not interpreted directly per se. Moreover, rhashtable_lookup_fast treats the key argument as a const void *. Therefore, there is no functional change associated with this patch, just one to silence "make W=1" builds. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:11:59 -07:00
Bhaskar Chowdhury	f44773058c	openvswitch: Fix a typo s/subsytem/subsystem/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 12:59:46 -07:00
Vladimir Oltean	a50a151e31	net: ipconfig: ic_dev can be NULL in ic_close_devs ic_close_dev contains a generalization of the logic to not close a network interface if it's the host port for a DSA switch. This logic is disguised behind an iteration through the lowers of ic_dev in ic_close_dev. When no interface for ipconfig can be found, ic_dev is NULL, and ic_close_dev: - dereferences a NULL pointer when assigning selected_dev - would attempt to search through the lower interfaces of a NULL net_device pointer So we should protect against that case. The "lower_dev" iterator variable was shortened to "lower" in order to keep the 80 character limit. Fixes: `f68cbaed67` ("net: ipconfig: avoid use-after-free in ic_close_devs") Fixes: `46acf7bdbc` ("Revert "net: ipv4: handle DSA enabled master network devices"") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Heiko Thiery <heiko.thiery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 12:57:51 -07:00
Vladimir Oltean	abee13f53e	net/sched: cls_flower: use nla_get_be32 for TCA_FLOWER_KEY_FLAGS The existing code is functionally correct: iproute2 parses the ip_flags argument for tc-flower and really packs it as big endian into the TCA_FLOWER_KEY_FLAGS netlink attribute. But there is a problem in the fact that W=1 builds complain: net/sched/cls_flower.c:1047:15: warning: cast to restricted __be32 This is because we should use the dedicated helper for obtaining a __be32 pointer to the netlink attribute, not a u32 one. This ensures type correctness for be32_to_cpu. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 12:48:20 -07:00
Vladimir Oltean	6215afcb9a	net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports A make W=1 build complains that: net/sched/cls_flower.c:214:20: warning: cast from restricted __be16 net/sched/cls_flower.c:214:20: warning: incorrect type in argument 1 (different base types) net/sched/cls_flower.c:214:20: expected unsigned short [usertype] val net/sched/cls_flower.c:214:20: got restricted __be16 [usertype] dst This is because we use htons on struct flow_dissector_key_ports members src and dst, which are defined as __be16, so they are already in network byte order, not host. The byte swap function for the other direction should have been used. Because htons and ntohs do the same thing (either both swap, or none does), this change has no functional effect except to silence the warnings. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 12:48:20 -07:00
Paul Moore	4ebd7651bf	lsm: separate security_task_getsecid() into subjective and objective variants Of the three LSMs that implement the security_task_getsecid() LSM hook, all three LSMs provide the task's objective security credentials. This turns out to be unfortunate as most of the hook's callers seem to expect the task's subjective credentials, although a small handful of callers do correctly expect the objective credentials. This patch is the first step towards fixing the problem: it splits the existing security_task_getsecid() hook into two variants, one for the subjective creds, one for the objective creds. void security_task_getsecid_subj(struct task_struct p, u32 secid); void security_task_getsecid_obj(struct task_struct p, u32 secid); While this patch does fix all of the callers to use the correct variant, in order to keep this patch focused on the callers and to ease review, the LSMs continue to use the same implementation for both hooks. The net effect is that this patch should not change the behavior of the kernel in any way, it will be up to the latter LSM specific patches in this series to change the hook implementations and return the correct credentials. Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2021-03-22 15:23:32 -04:00
Alexander Lobakin	227d72063f	dsa: simplify Kconfig symbols and dependencies 1. Remove CONFIG_HAVE_NET_DSA. CONFIG_HAVE_NET_DSA is a legacy leftover from the times when drivers should have selected CONFIG_NET_DSA manually. Currently, all drivers has explicit 'depends on NET_DSA', so this is no more needed. 2. CONFIG_HAVE_NET_DSA dependencies became CONFIG_NET_DSA's ones. - dropped !S390 dependency which was introduced to be sure NET_DSA can select CONFIG_PHYLIB. DSA migrated to Phylink almost 3 years ago and the PHY library itself doesn't depend on !S390 since commit `870a2b5e4f` ("phylib: remove !S390 dependeny from Kconfig"); - INET dependency is kept to be sure we can select NET_SWITCHDEV; - NETDEVICES dependency is kept to be sure we can select PHYLINK. 3. DSA drivers menu now depends on NET_DSA. Instead on 'depends on NET_DSA' on every single driver, the entire menu now depends on it. This eliminates a lot of duplicated lines from Kconfig with no loss (when CONFIG_NET_DSA=m, drivers also can be only m or n). This also has a nice side effect that there's no more empty menu on configurations without DSA. 4. Kbuild will now descend into 'drivers/net/dsa' only when CONFIG_NET_DSA is y or m. This is safe since no objects inside this folder can be built without DSA core, as well as when CONFIG_NET_DSA=m, no objects can be built-in. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 12:15:37 -07:00
Chuck Lever	82011c80b3	SUNRPC: Move svc_xprt_received() call sites Currently, XPT_BUSY is not cleared until xpo_recvfrom returns. That effectively blocks the receipt and handling of the next RPC message until the current one has been taken off the transport. This strict ordering is a requirement for socket transports. For our kernel RPC/RDMA transport implementation, however, dequeuing an ingress message is nothing more than a list_del(). The transport can safely be marked un-busy as soon as that is done. To keep the changes simpler, this patch just moves the svc_xprt_received() call site from svc_handle_xprt() into the transports, so that the actual optimization can be done in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	7dcfbd86ad	SUNRPC: Export svc_xprt_received() Prepare svc_xprt_received() to be called from transport code instead of from generic RPC server code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	cc93ce9529	svcrdma: Retain the page backing rq_res.head[0].iov_base svc_rdma_sendto() now waits for the NIC hardware to finish with the pages backing rq_res. We still have to release the page array in some cases, but now it's always safe to immediately re-use the page backing rq_res's head buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	579900670a	svcrdma: Remove unused sc_pages field Clean up. This significantly reduces the size of struct svc_rdma_send_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	2a1e4f21d8	svcrdma: Normalize Send page handling Currently svc_rdma_sendto() migrates xdr_buf pages into a separate page list and NULLs out a bunch of entries in rq_pages while the pages are under I/O. The Send completion handler then frees those pages later. Instead, let's wait for the Send completion, then handle page releasing in the nfsd thread. I'd like to avoid the cost of 250+ put_page() calls in the Send completion handler, which is single- threaded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	e844d307d4	svcrdma: Add a "deferred close" helper Refactor a bit of commonly used logic so that every site that wants a close deferred to an nfsd thread does all the right things (set_bit(XPT_CLOSE) then enqueue). Also, once XPT_CLOSE is set on a transport, it is never cleared. If XPT_CLOSE is already set, then the close is already being handled and the enqueue can be skipped. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	c558d47596	svcrdma: Maintain a Receive water mark Post more Receives when the number of pending Receives drops below a water mark. The batch mechanism is disabled if the underlying device cannot support a reasonably-sized Receive Queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	7b748c30cc	svcrdma: Use svc_rdma_refresh_recvs() in wc_receive Replace svc_rdma_post_recv() with the new batch receive mechanism. For the moment it is posting just a single Receive WR at a time, so no change in behavior is expected. Since svc_rdma_wc_receive() was the last call site for svc_rdma_post_recv(), it is removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	77f0a2aa5c	svcrdma: Add a batch Receive posting mechanism Introduce a server-side mechanism similar to commit `e340c2d6ef` ("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive WRs in batch. Its first consumer is svc_rdma_post_recvs(), which posts the initial set of Receive WRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Arnd Bergmann	8ff0278d10	Bluetooth: fix set_ecdh_privkey() prototype gcc-11 points out that the declaration does not match the definition: net/bluetooth/ecdh_helper.c:122:55: error: argument 2 of type ‘const u8[32]’ {aka ‘const unsigned char[32]’} with mismatched bound [-Werror=array-parameter=] 122 \| int set_ecdh_privkey(struct crypto_kpp tfm, const u8 private_key[32]) \| ~~~~~~~~~^~~~~~~~~~~~~~~ In file included from net/bluetooth/ecdh_helper.c:23: net/bluetooth/ecdh_helper.h:28:56: note: previously declared as ‘const u8 ’ {aka ‘const unsigned char ’} 28 \| int set_ecdh_privkey(struct crypto_kpp tfm, const u8 *private_key); \| ~~~~~~~~~~^~~~~~~~~~~ Change the declaration to contain the size of the array, rather than just a pointer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-22 18:11:33 +01:00
Tetsuo Handa	be85972393	Bluetooth: initialize skb_queue_head at l2cap_chan_create() syzbot is hitting "INFO: trying to register non-static key." message [1], for "struct l2cap_chan"->tx_q.lock spinlock is not yet initialized when l2cap_chan_del() is called due to e.g. timeout. Since "struct l2cap_chan"->lock mutex is initialized at l2cap_chan_create() immediately after "struct l2cap_chan" is allocated using kzalloc(), let's as well initialize "struct l2cap_chan"->{tx_q,srej_q}.lock spinlocks there. [1] https://syzkaller.appspot.com/bug?extid=fadfba6a911f6bf71842 Reported-and-tested-by: syzbot <syzbot+fadfba6a911f6bf71842@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-22 17:04:22 +01:00
Archie Pusaka	5c4c8c9544	Bluetooth: verify AMP hci_chan before amp_destroy hci_chan can be created in 2 places: hci_loglink_complete_evt() if it is an AMP hci_chan, or l2cap_conn_add() otherwise. In theory, Only AMP hci_chan should be removed by a call to hci_disconn_loglink_complete_evt(). However, the controller might mess up, call that function, and destroy an hci_chan which is not initiated by hci_loglink_complete_evt(). This patch adds a verification that the destroyed hci_chan must have been init'd by hci_loglink_complete_evt(). Example crash call trace: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe3/0x144 lib/dump_stack.c:118 print_address_description+0x67/0x22a mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report mm/kasan/report.c:412 [inline] kasan_report+0x251/0x28f mm/kasan/report.c:396 hci_send_acl+0x3b/0x56e net/bluetooth/hci_core.c:4072 l2cap_send_cmd+0x5af/0x5c2 net/bluetooth/l2cap_core.c:877 l2cap_send_move_chan_cfm_icid+0x8e/0xb1 net/bluetooth/l2cap_core.c:4661 l2cap_move_fail net/bluetooth/l2cap_core.c:5146 [inline] l2cap_move_channel_rsp net/bluetooth/l2cap_core.c:5185 [inline] l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5464 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:5799 [inline] l2cap_recv_frame+0x1d12/0x51aa net/bluetooth/l2cap_core.c:7023 l2cap_recv_acldata+0x2ea/0x693 net/bluetooth/l2cap_core.c:7596 hci_acldata_packet net/bluetooth/hci_core.c:4606 [inline] hci_rx_work+0x2bd/0x45e net/bluetooth/hci_core.c:4796 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Allocated by task 38: set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0x8d/0x9a mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x102/0x129 mm/slub.c:2787 kmalloc include/linux/slab.h:515 [inline] kzalloc include/linux/slab.h:709 [inline] hci_chan_create+0x86/0x26d net/bluetooth/hci_conn.c:1674 l2cap_conn_add.part.0+0x1c/0x814 net/bluetooth/l2cap_core.c:7062 l2cap_conn_add net/bluetooth/l2cap_core.c:7059 [inline] l2cap_connect_cfm+0x134/0x852 net/bluetooth/l2cap_core.c:7381 hci_connect_cfm+0x9d/0x122 include/net/bluetooth/hci_core.h:1404 hci_remote_ext_features_evt net/bluetooth/hci_event.c:4161 [inline] hci_event_packet+0x463f/0x72fa net/bluetooth/hci_event.c:5981 hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Freed by task 1732: set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free mm/kasan/kasan.c:521 [inline] __kasan_slab_free+0x106/0x128 mm/kasan/kasan.c:493 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook+0xaa/0xf6 mm/slub.c:1436 slab_free mm/slub.c:3009 [inline] kfree+0x182/0x21e mm/slub.c:3972 hci_disconn_loglink_complete_evt net/bluetooth/hci_event.c:4891 [inline] hci_event_packet+0x6a1c/0x72fa net/bluetooth/hci_event.c:6050 hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 The buggy address belongs to the object at ffff8881d7af9180 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 24 bytes inside of 128-byte region [ffff8881d7af9180, ffff8881d7af9200) The buggy address belongs to the page: page:ffffea00075ebe40 count:1 mapcount:0 mapping:ffff8881da403200 index:0x0 flags: 0x8000000000000200(slab) raw: 8000000000000200 dead000000000100 dead000000000200 ffff8881da403200 raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881d7af9080: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881d7af9100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8881d7af9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881d7af9200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881d7af9280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reported-by: syzbot+98228e7407314d2d4ba2@syzkaller.appspotmail.com Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-22 17:00:09 +01:00
Archie Pusaka	3a9d54b194	Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default Currently l2cap_chan_set_defaults() reset chan->conf_state to zero. However, there is a flag CONF_NOT_COMPLETE which is set when creating the l2cap_chan. It is suggested that the flag should be cleared when l2cap_chan is ready, but when l2cap_chan_set_defaults() is called, l2cap_chan is not yet ready. Therefore, we must set this flag as the default. Example crash call trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0xc4/0x118 lib/dump_stack.c:56 panic+0x1c6/0x38b kernel/panic.c:117 __warn+0x170/0x1b9 kernel/panic.c:471 warn_slowpath_fmt+0xc7/0xf8 kernel/panic.c:494 debug_print_object+0x175/0x193 lib/debugobjects.c:260 debug_object_assert_init+0x171/0x1bf lib/debugobjects.c:614 debug_timer_assert_init kernel/time/timer.c:629 [inline] debug_assert_init kernel/time/timer.c:677 [inline] del_timer+0x7c/0x179 kernel/time/timer.c:1034 try_to_grab_pending+0x81/0x2e5 kernel/workqueue.c:1230 cancel_delayed_work+0x7c/0x1c4 kernel/workqueue.c:2929 l2cap_clear_timer+0x1e/0x41 include/net/bluetooth/l2cap.h:834 l2cap_chan_del+0x2d8/0x37e net/bluetooth/l2cap_core.c:640 l2cap_chan_close+0x532/0x5d8 net/bluetooth/l2cap_core.c:756 l2cap_sock_shutdown+0x806/0x969 net/bluetooth/l2cap_sock.c:1174 l2cap_sock_release+0x64/0x14d net/bluetooth/l2cap_sock.c:1217 __sock_release+0xda/0x217 net/socket.c:580 sock_close+0x1b/0x1f net/socket.c:1039 __fput+0x322/0x55c fs/file_table.c:208 ____fput+0x17/0x19 fs/file_table.c:244 task_work_run+0x19b/0x1d3 kernel/task_work.c:115 exit_task_work include/linux/task_work.h:21 [inline] do_exit+0xe4c/0x204a kernel/exit.c:766 do_group_exit+0x291/0x291 kernel/exit.c:891 get_signal+0x749/0x1093 kernel/signal.c:2396 do_signal+0xa5/0xcdb arch/x86/kernel/signal.c:737 exit_to_usermode_loop arch/x86/entry/common.c:243 [inline] prepare_exit_to_usermode+0xed/0x235 arch/x86/entry/common.c:277 syscall_return_slowpath+0x3a7/0x3b3 arch/x86/entry/common.c:348 int_ret_from_sys_call+0x25/0xa3 Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reported-by: syzbot+338f014a98367a08a114@syzkaller.appspotmail.com Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-22 16:58:04 +01:00
Chuck Lever	c6b7ed8f94	svcrdma: Remove stale comment for svc_rdma_wc_receive() xprt pinning was removed in commit `365e9992b9` ("svcrdma: Remove transport reference counting"), but this comment was not updated to reflect that change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:05 -04:00
Chuck Lever	270f25edcc	svcrdma: Provide an explanatory comment in CMA event handler Clean up: explain why svc_xprt_enqueue() is invoked in the event handler even though no xpt_flags bits are toggled here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:05 -04:00
Chuck Lever	072db263e1	svcrdma: RPCDBG_FACILITY is no longer used Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:05 -04:00
Xin Long	154deab6a3	esp: delete NETIF_F_SCTP_CRC bit from features for esp offload Now in esp4/6_gso_segment(), before calling inner proto .gso_segment, NETIF_F_CSUM_MASK bits are deleted, as HW won't be able to do the csum for inner proto due to the packet encrypted already. So the UDP/TCP packet has to do the checksum on its own .gso_segment. But SCTP is using CRC checksum, and for that NETIF_F_SCTP_CRC should be deleted to make SCTP do the csum in own .gso_segment as well. In Xiumei's testing with SCTP over IPsec/veth, the packets are kept dropping due to the wrong CRC checksum. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: `7862b4058b` ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-22 07:43:31 +01:00
Ahmed S. Darwish	bc8e0adff3	net: xfrm: Use sequence counter with associated spinlock A sequence counter write section must be serialized or its internal state can get corrupted. A plain seqcount_t does not contain the information of which lock must be held to guaranteee write side serialization. For xfrm_state_hash_generation, use seqcount_spinlock_t instead of plain seqcount_t. This allows to associate the spinlock used for write serialization with the sequence counter. It thus enables lockdep to verify that the write serialization lock is indeed held before entering the sequence counter write section. If lockdep is disabled, this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-22 07:38:08 +01:00
Ahmed S. Darwish	e88add19f6	net: xfrm: Localize sequence counter per network namespace A sequence counter write section must be serialized or its internal state can get corrupted. The "xfrm_state_hash_generation" seqcount is global, but its write serialization lock (net->xfrm.xfrm_state_lock) is instantiated per network namespace. The write protection is thus insufficient. To provide full protection, localize the sequence counter per network namespace instead. This should be safe as both the seqcount read and write sections access data exclusively within the network namespace. It also lays the foundation for transforming "xfrm_state_hash_generation" data type from seqcount_t to seqcount_LOCKNAME_t in further commits. Fixes: `b65e3d7be0` ("xfrm: state: add sequence count to detect hash resizes") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-22 07:35:42 +01:00
Sai Kalyaan Palla	b29648ad5b	net: decnet: Fixed multiple coding style issues Made changes to coding style as suggested by checkpatch.pl changes are of the type: open brace '{' following struct go on the same line do not use assignment in if condition Signed-off-by: Sai Kalyaan Palla <saikalyaan63@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-20 18:50:51 -07:00
Oliver Hartkopp	b5f020f82a	can: isotp: tx-path: zero initialize outgoing CAN frames Commit `d4eb538e1f` ("can: isotp: TX-path: ensure that CAN frame flags are initialized") ensured the TX flags to be properly set for outgoing CAN frames. In fact the root cause of the issue results from a missing initialization of outgoing CAN frames created by isotp. This is no problem on the CAN bus as the CAN driver only picks the correctly defined content from the struct can(fd)_frame. But when the outgoing frames are monitored (e.g. with candump) we potentially leak some bytes in the unused content of struct can(fd)_frame. Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Cc: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20210319100619.10858-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-03-20 20:21:35 +01:00
David Brazdil	1f935e8e72	selinux: vsock: Set SID for socket returned by accept() For AF_VSOCK, accept() currently returns sockets that are unlabelled. Other socket families derive the child's SID from the SID of the parent and the SID of the incoming packet. This is typically done as the connected socket is placed in the queue that accept() removes from. Reuse the existing 'security_sk_clone' hook to copy the SID from the parent (server) socket to the child. There is no packet SID in this case. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-19 13:46:55 -07:00
Eric Dumazet	919067cc84	net: add CONFIG_PCPU_DEV_REFCNT I was working on a syzbot issue, claiming one device could not be dismantled because its refcount was -1 unregister_netdevice: waiting for sit0 to become free. Usage count = -1 It would be nice if syzbot could trigger a warning at the time this reference count became negative. This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults to per cpu variables (as before this patch) on SMP builds. v2: free_dev label in alloc_netdev_mqs() is moved to avoid a compiler warning (-Wunused-label), as reported by kernel test robot <lkp@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-19 13:38:46 -07:00
Xin Long	8ff0b1f08e	sctp: move sk_route_caps check and set into sctp_outq_flush_transports The sk's sk_route_caps is set in sctp_packet_config, and later it only needs to change when traversing the transport_list in a loop, as the dst might be changed in the tx path. So move sk_route_caps check and set into sctp_outq_flush_transports from sctp_packet_transmit. This also fixes a dst leak reported by Chen Yi: https://bugzilla.kernel.org/show_bug.cgi?id=212227 As calling sk_setup_caps() in sctp_packet_transmit may also set the sk_route_caps for the ctrl sock in a netns. When the netns is being deleted, the ctrl sock's releasing is later than dst dev's deleting, which will cause this dev's deleting to hang and dmesg error occurs: unregister_netdevice: waiting for xxx to become free. Usage count = 1 Reported-by: Chen Yi <yiche@redhat.com> Fixes: `bcd623d8e9` ("sctp: call sk_setup_caps in sctp_packet_transmit instead") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-19 11:34:49 -07:00
Kurt Kanzenbach	497cc00224	taprio: Handle short intervals and large packets When using short intervals e.g. below one millisecond, large packets won't be transmitted at all. The software implementations checks whether the packet can be fit into the remaining interval. Therefore, it takes the packet length and the transmission speed into account. That is correct. However, for large packets it may be that the transmission time exceeds the interval resulting in no packet transmission. The same situation works fine with hardware offloading applied. The problem has been observed with the following schedule and iperf3: \|tc qdisc replace dev lan1 parent root handle 100 taprio \ \| num_tc 8 \ \| map 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 \ \| queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ \| base-time $base \ \| sched-entry S 0x40 500000 \ \| sched-entry S 0xbf 500000 \ \| clockid CLOCK_TAI \ \| flags 0x00 [...] \|root@tsn:~# iperf3 -c 192.168.2.105 \|Connecting to host 192.168.2.105, port 5201 \|[ 5] local 192.168.2.121 port 52610 connected to 192.168.2.105 port 5201 \|[ ID] Interval Transfer Bitrate Retr Cwnd \|[ 5] 0.00-1.00 sec 45.2 KBytes 370 Kbits/sec 0 1.41 KBytes \|[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes After debugging, it seems that the packet length stored in the SKB is about 7000-8000 bytes. Using a 100 Mbit/s link the transmission time is about 600us which larger than the interval of 500us. Therefore, segment the SKB into smaller chunks if the packet is too big. This yields similar results than the hardware offload: \|root@tsn:~# iperf3 -c 192.168.2.105 \|Connecting to host 192.168.2.105, port 5201 \|- - - - - - - - - - - - - - - - - - - - - - - - - \|[ ID] Interval Transfer Bitrate Retr \|[ 5] 0.00-10.00 sec 48.9 MBytes 41.0 Mbits/sec 0 sender \|[ 5] 0.00-10.02 sec 48.7 MBytes 40.7 Mbits/sec receiver Furthermore, the segmentation can be skipped for the full offload case, as the driver or the hardware is expected to handle this. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-19 11:32:44 -07:00
Alexander Lobakin	5588796e89	ethernet: avoid retpoline overhead on TEB (GENEVE, NvGRE, VxLAN) GRO The two most popular headers going after Ethernet are IPv4 and IPv6. Retpoline overhead for them is addressed only in dev_gro_receive(), when they lie right after the outermost Ethernet header. Use the indirect call wrappers in TEB (Transparent Ethernet Bridging, such as GENEVE, NvGRE, VxLAN etc.) GRO receive code to reduce the penalty when processing the inner headers. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 19:51:12 -07:00
Alexander Lobakin	4a6e7ec93a	vlan/8021q: avoid retpoline overhead on GRO The two most popular headers going after VLAN are IPv4 and IPv6. Retpoline overhead for them is addressed only in dev_gro_receive(), when they lie right after the outermost Ethernet header. Use the indirect call wrappers in VLAN GRO receive code to reduce the penalty on receiving tagged frames (when hardware stripping is off or not available). Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 19:51:12 -07:00
David S. Miller	84f4aced67	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Several patches to testore use of memory barriers instead of RCU to ensure consistent access to ruleset, from Mark Tomlinson. 2) Fix dump of expectation via ctnetlink, from Florian Westphal. 3) GRE helper works for IPv6, from Ludovic Senecaux. 4) Set error on unsupported flowtable flags. 5) Use delayed instead of deferrable workqueue in the flowtable, from Yinjun Zhang. 6) Fix spurious EEXIST in case of add-after-delete flowtable in the same batch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 19:19:06 -07:00
Xiong Zhenwu	a835f9034e	/net/core/: fix misspellings using codespell tool A typo is found out by codespell tool in 1734th line of drop_monitor.c: $ codespell ./net/core/ ./net/core/drop_monitor.c:1734: guarnateed ==> guaranteed Fix a typo found by codespell. Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 19:13:41 -07:00
Xiong Zhenwu	7f1330c1b1	/net/hsr: fix misspellings using codespell tool A typo is found out by codespell tool in 111th line of hsr_debugfs.c: $ codespell ./net/hsr/ net/hsr/hsr_debugfs.c:111: Debufs ==> Debugfs Fix typos found by codespell. Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 19:13:41 -07:00
Tobias Waldekranz	cc76ce9e8d	net: dsa: Add helper to resolve bridge port from DSA port In order for a driver to be able to query a bridge for information about itself, e.g. reading out port flags, it has to use a netdev that is known to the bridge. In the simple case, that is just the netdev representing the port, e.g. swp0 or swp1 in this example: br0 / \ swp0 swp1 But in the case of an offloaded lag, this will be the bond or team interface, e.g. bond0 in this example: br0 / bond0 / \ swp0 swp1 Add a helper that hides some of this complexity from the drivers. Then, redefine dsa_port_offloads_bridge_port using the helper to avoid double accounting of the set of possible offloaded uppers. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 16:24:06 -07:00
Antoine Tenart	75b2758abc	net: NULL the old xps map entries when freeing them In __netif_set_xps_queue, old map entries from the old dev_maps are freed but their corresponding entry in the old dev_maps aren't NULLed. Fix this. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	2d05bf0153	net: fix use after free in xps When setting up an new dev_maps in __netif_set_xps_queue, we remove and free maps from unused CPUs/rx-queues near the end of the function; by calling remove_xps_queue. However it's possible those maps are also part of the old not-freed-yet dev_maps, which might be used concurrently. When that happens, a map can be freed while its corresponding entry in the old dev_maps table isn't NULLed, leading to: "BUG: KASAN: use-after-free" in different places. This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL the map entries from the old dev_maps table. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	2db6cdaeba	net-sysfs: move the xps cpus/rxqs retrieval in a common function Most of the xps_cpus_show and xps_rxqs_show functions share the same logic. Having it in two different functions does not help maintenance. This patch moves their common logic into a new function, xps_queue_show, to improve this. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	d7be87a687	net-sysfs: move the rtnl unlock up in the xps show helpers Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU protected, we do not have the need to protect the maps in the rtnl lock. Move the rtnl unlock up so we reduce the rtnl locking section. We also increase the reference count on the subordinate device if any, as we don't want this device to be freed while we use it (now that the rtnl lock isn't protecting it in the whole function). Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	132f743b01	net: improve queue removal readability in __netif_set_xps_queue Improve the readability of the loop removing tx-queue from unused CPUs/rx-queues in __netif_set_xps_queue. The change should only be cosmetic. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	402fbb992e	net: add an helper to copy xps maps to the new dev_maps This patch adds an helper, xps_copy_dev_maps, to copy maps from dev_maps to new_dev_maps at a given index. The logic should be the same, with an improved code readability and maintenance. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	044ab86d43	net: move the xps maps to an array Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in net_device. That will simplify a lot the code removing the need for lots of if/else conditionals as the correct map will be available using its offset in the array. This should not modify the xps maps behaviour in any way. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	6f36158e05	net: remove the xps possible_mask Remove the xps possible_mask. It was an optimization but we can just loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That simplifies the code a bit. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	5478fcd0f4	net: embed nr_ids in the xps maps Embed nr_ids (the number of cpu for the xps cpus map, and the number of rxqs for the xps cpus map) in dev_maps. That will help not accessing out of bound memory if those values change after dev_maps was allocated. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	255c04a87f	net: embed num_tc in the xps maps The xps cpus/rxqs map is accessed using dev->num_tc, which is used when allocating the map. But later updates of dev->num_tc can lead to having a mismatch between the maps and how they're accessed. In such cases the map values do not make any sense and out of bound accesses can occur (that can be easily seen using KASAN). This patch aims at fixing this by embedding num_tc into the maps, using the value at the time the map is created. This brings two improvements: - The maps can be accessed using the embedded num_tc, so we know for sure we won't have out of bound accesses. - Checks can be made before accessing the maps so we know the values retrieved will make sense. We also update __netif_set_xps_queue to conditionally copy old maps from dev_maps in the new one only if the number of traffic classes from both maps match. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	73f5e52b15	net-sysfs: make xps_cpus_show and xps_rxqs_show consistent Make the implementations of xps_cpus_show and xps_rxqs_show to converge, as the two share the same logic but diverted over time. This should not modify their behaviour but will help future changes and improve maintenance. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	d9a063d207	net-sysfs: store the return of get_netdev_queue_index in an unsigned int In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of its callers use an unsigned long to store the returned value. Update the code to be consistent, this should only be cosmetic. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Antoine Tenart	ea4fe7e842	net-sysfs: convert xps_cpus_show to bitmap_zalloc Use bitmap_zalloc instead of zalloc_cpumask_var in xps_cpus_show to align with xps_rxqs_show. This will improve maintenance and allow us to factorize the two functions. The function should behave the same. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:56:22 -07:00
Jiri Bohac	6c015a2256	net: check all name nodes in __dev_alloc_name __dev_alloc_name(), when supplied with a name containing '%d', will search for the first available device number to generate a unique device name. Since commit `ff92741270` ("net: introduce name_node struct to be used in hashlist") network devices may have alternate names. __dev_alloc_name() does take these alternate names into account, possibly generating a name that is already taken and failing with -ENFILE as a result. This demonstrates the bug: # rmmod dummy 2>/dev/null # ip link property add dev lo altname dummy0 # modprobe dummy numdummies=1 modprobe: ERROR: could not insert 'dummy': Too many open files in system Instead of creating a device named dummy1, modprobe fails. Fix this by checking all the names in the d->name_node list, not just d->name. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Fixes: `ff92741270` ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 14:40:53 -07:00
Jakub Kicinski	dcc32f4f18	ipv6: weaken the v4mapped source check This reverts commit `6af1799aaf`. Commit `6af1799aaf` ("ipv6: drop incoming packets having a v4mapped source address") introduced an input check against v4mapped addresses. Use of such addresses on the wire is indeed questionable and not allowed on public Internet. As the commit pointed out https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02 lists potential issues. Unfortunately there are applications which use v4mapped addresses, and breaking them is a clear regression. For example v4mapped addresses (or any semi-valid addresses, really) may be used for uni-direction event streams or packet export. Since the issue which sparked the addition of the check was with TCP and request_socks in particular push the check down to TCPv6 and DCCP. This restores the ability to receive UDPv6 packets with v4mapped address as the source. Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the user-visible changes. Fixes: `6af1799aaf` ("ipv6: drop incoming packets having a v4mapped source address") Reported-by: Sunyi Shao <sunyishao@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-18 11:19:23 -07:00
Yonghong Song	97a19caf1b	bpf: net: Emit anonymous enum with BPF_TCP_CLOSE value explicitly The selftest failed to compile with clang-built bpf-next. Adding LLVM=1 to your vmlinux and selftest build will use clang. The error message is: progs/test_sk_storage_tracing.c:38:18: error: use of undeclared identifier 'BPF_TCP_CLOSE' if (newstate == BPF_TCP_CLOSE) ^ 1 error generated. make: *** [Makefile:423: /bpf-next/tools/testing/selftests/bpf/test_sk_storage_tracing.o] Error 1 The reason for the failure is that BPF_TCP_CLOSE, a value of an anonymous enum defined in uapi bpf.h, is not defined in vmlinux.h. gcc does not have this problem. Since vmlinux.h is derived from BTF which is derived from vmlinux DWARF, that means gcc-produced vmlinux DWARF has BPF_TCP_CLOSE while llvm-produced vmlinux DWARF does not have. BPF_TCP_CLOSE is referenced in net/ipv4/tcp.c as BUILD_BUG_ON((int)BPF_TCP_CLOSE != (int)TCP_CLOSE); The following test mimics the above BUILD_BUG_ON, preprocessed with clang compiler, and shows gcc DWARF contains BPF_TCP_CLOSE while llvm DWARF does not. $ cat t.c enum { BPF_TCP_ESTABLISHED = 1, BPF_TCP_CLOSE = 7, }; enum { TCP_ESTABLISHED = 1, TCP_CLOSE = 7, }; int test() { do { extern void __compiletime_assert_767(void) ; if ((int)BPF_TCP_CLOSE != (int)TCP_CLOSE) __compiletime_assert_767(); } while (0); return 0; } $ clang t.c -O2 -c -g && llvm-dwarfdump t.o \| grep BPF_TCP_CLOSE $ gcc t.c -O2 -c -g && llvm-dwarfdump t.o \| grep BPF_TCP_CLOSE DW_AT_name ("BPF_TCP_CLOSE") Further checking clang code find clang actually tried to evaluate condition at compile time. If it is definitely true/false, it will perform optimization and the whole if condition will be removed before generating IR/debuginfo. This patch explicited add an expression after the above mentioned BUILD_BUG_ON in net/ipv4/tcp.c like (void)BPF_TCP_ESTABLISHED to enable generation of debuginfo for the anonymous enum which also includes BPF_TCP_CLOSE. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210317174132.589276-1-yhs@fb.com	2021-03-17 18:45:40 -07:00
Pablo Neira Ayuso	0ce7cf4127	netfilter: nftables: update table flags from the commit phase Do not update table flags from the preparation phase. Store the flags update into the transaction, then update the flags from the commit phase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 01:35:39 +01:00
Pablo Neira Ayuso	86fe2c19ee	netfilter: nftables: skip hook overlap logic if flowtable is stale If the flowtable has been previously removed in this batch, skip the hook overlap checks. This fixes spurious EEXIST errors when removing and adding the flowtable in the same batch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 01:08:54 +01:00
Pablo Neira Ayuso	1b9cd7690a	netfilter: flowtable: refresh timeout after dst and writable checks Refresh the timeout (and retry hardware offload) once the skbuff dst is confirmed to be current and after the skbuff is made writable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso	e5075c0bad	netfilter: flowtable: call dst_check() to fall back to classic forwarding In case the route is stale, pass up the packet to the classic forwarding path for re-evaluation and schedule this flow entry for removal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso	f4401262b9	netfilter: flowtable: fast NAT functions never fail Simplify existing fast NAT routines by returning void. After the skb_try_make_writable() call consolidation, these routines cannot ever fail. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso	4f08f173d0	netfilter: flowtable: move FLOW_OFFLOAD_DIR_MAX away from enumeration This allows to remove the default case which should not ever happen and that was added to avoid gcc warnings on unhandled FLOW_OFFLOAD_DIR_MAX enumeration case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso	2babb46c8c	netfilter: flowtable: move skb_try_make_writable() before NAT in IPv4 For consistency with the IPv6 flowtable datapath and to make sure the skbuff is writable right before the NAT header updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso	2fc11745c3	netfilter: flowtable: consolidate skb_try_make_writable() call Fetch the layer 4 header size to be mangled by NAT when building the tuple, then use it to make writable the network and the transport headers. After this update, the NAT routines now assumes that the skbuff area is writable. Do the pointer refetch only after the single skb_try_make_writable() call. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:43:59 +01:00
Gustavo A. R. Silva	c2168e6bd7	netfilter: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:32:48 +01:00
Yinjun Zhang	740b486a8d	netfilter: flowtable: Make sure GC works periodically in idle system Currently flowtable's GC work is initialized as deferrable, which means GC cannot work on time when system is idle. So the hardware offloaded flow may be deleted for timeout, since its used time is not timely updated. Resolve it by initializing the GC work as delayed work instead of deferrable. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:32:21 +01:00
Pablo Neira Ayuso	7b35582cd0	netfilter: nftables: allow to update flowtable flags Honor flowtable flags from the control update path. Disallow disabling to toggle hardware offload support though. Fixes: `8bb69f3b29` ("netfilter: nf_tables: add flowtable offload control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:32:21 +01:00
Pablo Neira Ayuso	7e6136f1b7	netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags Error was not set accordingly. Fixes: `8bb69f3b29` ("netfilter: nf_tables: add flowtable offload control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:32:21 +01:00
Ludovic Senecaux	8b2030b430	netfilter: conntrack: Fix gre tunneling over ipv6 This fix permits gre connections to be tracked within ip6tables rules Signed-off-by: Ludovic Senecaux <linuxludo@free.fr> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:32:06 +01:00
Oz Shlomo	2ed37183ab	netfilter: flowtable: separate replace, destroy and stats to different workqueues Currently the flow table offload replace, destroy and stats work items are executed on a single workqueue. As such, DESTROY and STATS commands may be backloged after a burst of REPLACE work items. This scenario can bloat up memory and may cause active connections to age. Instatiate add, del and stats workqueues to avoid backlogs of non-dependent actions. Provide sysfs control over the workqueue attributes, allowing userspace applications to control the workqueue cpumask. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-18 00:31:19 +01:00
Wei Wang	cb03835793	net: fix race between napi kthread mode and busy poll Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to determine if the kthread owns this napi and could call napi->poll() on it. However, if socket busy poll is enabled, it is possible that the busy poll thread grabs this SCHED bit (after the previous napi->poll() invokes napi_complete_done() and clears SCHED bit) and tries to poll on the same napi. napi_disable() could grab the SCHED bit as well. This patch tries to fix this race by adding a new bit NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in ____napi_schedule() if the threaded mode is enabled, and gets cleared in napi_complete_done(), and we only poll the napi in kthread if this bit is set. This helps distinguish the ownership of the napi between kthread and other scenarios and fixes the race issue. Fixes: `29863d41bb` ("net: implement threaded-able napi poll loop support") Reported-by: Martin Zaharinov <micron10@gmail.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 14:31:17 -07:00
Leon Romanovsky	6417f03132	module: remove never implemented MODULE_SUPPORTED_DEVICE MODULE_SUPPORTED_DEVICE was added in pre-git era and never was implemented. We can safely remove it, because the kernel has grown to have many more reliable mechanisms to determine if device is supported or not. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-03-17 13:16:18 -07:00
Álvaro Fernández Rojas	964dbf186e	net: dsa: tag_brcm: add support for legacy tags Add support for legacy Broadcom tags, which are similar to DSA_TAG_PROTO_BRCM. These tags are used on BCM5325, BCM5365 and BCM63xx switches. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 12:24:36 -07:00
David S. Miller	0692c33c9c	First round of fixes for 5.12-rc: * HE (802.11ax) elements can be extended, handle that * fix locking in network namespace changes that was broken due to the RTNL-redux work * various other small fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmBR1WoACgkQB8qZga/f l8QvKQ/+I4a4gmno6A05asm6PY4sf8lQ6j90SGS0u82nJlFCMGlPNpDcTYDwozy1 bjwTBGn5Jc808UUriPKt0aY8sukgFuJmBHZsxDFOfx48t/hMDnbkf4u3pIvtqhsf OfcqsmS4Zp3r1r7xfKhResAg5t4phaRz7WGW+RYyAF8vVwSR6hDR/TmUI7sft3l4 0FlQEpdW/dm2cvy2tbUJtSyHrKEbrlVn55DDE3fWpfwaOQzHSFZPKbishmk4mhIR zJgdehaogkUxa5KbuImWIRHFudJ+Seths6+c8FPERQGjK7QGGyB/qrC1+mvw3wX+ pAf9+R54aBqcA8mEEryqprmEwo0i+YPo1kuXig6fE4D62d2dD9nZSghmgcqMM1Vp 6FwpU4g7Fn5mpyw3yAxdWbYzP/Ai4fb9QVaFsQvIv4PnpRB0wt+/1Aftd68HDNYk kWLRRsoCVrddr3yImbYTA0ytIwgi6wAUqoH9+JQSGkJbOh18G1gAwxsdz0BWZoYK 9GI8LST0Pr+mnwjo0E9ILHbIVKyGCcx7Porxbr2yEeIB3JBpBtZA1qTIX6MHcote BIx8LyEWrnEVCyWq1ZWcEdLdzzTRVgzYmFprdY+Zq/Lk1L9Gr7M8m4ri1VzMX2rF 9IwfKRjZPLD5v3A5ls9PrkUrns+6IjIpIYhwoqFuVyv24QHyyoo= =ePzY -----END PGP SIGNATURE----- Merge tag 'mac80211-for-net-2021-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== First round of fixes for 5.12-rc: * HE (802.11ax) elements can be extended, handle that * fix locking in network namespace changes that was broken due to the RTNL-redux work * various other small fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 12:22:13 -07:00
wenxu	afa536d840	net/sched: cls_flower: fix only mask bit check in the validate_ct_state The ct_state validate should not only check the mask bit and also check mask_bit & key_bit.. For the +new+est case example, The 'new' and 'est' bits should be set in both state_mask and state flags. Or the -new-est case also will be reject by kernel. When Openvswitch with two flows ct_state=+trk+new,action=commit,forward ct_state=+trk+est,action=forward A packet go through the kernel and the contrack state is invalid, The ct_state will be +trk-inv. Upcall to the ovs-vswitchd, the finally dp action will be drop with -new-est+trk. Fixes: `1bcc51ac07` ("net/sched: cls_flower: Reject invalid ct_state flags rules") Fixes: `3aed8b6333` ("net/sched: cls_flower: validate ct_state for invalid and reply flags") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:56:25 -07:00
Jon Maloy	5c8349503d	tipc: remove some unnecessary warnings We move some warning printouts to more strategic locations to avoid duplicates and yield more detailed information about the reported problem. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:05 -07:00
Jon Maloy	429189acac	tipc: add host-endian copy of user subscription to struct tipc_subscription We reduce and localize the usage of the tipc_sub_xx() macros by adding a corresponding member, with fields set in host-endian format, to struct tipc_subscription. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:05 -07:00
Jon Maloy	09f78b851e	tipc: simplify api between binding table and topology server The function tipc_report_overlap() is called from the binding table with numerous parameters taken from an instance of struct publication. A closer look reveals that it always is safe to send along a pointer to the instance itself, and hence reduce the call signature. We do that in this commit. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:05 -07:00
Jon Maloy	6e44867b01	tipc: simplify signature of tipc_find_service() We reduce the signature of tipc_find_service() and tipc_create_service(). The reason for doing this might not be obvious, but we plan to let struct tipc_uaddr contain information that is relevant for these functions in a later commit. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:05 -07:00
Jon Maloy	13c9d23f6a	tipc: simplify signature of tipc_service_find_range() We simplify the signatures of the functions tipc_service_create_range() and tipc_service_find_range(). Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	006ed14ef8	tipc: simplify signature of tipc_nametbl_lookup_group() We reduce the signature of tipc_nametbl_lookup_group() by using a struct tipc_uaddr pointer. This entails a couple of minor changes in the functions tipc_send_group_mcast/anycast/unicast/bcast() in socket.c Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	833f867089	tipc: simplify signature of tipc_nametbl_lookup_mcast_nodes() We follow up the preceding commits by reducing the signature of the function tipc_nametbl_lookup_mcast_nodes(). Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	45ceea2d40	tipc: simplify signature of tipc_namtbl_lookup_mcast_sockets() We reduce the signature of this function according to the same principle as the preceding commits. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	908148bc50	tipc: refactor tipc_sendmsg() and tipc_lookup_anycast() We simplify the signature if function tipc_nametbl_lookup_anycast(), using address structures instead of discrete integers. This also makes it possible to make some improvements to the functions __tipc_sendmsg() in socket.c and tipc_msg_lookup_dest() in msg.c. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	66db239c48	tipc: rename binding table lookup functions The binding table provides four different lookup functions, which purpose is not obvious neither by their names nor by the (lack of) descriptions. We now give these functions names that better match their purposes, and improve the comments that describe what they are doing. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	2c98da0790	tipc: simplify signature of tipc_nametbl_withdraw() functions Following the principles of the preceding commits, we reduce the number of parameters passed along in tipc_sk_withdraw(), tipc_nametbl_withdraw() and associated functions. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	a45ffa6857	tipc: simplify call signatures for publication creation We simplify the call signatures for tipc_nametbl_insert_publ() and tipc_publ_create() so that fewer parameters are passed around. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	50a3499ab8	tipc: simplify signature of tipc_namtbl_publish() Using the new address structure tipc_uaddr, we simplify the signature of function tipc_sk_publish() and tipc_namtbl_publish() so that fewer parameters need to be passed around. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	7823f04f34	tipc: introduce new unified address type for internal use We introduce a simplified version of struct sockaddr_tipc, using anonymous unions and structures. Apart from being nicer to work with, this struct will come in handy when we in a later commit add another address type. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	b26b5aa9ce	tipc: move creation of publication item one level up in call chain We instantiate struct publication in tipc_nametbl_insert_publ() instead of as currently in tipc_service_insert_publ(). This has the advantage that we can pass a pointer to the publication struct to the next call levels, instead of the numerous individual parameters we pass on now. It also gives us a location to keep the contents of the additional fields we will introduce in a later commit. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Jon Maloy	998d3907f4	tipc: re-organize members of struct publication In a future commit we will introduce more members to struct publication. In order to keep this structure comprehensible we now group some of its current fields into the sub-structures where they really belong, - A struct tipc_service_range for the functional address the publication is representing. - A struct tipc_socket_addr for the socket bound to that service range. We also rename the stack variable 'publ' to just 'p' in a few places. This is just as easy to understand in the given context, and keeps the number of wrapped code lines to a minimum. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:51:04 -07:00
Alexander Duyck	7888fe53b7	ethtool: Add common function for filling out strings Add a function to handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by the ETH_GSTRING_LEN. Most of the drivers end up doing this and several have implemented their own versions of this function so it would make sense to consolidate on one implementation. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:42:30 -07:00
Ayush Garg	87df8bcccd	Bluetooth: Fix incorrect status handling in LE PHY UPDATE event Skip updation of tx and rx PHYs values, when PHY Update event's status is not successful. Signed-off-by: Ayush Garg <ayush.garg@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-17 15:21:29 +01:00
Flavio Leitner	ebfbc46b35	openvswitch: Warn over-mtu packets only if iface is UP. It is not unusual to have the bridge port down. Sometimes it has the old MTU, which is fine since it's not being used. However, the kernel spams the log with a warning message when a packet is going to be sent over such port. Fix that by warning only if the interface is UP. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 16:28:30 -07:00
Horatiu Vultur	2ed2c5f039	net: ocelot: Remove ocelot_xfh_get_cpuq Now when extracting frames from CPU the cpuq is not used anymore so remove it. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:49:52 -07:00
Horatiu Vultur	7c588c3e96	net: ocelot: Extend MRP This patch extends MRP support for Ocelot. It allows to have multiple rings and when the node has the MRC role it forwards MRP Test frames in HW. For MRM there is no change. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:49:52 -07:00
wenxu	d29334c15d	net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct When openvswitch conntrack offload with act_ct action. The first rule do conntrack in the act_ct in tc subsystem. And miss the next rule in the tc and fallback to the ovs datapath but miss set post_ct flag which will lead the ct_state_key with -trk flag. Fixes: `7baf2429a1` ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:22:18 -07:00
Yejune Deng	f105f26e45	net: ipv4: route.c: simplify procfs code proc_creat_seq() that directly take a struct seq_operations, and deal with network namespaces in ->open. Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 14:43:49 -07:00
Jarod Wilson	239729a21e	wireless/nl80211: fix wdev_id may be used uninitialized Build currently fails with -Werror=maybe-uninitialized set: net/wireless/nl80211.c: In function '__cfg80211_wdev_from_attrs': net/wireless/nl80211.c:124:44: error: 'wdev_id' may be used uninitialized in this function [-Werror=maybe-uninitialized] Easy fix is to just initialize wdev_id to 0, since it's value doesn't otherwise matter unless have_wdev_id is true. Fixes: `a05829a722` ("cfg80211: avoid holding the RTNL when calling the driver") CC: Johannes Berg <johannes@sipsolutions.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jakub Kicinski <kuba@kernel.org> CC: linux-wireless@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20210312163651.1398207-1-jarod@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:20:47 +01:00
Karthikeyan Kathirvel	041c881a0b	mac80211: choose first enabled channel for monitor Even if the first channel from sband channel list is invalid or disabled mac80211 ends up choosing it as the default channel for monitor interfaces, making them not usable. Fix this by assigning the first available valid or enabled channel instead. Signed-off-by: Karthikeyan Kathirvel <kathirve@codeaurora.org> Link: https://lore.kernel.org/r/1615440547-7661-1-git-send-email-kathirve@codeaurora.org [reword commit message, comment, code cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:20:47 +01:00
Johannes Berg	77cbf790e5	nl80211: fix locking for wireless device netns change We have all the network interfaces marked as netns-local since the only reasonable thing to do right now is to set a whole device, including all netdevs, into a different network namespace. For this reason, we also have our own way of changing the network namespace. Unfortunately, the RTNL locking changes broke this, and it now results in many RTNL assertions. The trivial fix for those (just hold RTNL for the changes) however leads to deadlocks in the cfg80211 netdev notifier. Since we only need the wiphy, and that's still protected by the RTNL, add a new NL80211_FLAG_NO_WIPHY_MTX flag to the nl80211 ops and use it to _not_ take the wiphy mutex but only the RTNL. This way, the notifier does all the work necessary during unregistration/registration of the netdevs from the old and in the new namespace. Reported-by: Sid Hayn <sidhayn@gmail.com> Fixes: `a05829a722` ("cfg80211: avoid holding the RTNL when calling the driver") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210310215839.eadf7c43781b.I5fc6cf6676f800ab8008e03bbea9c3349b02d804@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:20:47 +01:00
Daniel Phan	58d25626f6	mac80211: Check crypto_aead_encrypt for errors crypto_aead_encrypt returns <0 on error, so if these calls are not checked, execution may continue with failed encrypts. It also seems that these two crypto_aead_encrypt calls are the only instances in the codebase that are not checked for errors. Signed-off-by: Daniel Phan <daniel.phan36@gmail.com> Link: https://lore.kernel.org/r/20210309204137.823268-1-daniel.phan36@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:20:41 +01:00
Brian Norris	0f7e90fadd	mac80211: Allow HE operation to be longer than expected. We observed some Cisco APs sending the following HE Operation IE in associate response: ff 0a 24 f4 3f 00 01 fc ff 00 00 00 Its HE operation parameter is 0x003ff4, so the expected total length is 7 which does not match the actual length = 10. This causes association failing with "HE AP is missing HE Capability/operation." According to P802.11ax_D4 Table9-94, HE operation is extensible, and according to 802.11-2016 10.27.8, STA should discard the part beyond the maximum length and parse the truncated element. Allow HE operation element to be longer than expected to handle this case and future extensions. Fixes: `e4d005b80d` ("mac80211: refactor extended element parsing") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Yen-lin Lai <yenlinlai@chromium.org> Link: https://lore.kernel.org/r/20210223051926.2653301-1-yenlinlai@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:14:04 +01:00
Johannes Berg	29175be06d	mac80211: minstrel_ht: remove unused variable 'mg' This probably came in through some refactoring and what is now a call to minstrel_ht_group_min_rate_offset(), remove the unused variable. Reported-by: kernel test robot <lkp@intel.com> Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210219105744.f2538a80f6cf.I3d53554c158d5b896ac07ea546bceac67372ec28@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:13:13 +01:00
Markus Theil	3bd801b14e	mac80211: fix double free in ibss_leave Clear beacon ie pointer and ie length after free in order to prevent double free. ================================================================== BUG: KASAN: double-free or invalid-free \ in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 CPU: 0 PID: 8472 Comm: syz-executor100 Not tainted 5.11.0-rc6-syzkaller #0 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2c6 mm/kasan/report.c:230 kasan_report_invalid_free+0x51/0x80 mm/kasan/report.c:355 ____kasan_slab_free+0xcc/0xe0 mm/kasan/common.c:341 kasan_slab_free include/linux/kasan.h:192 [inline] __cache_free mm/slab.c:3424 [inline] kfree+0xed/0x270 mm/slab.c:3760 ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 rdev_leave_ibss net/wireless/rdev-ops.h:545 [inline] __cfg80211_leave_ibss+0x19a/0x4c0 net/wireless/ibss.c:212 __cfg80211_leave+0x327/0x430 net/wireless/core.c:1172 cfg80211_leave net/wireless/core.c:1221 [inline] cfg80211_netdev_notifier_call+0x9e8/0x12c0 net/wireless/core.c:1335 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2040 call_netdevice_notifiers_extack net/core/dev.c:2052 [inline] call_netdevice_notifiers net/core/dev.c:2066 [inline] __dev_close_many+0xee/0x2e0 net/core/dev.c:1586 __dev_close net/core/dev.c:1624 [inline] __dev_change_flags+0x2cb/0x730 net/core/dev.c:8476 dev_change_flags+0x8a/0x160 net/core/dev.c:8549 dev_ifsioc+0x210/0xa70 net/core/dev_ioctl.c:265 dev_ioctl+0x1b1/0xc40 net/core/dev_ioctl.c:511 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060 sock_ioctl+0x477/0x6a0 net/socket.c:1177 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+93976391bf299d425f44@syzkaller.appspotmail.com Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20210213133653.367130-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:13:09 +01:00
Johannes Berg	1944015fe9	mac80211: fix rate mask reset Coverity reported the strange "if (~...)" condition that's always true. It suggested that ! was intended instead of ~, but upon further analysis I'm convinced that what really was intended was a comparison to 0xff/0xffff (in HT/VHT cases respectively), since this indicates that all of the rates are enabled. Change the comparison accordingly. I'm guessing this never really mattered because a reset to not having a rate mask is basically equivalent to having a mask that enables all rates. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: `2ffbe6d333` ("mac80211: fix and optimize MCS mask handling") Fixes: `b119ad6e72` ("mac80211: add rate mask logic for vht rates") Reviewed-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210212112213.36b38078f569.I8546a20c80bc1669058eb453e213630b846e107b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-03-16 21:13:06 +01:00
Nikolay Aleksandrov	e09cf58205	net: bridge: mcast: factor out common allow/block EHT handling We hande EHT state change for ALLOW messages in INCLUDE mode and for BLOCK messages in EXCLUDE mode similarly - create the new set entries with the proper filter mode. We also handle EHT state change for ALLOW messages in EXCLUDE mode and for BLOCK messages in INCLUDE mode in a similar way - delete the common entries (current set and new set). Factor out all the common code as follows: - ALLOW/INCLUDE, BLOCK/EXCLUDE: call __eht_create_set_entries() - ALLOW/EXCLUDE, BLOCK/INCLUDE: call __eht_del_common_set_entries() The set entries creation can be reused in __eht_inc_exc() as well. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 11:57:57 -07:00
Nikolay Aleksandrov	6aa2c371c7	net: bridge: mcast: remove unreachable EHT code In the initial EHT versions there were common functions which handled allow/block messages for both INCLUDE and EXCLUDE modes, but later they were separated. It seems I've left some common code which cannot be reached because the filter mode is checked before calling the respective functions, i.e. the host filter is always in EXCLUDE mode when using __eht_allow_excl() and __eht_block_excl() thus we can drop the host_excl checks inside and simplify the code a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 11:57:57 -07:00
DENG Qingfang	5a30833b9a	net: dsa: mt7530: support MDB and bridge flag operations Support port MDB and bridge flag operations. As the hardware can manage multicast forwarding itself, offload_fwd_mark can be unconditionally set to true. Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 11:54:41 -07:00
Linus Torvalds	4108e10197	Miscellaneous NFSD fixes for v5.12-rc. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmBLe0AACgkQM2qzM29m f5efQQ//RE+kcXPczmmlNuRWMsVVTmlLw7lV6qOKeiaQAoNCD+Y4I99iAJPyCLeH NbZrnSqJstvoRIo1fS9koVuOfSlIzBOvjKeQuIw4vP9pGyEHfKfxPe2BH9Ijlb9W BxasLSukin7ju+5MHVZz2Z1GYopHU+hjn33uRLZk/JcSA97bLfoJFWZbWafXFBiK 1OY0gK26tkucTEYDobwVn7uUM4Swl4VlpsqJOoR2wCiUwBa9aOo9A/zJaQ2XS7Ut 8y5AxiBiVRlhrFkrN4cidpzV3OhDXIxPP9sa3qQ6PVUE2waS1QC0vTEx3Bsw4X3G RzgrS3Ceq7YXJCMminzM9SbTtInsAeToJHDDzXiLDrzh+3u4u7dhknh+Ag8M2vDL s4ZSBoIX74XFEGTU/KMCRXtBjum4WfzGE5p1tXBx44hVBUx75i0Ktdgl/8ap1uxE YqlpH1zLXpfe1zxbOd7huD+A5QbXWoHjALwMj7KewBp8j8/UZ/RrHAWc+ZOaxJwO 7PejnTPn6agdpNRXHhnV+XIAU9eoaMbCTVnxRN++ddaP6jIS6CdHSRVOtpUvhW9u VMprXUQ8ozA6ZXyyV41sVmzVC8l2f8DtwMm14yONgTA/0DX+JgrDKzXHbkW7ol3+ 7RuSM8mcmXAq+PxG9osUFOGxcZktgGzUhhYINm3twI0YShBPY4k= =4Nt4 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Miscellaneous NFSD fixes for v5.12-rc" * tag 'nfsd-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: svcrdma: Revert "svcrdma: Reduce Receive doorbell rate" NFSD: fix error handling in NFSv4.0 callbacks NFSD: fix dest to src mount in inter-server COPY Revert "nfsd4: a client's own opens needn't prevent delegations" Revert "nfsd4: remove check_conflicting_opens warning" rpc: fix NULL dereference on kmalloc failure sunrpc: fix refcount leak for rpc auth modules NFSD: Repair misuse of sv_lock in 5.10.16-rt30. nfsd: don't abort copies early fs: nfsd: fix kconfig dependency warning for NFSD_V4 svcrdma: disable timeouts on rdma backchannel nfsd: Don't keep looking up unhashed files in the nfsd file cache	2021-03-16 10:22:50 -07:00
Jiri Kosina	17486960d7	Bluetooth: avoid deadlock between hci_dev->lock and socket lock Commit `eab2404ba7` ("Bluetooth: Add BT_PHY socket option") added a dependency between socket lock and hci_dev->lock that could lead to deadlock. It turns out that hci_conn_get_phy() is not in any way relying on hdev being immutable during the runtime of this function, neither does it even look at any of the members of hdev, and as such there is no need to hold that lock. This fixes the lockdep splat below: ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc1-00026-g73d464503354 #10 Not tainted ------------------------------------------------------ bluetoothd/1118 is trying to acquire lock: ffff8f078383c078 (&hdev->lock){+.+.}-{3:3}, at: hci_conn_get_phy+0x1c/0x150 [bluetooth] but task is already holding lock: ffff8f07e831d920 (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}-{0:0}, at: l2cap_sock_getsockopt+0x8b/0x610 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}-{0:0}: lock_sock_nested+0x72/0xa0 l2cap_sock_ready_cb+0x18/0x70 [bluetooth] l2cap_config_rsp+0x27a/0x520 [bluetooth] l2cap_sig_channel+0x658/0x1330 [bluetooth] l2cap_recv_frame+0x1ba/0x310 [bluetooth] hci_rx_work+0x1cc/0x640 [bluetooth] process_one_work+0x244/0x5f0 worker_thread+0x3c/0x380 kthread+0x13e/0x160 ret_from_fork+0x22/0x30 -> #2 (&chan->lock#2/1){+.+.}-{3:3}: __mutex_lock+0xa3/0xa10 l2cap_chan_connect+0x33a/0x940 [bluetooth] l2cap_sock_connect+0x141/0x2a0 [bluetooth] __sys_connect+0x9b/0xc0 __x64_sys_connect+0x16/0x20 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (&conn->chan_lock){+.+.}-{3:3}: __mutex_lock+0xa3/0xa10 l2cap_chan_connect+0x322/0x940 [bluetooth] l2cap_sock_connect+0x141/0x2a0 [bluetooth] __sys_connect+0x9b/0xc0 __x64_sys_connect+0x16/0x20 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&hdev->lock){+.+.}-{3:3}: __lock_acquire+0x147a/0x1a50 lock_acquire+0x277/0x3d0 __mutex_lock+0xa3/0xa10 hci_conn_get_phy+0x1c/0x150 [bluetooth] l2cap_sock_getsockopt+0x5a9/0x610 [bluetooth] __sys_getsockopt+0xcc/0x200 __x64_sys_getsockopt+0x20/0x30 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &hdev->lock --> &chan->lock#2/1 --> sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP); lock(&chan->lock#2/1); lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP); lock(&hdev->lock); * DEADLOCK * 1 lock held by bluetoothd/1118: #0: ffff8f07e831d920 (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}-{0:0}, at: l2cap_sock_getsockopt+0x8b/0x610 [bluetooth] stack backtrace: CPU: 3 PID: 1118 Comm: bluetoothd Not tainted 5.12.0-rc1-00026-g73d464503354 #10 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: dump_stack+0x7f/0xa1 check_noncircular+0x105/0x120 ? __lock_acquire+0x147a/0x1a50 __lock_acquire+0x147a/0x1a50 lock_acquire+0x277/0x3d0 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] ? __lock_acquire+0x2e1/0x1a50 ? lock_is_held_type+0xb4/0x120 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] __mutex_lock+0xa3/0xa10 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] ? lock_acquire+0x277/0x3d0 ? mark_held_locks+0x49/0x70 ? mark_held_locks+0x49/0x70 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] hci_conn_get_phy+0x1c/0x150 [bluetooth] l2cap_sock_getsockopt+0x5a9/0x610 [bluetooth] __sys_getsockopt+0xcc/0x200 __x64_sys_getsockopt+0x20/0x30 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fb73df33eee Code: 48 8b 0d 85 0f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 0f 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fffcfbbbf08 EFLAGS: 00000203 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007fb73df33eee RDX: 000000000000000e RSI: 0000000000000112 RDI: 0000000000000018 RBP: 0000000000000000 R08: 00007fffcfbbbf44 R09: 0000000000000000 R10: 00007fffcfbbbf3c R11: 0000000000000203 R12: 0000000000000000 R13: 0000000000000018 R14: 0000000000000000 R15: 0000556fcefc70d0 Fixes: `eab2404ba7` ("Bluetooth: Add BT_PHY socket option") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-16 15:29:25 +01:00
Marc Kleine-Budde	d4eb538e1f	can: isotp: TX-path: ensure that CAN frame flags are initialized The previous patch ensures that the TX flags (struct can_isotp_ll_options::tx_flags) are 0 for classic CAN frames or a user configured value for CAN-FD frames. This patch sets the CAN frames flags unconditionally to the ISO-TP TX flags, so that they are initialized to a proper value. Otherwise when running "candump -x" on a classical CAN ISO-TP stream shows wrongly set "B" and "E" flags. \| $ candump any,0:0,#FFFFFFFF -extA \| [...] \| can0 TX B E 713 [8] 2B 0A 0B 0C 0D 0E 0F 00 \| can0 TX B E 713 [8] 2C 01 02 03 04 05 06 07 \| can0 TX B E 713 [8] 2D 08 09 0A 0B 0C 0D 0E \| can0 TX B E 713 [8] 2E 0F 00 01 02 03 04 05 Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/r/20210218215434.1708249-2-mkl@pengutronix.de Cc: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-03-16 08:40:04 +01:00
Marc Kleine-Budde	e4912459bd	can: isotp: isotp_setsockopt(): only allow to set low level TX flags for CAN-FD CAN-FD frames have struct canfd_frame::flags, while classic CAN frames don't. This patch refuses to set TX flags (struct can_isotp_ll_options::tx_flags) on non CAN-FD isotp sockets. Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/r/20210218215434.1708249-2-mkl@pengutronix.de Cc: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-03-16 08:40:04 +01:00
Martin Willi	3a5ca85707	can: dev: Move device back to init netns on owning netns delete When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14) [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8) [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114) [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac) [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60) [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380) [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438) [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8) [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c) [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: `e008b5fc8d` ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-03-16 08:40:04 +01:00
Davide Caratti	13832ae275	mptcp: fix ADD_ADDR HMAC in case port is specified Currently, Linux computes the HMAC contained in ADD_ADDR sub-option using the Address Id and the IP Address, and hardcodes a destination port equal to zero. This is not ok for ADD_ADDR with port: ensure to account for the endpoint port when computing the HMAC, in compliance with RFC8684 §3.4.1. Fixes: `22fb85ffae` ("mptcp: add port support for ADD_ADDR suboption writing") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-15 16:43:01 -07:00
Alexander Ovechkin	7233da8669	tcp: relookup sock for RST+ACK packets handled by obsolete req sock Currently tcp_check_req can be called with obsolete req socket for which big socket have been already created (because of CPU race or early demux assigning req socket to multiple packets in gro batch). Commit `e0f9759f53` ("tcp: try to keep packet if SYN_RCV race is lost") added retry in case when tcp_check_req is called for PSH\|ACK packet. But if client sends RST+ACK immediatly after connection being established (it is performing healthcheck, for example) retry does not occur. In that case tcp_check_req tries to close req socket, leaving big socket active. Fixes: `e0f9759f53` ("tcp: try to keep packet if SYN_RCV race is lost") Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru> Reported-by: Oleg Senin <olegsenin@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-15 14:34:29 -07:00
Luiz Augusto von Dentz	2e1614f7d6	Bluetooth: SMP: Convert BT_ERR/BT_DBG to bt_dev_err/bt_dev_dbg This converts instances of BT_ERR and BT_DBG to bt_dev_err and bt_dev_dbg which can be enabled at runtime when BT_FEATURE_DEBUG is enabled. Note: Not all instances could be converted as some are exercised by selftest. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-15 21:42:25 +01:00
Eric Dumazet	0217ed2848	tipc: better validate user input in tipc_nl_retrieve_key() Before calling tipc_aead_key_size(ptr), we need to ensure we have enough data to dereference ptr->keylen. We probably also want to make sure tipc_aead_key_size() wont overflow with malicious ptr->keylen values. Syzbot reported: BUG: KMSAN: uninit-value in __tipc_nl_node_set_key net/tipc/node.c:2971 [inline] BUG: KMSAN: uninit-value in tipc_nl_node_set_key+0x9bf/0x13b0 net/tipc/node.c:3023 CPU: 0 PID: 21060 Comm: syz-executor.5 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:120 kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197 __tipc_nl_node_set_key net/tipc/node.c:2971 [inline] tipc_nl_node_set_key+0x9bf/0x13b0 net/tipc/node.c:3023 genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0x1319/0x1610 net/netlink/genetlink.c:800 netlink_rcv_skb+0x6fa/0x810 net/netlink/af_netlink.c:2494 genl_rcv+0x63/0x80 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11d6/0x14a0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x1740/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0xcfc/0x12f0 net/socket.c:2345 ___sys_sendmsg net/socket.c:2399 [inline] __sys_sendmsg+0x714/0x830 net/socket.c:2432 __compat_sys_sendmsg net/compat.c:347 [inline] __do_compat_sys_sendmsg net/compat.c:354 [inline] __se_compat_sys_sendmsg+0xa7/0xc0 net/compat.c:351 __ia32_compat_sys_sendmsg+0x4a/0x70 net/compat.c:351 do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline] __do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c RIP: 0023:0xf7f60549 Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 RSP: 002b:00000000f555a5fc EFLAGS: 00000296 ORIG_RAX: 0000000000000172 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000200 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104 kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76 slab_alloc_node mm/slub.c:2907 [inline] __kmalloc_node_track_caller+0xa37/0x1430 mm/slub.c:4527 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x2f8/0xb30 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1099 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline] netlink_sendmsg+0xdbc/0x1840 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0xcfc/0x12f0 net/socket.c:2345 ___sys_sendmsg net/socket.c:2399 [inline] __sys_sendmsg+0x714/0x830 net/socket.c:2432 __compat_sys_sendmsg net/compat.c:347 [inline] __do_compat_sys_sendmsg net/compat.c:354 [inline] __se_compat_sys_sendmsg+0xa7/0xc0 net/compat.c:351 __ia32_compat_sys_sendmsg+0x4a/0x70 net/compat.c:351 do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline] __do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Fixes: `e1f32190cf` ("tipc: add support for AEAD key setting via netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tuong Lien <tuong.t.lien@dektech.com.au> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-15 13:21:18 -07:00
Luiz Augusto von Dentz	7cf3b1dd6a	Bluetooth: L2CAP: Fix not checking for maximum number of DCID When receiving L2CAP_CREDIT_BASED_CONNECTION_REQ the remote may request more channels than allowed by the spec (10 octecs = 5 CIDs) so this checks if the number of channels is bigger than the maximum allowed and respond with an error. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-15 21:08:33 +01:00
Sonny Sasaka	c06632a4ec	Bluetooth: Cancel le_scan_restart work when stopping discovery Not cancelling it has caused a bug where passive background scanning is disabled out of the blue, preventing BLE keyboards/mice to reconnect. Here is how it happens: After hci_req_stop_discovery, there is still le_scan_restart_work scheduled. Invocation of le_scan_restart_work causes a harmful le_scan_disable_work to be scheduled. This le_scan_disable_work will eventually disable passive scanning when the timer fires. Sample btmon trace: < HCI Command: LE Set Scan Parameters (0x08\|0x000b) plen 7 Type: Passive (0x00) Interval: 367.500 msec (0x024c) Window: 37.500 msec (0x003c) Own address type: Public (0x00) Filter policy: Accept all advertisement (0x00) > HCI Event: Command Complete (0x0e) plen 4 LE Set Scan Parameters (0x08\|0x000b) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08\|0x000c) plen 2 Scanning: Enabled (0x01) Filter duplicates: Disabled (0x00) > HCI Event: Command Complete (0x0e) plen 4 LE Set Scan Enable (0x08\|0x000c) ncmd 2 Status: Success (0x00) ... < HCI Command: LE Set Scan Enable (0x08\|0x000c) plen 2 Scanning: Disabled (0x00) Filter duplicates: Disabled (0x00) > HCI Event: Command Complete (0x0e) plen 4 LE Set Scan Enable (0x08\|0x000c) ncmd 2 Status: Success (0x00) // Background scanning is not working here onwards. Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-03-15 21:06:51 +01:00
Lorenzo Bianconi	8f64860f8b	net: export dev_set_threaded symbol For wireless devices (e.g. mt76 driver) multiple net_devices belongs to the same wireless phy and the napi object is registered in a dummy netdevice related to the wireless phy. Export dev_set_threaded in order to be reused in device drivers enabling threaded NAPI. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-15 12:35:23 -07:00
Florian Westphal	b58f33d49e	netfilter: ctnetlink: fix dump of the expect mask attribute Before this change, the mask is never included in the netlink message, so "conntrack -E expect" always prints 0.0.0.0. In older kernels the l3num callback struct was passed as argument, based on tuple->src.l3num. After the l3num indirection got removed, the call chain is based on m.src.l3num, but this value is 0xffff. Init l3num to the correct value. Fixes: `f957be9d34` ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-15 18:42:01 +01:00
Mark Tomlinson	175e476b8c	netfilter: x_tables: Use correct memory barriers. When a new table value was assigned, it was followed by a write memory barrier. This ensured that all writes before this point would complete before any writes after this point. However, to determine whether the rules are unused, the sequence counter is read. To ensure that all writes have been done before these reads, a full memory barrier is needed, not just a write memory barrier. The same argument applies when incrementing the counter, before the rules are read. Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic reported in `cc00bcaa58` (which is still present), while still maintaining the same speed of replacing tables. The smb_mb() barriers potentially slow the packet path, however testing has shown no measurable change in performance on a 4-core MIPS64 platform. Fixes: `7f5c6d4f66` ("netfilter: get rid of atomic ops in fast path") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-15 18:42:01 +01:00
Mark Tomlinson	d3d40f2374	Revert "netfilter: x_tables: Switch synchronization to RCU" This reverts commit `cc00bcaa58`. This (and the preceding) patch basically re-implemented the RCU mechanisms of patch `784544739a`. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Prior to using RCU a script calling "iptables" approx. 200 times was taking 1.16s. With RCU this increased to 11.59s. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-15 18:42:01 +01:00
Mark Tomlinson	abe7034b9a	Revert "netfilter: x_tables: Update remaining dereference to RCU" This reverts commit `443d6e86f8`. This (and the following) patch basically re-implemented the RCU mechanisms of patch `784544739a`. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-15 18:42:00 +01:00
Manu Bretelle	6503b9f29a	bpf: Add getter and setter for SO_REUSEPORT through bpf_{g,s}etsockopt Augment the current set of options that are accessible via bpf_{g,s}etsockopt to also support SO_REUSEPORT. Signed-off-by: Manu Bretelle <chantra@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210310182305.1910312-1-chantra@fb.com	2021-03-15 17:22:22 +01:00
Greg Kroah-Hartman	280def1e1c	Merge 5.12-rc3 into tty-next Resolves a merge issue with: drivers/tty/hvc/hvcs.c and we want the tty/serial fixes from 5.12-rc3 in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-15 08:43:49 +01:00
Ido Schimmel	07e1a5809b	psample: Add additional metadata attributes Extend psample to report the following attributes when available: * Output traffic class as a 16-bit value * Output traffic class occupancy in bytes as a 64-bit value * End-to-end latency of the packet in nanoseconds resolution * Software timestamp in nanoseconds resolution (always available) * Packet's protocol. Needed for packet dissection in user space (always available) Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 15:00:43 -07:00
Ido Schimmel	a03e99d39f	psample: Encapsulate packet metadata in a struct Currently, callers of psample_sample_packet() pass three metadata attributes: Ingress port, egress port and truncated size. Subsequent patches are going to add more attributes (e.g., egress queue occupancy), which also need an indication whether they are valid or not. Encapsulate packet metadata in a struct in order to keep the number of arguments reasonable. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 15:00:43 -07:00
Alexander Lobakin	59753ce8b1	ethernet: constify eth_get_headlen()'s data argument It's used only for flow dissection, which now takes constant data pointers. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 14:48:02 -07:00
Alexander Lobakin	f96533cded	flow_dissector: constify raw input data argument Flow Dissector code never modifies the input buffer, neither skb nor raw data. Make 'data' argument const for all of the Flow dissector's functions. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 14:46:32 -07:00
Alexander Lobakin	d0eed5c325	gro: give 'hash' variable in dev_gro_receive() a less confusing name 'hash' stores not the flow hash, but the index of the GRO bucket corresponding to it. Change its name to 'bucket' to avoid confusion while reading lines like '__set_bit(hash, &napi->gro_bitmask)'. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 14:41:09 -07:00
Alexander Lobakin	9dc2c31337	gro: consistentify napi->gro_hash[x] access in dev_gro_receive() GRO bucket index doesn't change through the entire function. Store a pointer to the corresponding bucket instead of its member and use it consistently through the function. It is performance-safe since &gro_list->list == gro_list. Misc: remove superfluous braces around single-line branches. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 14:41:08 -07:00
Alexander Lobakin	0ccf4d50d1	gro: simplify gro_list_prepare() gro_list_prepare() always returns &napi->gro_hash[bucket].list, without any variations. Moreover, it uses 'napi' argument only to have access to this list, and calculates the bucket index for the second time (firstly it happens at the beginning of dev_gro_receive()) to do that. Given that dev_gro_receive() already has an index to the needed list, just pass it as the first argument to eliminate redundant calculations, and make gro_list_prepare() return void. Also, both arguments of gro_list_prepare() can be constified since this function can only modify the skbs from the bucket list. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-14 14:41:08 -07:00

... 6 7 8 9 10 ...

65018 Commits