mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-09-04 20:19:47 +08:00 
			
		
		
		
	
			
				
					
						
					
					80c80164a5
				
			
			
		
	
	
		
			11 Commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  Jakub Kicinski | 7288dd2fd4 | genetlink: use attrs from struct genl_info Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> | ||
|  Jason A. Donenfeld | f58d0a9b4c | wireguard: netlink: send staged packets when setting initial private key Packets bound for peers can queue up prior to the device private key
being set. For example, if persistent keepalive is set, a packet is
queued up to be sent as soon as the device comes up. However, if the
private key hasn't been set yet, the handshake message never sends, and
no timer is armed to retry, since that would be pointless.
But, if a user later sets a private key, the expectation is that those
queued packets, such as a persistent keepalive, are actually sent. So
adjust the configuration logic to account for this edge case, and add a
test case to make sure this works.
Maxim noticed this with a wg-quick(8) config to the tune of:
    [Interface]
    PostUp = wg set %i private-key somefile
    [Peer]
    PublicKey = ...
    Endpoint = ...
    PersistentKeepalive = 25
Here, the private key gets set after the device comes up using a PostUp
script, triggering the bug.
Fixes:  | ||
|  Jakub Kicinski | 0140a7168f | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/freescale/fec.h | ||
|  Jason A. Donenfeld | 26c013108c | wireguard: netlink: avoid variable-sized memcpy on sockaddr Doing a variable-sized memcpy is slower, and the compiler isn't smart
enough to turn this into a constant-size assignment.
Further, Kees' latest fortified memcpy will actually bark, because the
destination pointer is type sockaddr, not explicitly sockaddr_in or
sockaddr_in6, so it thinks there's an overflow:
    memcpy: detected field-spanning write (size 28) of single field
    "&endpoint.addr" at drivers/net/wireguard/netlink.c:446 (size 16)
Fix this by just assigning by using explicit casts for each checked
case.
Fixes:  | ||
|  Jakub Kicinski | 9c5d03d362 | genetlink: start to validate reserved header bytes We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net> | ||
|  Johannes Berg | bc04358550 | netlink: consistently use NLA_POLICY_MIN_LEN() Change places that open-code NLA_POLICY_MIN_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> | ||
|  Johannes Berg | 8140860c81 | netlink: consistently use NLA_POLICY_EXACT_LEN() Change places that open-code NLA_POLICY_EXACT_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> | ||
|  Jason A. Donenfeld | 900575aa33 | wireguard: device: avoid circular netns references Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Fixes:  | ||
|  Jason A. Donenfeld | 11a7686aa9 | wireguard: noise: error out precomputed DH during handshake rather than config We precompute the static-static ECDH during configuration time, in order to save an expensive computation later when receiving network packets. However, not all ECDH computations yield a contributory result. Prior, we were just not letting those peers be added to the interface. However, this creates a strange inconsistency, since it was still possible to add other weird points, like a valid public key plus a low-order point, and, like points that result in zeros, a handshake would not complete. In order to make the behavior more uniform and less surprising, simply allow all peers to be added. Then, we'll error out later when doing the crypto if there's an issue. This also adds more separation between the crypto layer and the configuration layer. Discussed-with: Mathias Hall-Andersen <mathias@hall-andersen.dk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> | ||
|  Jason A. Donenfeld | ec31c2676a | wireguard: noise: reject peers with low order public keys Our static-static calculation returns a failure if the public key is of
low order. We check for this when peers are added, and don't allow them
to be added if they're low order, except in the case where we haven't
yet been given a private key. In that case, we would defer the removal
of the peer until we're given a private key, since at that point we're
doing new static-static calculations which incur failures we can act on.
This meant, however, that we wound up removing peers rather late in the
configuration flow.
Syzkaller points out that peer_remove calls flush_workqueue, which in
turn might then wait for sending a handshake initiation to complete.
Since handshake initiation needs the static identity lock, holding the
static identity lock while calling peer_remove can result in a rare
deadlock. We have precisely this case in this situation of late-stage
peer removal based on an invalid public key. We can't drop the lock when
removing, because then incoming handshakes might interact with a bogus
static-static calculation.
While the band-aid patch for this would involve breaking up the peer
removal into two steps like wg_peer_remove_all does, in order to solve
the locking issue, there's actually a much more elegant way of fixing
this:
If the static-static calculation succeeds with one private key, it
*must* succeed with all others, because all 32-byte strings map to valid
private keys, thanks to clamping. That means we can get rid of this
silly dance and locking headaches of removing peers late in the
configuration flow, and instead just reject them early on, regardless of
whether the device has yet been assigned a private key. For the case
where the device doesn't yet have a private key, we safely use zeros
just for the purposes of checking for low order points by way of
checking the output of the calculation.
The following PoC will trigger the deadlock:
ip link add wg0 type wireguard
ip addr add 10.0.0.1/24 dev wg0
ip link set wg0 up
ping -f 10.0.0.2 &
while true; do
        wg set wg0 private-key /dev/null peer AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= allowed-ips 10.0.0.0/24 endpoint 10.0.0.3:1234
        wg set wg0 private-key <(echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=)
done
[    0.949105] ======================================================
[    0.949550] WARNING: possible circular locking dependency detected
[    0.950143] 5.5.0-debug+ #18 Not tainted
[    0.950431] ------------------------------------------------------
[    0.950959] wg/89 is trying to acquire lock:
[    0.951252] ffff8880333e2128 ((wq_completion)wg-kex-wg0){+.+.}, at: flush_workqueue+0xe3/0x12f0
[    0.951865]
[    0.951865] but task is already holding lock:
[    0.952280] ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[    0.953011]
[    0.953011] which lock already depends on the new lock.
[    0.953011]
[    0.953651]
[    0.953651] the existing dependency chain (in reverse order) is:
[    0.954292]
[    0.954292] -> #2 (&wg->static_identity.lock){++++}:
[    0.954804]        lock_acquire+0x127/0x350
[    0.955133]        down_read+0x83/0x410
[    0.955428]        wg_noise_handshake_create_initiation+0x97/0x700
[    0.955885]        wg_packet_send_handshake_initiation+0x13a/0x280
[    0.956401]        wg_packet_handshake_send_worker+0x10/0x20
[    0.956841]        process_one_work+0x806/0x1500
[    0.957167]        worker_thread+0x8c/0xcb0
[    0.957549]        kthread+0x2ee/0x3b0
[    0.957792]        ret_from_fork+0x24/0x30
[    0.958234]
[    0.958234] -> #1 ((work_completion)(&peer->transmit_handshake_work)){+.+.}:
[    0.958808]        lock_acquire+0x127/0x350
[    0.959075]        process_one_work+0x7ab/0x1500
[    0.959369]        worker_thread+0x8c/0xcb0
[    0.959639]        kthread+0x2ee/0x3b0
[    0.959896]        ret_from_fork+0x24/0x30
[    0.960346]
[    0.960346] -> #0 ((wq_completion)wg-kex-wg0){+.+.}:
[    0.960945]        check_prev_add+0x167/0x1e20
[    0.961351]        __lock_acquire+0x2012/0x3170
[    0.961725]        lock_acquire+0x127/0x350
[    0.961990]        flush_workqueue+0x106/0x12f0
[    0.962280]        peer_remove_after_dead+0x160/0x220
[    0.962600]        wg_set_device+0xa24/0xcc0
[    0.962994]        genl_rcv_msg+0x52f/0xe90
[    0.963298]        netlink_rcv_skb+0x111/0x320
[    0.963618]        genl_rcv+0x1f/0x30
[    0.963853]        netlink_unicast+0x3f6/0x610
[    0.964245]        netlink_sendmsg+0x700/0xb80
[    0.964586]        __sys_sendto+0x1dd/0x2c0
[    0.964854]        __x64_sys_sendto+0xd8/0x1b0
[    0.965141]        do_syscall_64+0x90/0xd9a
[    0.965408]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[    0.965769]
[    0.965769] other info that might help us debug this:
[    0.965769]
[    0.966337] Chain exists of:
[    0.966337]   (wq_completion)wg-kex-wg0 --> (work_completion)(&peer->transmit_handshake_work) --> &wg->static_identity.lock
[    0.966337]
[    0.967417]  Possible unsafe locking scenario:
[    0.967417]
[    0.967836]        CPU0                    CPU1
[    0.968155]        ----                    ----
[    0.968497]   lock(&wg->static_identity.lock);
[    0.968779]                                lock((work_completion)(&peer->transmit_handshake_work));
[    0.969345]                                lock(&wg->static_identity.lock);
[    0.969809]   lock((wq_completion)wg-kex-wg0);
[    0.970146]
[    0.970146]  *** DEADLOCK ***
[    0.970146]
[    0.970531] 5 locks held by wg/89:
[    0.970908]  #0: ffffffff827433c8 (cb_lock){++++}, at: genl_rcv+0x10/0x30
[    0.971400]  #1: ffffffff82743480 (genl_mutex){+.+.}, at: genl_rcv_msg+0x642/0xe90
[    0.971924]  #2: ffffffff827160c0 (rtnl_mutex){+.+.}, at: wg_set_device+0x9f/0xcc0
[    0.972488]  #3: ffff888032819de0 (&wg->device_update_lock){+.+.}, at: wg_set_device+0xb0/0xcc0
[    0.973095]  #4: ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[    0.973653]
[    0.973653] stack backtrace:
[    0.973932] CPU: 1 PID: 89 Comm: wg Not tainted 5.5.0-debug+ #18
[    0.974476] Call Trace:
[    0.974638]  dump_stack+0x97/0xe0
[    0.974869]  check_noncircular+0x312/0x3e0
[    0.975132]  ? print_circular_bug+0x1f0/0x1f0
[    0.975410]  ? __kernel_text_address+0x9/0x30
[    0.975727]  ? unwind_get_return_address+0x51/0x90
[    0.976024]  check_prev_add+0x167/0x1e20
[    0.976367]  ? graph_lock+0x70/0x160
[    0.976682]  __lock_acquire+0x2012/0x3170
[    0.976998]  ? register_lock_class+0x1140/0x1140
[    0.977323]  lock_acquire+0x127/0x350
[    0.977627]  ? flush_workqueue+0xe3/0x12f0
[    0.977890]  flush_workqueue+0x106/0x12f0
[    0.978147]  ? flush_workqueue+0xe3/0x12f0
[    0.978410]  ? find_held_lock+0x2c/0x110
[    0.978662]  ? lock_downgrade+0x6e0/0x6e0
[    0.978919]  ? queue_rcu_work+0x60/0x60
[    0.979166]  ? netif_napi_del+0x151/0x3b0
[    0.979501]  ? peer_remove_after_dead+0x160/0x220
[    0.979871]  peer_remove_after_dead+0x160/0x220
[    0.980232]  wg_set_device+0xa24/0xcc0
[    0.980516]  ? deref_stack_reg+0x8e/0xc0
[    0.980801]  ? set_peer+0xe10/0xe10
[    0.981040]  ? __ww_mutex_check_waiters+0x150/0x150
[    0.981430]  ? __nla_validate_parse+0x163/0x270
[    0.981719]  ? genl_family_rcv_msg_attrs_parse+0x13f/0x310
[    0.982078]  genl_rcv_msg+0x52f/0xe90
[    0.982348]  ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[    0.982690]  ? register_lock_class+0x1140/0x1140
[    0.983049]  netlink_rcv_skb+0x111/0x320
[    0.983298]  ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[    0.983645]  ? netlink_ack+0x880/0x880
[    0.983888]  genl_rcv+0x1f/0x30
[    0.984168]  netlink_unicast+0x3f6/0x610
[    0.984443]  ? netlink_detachskb+0x60/0x60
[    0.984729]  ? find_held_lock+0x2c/0x110
[    0.984976]  netlink_sendmsg+0x700/0xb80
[    0.985220]  ? netlink_broadcast_filtered+0xa60/0xa60
[    0.985533]  __sys_sendto+0x1dd/0x2c0
[    0.985763]  ? __x64_sys_getpeername+0xb0/0xb0
[    0.986039]  ? sockfd_lookup_light+0x17/0x160
[    0.986397]  ? __sys_recvmsg+0x8c/0xf0
[    0.986711]  ? __sys_recvmsg_sock+0xd0/0xd0
[    0.987018]  __x64_sys_sendto+0xd8/0x1b0
[    0.987283]  ? lockdep_hardirqs_on+0x39b/0x5a0
[    0.987666]  do_syscall_64+0x90/0xd9a
[    0.987903]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[    0.988223] RIP: 0033:0x7fe77c12003e
[    0.988508] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 4
[    0.989666] RSP: 002b:00007fffada2ed58 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[    0.990137] RAX: ffffffffffffffda RBX: 00007fe77c159d48 RCX: 00007fe77c12003e
[    0.990583] RDX: 0000000000000040 RSI: 000055fd1d38e020 RDI: 0000000000000004
[    0.991091] RBP: 000055fd1d38e020 R08: 000055fd1cb63358 R09: 000000000000000c
[    0.991568] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000002c
[    0.992014] R13: 0000000000000004 R14: 000055fd1d38e020 R15: 0000000000000001
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net> | ||
|  Jason A. Donenfeld | e7096c131e | net: WireGuard secure network tunnel WireGuard is a layer 3 secure networking tunnel made specifically for the kernel, that aims to be much simpler and easier to audit than IPsec. Extensive documentation and description of the protocol and considerations, along with formal proofs of the cryptography, are available at: * https://www.wireguard.com/ * https://www.wireguard.com/papers/wireguard.pdf This commit implements WireGuard as a simple network device driver, accessible in the usual RTNL way used by virtual network drivers. It makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of networking subsystem APIs. It has a somewhat novel multicore queueing system designed for maximum throughput and minimal latency of encryption operations, but it is implemented modestly using workqueues and NAPI. Configuration is done via generic Netlink, and following a review from the Netlink maintainer a year ago, several high profile userspace tools have already implemented the API. This commit also comes with several different tests, both in-kernel tests and out-of-kernel tests based on network namespaces, taking profit of the fact that sockets used by WireGuard intentionally stay in the namespace the WireGuard interface was originally created, exactly like the semantics of userspace tun devices. See wireguard.com/netns/ for pictures and examples. The source code is fairly short, but rather than combining everything into a single file, WireGuard is developed as cleanly separable files, making auditing and comprehension easier. Things are laid out as follows: * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the cryptographic aspects of the protocol, and are mostly data-only in nature, taking in buffers of bytes and spitting out buffers of bytes. They also handle reference counting for their various shared pieces of data, like keys and key lists. * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for ratelimiting certain types of cryptographic operations in accordance with particular WireGuard semantics. * allowedips.[ch], peerlookup.[ch]: The main lookup structures of WireGuard, the former being trie-like with particular semantics, an integral part of the design of the protocol, and the latter just being nice helper functions around the various hashtables we use. * device.[ch]: Implementation of functions for the netdevice and for rtnl, responsible for maintaining the life of a given interface and wiring it up to the rest of WireGuard. * peer.[ch]: Each interface has a list of peers, with helper functions available here for creation, destruction, and reference counting. * socket.[ch]: Implementation of functions related to udp_socket and the general set of kernel socket APIs, for sending and receiving ciphertext UDP packets, and taking care of WireGuard-specific sticky socket routing semantics for the automatic roaming. * netlink.[ch]: Userspace API entry point for configuring WireGuard peers and devices. The API has been implemented by several userspace tools and network management utility, and the WireGuard project distributes the basic wg(8) tool. * queueing.[ch]: Shared function on the rx and tx path for handling the various queues used in the multicore algorithms. * send.c: Handles encrypting outgoing packets in parallel on multiple cores, before sending them in order on a single core, via workqueues and ring buffers. Also handles sending handshake and cookie messages as part of the protocol, in parallel. * receive.c: Handles decrypting incoming packets in parallel on multiple cores, before passing them off in order to be ingested via the rest of the networking subsystem with GRO via the typical NAPI poll function. Also handles receiving handshake and cookie messages as part of the protocol, in parallel. * timers.[ch]: Uses the timer wheel to implement protocol particular event timeouts, and gives a set of very simple event-driven entry point functions for callers. * main.c, version.h: Initialization and deinitialization of the module. * selftest/*.h: Runtime unit tests for some of the most security sensitive functions. * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing script using network namespaces. This commit aims to be as self-contained as possible, implementing WireGuard as a standalone module not needing much special handling or coordination from the network subsystem. I expect for future optimizations to the network stack to positively improve WireGuard, and vice-versa, but for the time being, this exists as intentionally standalone. We introduce a menu option for CONFIG_WIREGUARD, as well as providing a verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: David Miller <davem@davemloft.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> |