Add handling of packets received from the tunnel. This implements
tunnel egress functionality.
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add support for tunneling user (inner) packets that are larger than the
tunnel's path MTU (outer) using IP-TFS fragmentation.
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When possible rather than appending secondary (aggregated) inner packets
to the fragment list, share their page fragments with the outer IPTFS
packet. This allows for more efficient packet transmission.
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add tunnel packet output functionality. This is code handles
the ingress to the tunnel.
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add a new xfrm mode implementing AggFrag/IP-TFS from RFC9347.
This utilizes the new xfrm_mode_cbs to implement demand-driven IP-TFS
functionality. This functionality can be used to increase bandwidth
utilization through small packet aggregation, as well as help solve PMTU
issues through it's efficient use of fragmentation.
Link: https://www.rfc-editor.org/rfc/rfc9347.txt
Multiple commits follow to build the functionality into xfrm_iptfs.c
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Define `XFRM_MODE_IPTFS` and `IPSEC_MODE_IPTFS` constants, and add these to
switch case and conditionals adjacent with the existing TUNNEL modes.
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add a set of callbacks xfrm_mode_cbs to xfrm_state. These callbacks
enable the addition of new xfrm modes, such as IP-TFS to be defined
in modules.
Signed-off-by: Christian Hopps <chopps@labn.net>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
lookup and resize can run in parallel.
The xfrm_state_hash_generation seqlock ensures a retry, but the hash
functions can observe a hmask value that is too large for the new hlist
array.
rehash does:
rcu_assign_pointer(net->xfrm.state_bydst, ndst) [..]
net->xfrm.state_hmask = nhashmask;
While state lookup does:
h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family);
hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) {
This is only safe in case the update to state_bydst is larger than
net->xfrm.xfrm_state_hmask (or if the lookup function gets
serialized via state spinlock again).
Fix this by prefetching state_hmask and the associated pointers.
The xfrm_state_hash_generation seqlock retry will ensure that the pointer
and the hmask will be consistent.
The existing helpers, like xfrm_dst_hash(), are now unsafe for RCU side,
add lockdep assertions to document that they are only safe for insert
side.
xfrm_state_lookup_byaddr() uses the spinlock rather than RCU.
AFAICS this is an oversight from back when state lookup was converted to
RCU, this lock should be replaced with RCU in a future patch.
Reported-by: syzbot+5f9f31cb7d985f584d8e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/CACT4Y+azwfrE3uz6A5ZErov5YN2LYBN5KrsymBerT36VU8qzBA@mail.gmail.com/
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: c2f672fc94 ("xfrm: state lookup can be lockless")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When skb needs GSO and wrap around happens, if xo->seq.low (seqno of
the first skb segment) is before the last seq number but oseq (seqno
of the last segment) is after it, xo->seq.low is still bigger than
replay_esn->oseq while oseq is smaller than it, so the update of
replay_esn->oseq_hi is missed for this case wrap around because of
the change in the cited commit.
For example, if sending a packet with gso_segs=3 while old
replay_esn->oseq=0xfffffffe, we calculate:
xo->seq.low = 0xfffffffe + 1 = 0x0xffffffff
oseq = 0xfffffffe + 3 = 0x1
(oseq < replay_esn->oseq) is true, but (xo->seq.low <
replay_esn->oseq) is false, so replay_esn->oseq_hi is not incremented.
To fix this issue, change the outer checking back for the update of
replay_esn->oseq_hi. And add new checking inside for the update of
packet's oseq_hi.
Fixes: 4b549ccce9 ("xfrm: replay: Fix ESN wrap around for GSO")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmc3A/gACgkQrB3Eaf9P
W7fNew//XCIhIvFYaQcP2x84T4EYB679NkGlwMATxXgn40+sp7muSwVweynEWNIu
FltfBAwYD/MxD7g519abVPMWXs/iYI5duw3vvqnxmkOoebWLLocg2VoqFIdVXlQw
/hj+1X/oNT4OKcaQAw/FAGRuYvkc90YB/rRG51RwAIR0tyBjRwfUsozMM8QX/zQI
I0cLCgGAf/kylQre+dhvUkMhXaLogMF5v0qzPxhyMBD02JaUpe6+5cdHQcmKOhqa
ksTpySYnIKIHZrLizeFGDZpinaDIph20vGaDvDXpqTYFuwvCQsZczJy02dF4otf2
2dZz6+2La+ZM+WsGIqpALqKCNhr8fOcQxCRH3eGLPBwoXXt5CFAMgJKob8hKuonW
FgJaYMBZOjYbgGah8WbEe/YsWq4y3uRs48pFtY+T5cn7AskNxIvUoLNjSS83Hlqu
PJbveiKsZygig966Q/zUFATYnvj3zEgjVEcSbK6LRyBXL79Njr8l+PZ0Zoz76tc4
bF1Xv0x+lRYmwa9rvOFaeqrP/GTe0xvlitFzuCN7HnXiN8URKnnDY2odkXYzo+Z7
MBbP8wR/CaoiAvdMw74116nAIFOW95LPtvdGJTvlS9jAOt1P7dWQ3/mFKEpItndv
cJjWzI7HKl0+85FcCDw+tmsDWWGbALUyPw96i8UgUcDGyqVKUgA=
=Ioo8
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next-11-15
1) Add support for RFC 9611 per cpu xfrm state handling.
2) Add inbound and outbound xfrm state caches to speed up
state lookups.
3) Convert xfrm to dscp_t. From Guillaume Nault.
4) Fix error handling in build_aevent.
From Everest K.C.
5) Replace strncpy with strscpy_pad in copy_to_user_auth.
From Daniel Yang.
6) Fix an uninitialized symbol during acquire state insertion.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent commit jumped over the dst hash computation and
left the symbol uninitialized. Fix this by explicitly
computing the dst hash before it is used.
Fixes: 0045e3d806 ("xfrm: Cache used outbound xfrm states at the policy.")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The function strncpy is deprecated since it does not guarantee the
destination buffer is NULL terminated. Recommended replacement is
strscpy. The padded version was used to remain consistent with the other
strscpy_pad usage in the modified function.
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Error handling is missing when call to nla_put_u32() fails.
Handle the error when the call to nla_put_u32() returns an error.
The error was reported by Coverity Scan.
Report:
CID 1601525: (#1 of 1): Unused value (UNUSED_VALUE)
returned_value: Assigning value from nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num)
to err here, but that stored value is overwritten before it can be used
Fixes: 1ddf9916ac ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Everest K.C. <everestkc@everestkc.com.np>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Most of the original conversion is from the spatch below,
but I edited some and left out other instances that were
either buggy after conversion (where default values don't
fit into the type) or just looked strange.
@@
expression attr, def;
expression val;
identifier fn =~ "^nla_get_.*";
fresh identifier dfn = fn ## "_default";
@@
(
-if (attr)
- val = fn(attr);
-else
- val = def;
+val = dfn(attr, def);
|
-if (!attr)
- val = def;
-else
- val = fn(attr);
+val = dfn(attr, def);
|
-if (!attr)
- return def;
-return fn(attr);
+return dfn(attr, def);
|
-attr ? fn(attr) : def
+dfn(attr, def)
|
-!attr ? def : fn(attr)
+dfn(attr, def)
)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add type annotation to the "tos" field of struct xfrm_dst_lookup_params,
to ensure that the ECN bits aren't mistakenly taken into account when
doing route lookups. Rename that field (tos -> dscp) to make that
change explicit.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pass a dscp_t variable to xfrm_dst_lookup(), instead of an int, to
prevent accidental setting of ECN bits in ->flowi4_tos.
Only xfrm_bundle_create() actually calls xfrm_dst_lookup(). Since it
already has a dscp_t variable to pass as parameter, we only need to
remove the inet_dscp_to_dsfield() conversion.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Use a dscp_t variable to store the result of xfrm_get_dscp().
This prepares for the future conversion of xfrm_dst_lookup().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Return a dscp_t variable to prepare for the future conversion of
xfrm_bundle_create() to dscp_t.
While there, rename the function "xfrm_get_dscp", to align its name
with the new return type.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reject the usage of XFRMA_SA_PCPU in xfrm netlink messages when
it's not applicable.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
Now that we can have percpu xfrm states, the number of active
states might increase. To get a better lookup performance,
we add a percpu cache to cache the used inbound xfrm states.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
Now that we can have percpu xfrm states, the number of active
states might increase. To get a better lookup performance,
we cache the used xfrm states at the policy for outbound
IPsec traffic.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
Currently all flows for a certain SA must be processed by the same
cpu to avoid packet reordering and lock contention of the xfrm
state lock.
To get rid of this limitation, the IETF standardized per cpu SAs
in RFC 9611. This patch implements the xfrm part of it.
We add the cpu as a lookup key for xfrm states and a config option
to generate acquire messages for each cpu.
With that, we can have on each cpu a SA with identical traffic selector
so that flows can be processed in parallel on all cpus.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmcXbBEACgkQrB3Eaf9P
W7e0hQ//XiBdyhArA8kYIgsCylrOr+y/uCErnIhzUTqo20uE3dMPvzQHwY1GIgiU
HYXKg49WLVxSuFtLRu32qCr0G+muU1UI5OL58IQuQ+TxKzj0hnV4BqAx+rNYhaFb
JxJhgAcQQu7VCL7/qgqGsQnhq/hhg29Rfqa1VTEZ4RthEMahPDbwyibjyOfqwSgm
fCPIl2FTkB7E0PZnwZJGxmaOJXS7g/djb+CPmBI6zxLQHG5VXY/UGNyObUvTLD9K
gV+N0u0ieyDTxpvpgh6HMAFSkORLS/PIUCAX0SZEW48+7DLbBeKMMYwegtxxJZ3D
3zaWi8uKGh5rjOslQbU4ZlpxJr7yvIV6RhGJhOPDYz5Es4EXHU7c0tZ/pma46eb0
2PJxQyTHW4O9fbybQvl0w9fUQlhjKMbv/TygJgpOIk9YUr2y8Yxc8yhmWi+669ly
e7PEi/33lqJI44gisu0BMresxJcPA3eFWje+Dzw/7N/tlLJzbWt3psRqB9u/JwVH
LD0YvXraZYvaRNzeGUfbXTrvmouhLcl15zAE8RFJBTgGJbpILviJ9NfUMOIO7Yor
BBKEWlylCm/4x5iOdVb17gFCi7uERiahbxNg3+hltAQuMvEdrhhWXp1N7esTRvkf
D1o0qR5C2k2jyc9LQNqfiGWDEOgTCt1DCdhpo2F/EtF5kSerp6s=
=Ai21
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2024-10-22
1) Fix routing behavior that relies on L4 information
for xfrm encapsulated packets.
From Eyal Birger.
2) Remove leftovers of pernet policy_inexact lists.
From Florian Westphal.
3) Validate new SA's prefixlen when the selector family is
not set from userspace.
From Sabrina Dubroca.
4) Fix a kernel-infoleak when dumping an auth algorithm.
From Petr Vaganov.
Please pull or let me know if there are problems.
ipsec-2024-10-22
* tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix one more kernel-infoleak in algo dumping
xfrm: validate new SA's prefixlen using SA family when sel.family is unset
xfrm: policy: remove last remnants of pernet inexact list
xfrm: respect ip protocols rules criteria when performing dst lookups
xfrm: extract dst lookup parameters into a struct
====================
Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
During fuzz testing, the following issue was discovered:
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x598/0x2a30
_copy_to_iter+0x598/0x2a30
__skb_datagram_iter+0x168/0x1060
skb_copy_datagram_iter+0x5b/0x220
netlink_recvmsg+0x362/0x1700
sock_recvmsg+0x2dc/0x390
__sys_recvfrom+0x381/0x6d0
__x64_sys_recvfrom+0x130/0x200
x64_sys_call+0x32c8/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Uninit was stored to memory at:
copy_to_user_state_extra+0xcc1/0x1e00
dump_one_state+0x28c/0x5f0
xfrm_state_walk+0x548/0x11e0
xfrm_dump_sa+0x1e0/0x840
netlink_dump+0x943/0x1c40
__netlink_dump_start+0x746/0xdb0
xfrm_user_rcv_msg+0x429/0xc00
netlink_rcv_skb+0x613/0x780
xfrm_netlink_rcv+0x77/0xc0
netlink_unicast+0xe90/0x1280
netlink_sendmsg+0x126d/0x1490
__sock_sendmsg+0x332/0x3d0
____sys_sendmsg+0x863/0xc30
___sys_sendmsg+0x285/0x3e0
__x64_sys_sendmsg+0x2d6/0x560
x64_sys_call+0x1316/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Uninit was created at:
__kmalloc+0x571/0xd30
attach_auth+0x106/0x3e0
xfrm_add_sa+0x2aa0/0x4230
xfrm_user_rcv_msg+0x832/0xc00
netlink_rcv_skb+0x613/0x780
xfrm_netlink_rcv+0x77/0xc0
netlink_unicast+0xe90/0x1280
netlink_sendmsg+0x126d/0x1490
__sock_sendmsg+0x332/0x3d0
____sys_sendmsg+0x863/0xc30
___sys_sendmsg+0x285/0x3e0
__x64_sys_sendmsg+0x2d6/0x560
x64_sys_call+0x1316/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Bytes 328-379 of 732 are uninitialized
Memory access of size 732 starts at ffff88800e18e000
Data copied to user address 00007ff30f48aff0
CPU: 2 PID: 18167 Comm: syz-executor.0 Not tainted 6.8.11 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Fixes copying of xfrm algorithms where some random
data of the structure fields can end up in userspace.
Padding in structures may be filled with random (possibly sensitve)
data and should never be given directly to user-space.
A similar issue was resolved in the commit
8222d5910d ("xfrm: Zero padding when dumping algos and encap")
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: c7a5899eb2 ("xfrm: redact SA secret with lockdown confidentiality")
Cc: stable@vger.kernel.org
Co-developed-by: Boris Tonofa <b.tonofa@ideco.ru>
Signed-off-by: Boris Tonofa <b.tonofa@ideco.ru>
Signed-off-by: Petr Vaganov <p.vaganov@ideco.ru>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This expands the validation introduced in commit 07bf790895 ("xfrm:
Validate address prefix lengths in the xfrm selector.")
syzbot created an SA with
usersa.sel.family = AF_UNSPEC
usersa.sel.prefixlen_s = 128
usersa.family = AF_INET
Because of the AF_UNSPEC selector, verify_newsa_info doesn't put
limits on prefixlen_{s,d}. But then copy_from_user_state sets
x->sel.family to usersa.family (AF_INET). Do the same conversion in
verify_newsa_info before validating prefixlen_{s,d}, since that's how
prefixlen is going to be used later on.
Reported-by: syzbot+cc39f136925517aed571@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
The series in the "fixes" tag added the ability to consider L4 attributes
in routing rules.
The dst lookup on the outer packet of encapsulated traffic in the xfrm
code was not adapted to this change, thus routing behavior that relies
on L4 information is not respected.
Pass the ip protocol information when performing dst lookups.
Fixes: a25724b05a ("Merge branch 'fib_rules-support-sport-dport-and-proto-match'")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Preparation for adding more fields to dst lookup functions without
changing their signatures.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmbf6xAACgkQrB3Eaf9P
W7eZQA/9HuHTWBg0V43QDT1rjNnKult+uBKYpKrh045outqMs+cU8bsww5ZuIAKx
ktN66OCE67d7XeFttb9UAJUPqQ98RjwjVUOpjRJ5iRDtj2bmn/5VGSYuH7zx5so0
msFs5gkomo2ZZNjcMOSrDVGUoCdlHh1og5L2KN/FgztSA1smDdUBQOWNm1peezbI
eJFt2Q6KCNfzwPthmQte0dmDnK5gWPducereSx03tMuSyUmPML1zrzOFXBXSg09e
dAlDTxbAXZDrXS4Ii0y/FEM2Ugkjg9FXbE1kvM0i05GIc/SGnEBGEcdW5YbmRhOL
4JlLnpiLTmKTaIZ0GdpADv7XZMga6R01AalSPsJz+H7aNAHTKkK+SzQY4YXRucZy
SsASM39oRLzo9Bm4ZZ773Nw83cxBgO/ZixK4KVvCZI/1ftD+9zn72eqk+CeveSeE
ChaXGuWpRdfAOsgozFJNFx/ffK5qzxFKkIeN9KN0QYV/XJuZJ7nD6eQkH9ydgvTI
4cexY+cs4wgfdi9dDkVHPVhCR7mRlfi5r/VL8rtWWnWzR07okKF4rW6dgvx33m60
9MmF1/EdD2uh3CLcBMjNg6qXdC07VeDpFLqWs+utJvSHMuI43uE4FkRQui/J6T9N
RX7zzkFBsPvPpm5GHLx2u/wvnzX1co1Rk9xzbC+J6FEPlm2/0vI=
=ErGl
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-09-10
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clang warns (or errors with CONFIG_WERROR):
net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized]
1286 | if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
| ^~~
net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning
1257 | int dir;
| ^
| = 0
1 error generated.
A recent refactoring removed some assignments to dir because
xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is
used elsewhere in xfrm_hash_rebuild(), including within loops where it
needs to be reloaded for each policy. Restore the assignments before the
first use of dir to fix the warning and ensure dir is properly
initialized throughout the function.
Fixes: 08c2182cf0 ("xfrm: policy: use recently added helper in more places")
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This reverts commit e7cd191f83.
While supporting xfrm interfaces in the packet offload API
is needed, this patch does not do the right thing. There are
more things to do to really support xfrm interfaces, so revert
it for now.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The function returns a value that is used to initialize 'flowi4_tos'
before being passed to the FIB lookup API in the following call chain:
xfrm_bundle_create()
tos = xfrm_get_tos(fl, family)
xfrm_dst_lookup(..., tos, ...)
__xfrm_dst_lookup(..., tos, ...)
xfrm4_dst_lookup(..., tos, ...)
__xfrm4_dst_lookup(..., tos, ...)
fl4->flowi4_tos = tos
__ip_route_output_key(net, fl4)
Unmask the upper DSCP bits so that in the future the output route lookup
could be performed according to the full DSCP value.
Remove IPTOS_RT_MASK since it is no longer used.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The spd is no longer maintained as a linear list.
We also haven't been caching bundles in the xfrm_policy
struct since 2010.
While at it, add kdoc style comments for the xfrm_policy structure
and extend the description of the current rbtree based search to
mention why it needs to search the candidate set.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In packet offload mode, append Security Association (SA) information
to each packet, replicating the crypto offload implementation.
The XFRM_XMIT flag is set to enable packet to be returned immediately
from the validate_xmit_xfrm function, thus aligning with the existing
code path for packet offload mode.
This SA info helps HW offload match packets to their correct security
policies. The XFRM interface ID is included, which is crucial in setups
with multiple XFRM interfaces where source/destination addresses alone
can't pinpoint the right policy.
Signed-off-by: wangfe <wangfe@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
No consumers anymore, remove it. After this, insertion of policies
no longer require list walk of all inexact policies but only those
that are reachable via the candidate sets.
This gives almost linear insertion speeds provided the inserted
policies are for non-overlapping networks.
Before:
Inserted 1000 policies in 70 ms
Inserted 10000 policies in 1155 ms
Inserted 100000 policies in 216848 ms
After:
Inserted 1000 policies in 56 ms
Inserted 10000 policies in 478 ms
Inserted 100000 policies in 4580 ms
Insertion of 1m entries takes about ~40s after this change
on my test vm.
Cc: Noel Kuntze <noel@familie-kuntze.de>
Cc: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
XFRM_MIGRATE still uses the old lookup method:
first check the bydst hash table, then search the list of all the other
policies.
Switch MIGRATE to use the same lookup function as the packetpath.
This is done to remove the last remaining users of the pernet
xfrm.policy_inexact lists with the intent of removing this list.
After this patch, policies are still added to the list on insertion
and they are rehashed as-needed but no single API makes use of these
anymore.
This change is compile tested only.
Cc: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Since commit
6be3b0db6d ("xfrm: policy: add inexact policy search tree infrastructure")
policy lookup no longer walks a list but has a set of candidate lists.
This set has to be searched for the best match.
In case there are several matches, the priority wins.
If the priority is also the same, then the historic behaviour with
a single list was to return the first match (first-in-list).
With introduction of serval lists, this doesn't work and a new
'pos' member was added that reflects the xfrm_policy structs position
in the list.
This value is not exported to userspace and it does not need to be
the 'position in the list', it just needs to make sure that
a->pos < b->pos means that a was added to the lists more recently
than b.
This re-walk is expensive when many inexact policies are in use.
Speed this up: when appending the policy to the end of the walker list,
then just take the ->pos value of the last entry made and add 1.
Add a slowpath version to prevent overflow, if we'd assign UINT_MAX
then iterate the entire list and fix the ordering.
While this speeds up insertion considerably finding the insertion spot
in the inexact list still requires a partial list walk.
This is addressed in followup patches.
Before:
./xfrm_policy_add_speed.sh
Inserted 1000 policies in 72 ms
Inserted 10000 policies in 1540 ms
Inserted 100000 policies in 334780 ms
After:
Inserted 1000 policies in 68 ms
Inserted 10000 policies in 1137 ms
Inserted 100000 policies in 157307 ms
Reported-by: Noel Kuntze <noel@familie-kuntze.de>
Cc: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The original idea to put WARN_ON() on return value from driver code was
to make sure that packet offload doesn't have silent fallback to
SW implementation, like crypto offload has.
In reality, this is not needed as all *swan implementations followed
this request and used explicit configuration style to make sure that
"users will get what they ask".
So instead of forcing drivers to make sure that even their internal flows
don't return -EOPNOTSUPP, let's remove this WARN_ON.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Merge in late fixes to prepare for the 6.11 net-next PR.
Conflicts:
93c3a96c30 ("net: pse-pd: Do not return EOPNOSUPP if config is null")
4cddb0f15e ("net: ethtool: pse-pd: Fix possible null-deref")
30d7b67277 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmaSU/QACgkQrB3Eaf9P
W7etjA/+I8bWTjMCCGFT7AXIisXWQhHbrRuaU6hpROxWUTAyjUuM4qhdXHYUyG6i
2mcg7Ppqn0etEnrvCDJqgWGPonSJuxKRMpRNiB2uRYZAKDK2X7d5gCVVK+xGyuYn
rXjAw3yQ9W6oV8lQvm7GqLYOFL5vj9UA5q8QEhyTxH11HDDRBjlHSgzgWovzGsjO
2qLHSh3wuBuuoWS6jhN5n0pA1mFiKxhzPRRvTV2Q8CEBt+JML0gGd08g0s6tSGMJ
qlEGdTHIkIGi/QsbOoRm14X5gYYrDz1EEATISZTA9/Pbb03MsQfxUp6EUZNZIM4O
/K9XO7LLXOYWXBcI3BDCHCOT1cJPw1WVvYwlwWzu4DpxelPAc+pk2/QZk9wV2cWd
MzScbhHKmZ5GnYnlfQAyOnC5tvQXUBG2OntyXMBGh9seh+H5Lcl1RJAflIwRvBx5
7cnR6HiTmLUlbBxKjSJF+xFPnTucp0J637DkY/ONtAA7qNHnOKh3LWqkIH80q/FI
7Ua0EpgTtzAzN6iR2ujMHusfAjJs4yhMGY5KFGcEHwqS2axYq+mpnaShYzNebzl6
9kOmj6UAVP0tivH2Ahmsz2HaNhZaJ3hXftZeF3zwcoN6XTc3jrQ4JuNyiDcsUdnf
ggyLMZ7VI6Jf38ep8LEnfpqQm5qFTVfto62goWWLlGgr4wsy66c=
=KyYL
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-07-13
1) Support sending NAT keepalives in ESP in UDP states.
Userspace IKE daemon had to do this before, but the
kernel can better keep track of it.
From Eyal Birger.
2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths. Currently, IPsec crypto offload is enabled for GRO
code path only. This patchset support UDP encapsulation for the non
GRO path. From Mike Yu.
* tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet
xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet
xfrm: Allow UDP encapsulation in crypto offload control path
xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path
xfrm: support sending NAT keepalives in ESP in UDP states
====================
Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmaPqSEACgkQrB3Eaf9P
W7eOmQ//YVp6OL+oS5lRzLMvhKLXh42qGbaOPAZl/k0cOACsOnNhubTQHUToIMYt
FXLVCDrXHU3F4JVGdgzwJb+/2wqElP+3Wlw48WCnycAlB8NpFc24qKwZHWzo04Mv
uutWG5oVXXMYsnLEQhsQCMj+rCjDnSJG2bmsQCHS8GFB4PKP/SSGm/H0UFUbYjIE
leZ6rPmqmHf/FShqSmm0VTbXyeLE3bIJQ5zfDLzKW9/nO5h/VyZcZCEzEENF5i2i
bKaEGSNrK4evyj+9j/B8FDdujEfVbNyanTAkChJgx3Wug6rIy1QdsG2xDpPn3zm+
pdDvSLPAjjLHrCr7yPPnHEdtOYBvnvjW035VBG/q7pNZfHUaKcutvQJESiNVjsV0
hqmL8XhKgdT/0dPrevXVSXcLOXT25EkzLoN8W4P3qOY4OSFQPC8V+ELCOhWGlZwB
rKA8/NfEwV2yIlxhEzSYUTaGT3YZVLJsAVuEfR8Y3tq/j7X5G6h4lCKddxNKhLn+
jJroKlKQEHsC7HCMOW9kJijiXWxNjT4cAPRXMSIxf3cL29UwU9zPE1wx1oq1Pr97
FZiGg9IapcK5nKslaim+nwn6PtEJzVzCWtZ5gddtS4qOrZKuveql/B2P1I8EL9S6
LUqOE9gUeQpSdG/M5FqkLJnUE1knHYRZhQw682fA1zvZFj+G9lo=
=xFmH
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2024-07-11
1) Fix esp_output_tail_tcp() on unsupported ESPINTCP.
From Hagar Hemdan.
2) Fix two bugs in the recently introduced SA direction separation.
From Antony Antony.
3) Fix unregister netdevice hang on hardware offload. We had to add another
list where skbs linked to that are unlinked from the lists (deleted)
but not yet freed.
4) Fix netdev reference count imbalance in xfrm_state_find.
From Jianbo Liu.
5) Call xfrm_dev_policy_delete when killingi them on offloaded policies.
Jianbo Liu.
* tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: call xfrm_dev_policy_delete when kill policy
xfrm: fix netdev reference count imbalance
xfrm: Export symbol xfrm_dev_state_delete.
xfrm: Fix unregister netdevice hang on hardware offload.
xfrm: Log input direction mismatch error in one place
xfrm: Fix input error path memory access
net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP
====================
Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If xfrm_input() is called with UDP_ENCAP_ESPINUDP, the packet is
already processed in UDP layer that removes the UDP header.
Therefore, there should be no much difference to treat it as an
ESP packet in the XFRM stack.
Test: Enabled dir=in IPsec crypto offload, and verified IPv4
UDP-encapsulated ESP packets on both wifi/cellular network
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Unblock this limitation so that SAs with encapsulation specified
can be passed to HW drivers. HW drivers can still reject the SA
in their implementation of xdo_dev_state_add if the encapsulation
is not supported.
Test: Verified on Android device
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
IPsec crypt offload supports outbound IPv6 ESP packets, but it doesn't
support inbound IPv6 ESP packets.
This change enables the crypto offload for inbound IPv6 ESP packets
that are not handled through GRO code path. If HW drivers add the
offload information to the skb, the packet will be handled in the
crypto offload rx code path.
Apart from the change in crypto offload rx code path, the change
in xfrm_policy_check is also needed.
Exampe of RX data path:
+-----------+ +-------+
| HW Driver |-->| wlan0 |--------+
+-----------+ +-------+ |
v
+---------------+ +------+
+------>| Network Stack |-->| Apps |
| +---------------+ +------+
| |
| v
+--------+ +------------+
| ipsec1 |<--| XFRM Stack |
+--------+ +------------+
Test: Enabled both in/out IPsec crypto offload, and verified IPv6
ESP packets on Android device on both wifi/cellular network
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
xfrm_policy_kill() is called at different places to delete xfrm
policy. It will call xfrm_pol_put(). But xfrm_dev_policy_delete() is
not called to free the policy offloaded to hardware.
The three commits cited here are to handle this issue by calling
xfrm_dev_policy_delete() outside xfrm_get_policy(). But they didn't
cover all the cases. An example, which is not handled for now, is
xfrm_policy_insert(). It is called when XFRM_MSG_UPDPOLICY request is
received. Old policy is replaced by new one, but the offloaded policy
is not deleted, so driver doesn't have the chance to release hardware
resources.
To resolve this issue for all cases, move xfrm_dev_policy_delete()
into xfrm_policy_kill(), so the offloaded policy can be deleted from
hardware when it is called, which avoids hardware resources leakage.
Fixes: 919e43fad5 ("xfrm: add an interface to offload policy")
Fixes: bf06fcf4be ("xfrm: add missed call to delete offloaded policies")
Fixes: 982c3aca8b ("xfrm: delete offloaded policy")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In cited commit, netdev_tracker_alloc() is called for the newly
allocated xfrm state, but dev_hold() is missed, which causes netdev
reference count imbalance, because netdev_put() is called when the
state is freed in xfrm_dev_state_free(). Fix the issue by replacing
netdev_tracker_alloc() with netdev_hold().
Fixes: f8a70afafc ("xfrm: add TX datapath support for IPsec packet offload mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This fixes a build failure if xfrm_user is build as a module.
Fixes: 07b87f9eea ("xfrm: Fix unregister netdevice hang on hardware offload.")
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add the ability to send out RFC-3948 NAT keepalives from the xfrm stack.
To use, Userspace sets an XFRM_NAT_KEEPALIVE_INTERVAL integer property when
creating XFRM outbound states which denotes the number of seconds between
keepalive messages.
Keepalive messages are sent from a per net delayed work which iterates over
the xfrm states. The logic is guarded by the xfrm state spinlock due to the
xfrm state walk iterator.
Possible future enhancements:
- Adding counters to keep track of sent keepalives.
- deduplicate NAT keepalives between states sharing the same nat keepalive
parameters.
- provisioning hardware offloads for devices capable of implementing this.
- revise xfrm state list to use an rcu list in order to avoid running this
under spinlock.
Suggested-by: Paul Wouters <paul.wouters@aiven.io>
Tested-by: Paul Wouters <paul.wouters@aiven.io>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When offloading xfrm states to hardware, the offloading
device is attached to the skbs secpath. If a skb is free
is deferred, an unregister netdevice hangs because the
netdevice is still refcounted.
Fix this by removing the netdevice from the xfrm states
when the netdevice is unregistered. To find all xfrm states
that need to be cleared we add another list where skbs
linked to that are unlinked from the lists (deleted)
but not yet freed.
Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Previously, the offload data path decrypted the packet before checking
the direction, leading to error logging and packet dropping. However,
dropped packets wouldn't be visible in tcpdump or audit log.
With this fix, the offload path, upon noticing SA direction mismatch,
will pass the packet to the stack without decrypting it. The L3 layer
will then log the error, audit, and drop ESP without decrypting or
decapsulating it.
This also ensures that the slow path records the error and audit log,
making dropped packets visible in tcpdump.
Fixes: 304b44f0d5 ("xfrm: Add dir validation to "in" data path lookup")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Fixes: a87cb3e48e ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfba ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5f ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
All rtnl_link_ops->get_link_net() methods already using dev_net()
are ready. I added READ_ONCE() annotations on others.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmY0mEIACgkQrB3Eaf9P
W7fU7g//bQyydwei/4Vo+cNCPp82k8wL/qhDY3IjN10PfJOSNmeCAcSgkuuHTRSx
g/hoxZEVzLrQT5bt+Sb38JxADFiL787GjdGEUy1gzF7CnDKcGT5KnydYNjDqDVGt
nOv9kAGfWIkMKdNqrhHifddPMWd+ZqvpUcFz5olvqIE2mpNgMwy2i3NID9bNAV31
v5AEvNINa1LKOhX9cEka8iPQXwp+I6yTLqyOd4VciOuFr8dPg0FQqFaYR+OtMsV0
kIxdGTVfmRWaNgq/Tsg4z/2rXEwmEjTWzAhNVGu8o8L3JozXOvbjIrDG7Ws6qB3V
XTFl8ueRMk0UCTlY/QAfip5H7IlAo+H0FUBC45FNP1UhHeWisXT4D5rqAEqQTlZR
bddtuueLZyKclFpXRNi+/8vdDrXhhEzeNINkc52Ef33rUTtZJR8bXrEUKzaYCIuF
ldub0PA3+e5wvwIxq5/Chc/+MIaIHnXBMUmbCJSPnMrupBQtO+i6arPQcbtaBAgS
YyVGTRk9YN0UAjSriIuiViLlgUCMsvsWgfSz9rd0PE54MFBrvcLPeCtPxKZ+sTVG
Y2iSZ8d3ThvsMiQVNU8gj3SlTY1oTvuaijDDGjnR0nWkxV9LMJHCPKfIzsbOKLJe
+ee5hKP4TOFygnV58BkqdGK/LavNpouTIbrM43hgmJ0IX9kSt4o=
=QiGZ
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-05-03
1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support.
This was defined by an early version of an IETF draft
that did not make it to a standard.
2) Introduce direction attribute for xfrm states.
xfrm states have a direction, a stsate can be used
either for input or output packet processing.
Add a direction to xfrm states to make it clear
for what a xfrm state is used.
* tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Restrict SA direction attribute to specific netlink message types
xfrm: Add dir validation to "in" data path lookup
xfrm: Add dir validation to "out" data path lookup
xfrm: Add Direction to the SA in or out
udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support
====================
Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
To avoid lots of small commits, this commit brings together network
changes from (as they appear in MAINTAINERS) LLC, MPTCP, NETROM NETWORK
LAYER, PHONET PROTOCOL, ROSE NETWORK LAYER, RXRPC SOCKETS, SCTP
PROTOCOL, SHARED MEMORY COMMUNICATIONS (SMC), TIPC NETWORK LAYER and
NETWORKING [IPSEC]
* Remove sentinel element from ctl_table structs.
* Replace empty array registration with the register_net_sysctl_sz call
in llc_sysctl_init
* Replace the for loop stop condition that tests for procname == NULL
with one that depends on array size in sctp_sysctl_net_register
* Remove instances where an array element is zeroed out to make it look
like a sentinel in xfrm_sysctl_init. This is not longer needed and is
safe after commit c899710fe7 ("networking: Update to
register_net_sysctl_sz") added the array size to the ctl_table
registration
* Use a table_size variable to keep the value of ARRAY_SIZE
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reject the usage of the SA_DIR attribute in xfrm netlink messages when
it's not applicable. This ensures that SA_DIR is only accepted for
certain message types (NEWSA, UPDSA, and ALLOCSPI)
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Introduces validation for the x->dir attribute within the XFRM input
data lookup path. If the configured direction does not match the
expected direction, input, increment the XfrmInStateDirError counter
and drop the packet to ensure data integrity and correct flow handling.
grep -vw 0 /proc/net/xfrm_stat
XfrmInStateDirError 1
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Introduces validation for the x->dir attribute within the XFRM output
data lookup path. If the configured direction does not match the expected
direction, output, increment the XfrmOutStateDirError counter and drop
the packet to ensure data integrity and correct flow handling.
grep -vw 0 /proc/net/xfrm_stat
XfrmOutPolError 1
XfrmOutStateDirError 1
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch introduces the 'dir' attribute, 'in' or 'out', to the
xfrm_state, SA, enhancing usability by delineating the scope of values
based on direction. An input SA will restrict values pertinent to input,
effectively segregating them from output-related values.
And an output SA will restrict attributes for output. This change aims
to streamline the configuration process and improve the overall
consistency of SA attributes during configuration.
This feature sets the groundwork for future patches, including
the upcoming IP-TFS patch.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Instead of (struct rt6_info *)dst casts, we can use :
#define dst_rt6_info(_ptr) \
container_of_const(_ptr, struct rt6_info, dst)
Some places needed missing const qualifiers :
ip6_confirm_neigh(), ipv6_anycast_destination(),
ipv6_unicast_destination(), has_gateway()
v2: added missing parts (David Ahern)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The software GRO path for esp transport mode uses skb_mac_header_rebuild
prior to re-injecting the packet via the xfrm_napi_dev. This only
copies skb->mac_len bytes of header which may not be sufficient if the
packet contains 802.1Q tags or other VLAN tags. Worse copying only the
initial header will leave a packet marked as being VLAN tagged but
without the corresponding tag leading to mangling when it is later
untagged.
The VLAN tags are important when receiving the decrypted esp transport
mode packet after GRO processing to ensure it is received on the correct
interface.
Therefore record the full mac header length in xfrm*_transport_input for
later use in corresponding xfrm*_transport_finish to copy the entire mac
header when rebuilding the mac header for GRO. The skb->data pointer is
left pointing skb->mac_header bytes after the start of the mac header as
is expected by the network stack and network and transport header
offsets reset to this location.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
To be able to constify instances of struct ctl_tables it is necessary to
remove ways through which non-const versions are exposed from the
sysctl core.
One of these is the ctl_table_arg member of struct ctl_table_header.
Constify this reference as a prerequisite for the full constification of
struct ctl_table instances.
No functional change.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The missing check of x->encap caused to the situation where GSO packets
were created with UDP encapsulation.
As a solution return the encap check for non-offloaded SA.
Fixes: 983a73da1f ("xfrm: Pass UDP encapsulation in TX packet offload")
Closes: https://lore.kernel.org/all/a650221ae500f0c7cf496c61c96c1b103dcb6f67.camel@redhat.com
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmXoQdQACgkQrB3Eaf9P
W7dnTQ//RnTEaOPgTsHzhSwVOfWhsWkHx2xqUAlPNY8W2jrzxGgAIknPzobivvRJ
U2bYPXDocDHUJAHIELUlu+lzATEz8baBN5zK5a+pPx5hXJlf5UI95linNZ5rEIiV
RoxLpicnJqtWn1oMZ8d7Y0CknsLR/f4ruiVApzoifk1JaXC/zX8FcqqKsSPwVlqA
GKy4+f71rNrIE9fbBAqDpmt6RuyRp/5yXPHLBoZlEXfYrYU1JOG8b/HLtGMD0SzV
yHbDcgRPtbkWgAwNO/zxSDKa+PZr7NbVgakDzyHK+TltpU+6cOsajCaSXHWwsTBB
+AebDschYY1H49oQe4bwLbNdGY+4lFvXxtk02sa8eM5a104MWxxTEB1QGAEri6gQ
biAh3xTTbDpls26qkm97iZ6LlDE6pVIzF744buOYedvR8gjjoLt1z1PId05wMYGB
A/4P6WkM8I1CZL++ODVfT8qR2N6lwFAQ6AM/eqHLvc6QpZ5Hm3lQAdLz1tK6QlCP
MIV9uuNz8dFPrX1QifmLGojjdedB+4ASglxffOaoqRpHnMgHgzWTOux8tSFpuJGu
mIYO/Dv5sHMdH8Jm+xXX1549bRzR+KGuqjXPxOSiO1jbOb5VC5ZDd3LVWb7fpDid
K4eaU4Bo4R3eiCo1Bapt/1jKV1YFuyBKqTvObCDslVuN3Fu9d7I=
=e4aa
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Introduce forwarding of ICMP Error messages. That is specified
in RFC 4301 but was never implemented. From Antony Antony.
2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init()
and xfrm_policy_init(). From Kunwu Chan.
3) Do not allocate stats in the xfrm interface driver, this can be done
on net core now. From Breno Leitao.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev_max_backlog is used in rx fat path.
Move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmXoPHUACgkQrB3Eaf9P
W7ehmxAAoemzwIDP0wcDi7U68Za7wBC7CbV6WoVmDNRsO+BwnqlCtd7+B9hi0Qd0
h+KCVYw5EbUJbsHcuefj/QMNO46ueZZLswRIEMKlkZOHdC8TTGzjYmjLnkOHKTCm
wpJ9QSrnBoy3MUcWbZCJh4BZXsTftbu1fHRWy9GdBERXYfHqWdQCq/ZMAgv3IwLF
KwZahoGZwCDkmWOpshbBRGj0lnONzZ3mW//bN5EB71rSi33gPEtABYBSw9E9sdMw
uZg/xRnHMhS5CQHRnFEqVUiqu3wDJYgs3kQIDFhC1T2w94GBF/R+HzzFiBKLzxr1
Dk17avoNexSYRThJfCk6fMbXT4GVaUSKSG6KI4CRLna/wAIb4QEVDPEdk1ybOxRy
eoUfo7GXkVqhJpqnOX0Sl3262DnhQ/syhmv3sWXmoSpa630mDuFleVuVJE81dtMu
jSfaXY7BNpEwTwj8kzabKq5cLkt4T4dAnXf0ao1ATNCzcFkUjSIWN5ylnOEtdVe/
wEZYp8oc1kPEDU0RC8LzpaEooTQlPceeIAZca7a/lAhnRreGyP9j56Y2oM9YS76o
JPRrznWAdXVa0gX81zGOjZDHWSyLgpJZ9z5MaAdquP9uhXS4buu+pxxoUcFEfLJd
FV+yY9j3o3l0HLL6j/y6EZy/QzWPV+K9NAhokNL/WTwuOZ2Lq5Y=
=FnLp
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2024-03-06
1) Clear the ECN bits flowi4_tos in decode_session4().
This was already fixed but the bug was reintroduced
when decode_session4() switched to us the flow dissector.
From Guillaume Nault.
2) Fix UDP encapsulation in the TX path with packet offload mode.
From Leon Romanovsky,
3) Avoid clang fortify warning in copy_to_user_tmpl().
From Nathan Chancellor.
4) Fix inter address family tunnel in packet offload mode.
From Mike Yu.
* tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: set skb control buffer based on packet offload as well
xfrm: fix xfrm child route lookup for packet offload
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
xfrm: Pass UDP encapsulation in TX packet offload
xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
====================
Link: https://lore.kernel.org/r/20240306100438.3953516-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In packet offload, packets are not encrypted in XFRM stack, so
the next network layer which the packets will be forwarded to
should depend on where the packet came from (either xfrm4_output
or xfrm6_output) rather than the matched SA's family type.
Test: verified IPv6-in-IPv4 packets on Android device with
IPsec packet offload enabled
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In current code, xfrm_bundle_create() always uses the matched
SA's family type to look up a xfrm child route for the skb.
The route returned by xfrm_dst_lookup() will eventually be
used in xfrm_output_resume() (skb_dst(skb)->ops->local_out()).
If packet offload is used, the above behavior can lead to
calling ip_local_out() for an IPv6 packet or calling
ip6_local_out() for an IPv4 packet, which is likely to fail.
This change fixes the behavior by checking if the matched SA
has packet offload enabled. If not, keep the same behavior;
if yes, use the matched SP's family type for the lookup.
Test: verified IPv6-in-IPv4 packets on Android device with
IPsec packet offload enabled
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI
g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg
D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg=
=mTAj
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-02-29
We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
The main changes are:
1) Extend the BPF verifier to enable static subprog calls in spin lock
critical sections, from Kumar Kartikeya Dwivedi.
2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
in BPF global subprogs, from Andrii Nakryiko.
3) Larger batch of riscv BPF JIT improvements and enabling inlining
of the bpf_kptr_xchg() for RV64, from Pu Lehui.
4) Allow skeleton users to change the values of the fields in struct_ops
maps at runtime, from Kui-Feng Lee.
5) Extend the verifier's capabilities of tracking scalars when they
are spilled to stack, especially when the spill or fill is narrowing,
from Maxim Mikityanskiy & Eduard Zingerman.
6) Various BPF selftest improvements to fix errors under gcc BPF backend,
from Jose E. Marchesi.
7) Avoid module loading failure when the module trying to register
a struct_ops has its BTF section stripped, from Geliang Tang.
8) Annotate all kfuncs in .BTF_ids section which eventually allows
for automatic kfunc prototype generation from bpftool, from Daniel Xu.
9) Several updates to the instruction-set.rst IETF standardization
document, from Dave Thaler.
10) Shrink the size of struct bpf_map resp. bpf_array,
from Alexei Starovoitov.
11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
from Benjamin Tissoires.
12) Fix bpftool to be more portable to musl libc by using POSIX's
basename(), from Arnaldo Carvalho de Melo.
13) Add libbpf support to gcc in CORE macro definitions,
from Cupertino Miranda.
14) Remove a duplicate type check in perf_event_bpf_event,
from Florian Lehner.
15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
with notrace correctly, from Yonghong Song.
16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
array to fix build warnings, from Kees Cook.
17) Fix resolve_btfids cross-compilation to non host-native endianness,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
selftests/bpf: Test if shadow types work correctly.
bpftool: Add an example for struct_ops map and shadow type.
bpftool: Generated shadow variables for struct_ops maps.
libbpf: Convert st_ops->data to shadow type.
libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
bpf, arm64: use bpf_prog_pack for memory management
arm64: patching: implement text_poke API
bpf, arm64: support exceptions
arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
bpf: add is_async_callback_calling_insn() helper
bpf: introduce in_sleepable() helper
bpf: allow more maps in sleepable bpf programs
selftests/bpf: Test case for lacking CFI stub functions.
bpf: Check cfi_stubs before registering a struct_ops type.
bpf: Clarify batch lookup/lookup_and_delete semantics
bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
bpf, docs: Fix typos in instruction-set.rst
selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
bpf: Shrink size of struct bpf_map/bpf_array.
...
====================
Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the xfrm driver and leverage the network
core allocation.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a couple recent changes in LLVM, there is a warning (or error with
CONFIG_WERROR=y or W=e) from the compile time fortify source routines,
specifically the memset() in copy_to_user_tmpl().
In file included from net/xfrm/xfrm_user.c:14:
...
include/linux/fortify-string.h:438:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
438 | __write_overflow_field(p_size_field, size);
| ^
1 error generated.
While ->xfrm_nr has been validated against XFRM_MAX_DEPTH when its value
is first assigned in copy_templates() by calling validate_tmpl() first
(so there should not be any issue in practice), LLVM/clang cannot really
deduce that across the boundaries of these functions. Without that
knowledge, it cannot assume that the loop stops before i is greater than
XFRM_MAX_DEPTH, which would indeed result a stack buffer overflow in the
memset().
To make the bounds of ->xfrm_nr clear to the compiler and add additional
defense in case copy_to_user_tmpl() is ever used in a path where
->xfrm_nr has not been properly validated against XFRM_MAX_DEPTH first,
add an explicit bound check and early return, which clears up the
warning.
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1985
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the XFRM interface drivers.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit 0a31bd5f2b ("KMEM_CACHE(): simplify slab cache creation")
introduces a new macro.
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
exit_batch_rtnl() is called while RTNL is held,
and devices to be unregistered can be queued in the dev_kill_list.
This saves one rtnl_lock()/rtnl_unlock() pair per netns
and one unregister_netdevice_many() call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20240206144313.2050392-17-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Iterate over all SAs in order to fill global IPsec statistics.
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In order to allow drivers to fill all statistics, change the name
of xdo_dev_state_update_curlft to be xdo_dev_state_update_stats.
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit marks kfuncs as such inside the .BTF_ids section. The upshot
of these annotations is that we'll be able to automatically generate
kfunc prototypes for downstream users. The process is as follows:
1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs
2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for
each function inside BTF_KFUNCS sets
3. At runtime, vmlinux or module BTF is made available in sysfs
4. At runtime, bpftool (or similar) can look at provided BTF and
generate appropriate prototypes for functions with "bpf_kfunc" tag
To ensure future kfunc are similarly tagged, we now also return error
inside kfunc registration for untagged kfuncs. For vmlinux kfuncs,
we also WARN(), as initcall machinery does not handle errors.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit aligns with RFC 4301, Section 6, and addresses the
requirement to forward unauthenticated ICMP error messages that do not
match any xfrm policies. It utilizes the ICMP payload as an skb and
performs a reverse lookup. If a policy match is found, forward
the packet.
The ICMP payload typically contains a partial IP packet that is likely
responsible for the error message.
The following error types will be forwarded:
- IPv4 ICMP error types: ICMP_DEST_UNREACH & ICMP_TIME_EXCEEDED
- IPv6 ICMPv6 error types: ICMPV6_DEST_UNREACH, ICMPV6_PKT_TOOBIG,
ICMPV6_TIME_EXCEED
To implement this feature, a reverse lookup has been added to the xfrm
forward path, making use of the ICMP payload as the skb.
To enable this functionality from user space, the XFRM_POLICY_ICMP flag
should be added to the outgoing and forward policies, and the
XFRM_STATE_ICMP flag should be set on incoming states.
e.g.
ip xfrm policy add flag icmp tmpl
ip xfrm policy
src 192.0.2.0/24 dst 192.0.1.0/25
dir out priority 2084302 ptype main flag icmp
ip xfrm state add ...flag icmp
ip xfrm state
root@west:~#ip x s
src 192.1.2.23 dst 192.1.2.45
proto esp spi 0xa7b76872 reqid 16389 mode tunnel
replay-window 32 flag icmp af-unspec
Changes since v5:
- fix return values bool->int, feedback from Steffen
Changes since v4:
- split the series to only ICMP erorr forwarding
Changes since v3: no code chage
- add missing white spaces detected by checkpatch.pl
Changes since v2: reviewed by Steffen Klassert
- user consume_skb instead of kfree_skb for the inner skb
- fixed newskb leaks in error paths
- free the newskb once inner flow is decoded with change due to
commit 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
- if xfrm_decode_session_reverse() on inner payload fails ignore.
do not increment error counter
Changes since v1:
- Move IPv6 variable declaration inside IS_ENABLED(CONFIG_IPV6)
Changes since RFC:
- Fix calculation of ICMPv6 header length
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit 23e7b1bfed ("xfrm: Don't accidentally set RTO_ONLINK in
decode_session4()") fixed a problem where decode_session4() could
erroneously set the RTO_ONLINK flag for IPv4 route lookups. This
problem was reintroduced when decode_session4() was modified to
use the flow dissector.
Fix this by clearing again the two low order bits of ->flowi4_tos.
Found by code inspection, compile tested only.
Fixes: 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This commit adds an unstable kfunc helper to access internal xfrm_state
associated with an SA. This is intended to be used for the upcoming
IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
words: for custom software RSS.
That being said, the function that this kfunc wraps is fairly generic
and used for a lot of xfrm tasks. I'm sure people will find uses
elsewhere over time.
This commit also adds a corresponding bpf_xdp_xfrm_state_release() kfunc
to release the refcnt acquired by bpf_xdp_get_xfrm_state(). The verifier
will require that all acquired xfrm_state's are released.
Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/a29699c42f5fad456b875c98dd11c6afc3ffb707.1702593901.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp: fix usec timestamps with TCP fastopen
- tcp_sigpool: fix some off by one bugs
- tcp: fix possible out-of-bounds reads in tcp_hash_fail()
- tcp: fix SYN option room calculation for TCP-AO
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmVNRnAACgkQMUZtbf5S
IrsYaA/+IUoYi96/oLtvvrET6HIbXeMaLKef0UlytEicQKy8h5EWlhcTZPhQEY0g
dtaKOemQsO0dQTma4eQBiBDHeCeSkitgD9p7fh0i+//QFYWSFqHrBiF2mlToc/ZQ
T1p4BlVL7D2Xsr1Lki93zk+EhFGEy2KroYgrWbZc9TWE5ap9PtSVF9eqeHAVCmZ7
ocre/eo4pqUM9rAHIAyhoL+0xtVQ59dBevbJC0qYcmflhafr82Gtdveo6pBBKuYm
GhwbRrAXER3Neav9c6NHqat4zsMwGpC27SiN9dYWm6dlkeS9U9t2PUu71OkJGVfw
VaSE+utkC/WmzGbuiUIjqQLBrRe372ItHCr78BfSRMshS+RBTHtoK7njeH8Iv67E
RsMeCyVNj9dtGlOQG5JAv8IoCQ1WbMw9B36Yzw3ip/MmDX/ntXz7Dcr4ZMZ6VURS
CHhHFZPnmMykMXkT6SIlxeAg2r8ELtESzkvLimdTVFPAlk3cPkibKJbh3F/tEqXS
PDb3y0uoEgRQBAsWXXx9FQEvv9rTL6YrzbMhmJBIIEoNxppQYQ7FZBJX9utAVp5B
1GdyqhR6IRTaKb9cMRj/K1xPwm2KgCw9xj9pjKdAA7QUMslXbFp8blv1rIkFGshg
hiNXmPcI8wo0j+0lZYktEcIERL5y6c8BgK2NnPU6RULua96tuQ4=
=k6Wk
-----END PGP SIGNATURE-----
Merge tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp:
- fix usec timestamps with TCP fastopen
- fix possible out-of-bounds reads in tcp_hash_fail()
- fix SYN option room calculation for TCP-AO
- tcp_sigpool: fix some off by one bugs
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
API:
- Add virtual-address based lskcipher interface.
- Optimise ahash/shash performance in light of costly indirect calls.
- Remove ahash alignmask attribute.
Algorithms:
- Improve AES/XTS performance of 6-way unrolling for ppc.
- Remove some uses of obsolete algorithms (md4, md5, sha1).
- Add FIPS 202 SHA-3 support in pkcs1pad.
- Add fast path for single-page messages in adiantum.
- Remove zlib-deflate.
Drivers:
- Add support for S4 in meson RNG driver.
- Add STM32MP13x support in stm32.
- Add hwrng interface support in qcom-rng.
- Add support for deflate algorithm in hisilicon/zip.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmVB3vgACgkQxycdCkmx
i6dsOBAAykbnX8BpnpnOXYywE9ZWrl98rAk51MK0N9olZNfg78zRPIv7fFxFdC20
SDJrDSNPmn0Qvaa5e0EfoAdklsm0k2GkXL/BwPKMKWUsyIoJVYI3WrBMnjBy9xMp
yfME+h0bKoXJCZKnYkIUSGUejmUPSyRlEylrXoFlH/VWYwAaii/x9zwreQoF+0LR
KI24A1q8AYs6Dw9HSfndaAub9GOzrqKYs6fSaMG+77Y4UC5aoi5J9Bp2G3uVyHay
x/0bZtIxKXS9wn+LeG/3GspX23x/I5VwBOdAoMigrYmAIaIg5qgyMszudltTAs4R
zF1Kh7WsnM5+vpnBSeigzo+/GGOU3QTz8y3tBTg+3ZR7GWGOwQLiizhOYqCyOfAH
pIm6c++sZw/OOHiL69Nt4HeLKzGNYYWk3s4X/B/6cqoouPfOsfBaQobZNx9zfy7q
ZNEvSVBjrFX/L6wDSotny1LTWLUNjHbmLaMV5uQZ/SQKEtv19fp2Dl7SsLkHH+3v
ldOAwfoJR6QcSwz3Ez02TUAvQhtP172Hnxi7u44eiZu2aUboLhCFr7aEU6kVdBCx
1rIRVHD1oqlOEDRwPRXzhF3I8R4QDORJIxZ6UUhg7yueuI+XCGDsBNC+LqBrBmSR
IbdjqmSDUBhJyM5yMnt1VFYhqKQ/ZzwZ3JQviwW76Es9pwEIolM=
=IZmR
-----END PGP SIGNATURE-----
Merge tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add virtual-address based lskcipher interface
- Optimise ahash/shash performance in light of costly indirect calls
- Remove ahash alignmask attribute
Algorithms:
- Improve AES/XTS performance of 6-way unrolling for ppc
- Remove some uses of obsolete algorithms (md4, md5, sha1)
- Add FIPS 202 SHA-3 support in pkcs1pad
- Add fast path for single-page messages in adiantum
- Remove zlib-deflate
Drivers:
- Add support for S4 in meson RNG driver
- Add STM32MP13x support in stm32
- Add hwrng interface support in qcom-rng
- Add support for deflate algorithm in hisilicon/zip"
* tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits)
crypto: adiantum - flush destination page before unmapping
crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place
Documentation/module-signing.txt: bring up to date
module: enable automatic module signing with FIPS 202 SHA-3
crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures
crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support
crypto: FIPS 202 SHA-3 register in hash info for IMA
x509: Add OIDs for FIPS 202 SHA-3 hash and signatures
crypto: ahash - optimize performance when wrapping shash
crypto: ahash - check for shash type instead of not ahash type
crypto: hash - move "ahash wrapping shash" functions to ahash.c
crypto: talitos - stop using crypto_ahash::init
crypto: chelsio - stop using crypto_ahash::init
crypto: ahash - improve file comment
crypto: ahash - remove struct ahash_request_priv
crypto: ahash - remove crypto_ahash_alignmask
crypto: gcm - stop using alignmask of ahash
crypto: chacha20poly1305 - stop using alignmask of ahash
crypto: ccm - stop using alignmask of ahash
net: ipv6: stop checking crypto_ahash_alignmask
...
BPF kfuncs are meant to be called from BPF programs. Accordingly, most
kfuncs are not called from anywhere in the kernel, which the
-Wmissing-prototypes warning is unhappy about. We've peppered
__diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are
defined in the codebase to suppress this warning.
This patch adds two macros meant to bound one or many kfunc definitions.
All existing kfunc definitions which use these __diag calls to suppress
-Wmissing-prototypes are migrated to use the newly-introduced macros.
A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the
__bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier
version of this patch [0] and another recent mailing list thread [1].
In the future we might need to ignore different warnings or do other
kfunc-specific things. This change will make it easier to make such
modifications for all kfunc defs.
[0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/
[1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmU8x1wACgkQrB3Eaf9P
W7eP7A//QaUgDJXR5ZrkU9SPp5GXufFew7Zq7F4ODQNR+2l/llNkvfVyp0W0va4D
Keu1pEGTLfis2pc3nR1zezTJU+ZTk0Y+AshqasvYE5uSCcn8BNLaaV+4AmVEHtp5
i3C+guevUnQ66IXhVu/sdPNduDUhMj/tSr3y9pT30oSINT4nsYY2Z8VZezrzdWrr
j+rRuqtrg0avuEkbbuXT5zS396ngEsrHd6RKQVN/eST0U787Xb2D+8hobvW6I9At
tWTa8la/qSTP/ez/m7Z/c6YDFECzRUPrY7S9dW7e3DOpJmAE48RZj/odE9wDZJKW
pu8EaCGGasDnYA3jWD8L65+/Mold9PbUc1m6HBGHYdCFR6UKpD7Eahl6kZ+p/9Wn
TjaVFqBKZkB38H5Opedr8V8vQUR/3dQ8MON+PscnG1SI1ZRGlZLc3+gS3c1zOIVC
iJG2GUaKTySmBEOL6YZbAzKA9nD4JdWMObZvxM8CwQ0q5AWR2Xzv187IVLF+NWMx
aEyyrIpy7BBAEoWjcXmuBceCcdUdsDmiIWplSUbEdTaSJfjFvxezK0GhiPCibUr+
FHvexPF1kuQAaNo+VNVLL5170RmxD1PoIcH3YIy1MLfv7sk+P3nd5EDmXAolrpXh
hFg247odTiRehOF/y3VEUnHTHErakM39jFzkwf43elVlwlnMWmY=
=slHB
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2023-10-28
1) Remove unused function declarations of xfrm4_extract_input and
xfrm6_extract_input. From Yue Haibing.
2) Annotate struct xfrm_sec_ctx with __counted_by.
From Kees Cook.
3) Support GRO decapsulation for ESP in UDP encapsulation.
From Antony Antony et all.
4) Replace the xfrm session decode with flow dissector.
From Florian Westphal.
5) Fix a use after free in __xfrm6_udp_encap_rcv.
6) Fix the layer 4 flowi decoding.
From Florian Westphal.
* tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: fix layer 4 flowi decoding
xfrm Fix use after free in __xfrm6_udp_encap_rcv.
xfrm: policy: replace session decode with flow dissector
xfrm: move mark and oif flowi decode into common code
xfrm: pass struct net to xfrm_decode_session wrappers
xfrm: Support GRO for IPv6 ESP in UDP encapsulation
xfrm: Support GRO for IPv4 ESP in UDP encapsulation
xfrm: Use the XFRM_GRO to indicate a GRO call on input
xfrm: Annotate struct xfrm_sec_ctx with __counted_by
xfrm: Remove unused function declarations
====================
Link: https://lore.kernel.org/r/20231028084328.3119236-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit shipped with two bugs:
fl4->fl4_icmp_type = flkeys->icmp.type;
fl4->fl4_icmp_type = flkeys->icmp.code;
~~~~ should have been "code".
But the more severe bug is that I got fooled by flowi member defines:
fl4_icmp_type, fl4_gre_key and fl4_dport share the same union/address.
Fix typo and make gre/icmp key setting depend on the l4 protocol.
Fixes: 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
Reported-and-tested-by: Antony Antony <antony@phenome.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmUuRZ4ACgkQrB3Eaf9P
W7e7eg//fgQux/sJoq2Qf7T8cvVt3NSpOLXB43NRsIb9nUyMNN/cYTtypYM/+0FM
f0zxmwtUkj9Mx/IL2nNzK/h8lMtJeKfy14tt8lAX2ux5oJltCcTiLgdp3C4HWxOh
9DrXMGIryr0apedkcStSzhRoJBd8giFViAQZE8NwO5EKG8WLJNVvw0YzlNgM0WOk
5y/sN1uXeUmUCYDkOGcdt6yIyV+GIo/nEg/XOY8LkaQDzKveDypydj0dPGL/Dj8U
MhczTCf1WdbTHh0dmITIdX/yZ8/bPNfV3EzAtAaYgceVh1DSq8/F9buRXz2oxxvh
r8Igv+640+SoWEtIsVoXHx7KIZ7LckasrJv1IoKRXGVV25LjnR+bxGVQNyUjdd6+
Cb11Q/m66fp/k7B+++b6eFVlKvr9O4jBogb3kfGpqMiwm6LNM+CJicdAfM8/GEgM
ejnVNSEaChhdgGvrXVbKhI2ACatwbPrt6d4PN0d9fpUbZJnuBZl4QHElLNSBznjO
xJUF8LvPjFGx6vj3YFnBg5f3uNH35nR2jQxPsx+yWdBEFqbxBt7yKr/pftwP1gxD
ZLHF8JbOT4J+96ClDRFnlWFdY+Phq0fg5S5WhPhNGQ4rj9rsjAS8ZMcHWZeA7iZ2
0w5P/DlFmmnw6CC+Xr+wQJlgB1tZ2sdC8nAK6gxQJ3eFVv4wnhk=
=yHtO
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2023-10-17
1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert.
From Dong Chenchen.
2) Fix data-races in the xfrm interfaces dev->stats fields.
From Eric Dumazet.
3) Fix a data-race in xfrm_gen_index.
From Eric Dumazet.
4) Fix an inet6_dev refcount underflow.
From Zhang Changzhong.
5) Check the return value of pskb_trim in esp_remove_trailer
for esp4 and esp6. From Ma Ke.
6) Fix a data-race in xfrm_lookup_with_ifid.
From Eric Dumazet.
* tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix a data-race in xfrm_lookup_with_ifid()
net: ipv4: fix return value check in esp_remove_trailer
net: ipv6: fix return value check in esp_remove_trailer
xfrm6: fix inet6_dev refcount underflow problem
xfrm: fix a data-race in xfrm_gen_index()
xfrm: interface: use DEV_STATS_INC()
net: xfrm: skip policies marked as dead while reinserting policies
====================
Link: https://lore.kernel.org/r/20231017083723.1364940-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>