2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

460 Commits

Author SHA1 Message Date
Florian Westphal
7613b92b1a xfrm: remove gso_segment indirection from xfrm_mode
These functions are small and we only have versions for tunnel
and transport mode for ipv4 and ipv6 respectively.

Just place the 'transport or tunnel' conditional in the protocol
specific function instead of using an indirection.

Before:
    3226       12       0     3238   net/ipv4/esp4_offload.o
    7004      492       0     7496   net/ipv4/ip_vti.o
    3339       12       0     3351   net/ipv6/esp6_offload.o
   11294      460       0    11754   net/ipv6/ip6_vti.o
    1180       72       0     1252   net/ipv4/xfrm4_mode_beet.o
     428       48       0      476   net/ipv4/xfrm4_mode_transport.o
    1271       48       0     1319   net/ipv4/xfrm4_mode_tunnel.o
    1083       60       0     1143   net/ipv6/xfrm6_mode_beet.o
     172       48       0      220   net/ipv6/xfrm6_mode_ro.o
     429       48       0      477   net/ipv6/xfrm6_mode_transport.o
    1164       48       0     1212   net/ipv6/xfrm6_mode_tunnel.o
15730428  6937008 4046908 26714344   vmlinux

After:
    3461       12       0     3473   net/ipv4/esp4_offload.o
    7000      492       0     7492   net/ipv4/ip_vti.o
    3574       12       0     3586   net/ipv6/esp6_offload.o
   11295      460       0    11755   net/ipv6/ip6_vti.o
    1180       64       0     1244   net/ipv4/xfrm4_mode_beet.o
     171       40       0      211   net/ipv4/xfrm4_mode_transport.o
    1163       40       0     1203   net/ipv4/xfrm4_mode_tunnel.o
    1083       52       0     1135   net/ipv6/xfrm6_mode_beet.o
     172       40       0      212   net/ipv6/xfrm6_mode_ro.o
     172       40       0      212   net/ipv6/xfrm6_mode_transport.o
    1056       40       0     1096   net/ipv6/xfrm6_mode_tunnel.o
15730424  6937008 4046908 26714340   vmlinux

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08 09:14:47 +02:00
Florian Westphal
303c5fab12 xfrm: remove xmit indirection from xfrm_mode
There are only two versions (tunnel and transport). The ip/ipv6 versions
are only differ in sizeof(iphdr) vs ipv6hdr.

Place this in the core and use x->outer_mode->encap type to call the
correct adjustment helper.

Before:
   text   data    bss     dec      filename
15730311  6937008 4046908 26714227 vmlinux

After:
15730428  6937008 4046908 26714344 vmlinux

(about 117 byte increase)

v2: use family from x->outer_mode, not inner

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08 09:14:34 +02:00
Florian Westphal
0c620e97b3 xfrm: remove output indirection from xfrm_mode
Same is input indirection.  Only exception: we need to export
xfrm_outer_mode_output for pktgen.

Increases size of vmlinux by about 163 byte:
Before:
   text    data     bss     dec      filename
15730208  6936948 4046908 26714064   vmlinux

After:
15730311  6937008 4046908 26714227   vmlinux

xfrm_inner_extract_output has no more external callers, make it static.

v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
    add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
    add WARN_ON_ONCE for 'call AF_INET6 related output function, but
    CONFIG_IPV6=n' case.
    make xfrm_inner_extract_output static

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08 09:14:28 +02:00
Florian Westphal
c2d305e510 xfrm: remove input indirection from xfrm_mode
No need for any indirection or abstraction here, both functions
are pretty much the same and quite small, they also have no external
dependencies.

xfrm_prepare_input can then be made static.

With allmodconfig build, size increase of vmlinux is 25 byte:

Before:
   text   data     bss     dec      filename
15730207  6936924 4046908 26714039  vmlinux

After:
15730208  6936948 4046908 26714064 vmlinux

v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca)
    change copied comment to refer to transport and network header,
    not skb->{h,nh}, which don't exist anymore. (Sabrina)
    make xfrm_prepare_input static (Eyal Birger)

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08 09:14:21 +02:00
Florian Westphal
b262a69582 xfrm: place af number into xfrm_mode struct
This will be useful to know if we're supposed to decode ipv4 or ipv6.

While at it, make the unregister function return void, all module_exit
functions did just BUG(); there is never a point in doing error checks
if there is no way to handle such error.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08 09:13:46 +02:00
Martin Willi
025c65e119 xfrm: Honor original L3 slave device in xfrmi policy lookup
If an xfrmi is associated to a vrf layer 3 master device,
xfrm_policy_check() fails after traffic decapsulation. The input
interface is replaced by the layer 3 master device, and hence
xfrmi_decode_session() can't match the xfrmi anymore to satisfy
policy checking.

Extend ingress xfrmi lookup to honor the original layer 3 slave
device, allowing xfrm interfaces to operate within a vrf domain.

Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-27 16:14:05 +01:00
Cong Wang
dbb2483b2a xfrm: clean up xfrm protocol checks
In commit 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().

And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.

This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.

Fixes: 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-26 08:35:36 +01:00
Thomas Gleixner
671422b220 xfrm: Replace hrtimer tasklet with softirq hrtimer
Switch the timer to HRTIMER_MODE_SOFT, which executed the timer
callback in softirq context and remove the hrtimer_tasklet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lkml.kernel.org/r/20190301224821.29843-3-bigeasy@linutronix.de
2019-03-22 14:35:31 +01:00
Cong Wang
f75a2804da xfrm: destroy xfrm_state synchronously on net exit path
xfrm_state_put() moves struct xfrm_state to the GC list
and schedules the GC work to clean it up. On net exit call
path, xfrm_state_flush() is called to clean up and
xfrm_flush_gc() is called to wait for the GC work to complete
before exit.

However, this doesn't work because one of the ->destructor(),
ipcomp_destroy(), schedules the same GC work again inside
the GC work. It is hard to wait for such a nested async
callback. This is also why syzbot still reports the following
warning:

 WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
 ...
  ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
  cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
  process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
  worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
  kthread+0x357/0x430 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

In fact, it is perfectly fine to bypass GC and destroy xfrm_state
synchronously on net exit call path, because it is in process context
and doesn't need a work struct to do any blocking work.

This patch introduces xfrm_state_put_sync() which simply bypasses
GC, and lets its callers to decide whether to use this synchronous
version. On net exit path, xfrm_state_fini() and
xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
blocking, it can use xfrm_state_put_sync() directly too.

Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
reflect this change.

Fixes: b48c05ab5d ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-02-05 06:29:20 +01:00
David S. Miller
2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Florian Westphal
4165079ba3 net: switch secpath to use skb extension infrastructure
Remove skb->sp and allocate secpath storage via extension
infrastructure.  This also reduces sk_buff by 8 bytes on x86_64.

Total size of allyesconfig kernel is reduced slightly, as there is
less inlined code (one conditional atomic op instead of two on
skb_clone).

No differences in throughput in following ipsec performance tests:
- transport mode with aes on 10GB link
- tunnel mode between two network namespaces with aes and null cipher

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:38 -08:00
Florian Westphal
26912e3756 xfrm: use secpath_exist where applicable
Will reduce noise when skb->sp is removed later in this series.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
Florian Westphal
2294be0f11 net: use skb_sec_path helper in more places
skb_sec_path gains 'const' qualifier to avoid
xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type

same reasoning as previous conversions: Won't need to touch these
spots anymore when skb->sp is removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
Florian Westphal
7af8f4ca31 net: move secpath_exist helper to sk_buff.h
Future patch will remove skb->sp pointer.
To reduce noise in those patches, move existing helper to
sk_buff and use it in more places to ease skb->sp replacement later.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
Florian Westphal
0ca64da128 xfrm: change secpath_set to return secpath struct, not error value
It can only return 0 (success) or -ENOMEM.
Change return value to a pointer to secpath struct.

This avoids direct access to skb->sp:

err = secpath_set(skb);
if (!err) ..
skb->sp-> ...

Becomes:
sp = secpath_set(skb)
if (!sp) ..
sp-> ..

This reduces noise in followup patch which is going to remove skb->sp.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
Colin Ian King
77990464bb xfrm: clean an indentation issue, remove a space
Trivial fix to clean up indentation issue, remove an extraneous
space.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-12-10 12:30:40 +01:00
Mathias Krause
4a135e5389 xfrm_user: fix freeing of xfrm states on acquire
Commit 565f0fa902 ("xfrm: use a dedicated slab cache for struct
xfrm_state") moved xfrm state objects to use their own slab cache.
However, it missed to adapt xfrm_user to use this new cache when
freeing xfrm states.

Fix this by introducing and make use of a new helper for freeing
xfrm_state objects.

Fixes: 565f0fa902 ("xfrm: use a dedicated slab cache for struct xfrm_state")
Reported-by: Pan Bian <bianpan2016@163.com>
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-23 07:51:32 +01:00
Florian Westphal
9cf545ebd5 xfrm: policy: store inexact policies in a tree ordered by destination address
This adds inexact lists per destination network, stored in a search tree.

Inexact lookups now return two 'candidate lists', the 'any' policies
('any' destionations), and a list of policies that share same
daddr/prefix.

Next patch will add a second search tree for 'saddr:any' policies
so we can avoid placing those on the 'any:any' list too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-09 11:58:07 +01:00
Florian Westphal
6be3b0db6d xfrm: policy: add inexact policy search tree infrastructure
At this time inexact policies are all searched in-order until the first
match is found.  After removal of the flow cache, this resolution has
to be performed for every packetm resulting in major slowdown when
number of inexact policies is high.

This adds infrastructure to later sort inexact policies into a tree.
This only introduces a single class: any:any.

Next patch will add a search tree to pre-sort policies that
have a fixed daddr/prefixlen, so in this patch the any:any class
will still be used for all policies.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-09 11:57:59 +01:00
Florian Westphal
24969facd7 xfrm: policy: store inexact policies in an rhashtable
Switch packet-path lookups for inexact policies to rhashtable.

In this initial version, we now no longer need to search policies with
non-matching address family and type.

Next patch will add the if_id as well so lookups from the xfrm interface
driver only need to search inexact policies for that device.

Future patches will augment the hlist in each rhash bucket with a tree
and pre-sort policies according to daddr/prefix.

A single rhashtable is used.  In order to avoid a full rhashtable walk on
netns exit, the bins get placed on a pernet list, i.e. we add almost no
cost for network namespaces that had no xfrm policies.

The inexact lists are kept in place, and policies are added to both the
per-rhash-inexact list and a pernet one.

The latter is needed for the control plane to handle migrate -- these
requests do not consider the if_id, so if we'd remove the inexact_list
now we would have to search all hash buckets and then figure
out which matching policy candidate is the most recent one -- this appears
a bit harder than just keeping the 'old' inexact list for this purpose.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-09 11:57:47 +01:00
Linus Torvalds
9a76aba02a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   - Gustavo A. R. Silva keeps working on the implicit switch fallthru
     changes.

   - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
     Luca Coelho.

   - Re-enable ASPM in r8169, from Kai-Heng Feng.

   - Add virtual XFRM interfaces, which avoids all of the limitations of
     existing IPSEC tunnels. From Steffen Klassert.

   - Convert GRO over to use a hash table, so that when we have many
     flows active we don't traverse a long list during accumluation.

   - Many new self tests for routing, TC, tunnels, etc. Too many
     contributors to mention them all, but I'm really happy to keep
     seeing this stuff.

   - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

   - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

   - Add IPSEC offload support to netdevsim, from Shannon Nelson.

   - Add support for slotting with non-uniform distribution to netem
     packet scheduler, from Yousuk Seung.

   - Add UDP GSO support to mlx5e, from Boris Pismenny.

   - Support offloading of Team LAG in NFP, from John Hurley.

   - Allow to configure TX queue selection based upon RX queue, from
     Amritha Nambiar.

   - Support ethtool ring size configuration in aquantia, from Anton
     Mikaev.

   - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

   - Support list based batching and stack traversal of SKBs, this is
     very exciting work. From Edward Cree.

   - Busyloop optimizations in vhost_net, from Toshiaki Makita.

   - Introduce the ETF qdisc, which allows time based transmissions. IGB
     can offload this in hardware. From Vinicius Costa Gomes.

   - Add parameter support to devlink, from Moshe Shemesh.

   - Several multiplication and division optimizations for BPF JIT in
     nfp driver, from Jiong Wang.

   - Lots of prepatory work to make more of the packet scheduler layer
     lockless, when possible, from Vlad Buslov.

   - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
     Toke Høiland-Jørgensen.

   - Support regions and region snapshots in devlink, from Alex Vesker.

   - Allow to attach XDP programs to both HW and SW at the same time on
     a given device, with initial support in nfp. From Jakub Kicinski.

   - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

   - Use PHYLIB in r8169 driver, from Heiner Kallweit.

   - All sorts of changes to support Spectrum 2 in mlxsw driver, from
     Ido Schimmel.

   - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

   - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
     Maxwell.

   - Support for templates in packet scheduler classifier, from Jiri
     Pirko.

   - IPV6 support in RDS, from Ka-Cheong Poon.

   - Native tproxy support in nf_tables, from Máté Eckl.

   - Maintain IP fragment queue in an rbtree, but optimize properly for
     in-order frags. From Peter Oskolkov.

   - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
  bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
  hv/netvsc: Fix NULL dereference at single queue mode fallback
  net: filter: mark expected switch fall-through
  xen-netfront: fix warn message as irq device name has '/'
  cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
  net: dsa: mv88e6xxx: missing unlock on error path
  rds: fix building with IPV6=m
  inet/connection_sock: prefer _THIS_IP_ to current_text_addr
  net: dsa: mv88e6xxx: bitwise vs logical bug
  net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
  ieee802154: hwsim: using right kind of iteration
  net: hns3: Add vlan filter setting by ethtool command -K
  net: hns3: Set tx ring' tc info when netdev is up
  net: hns3: Remove tx ring BD len register in hns3_enet
  net: hns3: Fix desc num set to default when setting channel
  net: hns3: Fix for phy link issue when using marvell phy driver
  net: hns3: Fix for information of phydev lost problem when down/up
  net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
  net: hns3: Add support for serdes loopback selftest
  bnxt_en: take coredump_record structure off stack
  ...
2018-08-15 15:04:25 -07:00
Benedict Wong
bc56b33404 xfrm: Remove xfrmi interface ID from flowi
In order to remove performance impact of having the extra u32 in every
single flowi, this change removes the flowi_xfrm struct, prefering to
take the if_id as a method parameter where needed.

In the inbound direction, if_id is only needed during the
__xfrm_check_policy() function, and the if_id can be determined at that
point based on the skb. As such, xfrmi_decode_session() is only called
with the skb in __xfrm_check_policy().

In the outbound direction, the only place where if_id is needed is the
xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
directly passed into the xfrm_lookup_with_ifid() call. All existing
callers can still call xfrm_lookup(), which uses a default if_id of 0.

This change does not change any behavior of XFRMIs except for improving
overall system performance via flowi size reduction.

This change has been tested against the Android Kernel Networking Tests:

https://android.googlesource.com/kernel/tests/+/master/net/test

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-20 10:14:41 +02:00
Arnd Bergmann
03dc7a35fc ipv6: xfrm: use 64-bit timestamps
get_seconds() is deprecated because it can overflow on 32-bit
architectures.  For the xfrm_state->lastused member, we treat the data
as a 64-bit number already, so we just need to use the right accessor
that works on both 32-bit and 64-bit machines.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-11 15:26:35 +02:00
Florian Westphal
e4db5b61c5 xfrm: policy: remove pcpu policy cache
Kristian Evensen says:
  In a project I am involved in, we are running ipsec (Strongswan) on
  different mt7621-based routers. Each router is configured as an
  initiator and has around ~30 tunnels to different responders (running
  on misc. devices). Before the flow cache was removed (kernel 4.9), we
  got a combined throughput of around 70Mbit/s for all tunnels on one
  router. However, we recently switched to kernel 4.14 (4.14.48), and
  the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
  drop of around 20%. Reverting the flow cache removal restores, as
  expected, performance levels to that of kernel 4.9.

When pcpu xdst exists, it has to be validated first before it can be
used.

A negative hit thus increases cost vs. no-cache.

As number of tunnels increases, hit rate decreases so this pcpu caching
isn't a viable strategy.

Furthermore, the xdst cache also needs to run with BH off, so when
removing this the bh disable/enable pairs can be removed too.

Kristian tested a 4.14.y backport of this change and reported
increased performance:

  In our tests, the throughput reduction has been reduced from around -20%
  to -5%. We also see that the overall throughput is independent of the
  number of tunnels, while before the throughput was reduced as the number
  of tunnels increased.

Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-06-25 17:46:06 +02:00
Steffen Klassert
f203b76d78 xfrm: Add virtual xfrm interfaces
This patch adds support for virtual xfrm interfaces.
Packets that are routed through such an interface
are guaranteed to be IPsec transformed or dropped.
It is a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind
the interface. This means that we can tunnel IPv4 and
IPv6 through the same interface and support all xfrm
modes (tunnel, transport and beet) on it.

Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Co-developed-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
2018-06-23 16:07:25 +02:00
Steffen Klassert
7e6526404a xfrm: Add a new lookup key to match xfrm interfaces.
This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
2018-06-23 16:07:15 +02:00
Steffen Klassert
9b42c1f179 xfrm: Extend the output_mark to support input direction and masking.
We already support setting an output mark at the xfrm_state,
unfortunately this does not support the input direction and
masking the marks that will be applied to the skb. This change
adds support applying a masked value in both directions.

The existing XFRMA_OUTPUT_MARK number is reused for this purpose
and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.

An additional XFRMA_SET_MARK_MASK attribute is added for setting the
mask. If the attribute mask not provided, it is set to 0xffffffff,
keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.

Co-developed-by: Tobias Brunner <tobias@strongswan.org>
Co-developed-by: Eyal Birger <eyal.birger@gmail.com>
Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2018-06-23 16:06:57 +02:00
Richard Guy Briggs
f7859590d9 audit: eliminate audit_enabled magic number comparison
Remove comparison of audit_enabled to magic numbers outside of audit.

Related: https://github.com/linux-audit/audit-kernel/issues/86

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-06-19 10:43:55 -04:00
Linus Torvalds
8b5c6a3a49 audit/stable-4.18 PR 20180605
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlsXFUEUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQVeRaWujKfIoomg//eRNpc6x9kxTijN670AC2uD0CBTlZ
 2z6mHuJaOhG8bTxjZxQfUBoo6/eZJ2YC1yq6ornGFNzw4sfKsR/j86ujJim2HAmo
 opUhziq3SILGEvjsxfPkREe/wb49jy0AA/WjZqciitB1ig8Hz7xzqi0lpNaEspFh
 QJFB6XXkojWGFGrRzruAVJnPS+pDWoTQR0qafs3JWKnpeinpOdZnl1hPsysAEHt5
 Ag8o4qS/P9xJM0khi7T+jWECmTyT/mtWqEtFcZ0o+JLOgt/EMvNX6DO4ETDiYRD2
 mVChga9x5r78bRgNy2U8IlEWWa76WpcQAEODvhzbijX4RxMAmjsmLE+e+udZSnMZ
 eCITl2f7ExxrL5SwNFC/5h7pAv0RJ+SOC19vcyeV4JDlQNNVjUy/aNKv5baV0aeg
 EmkeobneMWxqHx52aERz8RF1in5pT8gLOYoYnWfNpcDEmjLrwhuZLX2asIzUEqrS
 SoPJ8hxIDCxceHOWIIrz5Dqef7x28Dyi46w3QINC8bSy2RnR/H3q40DRegvXOGiS
 9WcbbwbhnM4Kau413qKicGCvdqTVYdeyZqo7fVelSciD139Vk7pZotyom4MuU25p
 fIyGfXa8/8gkl7fZ+HNkZbba0XWNfAZt//zT095qsp3CkhVnoybwe6OwG1xRqErq
 W7OOQbS7vvN/KGo=
 =10u6
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Another reasonable chunk of audit changes for v4.18, thirteen patches
  in total.

  The thirteen patches can mostly be broken down into one of four
  categories: general bug fixes, accessor functions for audit state
  stored in the task_struct, negative filter matches on executable
  names, and extending the (relatively) new seccomp logging knobs to the
  audit subsystem.

  The main driver for the accessor functions from Richard are the
  changes we're working on to associate audit events with containers,
  but I think they have some standalone value too so I figured it would
  be good to get them in now.

  The seccomp/audit patches from Tyler apply the seccomp logging
  improvements from a few releases ago to audit's seccomp logging;
  starting with this patchset the changes in
  /proc/sys/kernel/seccomp/actions_logged should apply to both the
  standard kernel logging and audit.

  As usual, everything passes the audit-testsuite and it happens to
  merge cleanly with your tree"

[ Heh, except it had trivial merge conflicts with the SELinux tree that
  also came in from Paul   - Linus ]

* tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: Fix wrong task in comparison of session ID
  audit: use existing session info function
  audit: normalize loginuid read access
  audit: use new audit_context access funciton for seccomp_actions_logged
  audit: use inline function to set audit context
  audit: use inline function to get audit context
  audit: convert sessionid unset to a macro
  seccomp: Don't special case audited processes when logging
  seccomp: Audit attempts to modify the actions_logged sysctl
  seccomp: Configurable separator for the actions_logged string
  seccomp: Separate read and write code for actions_logged sysctl
  audit: allow not equal op for audit by executable
  audit: add syscall information to FEATURE_CHANGE records
2018-06-06 16:34:00 -07:00
Richard Guy Briggs
cdfb6b341f audit: use inline function to get audit context
Recognizing that the audit context is an internal audit value, use an
access function to retrieve the audit context pointer for the task
rather than reaching directly into the task struct to get it.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: merge fuzz in auditsc.c and selinuxfs.c, checkpatch.pl fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-05-14 17:24:18 -04:00
Richard Guy Briggs
f0b752168d audit: convert sessionid unset to a macro
Use a macro, "AUDIT_SID_UNSET", to replace each instance of
initialization and comparison to an audit session ID.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-05-14 15:56:35 -04:00
Steffen Klassert
b48c05ab5d xfrm: Fix warning in xfrm6_tunnel_net_exit.
We need to make sure that all states are really deleted
before we check that the state lists are empty. Otherwise
we trigger a warning.

Fixes: baeb0dbbb5 ("xfrm6_tunnel: exit_net cleanup check added")
Reported-and-tested-by:syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-04-16 07:50:09 +02:00
Kirill Tkhai
e9a441b6e7 xfrm: Register xfrm_dev_notifier in appropriate place
Currently, driver registers it from pernet_operations::init method,
and this breaks modularity, because initialization of net namespace
and netdevice notifiers are orthogonal actions. We don't have
per-namespace netdevice notifiers; all of them are global for all
devices in all namespaces.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:59:23 -04:00
Stephen Hemminger
82695b30ff inet: whitespace cleanup
Ran simple script to find/remove trailing whitespace and blank lines
at EOF because that kind of stuff git whines about and editors leave
behind.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:43:28 -05:00
Yossef Efraim
50bd870a9e xfrm: Add ESN support for IPSec HW offload
This patch adds ESN support to IPsec device offload.
Adding new xfrm device operation to synchronize device ESN.

Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-18 10:42:59 +01:00
David S. Miller
6bb8824732 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/ipv6/ip6_gre.c is a case of parallel adds.

include/trace/events/tcp.h is a little bit more tricky.  The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29 15:42:26 -05:00
Shannon Nelson
7f05b467a7 xfrm: check for xdo_dev_state_free
The current XFRM code assumes that we've implemented the
xdo_dev_state_free() callback, even if it is meaningless to the driver.
This patch adds a check for it before calling, as done in other APIs,
to prevent a NULL function pointer kernel crash.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-21 08:17:48 +01:00
Steffen Klassert
2271d5190e xfrm: Allow IPsec GSO with software crypto for local sockets.
With support of async crypto operations in the GSO codepath
we have everything in place to allow GSO for local sockets.
This patch enables the GSO codepath.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20 10:41:48 +01:00
Steffen Klassert
f53c723902 net: Add asynchronous callbacks for xfrm on layer 2.
This patch implements asynchronous crypto callbacks
and a backlog handler that can be used when IPsec
is done at layer 2 in the TX path. It also extends
the skb validate functions so that we can update
the driver transmit return codes based on async
crypto operation or to indicate that we queued the
packet in a backlog queue.

Joint work with: Aviv Heller <avivh@mellanox.com>

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20 10:41:36 +01:00
Steffen Klassert
3dca3f38cf xfrm: Separate ESP handling from segmentation for GRO packets.
We change the ESP GSO handlers to only segment the packets.
The ESP handling and encryption is defered to validate_xmit_xfrm()
where this is done for non GRO packets too. This makes the code
more robust and prepares for asynchronous crypto handling.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20 10:41:31 +01:00
Herbert Xu
acf568ee85 xfrm: Reinject transport-mode packets through tasklet
This is an old bugbear of mine:

https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html

By crafting special packets, it is possible to cause recursion
in our kernel when processing transport-mode packets at levels
that are only limited by packet size.

The easiest one is with DNAT, but an even worse one is where
UDP encapsulation is used in which case you just have to insert
an UDP encapsulation header in between each level of recursion.

This patch avoids this problem by reinjecting tranport-mode packets
through a tasklet.

Fixes: b05e106698 ("[IPV4/6]: Netfilter IPsec input hooks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-19 08:23:21 +01:00
David Miller
0f6c480f23 xfrm: Move dst->path into struct xfrm_dst
The first member of an IPSEC route bundle chain sets it's dst->path to
the underlying ipv4/ipv6 route that carries the bundle.

Stated another way, if one were to follow the xfrm_dst->child chain of
the bundle, the final non-NULL pointer would be the path and point to
either an ipv4 or an ipv6 route.

This is largely used to make sure that PMTU events propagate down to
the correct ipv4 or ipv6 route.

When we don't have the top of an IPSEC bundle 'dst->path == dst'.

Move it down into xfrm_dst and key off of dst->xfrm.

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
2017-11-30 09:54:26 -05:00
David Miller
b6ca8bd5a9 xfrm: Move child route linkage into xfrm_dst.
XFRM bundle child chains look like this:

	xdst1 --> xdst2 --> xdst3 --> path_dst

All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
The final child pointer in the chain, here called 'path_dst', is some
other kind of route such as an ipv4 or ipv6 one.

The xfrm output path pops routes, one at a time, via the child
pointer, until we hit one which has a dst->xfrm pointer which
is NULL.

We can easily preserve the above mechanisms with child sitting
only in the xfrm_dst structure.  All children in the chain
before we break out of the xfrm_output() loop have dst->xfrm
non-NULL and are therefore xfrm_dst objects.

Since we break out of the loop when we find dst->xfrm NULL, we
will not try to dereference 'dst' as if it were an xfrm_dst.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30 09:54:26 -05:00
David Miller
45b018bedd ipsec: Create and use new helpers for dst child access.
This will make a future change moving the dst->child pointer less
invasive.

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
2017-11-30 09:54:26 -05:00
David Miller
b92cf4aab8 net: Create and use new helper xfrm_dst_child().
Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
routes are identified by a non-NULL dst->xfrm pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30 09:54:25 -05:00
David S. Miller
2a171788ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Files removed in 'net-next' had their license header updated
in 'net'.  We take the remove from 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:26:51 +09:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Alexey Dobriyan
5e708e47c4 xfrm: make xfrm_replay_state_esn_len() return unsigned int
Replay detection bitmaps can't have negative length.

Comparisons with nla_len() are left signed just in case negative value
can sneak in there.

Propagate unsignedness for code size savings:

	add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-38 (-38)
	function                                     old     new   delta
	xfrm_state_construct                        1802    1800      -2
	xfrm_update_ae_params                        295     289      -6
	xfrm_state_migrate                          1345    1339      -6
	xfrm_replay_notify_esn                       349     337     -12
	xfrm_replay_notify_bmp                       345     333     -12

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-09-25 07:14:06 +02:00
Alexey Dobriyan
1bd963a72e xfrm: make xfrm_alg_auth_len() return unsigned int
Key lengths can't be negative.

Comparison with nla_len() is left signed just in case negative value
can sneak in there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-09-25 07:14:06 +02:00
Alexey Dobriyan
06cd22f830 xfrm: make xfrm_alg_len() return unsigned int
Key lengths can't be negative.

Comparison with nla_len() is left signed just in case negative value
can sneak in there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-09-25 07:14:06 +02:00
Alexey Dobriyan
373b8eeb0c xfrm: make aead_len() return unsigned int
Key lengths can't be negative.

Comparison with nla_len() is left signed just in case negative value
can sneak in there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-09-25 07:14:06 +02:00
Yossi Kuperman
47ebcc0bb1 xfrm: Add support for network devices capable of removing the ESP trailer
In conjunction with crypto offload [1], removing the ESP trailer by
hardware can potentially improve the performance by avoiding (1) a
cache miss incurred by reading the nexthdr field and (2) the necessity
to calculate the csum value of the trailer in order to keep skb->csum
valid.

This patch introduces the changes to the xfrm stack and merely serves
as an infrastructure. Subsequent patch to mlx5 driver will put this to
a good use.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.html

Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-31 09:04:03 +02:00
Lorenzo Colitti
077fbac405 net: xfrm: support setting an output mark.
On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.

Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.

This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:

1. The xfrm mark is used to match xfrm policies and states, while
   the xfrm output mark is used to set the mark (and influence
   the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
   the originating socket or transformed packet, but the output
   mark is arbitrary and depends only on the state.

The use of a separate mark provides additional flexibility. For
example:

- A packet subject to two transforms (e.g., transport mode inside
  tunnel mode) can have two different output marks applied to it,
  one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
  emitted by an IPsec tunnel can be routed based on a mark that
  is determined by the tunnel, not by the marks of the
  unencrypted packets.
- Support for setting the output marks can be introduced without
  breaking any existing setups that employ both mark-based
  routing and xfrm tunnel mode. Simply changing the code to use
  the xfrm mark for routing output packets could xfrm mark could
  change behaviour in a way that breaks these setups.

If the output mark is unspecified or set to zero, the mark is not
set or changed.

Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-11 07:03:00 +02:00
Steffen Klassert
f70f250a77 net: Allow IPsec GSO for local sockets
This patch allows local sockets to make use of XFRM GSO code path.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
2017-08-02 11:45:48 +02:00
Ilan Tayari
ffdb5211da xfrm: Auto-load xfrm offload modules
IPSec crypto offload depends on the protocol-specific
offload module (such as esp_offload.ko).

When the user installs an SA with crypto-offload, load
the offload module automatically, in the same way
that the protocol module is loaded (such as esp.ko)

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-02 11:00:15 +02:00
Florian Westphal
ec30d78c14 xfrm: add xdst pcpu cache
retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.

The cache will not help with strict RR workloads as there is no hit.

The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.

We need to run the dst_release on the correct cpu to avoid races with
packet path.  This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().

Test results using 4 network namespaces and null encryption:

ns1           ns2          -> ns3           -> ns4
netperf -> xfrm/null enc   -> xfrm/null dec -> netserver

what                    TCP_STREAM      UDP_STREAM      UDP_RR
Flow cache:             14644.61        294.35          327231.64
No flow cache:		14349.81	242.64		202301.72
Pcpu cache:		14629.70	292.21		205595.22

UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.

'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal
09c7570480 xfrm: remove flow cache
After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.

See next patch for some numbers.

A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Reshetova, Elena
55eabed60a net, xfrm: convert sec_path.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 22:35:18 +01:00
Reshetova, Elena
850a6212c6 net, xfrm: convert xfrm_policy.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 22:35:18 +01:00
Reshetova, Elena
88755e9c7c net, xfrm: convert xfrm_state.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 22:35:18 +01:00
David S. Miller
b079115937 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30 12:43:08 -04:00
David S. Miller
93bbbfbb4a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-06-23

1) Use memdup_user to spmlify xfrm_user_policy.
   From Geliang Tang.

2) Make xfrm_dev_register static to silence a sparse warning.
   From Wei Yongjun.

3) Use crypto_memneq to check the ICV in the AH protocol.
   From Sabrina Dubroca.

4) Remove some unused variables in esp6.
   From Stephen Hemminger.

5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
   From Antony Antony.

6) Include the UDP encapsulation port to km_migrate announcements.
   From Antony Antony.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23 14:17:31 -04:00
Antony Antony
8bafd73093 xfrm: add UDP encapsulation port in migrate message
Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
to userland. Only add if XFRMA_ENCAP was in user migrate request.

Signed-off-by: Antony Antony <antony@phenome.org>
Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-07 08:35:54 +02:00
Antony Antony
4ab47d47af xfrm: extend MIGRATE with UDP encapsulation port
Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
netlink attribute XFRMA_ENCAP.

The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
could go to sleep for a few minutes and wake up. When it wake up the
NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
message to migrate the IPsec SA. The change could be a change UDP
encapsulation port, IP address, or both.

Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Antony Antony <antony@phenome.org>
Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-07 08:25:58 +02:00
Hangbin Liu
b81f884a54 xfrm: fix xfrm_dev_event() missing when compile without CONFIG_XFRM_OFFLOAD
In commit d77e38e612 ("xfrm: Add an IPsec hardware offloading API") we
make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
But this will make xfrm_dev_event() missing if we only enable default XFRM
options.

Then if we set down and unregister an interface with IPsec on it. there
will no xfrm_garbage_collect(), which will cause dev usage count hold and
get error like:

unregister_netdevice: waiting for <dev> to become free. Usage count = 4

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-07 08:16:27 +02:00
Sabrina Dubroca
9b3eb54106 xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
that dst. Unfortunately, the code that allocates and fills this copy
doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
passed. In multiple code paths (from raw_sendmsg, from TCP when
replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
passed to xfrm is actually an on-stack flowi4, so we end up reading
stuff from the stack past the end of the flowi4 struct.

Since xfrm_dst->origin isn't used anywhere following commit
ca116922af ("xfrm: Eliminate "fl" and "pol" args to
xfrm_bundle_ok()."), just get rid of it.  xfrm_dst->partner isn't used
either, so get rid of that too.

Fixes: 9d6ec93801 ("ipv4: Use flowi4 in public route lookup interfaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-05-04 07:30:59 +02:00
Steffen Klassert
f6e27114a6 net: Add a xfrm validate function to validate_xmit_skb
When we do IPsec offloading, we need a fallback for
packets that were targeted to be IPsec offloaded but
rerouted to a device that does not support IPsec offload.
For that we add a function that checks the offloading
features of the sending device and and flags the
requirement of a fallback before it calls the IPsec
output function. The IPsec output function adds the IPsec
trailer and does encryption if needed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:07:28 +02:00
Steffen Klassert
d77e38e612 xfrm: Add an IPsec hardware offloading API
This patch adds all the bits that are needed to do
IPsec hardware offload for IPsec states and ESP packets.
We add xfrmdev_ops to the net_device. xfrmdev_ops has
function pointers that are needed to manage the xfrm
states in the hardware and to do a per packet
offloading decision.

Joint work with:
Ilan Tayari <ilant@mellanox.com>
Guy Shapiro <guysh@mellanox.com>
Yossi Kuperman <yossiku@mellanox.com>

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:06:10 +02:00
Steffen Klassert
c35fe4106b xfrm: Add mode handlers for IPsec on layer 2
This patch adds a gso_segment and xmit callback for the
xfrm_mode and implement these functions for tunnel and
transport mode.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:06:01 +02:00
Steffen Klassert
21f42cc95f xfrm: Move device notifications to a sepatate file
This is needed for the upcomming IPsec device offloading.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:05:53 +02:00
Steffen Klassert
9d389d7f84 xfrm: Add a xfrm type offload.
We add a struct  xfrm_type_offload so that we have the offloaded
codepath separated to the non offloaded codepath. With this the
non offloade and the offloaded codepath can coexist.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:05:44 +02:00
Alexey Dobriyan
6c786bcb29 xfrm: branchless addr4_match() on 64-bit
Current addr4_match() code has special test for /0 prefixes because of
standard required undefined behaviour. However, it is possible to omit
it on 64-bit because shifting can be done within a 64-bit register and
then truncated to the expected value (which is 0 mask).

Implicit truncation by htonl() fits nicely into R32-within-R64 model
on x86-64.

Space savings: none (coincidence)
Branch savings: 1

Before:

	movzx  eax,BYTE PTR [rdi+0x2a]		# ->prefixlen_d
	test   al,al
	jne    xfrm_selector_match + 0x23f
		...
	movzx  eax,BYTE PTR [rbx+0x2b]		# ->prefixlen_s
	test   al,al
	je     xfrm_selector_match + 0x1c7

After (no branches):

	mov    r8d,0x20
	mov    rdx,0xffffffffffffffff
	mov    esi,DWORD PTR [rsi+0x2c]
	mov    ecx,r8d
	sub    cl,BYTE PTR [rdi+0x2a]
	xor    esi,DWORD PTR [rbx]
	mov    rdi,rdx
	xor    eax,eax
	shl    rdi,cl
	bswap  edi

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-27 07:04:14 +02:00
Alexey Dobriyan
e1b0048e18 xfrm: use "unsigned int" in addr_match()
x86_64 is zero-extending arch so "unsigned int" is preferred over "int"
for address calculations and extending to size_t.

Space savings:

	add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-24 (-24)
	function                                     old     new   delta
	xfrm_state_walk                              708     696     -12
	xfrm_selector_match                          918     906     -12

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-24 07:03:12 +01:00
Alexey Dobriyan
1560875600 xfrm: remove unused struct xfrm_mgr::id
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-24 07:03:12 +01:00
Steffen Klassert
7785bba299 esp: Add a software GRO codepath
This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 11:04:11 +01:00
Steffen Klassert
54ef207ac8 xfrm: Extend the sec_path for IPsec offloading
We need to keep per packet offloading informations across
the layers. So we extend the sec_path to carry these for
the input and output offload codepath.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 11:04:10 +01:00
Steffen Klassert
1e29537034 xfrm: Export xfrm_parse_spi.
We need it in the ESP offload handlers, so export it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:49 +01:00
Steffen Klassert
b0fcee825c xfrm: Add a secpath_set helper.
Add a new helper to set the secpath to the skb.
This avoids code duplication, as this is used
in multiple places.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:24 +01:00
Florian Westphal
a2817d8b27 xfrm: policy: remove family field
Only needed it to register the policy backend at init time.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
3d7d25a68e xfrm: policy: remove garbage_collect callback
Just call xfrm_garbage_collect_deferred() directly.
This gets rid of a write to afinfo in register/unregister and allows to
constify afinfo later on.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
2b61997aa0 xfrm: policy: xfrm_policy_unregister_afinfo can return void
Nothing checks the return value. Also, the errors returned on unregister
are impossible (we only support INET and INET6, so no way
xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo'
itself).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Florian Westphal
960fdfdeb9 xfrm: input: constify xfrm_input_afinfo
Nothing writes to these structures (the module owner was not used).

While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
(INET6), not AF_MAX.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Steffen Klassert
cac2661c53 esp4: Avoid skb_cow_data whenever possible
This patch tries to avoid skb_cow_data on esp4.

On the encrypt side we add the IPsec tailbits
to the linear part of the buffer if there is
space on it. If there is no space on the linear
part, we add a page fragment with the tailbits to
the buffer and use separate src and dst scatterlists.

On the decrypt side, we leave the buffer as it is
if it is not cloned.

With this, we can avoid a linearization of the buffer
in most of the cases.

Joint work with:
Sowmini Varadhan <sowmini.varadhan@oracle.com>
Ilan Tayari <ilant@mellanox.com>

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-17 10:22:57 +01:00
Florian Westphal
711059b975 xfrm: add and use xfrm_state_afinfo_get_rcu
xfrm_init_tempstate is always called from within rcu read side section.
We can thus use a simpler function that doesn't call rcu_read_lock
again.

While at it, also make xfrm_init_tempstate return value void, the
return value was never tested.

A followup patch will replace remaining callers of xfrm_state_get_afinfo
with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
get_afinfo interface.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
af5d27c4e1 xfrm: remove xfrm_state_put_afinfo
commit 44abdc3047
("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
xfrm_state_put_afinfo equivalent to rcu_read_unlock.

Use spatch to replace it with direct calls to rcu_read_unlock:

@@
struct xfrm_state_afinfo *a;
@@

-  xfrm_state_put_afinfo(a);
+  rcu_read_unlock();

old:
 text    data     bss     dec     hex filename
22570      72     424   23066    5a1a xfrm_state.o
 1612       0       0    1612     64c xfrm_output.o
new:
22554      72     424   23050    5a0a xfrm_state.o
 1596       0       0    1596     63c xfrm_output.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
David S. Miller
d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Nicolas Dichtel
63c43787d3 vti6: fix input path
Since commit 1625f45299, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).

XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().

A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).

CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Fixes: 1625f45299 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-09-21 10:09:14 +02:00
Julia Lawall
e45a8a9e60 xfrm: constify xfrm_replay structures
The xfrm_replay structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-10 11:18:49 +02:00
Eric Dumazet
b540f9d702 net: xfrm: kill XFRM_INC_STATS_BH()
Not used anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet
6aef70a851 net: snmp: kill various STATS_USER() helpers
In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.

After commit 8f0ea0fe3a ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:22 -04:00
Eric Dumazet
d188ba86dd xfrm: add rcu protection to sk->sk_policy[]
XFRM can deal with SYNACK messages, sent while listener socket
is not locked. We add proper rcu protection to __xfrm_sk_clone_policy()
and xfrm_sk_policy_lookup()

This might serve as the first step to remove xfrm.xfrm_policy_lock
use in fast path.

Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11 19:22:06 -05:00
Eric Dumazet
56f047305d xfrm: add rcu grace period in xfrm_policy_destroy()
We will soon switch sk->sk_policy[] to RCU protection,
as SYNACK packets are sent while listener socket is not locked.

This patch simply adds RCU grace period before struct xfrm_policy
freeing, and the corresponding rcu_head in struct xfrm_policy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11 19:22:06 -05:00
Eric W. Biederman
ede2059dba dst: Pass net into dst->output
The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:27:03 -07:00
Eric W. Biederman
a6568b2425 xfrm: Remove unused afinfo method init_dst
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:32 -07:00
David Ahern
42a7b32b73 xfrm: Add oif to dst lookups
Rules can be installed that direct route lookups to specific tables based
on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
struct and passed to the resolver routines.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-11 12:41:35 +02:00
Herbert Xu
69b0137f61 ipsec: Add IV generator information to xfrm_state
This patch adds IV generator information to xfrm_state.  This
is currently obtained from our own list of algorithm descriptions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-28 11:23:20 +08:00
Herbert Xu
165ecc6373 xfrm: Add IV generator information to xfrm_algo_desc
This patch adds IV generator information for each AEAD and block
cipher to xfrm_algo_desc.  This will be used to access the new
AEAD interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-28 11:23:19 +08:00
David Miller
7026b1ddb6 netfilter: Pass socket pointer down through okfn().
On the output paths in particular, we have to sometimes deal with two
socket contexts.  First, and usually skb->sk, is the local socket that
generated the frame.

And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.

We do not want to disassociate skb->sk when encapsulating in order
to fix this, because that would break socket memory accounting.

The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device.  We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 15:25:55 -04:00
Jiri Benc
15e318bdc6 xfrm: simplify xfrm_address_t use
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and
get rid of the typecasting.

Modifying the uapi header is okay, the union has still the same size.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Eric W. Biederman
0c5c9fb551 net: Introduce possible_net_t
Having to say
> #ifdef CONFIG_NET_NS
> 	struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
>       struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00