2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

301 Commits

Author SHA1 Message Date
Paul Blakey
95a7233c45 net: openvswitch: Set OvS recirc_id from tc chain index
Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:

recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

Will be translated to the following tc rule:

$ tc filter add dev dev1 ingress \
	    prio 1 chain 0 proto ip \
		flower tcp ct_state -trk \
		action ct pipe \
		action goto chain 2

Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.

To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06 14:59:18 +02:00
Vlad Buslov
1444c175a3 net: sched: copy tunnel info when setting flow_action entry->tunnel
In order to remove dependency on rtnl lock, modify tc_setup_flow_action()
to copy tunnel info, instead of just saving pointer to tunnel_key action
tunnel info. This is necessary to prevent concurrent action overwrite from
releasing tunnel info while it is being used by rtnl-unlocked driver.

Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info
with all its options to dynamically allocated memory block. Modify
tc_cleanup_flow_action() to free dynamically allocated tunnel info.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
5a6ff4b13d net: sched: take reference to action dev before calling offloads
In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
9838b20a7f net: sched: take rtnl lock in tc_setup_flow_action()
In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
11bd634da2 net: sched: conditionally obtain rtnl lock in cls hw offloads API
In order to remove dependency on rtnl lock from offloads code of
classifiers, take rtnl lock conditionally before executing driver
callbacks. Only obtain rtnl lock if block is bound to devices that require
it.

Block bind/unbind code is rtnl-locked and obtains block->cb_lock while
holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions
to prevent deadlock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
c9f14470d0 net: sched: add API for registering unlocked offload block callbacks
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.

Register mlx5 tc block offload callbacks as "unlocked".

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
a449a3e77a net: sched: notify classifier on successful offload add/delete
To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.

Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
4011921137 net: sched: refactor block offloads counter usage
Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.

Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:

  tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
  already in hardware is successfully offloaded, increment block offloads
  counter, set filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be intact and offloads counter is not
  decremented.

  tc_setup_cb_replace() - destructive filter replace. Release existing
  filter block offload counter and reset its in hardware counter and flag.
  Set new filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be destroyed and offload counter is
  decremented.

  tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
  offloads counter.

  tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
  call tc_cls_offload_cnt_update() if cb() didn't return an error.

Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
97394bef56 net: sched: change tcf block offload counter type to atomic_t
As a preparation for running proto ops functions without rtnl lock, change
offload counter type to atomic. This is necessary to allow updating the
counter by multiple concurrent users when offloading filters to hardware
from unlocked classifiers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
4f8116c850 net: sched: protect block offload-related fields with rw_semaphore
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock'
rwsem and use it to protect flow_block->cb_list and related counters from
concurrent modification. The lock is taken in read mode for read-only
traversal of cb_list in tc_setup_cb_call() and write mode in all other
cases. This approach ensures that:

- cb_list is not changed concurrently while filters is being offloaded on
  block.

- block->nooffloaddevcnt is checked while holding the lock in read mode,
  but is only changed by bind/unbind code when holding the cb_lock in write
  mode to prevent concurrent modification.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
wenxu
1150ab0f1b flow_offload: support get multi-subsystem block
It provide a callback list to find the blocks of tc
and nft subsystems

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
4e481908c5 flow_offload: move tc indirect block to flow offload
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
e4da910211 cls_api: add flow_indr_block_call function
This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
f843698857 cls_api: remove the tcf_block cache
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem
support.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
242453c227 cls_api: modify the tc_indr_block_ing_cmd parameters.
This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
John Hurley
48e584ac58 net: sched: add ingress mirred action to hardware IR
TC mirred actions (redirect and mirred) can send to egress or ingress of a
device. Currently only egress is used for hw offload rules.

Modify the intermediate representation for hw offload to include mirred
actions that go to ingress. This gives drivers access to such rules and
can decide whether or not to offload them.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:24:21 -07:00
John Hurley
fb1b775a24 net: sched: add skbedit of ptype action to hardware IR
TC rules can impliment skbedit actions. Currently actions that modify the
skb mark are passed to offloading drivers via the hardware intermediate
representation in the flow_offload API.

Extend this to include skbedit actions that modify the packet type of the
skb. Such actions may be used to set the ptype to HOST when redirecting a
packet to ingress.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:24:21 -07:00
John Hurley
6749d59016 net: sched: include mpls actions in hardware intermediate representation
A recent addition to TC actions is the ability to manipulate the MPLS
headers on packets.

In preparation to offload such actions to hardware, update the IR code to
accept and prepare the new actions.

Note that no driver currently impliments the MPLS dec_ttl action so this
is not included.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-23 13:52:50 -07:00
Vlad Buslov
503d81d428 net: sched: verify that q!=NULL before setting q->flags
In function int tc_new_tfilter() q pointer can be NULL when adding filter
on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
filter creation, following NULL pointer dereference happens in case parent
block is shared:

[  212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  212.925445] #PF: supervisor write access in kernel mode
[  212.925709] #PF: error_code(0x0002) - not-present page
[  212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
[  212.926302] Oops: 0002 [#1] SMP KASAN PTI
[  212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G    B             5.2.0+ #512
[  212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
[  212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
[  212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
[  212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[  212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
[  212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
[  212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
[  212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
[  212.929999] FS:  00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
[  212.930235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
[  212.930588] Call Trace:
[  212.930682]  ? tc_del_tfilter+0xa40/0xa40
[  212.930811]  ? __lock_acquire+0x5b5/0x2460
[  212.930948]  ? find_held_lock+0x85/0xa0
[  212.931081]  ? tc_del_tfilter+0xa40/0xa40
[  212.931201]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  212.931332]  ? rtnl_dellink+0x490/0x490
[  212.931454]  ? lockdep_hardirqs_on+0x260/0x260
[  212.931589]  ? netlink_deliver_tap+0xab/0x5a0
[  212.931717]  ? match_held_lock+0x1b/0x240
[  212.931844]  netlink_rcv_skb+0xd0/0x200
[  212.931958]  ? rtnl_dellink+0x490/0x490
[  212.932079]  ? netlink_ack+0x440/0x440
[  212.932205]  ? netlink_deliver_tap+0x161/0x5a0
[  212.932335]  ? lock_downgrade+0x360/0x360
[  212.932457]  ? lock_acquire+0xe5/0x210
[  212.932579]  netlink_unicast+0x296/0x350
[  212.932705]  ? netlink_attachskb+0x390/0x390
[  212.932834]  ? _copy_from_iter_full+0xe0/0x3a0
[  212.932976]  netlink_sendmsg+0x394/0x600
[  212.937998]  ? netlink_unicast+0x350/0x350
[  212.943033]  ? move_addr_to_kernel.part.0+0x90/0x90
[  212.948115]  ? netlink_unicast+0x350/0x350
[  212.953185]  sock_sendmsg+0x96/0xa0
[  212.958099]  ___sys_sendmsg+0x482/0x520
[  212.962881]  ? match_held_lock+0x1b/0x240
[  212.967618]  ? copy_msghdr_from_user+0x250/0x250
[  212.972337]  ? lock_downgrade+0x360/0x360
[  212.976973]  ? rwlock_bug.part.0+0x60/0x60
[  212.981548]  ? __mod_node_page_state+0x1f/0xa0
[  212.986060]  ? match_held_lock+0x1b/0x240
[  212.990567]  ? find_held_lock+0x85/0xa0
[  212.994989]  ? do_user_addr_fault+0x349/0x5b0
[  212.999387]  ? lock_downgrade+0x360/0x360
[  213.003713]  ? find_held_lock+0x85/0xa0
[  213.007972]  ? __fget_light+0xa1/0xf0
[  213.012143]  ? sockfd_lookup_light+0x91/0xb0
[  213.016165]  __sys_sendmsg+0xba/0x130
[  213.020040]  ? __sys_sendmsg_sock+0xb0/0xb0
[  213.023870]  ? handle_mm_fault+0x337/0x470
[  213.027592]  ? page_fault+0x8/0x30
[  213.031316]  ? lockdep_hardirqs_off+0xbe/0x100
[  213.034999]  ? mark_held_locks+0x24/0x90
[  213.038671]  ? do_syscall_64+0x1e/0xe0
[  213.042297]  do_syscall_64+0x74/0xe0
[  213.045828]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  213.049354] RIP: 0033:0x7fe7c527c7b8
[  213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
[  213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
[  213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
[  213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
[  213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
[  213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
[  213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
[<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
_ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[  213.112326] CR2: 0000000000000010
[  213.117429] ---[ end trace adb58eb0a4ee6283 ]---

Verify that q pointer is not NULL before setting the 'flags' field.

Fixes: 3f05e6886a ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:49:53 -07:00
Pablo Neira Ayuso
14bfb13f0e net: flow_offload: add flow_block structure and use it
This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.

This patch restores the block sharing feature.

Fixes: da3eeb904f ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19 21:27:45 -07:00
Pablo Neira Ayuso
a732331151 net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t
Rename this type definition and adapt users.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19 21:27:45 -07:00
Cong Wang
3f05e6886a net_sched: unset TCQ_F_CAN_BYPASS when adding filters
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
notably fq_codel, it makes no sense to let packets bypass the TC
filters we setup in any scenario, otherwise our packets steering
policy could not be enforced.

This can be reproduced easily with the following script:

 ip li add dev dummy0 type dummy
 ifconfig dummy0 up
 tc qd add dev dummy0 root fq_codel
 tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
 tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
 ping -I dummy0 192.168.112.1

Without this patch, packets are sent directly to dummy0 without
hitting any of the filters. With this patch, packets are redirected
to loopback as expected.

This fix is not perfect, it only unsets the flag but does not set it back
because we have to save the information somewhere in the qdisc if we
really want that. Note, both fq_codel and sfq clear this flag in their
->bind_tcf() but this is clearly not sufficient when we don't use any
class ID.

Fixes: 23624935e0 ("net_sched: TCQ_F_CAN_BYPASS generalization")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17 13:34:09 -07:00
Vlad Buslov
c1a970d06f net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd()
After recent refactoring of block offlads infrastructure, indr_dev->block
pointer is dereferenced before it is verified to be non-NULL. Example stack
trace where this behavior leads to NULL-pointer dereference error when
creating vxlan dev on system with mlx5 NIC with offloads enabled:

[ 1157.852938] ==================================================================
[ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
[ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
[ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1157.929031] Call Trace:
[ 1157.938318]  dump_stack+0x9a/0xeb
[ 1157.948362]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.960262]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.972082]  __kasan_report+0x176/0x192
[ 1157.982513]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.994348]  kasan_report+0xe/0x20
[ 1158.004324]  tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1158.015950]  ? tcf_block_setup+0x430/0x430
[ 1158.026558]  ? kasan_unpoison_shadow+0x30/0x40
[ 1158.037464]  __tc_indr_block_cb_register+0x5f5/0xf20
[ 1158.049288]  ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
[ 1158.062344]  ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
[ 1158.074498]  ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
[ 1158.086580]  ? br_device_event+0x98/0x480 [bridge]
[ 1158.097870]  ? strcmp+0x30/0x50
[ 1158.107578]  mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
[ 1158.120212]  notifier_call_chain+0x6d/0xa0
[ 1158.130753]  register_netdevice+0x6fc/0x7e0
[ 1158.141322]  ? netdev_change_features+0xa0/0xa0
[ 1158.152218]  ? vxlan_config_apply+0x210/0x310 [vxlan]
[ 1158.163593]  __vxlan_dev_create+0x2ad/0x520 [vxlan]
[ 1158.174770]  ? vxlan_changelink+0x490/0x490 [vxlan]
[ 1158.185870]  ? rcu_read_unlock+0x60/0x60 [vxlan]
[ 1158.196798]  vxlan_newlink+0x99/0xf0 [vxlan]
[ 1158.207303]  ? __vxlan_dev_create+0x520/0x520 [vxlan]
[ 1158.218601]  ? rtnl_create_link+0x3d0/0x450
[ 1158.228900]  __rtnl_newlink+0x8a7/0xb00
[ 1158.238701]  ? stack_access_ok+0x35/0x80
[ 1158.248450]  ? rtnl_link_unregister+0x1a0/0x1a0
[ 1158.258735]  ? find_held_lock+0x6d/0xd0
[ 1158.268379]  ? is_bpf_text_address+0x67/0xf0
[ 1158.278330]  ? lock_acquire+0xc1/0x1f0
[ 1158.287686]  ? is_bpf_text_address+0x5/0xf0
[ 1158.297449]  ? is_bpf_text_address+0x86/0xf0
[ 1158.307310]  ? kernel_text_address+0xec/0x100
[ 1158.317155]  ? arch_stack_walk+0x92/0xe0
[ 1158.326497]  ? __kernel_text_address+0xe/0x30
[ 1158.336213]  ? unwind_get_return_address+0x2f/0x50
[ 1158.346267]  ? create_prof_cpu_mask+0x20/0x20
[ 1158.355936]  ? arch_stack_walk+0x92/0xe0
[ 1158.365117]  ? stack_trace_save+0x8a/0xb0
[ 1158.374272]  ? stack_trace_consume_entry+0x80/0x80
[ 1158.384226]  ? match_held_lock+0x33/0x210
[ 1158.393216]  ? kasan_unpoison_shadow+0x30/0x40
[ 1158.402593]  rtnl_newlink+0x53/0x80
[ 1158.410925]  rtnetlink_rcv_msg+0x3a5/0x600
[ 1158.419777]  ? validate_linkmsg+0x400/0x400
[ 1158.428620]  ? find_held_lock+0x6d/0xd0
[ 1158.437117]  ? match_held_lock+0x1b/0x210
[ 1158.445760]  ? validate_linkmsg+0x400/0x400
[ 1158.454642]  netlink_rcv_skb+0xc7/0x1f0
[ 1158.463150]  ? netlink_ack+0x470/0x470
[ 1158.471538]  ? netlink_deliver_tap+0x1f3/0x5a0
[ 1158.480607]  netlink_unicast+0x2ae/0x350
[ 1158.489099]  ? netlink_attachskb+0x340/0x340
[ 1158.497935]  ? _copy_from_iter_full+0xde/0x3b0
[ 1158.506945]  ? __virt_addr_valid+0xb6/0xf0
[ 1158.515578]  ? __check_object_size+0x159/0x240
[ 1158.524515]  netlink_sendmsg+0x4d3/0x630
[ 1158.532879]  ? netlink_unicast+0x350/0x350
[ 1158.541400]  ? netlink_unicast+0x350/0x350
[ 1158.549805]  sock_sendmsg+0x94/0xa0
[ 1158.557561]  ___sys_sendmsg+0x49d/0x570
[ 1158.565625]  ? copy_msghdr_from_user+0x210/0x210
[ 1158.574457]  ? __fput+0x1e2/0x330
[ 1158.581948]  ? __kasan_slab_free+0x130/0x180
[ 1158.590407]  ? kmem_cache_free+0xb6/0x2d0
[ 1158.598574]  ? mark_lock+0xc7/0x790
[ 1158.606177]  ? task_work_run+0xcf/0x100
[ 1158.614165]  ? exit_to_usermode_loop+0x102/0x110
[ 1158.622954]  ? __lock_acquire+0x963/0x1ee0
[ 1158.631199]  ? lockdep_hardirqs_on+0x260/0x260
[ 1158.639777]  ? match_held_lock+0x1b/0x210
[ 1158.647918]  ? lockdep_hardirqs_on+0x260/0x260
[ 1158.656501]  ? match_held_lock+0x1b/0x210
[ 1158.664643]  ? __fget_light+0xa6/0xe0
[ 1158.672423]  ? __sys_sendmsg+0xd2/0x150
[ 1158.680334]  __sys_sendmsg+0xd2/0x150
[ 1158.688063]  ? __ia32_sys_shutdown+0x30/0x30
[ 1158.696435]  ? lock_downgrade+0x2e0/0x2e0
[ 1158.704541]  ? mark_held_locks+0x1a/0x90
[ 1158.712611]  ? mark_held_locks+0x1a/0x90
[ 1158.720619]  ? do_syscall_64+0x1e/0x2c0
[ 1158.728530]  do_syscall_64+0x78/0x2c0
[ 1158.736254]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1158.745414] RIP: 0033:0x7f62d505cb87
[ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
 48 89 f3 48
[ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
[ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
[ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
[ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
[ 1158.852873] ==================================================================

Introduce new function tcf_block_non_null_shared() that verifies block
pointer before dereferencing it to obtain index. Use the function in
tc_indr_block_ing_cmd() to prevent NULL pointer dereference.

Fixes: 955bcb6ea0 ("drivers: net: use flow block API")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-12 15:21:53 -07:00
Pablo Neira Ayuso
722d36e6e2 net: sched: remove tcf block API
Unused, now replaced by flow block API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
955bcb6ea0 drivers: net: use flow block API
This patch updates flow_block_cb_setup_simple() to use the flow block API.
Several drivers are also adjusted to use it.

This patch introduces the per-driver list of flow blocks to account for
blocks that are already in use.

Remove tc_block_offload alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
59094b1e50 net: sched: use flow block API
This patch adds tcf_block_setup() which uses the flow block API.

This infrastructure takes the flow block callbacks coming from the
driver and register/unregister to/from the cls_api core.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
da3eeb904f net: flow_offload: add list handling functions
This patch adds the list handling functions for the flow block API:

* flow_block_cb_lookup() allows drivers to look up for existing flow blocks.
* flow_block_cb_add() adds a flow block to the per driver list to be registered
  by the core.
* flow_block_cb_remove() to remove a flow block from the list of existing
  flow blocks per driver and to request the core to unregister this.

The flow block API also annotates the netns this flow block belongs to.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
32f8c4093a net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
Rename from TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* and
remove temporary tcf_block_binder_type alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
9c0e189ec9 net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
Rename from TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND and remove
temporary tc_block_command alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Paul Blakey
b57dc7c13e net/sched: Introduce action ct
Allow sending a packet to conntrack module for connection tracking.

The packet will be marked with conntrack connection's state, and
any metadata such as conntrack mark and label. This state metadata
can later be matched against with tc classifers, for example with the
flower classifier as below.

In addition to committing new connections the user can optionally
specific a zone to track within, set a mark/label and configure nat
with an address range and port range.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

Changelog:
V5->V6:
	Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
V4->V5:
	Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
V3->V4:
	Added strict_start_type for act_ct policy
V2->V3:
	Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
V1->V2:
	Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
	Refactored NAT_PORT_MIN_MAX range handling as well
	Added ipv4/ipv6 defragmentation
	Removed extra skb pull push of nw offset in exectute nat
	Refactored tcf_ct_skb_network_trim after pull
	Removed TCA_ACT_CT define

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:11:59 -07:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Pieter Jansen van Vuuren
8c8cfc6ed2 net/sched: add police action to the hardware intermediate representation
Add police action to the hardware intermediate representation which
would subsequently allow it to be used by drivers for offload.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren
a7a7be6087 net/sched: add sample action to the hardware intermediate representation
Add sample action to the hardware intermediate representation model which
would subsequently allow it to be used by drivers for offload.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:23 -07:00
Johannes Berg
8cb081746c netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27 17:07:21 -04:00
Michal Kubecek
ae0be8de9a netlink: make nla_nest_start() add NLA_F_NESTED flag
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27 17:03:44 -04:00
Vlad Buslov
3eed52842b net: sched: don't set tunnel for decap action
Action tunnel_key doesn't have a metadata/tunnel for release(decap) action.
Drivers do not dereference entry->tunnel pointer for that action type, so
this behavior doesn't result in a crash at the moment. However, this needs
to be corrected as a preparation for updating hardware offloads API to not
rely on rtnl lock, for which flow_action code will copy the tunnel data to
temporary buffer to prevent concurrent action overwrite from
invalidating/freeing it.

Fixes: 3a7b68617d ("cls_api: add translator to flow_action representation")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02 13:20:30 -07:00
Davide Caratti
ee3bbfe806 net/sched: let actions use RCU to access 'goto_chain'
use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:

 # tc chain add dev dd0 chain 42 ingress protocol ip flower \
 > ip_proto udp action pass index 4
 # tc filter add dev dd0 ingress protocol ip flower \
 > ip_proto udp action csum udp goto chain 42 index 66
 # tc chain del dev dd0 chain 42 ingress
 (start UDP traffic towards dd0)
 # tc action replace action csum udp pass index 66

was run repeatedly for several hours.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Zhike Wang
5b5f99b186 net_sched: return correct value for *notify* functions
It is confusing to directly use return value of netlink_send()/
netlink_unicast() as the return value of *notify*, as it may be not
error at all.

Example: in tc_del_tfilter(), after calling tfilter_del_notify(), it will
goto errout if (err). However, the netlink_send()/netlink_unicast() will
return positive value even for successful case. So it may not call
tcf_chain_tp_remove() and so on to clean up the resource, as a result,
resource is leaked.

It may be easier to only check the return value of tfilter_del_nofiy(),
but it is more clean to correct all related functions.

Co-developed-by: Zengmo Gao <gaozengmo@jd.com>
Signed-off-by: Zhike Wang <wangzhike@jd.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-13 13:48:27 -07:00
Vlad Buslov
b62989fc4e net: sched: fix potential use-after-free in __tcf_chain_put()
When used with unlocked classifier that have filters attached to actions
with goto chain, __tcf_chain_put() for last non action reference can race
with calls to same function from action cleanup code that releases last
action reference. In this case action cleanup handler could free the chain
if it executes after all references to chain were released, but before all
concurrent users finished using it. Modify __tcf_chain_put() to only access
tcf_chain fields when holding block->lock. Remove local variables that were
used to cache some tcf_chain fields and are no longer needed because their
values can now be obtained directly from chain under block->lock
protection.

Fixes: 726d061286 ("net: sched: prevent insertion of new classifiers during chain flush")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08 15:17:47 -08:00
Vlad Buslov
268a351d4a net: sched: fix typo in walker_check_empty()
Function walker_check_empty() incorrectly verifies that tp pointer is not
NULL, instead of actual filter pointer. Fix conditional to check the right
pointer. Adjust filter pointer naming accordingly to other cls API
functions.

Fixes: 6676d5e416 ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26 09:17:49 -08:00
Vlad Buslov
ace4a267e8 net: sched: don't release block->lock when dumping chains
Function tc_dump_chain() obtains and releases block->lock on each iteration
of its inner loop that dumps all chains on block. Outputting chain template
info is fast operation so locking/unlocking mutex multiple times is an
overhead when lock is highly contested. Modify tc_dump_chain() to only
obtain block->lock once and dump all chains without releasing it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 10:25:02 -08:00
Vlad Buslov
6676d5e416 net: sched: set dedicated tcf_walker flag when tp is empty
Using tcf_walker->stop flag to determine when tcf_walker->fn() was called
at least once is unreliable. Some classifiers set 'stop' flag on error
before calling walker callback, other classifiers used to call it with NULL
filter pointer when empty. In order to prevent further regressions, extend
tcf_walker structure with dedicated 'nonempty' flag. Set this flag in
tcf_walker->fn() implementation that is used to check if classifier has
filters configured.

Fixes: 8b64678e0a ("net: sched: refactor tp insert/delete for concurrent execution")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 10:18:17 -08:00
Cong Wang
14215108a1 net_sched: initialize net pointer inside tcf_exts_init()
For tcindex filter, it is too late to initialize the
net pointer in tcf_exts_validate(), as tcf_exts_get_net()
requires a non-NULL net pointer. We can just move its
initialization into tcf_exts_init(), which just requires
an additional parameter.

This makes the code in tcindex_alloc_perfect_hash()
prettier.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22 15:26:51 -08:00
Dan Carpenter
af736bf071 net: sched: potential NULL dereference in tcf_block_find()
The error code isn't set on this path so it would result in returning
ERR_PTR(0) and a NULL dereference in the caller.

Fixes: 18d3eefb17 ("net: sched: refactor tcf_block_find() into standalone functions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 13:11:07 -08:00
YueHaibing
fb14b09635 net: sched: remove duplicated include from cls_api.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 21:09:06 -08:00
Vlad Buslov
470502de5b net: sched: unlock rules update API
Register netlink protocol handlers for message types RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
tracks rtnl mutex state to be false by default.

Introduce tcf_proto_is_unlocked() helper that is used to check
tcf_proto_ops->flag to determine if ops can be called without taking rtnl
lock. Manually lookup Qdisc, class and block in rule update handlers.
Verify that both Qdisc ops and proto ops are unlocked before using any of
their callbacks, and obtain rtnl lock otherwise.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
18d3eefb17 net: sched: refactor tcf_block_find() into standalone functions
Refactor tcf_block_find() code into three standalone functions:
- __tcf_qdisc_find() to lookup Qdisc and increment its reference counter.
- __tcf_qdisc_cl_find() to lookup class.
- __tcf_block_find() to lookup block and increment its reference counter.

This change is necessary to allow netlink tc rule update handlers to call
these functions directly in order to conditionally take rtnl lock
according to Qdisc class ops flags before calling any of class ops
functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
12db03b65c net: sched: extend proto ops to support unlocked classifiers
Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. Extend users of these function in cls
API to propagate rtnl lock status to them. This allows classifiers to
obtain rtnl lock when necessary and to pass rtnl lock status to extensions
and driver offload callbacks.

Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
7d5509fa0d net: sched: extend proto ops with 'put' callback
Add optional tp->ops->put() API to be implemented for filter reference
counting. This new function is called by cls API to release filter
reference for filters returned by tp->ops->change() or tp->ops->get()
functions. Implement tfilter_put() helper to call tp->ops->put() only for
classifiers that implement it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
ec6743a109 net: sched: track rtnl lock status when validating extensions
Actions API is already updated to not rely on rtnl lock for
synchronization. However, it need to be provided with rtnl status when
called from classifiers API in order to be able to correctly release the
lock when loading kernel module.

Extend extension validation function with 'rtnl_held' flag which is passed
to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
API. No classifier is currently updated to support unlocked execution, so
pass hardcoded 'true' flag parameter value.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00