Commit Graph

985278 Commits

Author SHA1 Message Date
Christina Jacob
d0cf9503e9 octeontx2-pf: ethtool fec mode support
Add ethtool support to configure fec modes baser/rs and
support to fecth FEC stats from CGX as well PHY.

Configure fec mode
	- ethtool --set-fec eth0 encoding rs/baser/off/auto
Query fec mode
	- ethtool --show-fec eth0

Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:19:53 -08:00
Felix Manlunas
bd74d4ea29 octeontx2-af: Add new CGX_CMD to get PHY FEC statistics
This patch adds support to fetch fec stats from PHY. The stats are
put in the shared data struct fwdata.  A PHY driver indicates
that it has FEC stats by setting the flag fwdata.phy.misc.has_fec_stats

Besides CGX_CMD_GET_PHY_FEC_STATS, also add CGX_CMD_PRBS and
CGX_CMD_DISPLAY_EYE to enum cgx_cmd_id so that Linux's enum list is in sync
with firmware's enum list.

Signed-off-by: Felix Manlunas <fmanlunas@marvell.com>
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:19:53 -08:00
Christina Jacob
84c4f9cab4 octeontx2-af: forward error correction configuration
CGX block supports forward error correction modes baseR
and RS. This patch adds support to set encoding mode
and to read corrected/uncorrected block counters

Adds new mailbox handlers set_fec to configure encoding modes
and fec_stats to read counters and also increase mbox timeout
to accomdate firmware command response timeout.

Along with new CGX_CMD_SET_FEC command add other commands to
sync with kernel enum list with firmware.

Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:19:53 -08:00
Kevin Hao
1fb3ca7675 net: octeontx2: Fix the confusion in buffer alloc failure path
Pavel pointed that the return of dma_addr_t in
otx2_alloc_rbuf/__otx2_alloc_rbuf() seem suspicious because a negative
error code may be returned in some cases. For a dma_addr_t, the error
code such as -ENOMEM does seem a valid value, so we can't judge if the
buffer allocation fail or not based on that value. Add a parameter for
otx2_alloc_rbuf/__otx2_alloc_rbuf() to store the dma address and make
the return value to indicate if the buffer allocation really fail or
not.

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:16:35 -08:00
David S. Miller
d816f2a9cb Merge branch 'Add-MBIM-over-MHI-support'
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:11:51 -08:00
Loic Poulain
163c5e6262 net: mhi: Add mbim proto
MBIM has initially been specified by USB-IF for transporting data (IP)
between a modem and a host over USB. However some modern modems also
support MBIM over PCIe (via MHI). In the same way as QMAP(rmnet), it
allows to aggregate IP packets and to perform context multiplexing.

This change adds minimal MBIM data transport support to MHI, allowing
to support MBIM only modems. MBIM being based on USB NCM, it reuses
and copy some helpers/functions from the USB stack (cdc-ncm, cdc-mbim).

Note that is a subset of the CDC-MBIM specification, supporting only
transport of network data (IP), there is no support for DSS. Moreover
the multi-session (for multi-pdn) is not supported in this initial
version, but will be added latter, and aligned with the cdc-mbim
solution (VLAN tags).

This code has been inspired from the mhi_mbim downstream implementation
(Carl Yin <carl.yin@quectel.com>).

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:11:51 -08:00
Loic Poulain
84c55f16dc net: mhi: Add rx_length_errors stat
This can be used by proto when packet len is incorrect.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:11:51 -08:00
Loic Poulain
77e8080e12 net: mhi: Create mhi.h
Move mhi-net shared structures to mhi header, that will be used by
upcoming proto(s).

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:11:51 -08:00
Loic Poulain
b6ec6b8942 net: mhi: Add dedicated folder
Create a dedicated mhi directory for mhi-net, mhi-net is going to
be split into differente files (for additional protocol support).

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:11:51 -08:00
Loic Poulain
ddeb9bfa59 net: mhi: Add protocol support
MHI can transport different protocols, some are handled at upper level,
like IP and QMAP(rmnet/netlink), but others will need to be inside MHI
net driver, like mbim. This change adds support for protocol rx and
tx_fixup callbacks registration, that can be used to encode/decode the
targeted protocol.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:11:50 -08:00
Rahul Lakkireddy
24a1720a08 cxgb4: collect serial config version from register
Collect serial config version information directly from an internal
register, instead of explicitly resizing VPD.

v2:
- Add comments on info stored in PCIE_STATIC_SPARE2 register.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10 15:05:40 -08:00
David S. Miller
dc9d87581d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-10 13:30:12 -08:00
Linus Torvalds
291009f656 Merge tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "Address a performance regression related to scale-invariance on x86
  that may prevent turbo CPU frequencies from being used in certain
  workloads on systems using acpi-cpufreq as the CPU performance scaling
  driver and schedutil as the scaling governor"

* tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
  cpufreq: ACPI: Extend frequency tables to cover boost frequencies
2021-02-10 12:03:35 -08:00
Linus Torvalds
a3961497bd Merge tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
 "Revert a problematic ACPICA commit that changed the code to attempt to
  update memory regions which may be read-only on some systems (Ard
  Biesheuvel)"

* tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"
2021-02-10 11:58:21 -08:00
Linus Torvalds
708c2e4181 Merge tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
 "Some late fixes for dmaengine:

  Core:
   - fix channel device_node deletion

  Driver fixes:
   - dw: revert of runtime pm enabling
   - idxd: device state fix, interrupt completion and list corruption
   - ti: resource leak

* tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine dw: Revert "dmaengine: dw: Enable runtime PM"
  dmaengine: idxd: check device state before issue command
  dmaengine: ti: k3-udma: Fix a resource leak in an error handling path
  dmaengine: move channel device_node deletion to driver
  dmaengine: idxd: fix misc interrupt completion
  dmaengine: idxd: Fix list corruption in description completion
2021-02-10 11:51:25 -08:00
Linus Torvalds
6016bf19b3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Another pile of networing fixes:

   1) ath9k build error fix from Arnd Bergmann

   2) dma memory leak fix in mediatec driver from Lorenzo Bianconi.

   3) bpf int3 kprobe fix from Alexei Starovoitov.

   4) bpf stackmap integer overflow fix from Bui Quang Minh.

   5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from
      Christoph Schemmel.

   6) Don't update deleted entry in xt_recent netfilter module, from
      Jazsef Kadlecsik.

   7) Use after free in nftables, fix from Pablo Neira Ayuso.

   8) Header checksum fix in flowtable from Sven Auhagen.

   9) Validate user controlled length in qrtr code, from Sabyrzhan
      Tasbolatov.

  10) Fix race in xen/netback, from Juergen Gross,

  11) New device ID in cxgb4, from Raju Rangoju.

  12) Fix ring locking in rxrpc release call, from David Howells.

  13) Don't return LAPB error codes from x25_open(), from Xie He.

  14) Missing error returns in gsi_channel_setup() from Alex Elder.

  15) Get skb_copy_and_csum_datagram working properly with odd segment
      sizes, from Willem de Bruijn.

  16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean.

  17) Do teardown on probe failure in DSA, from Vladimir Oltean.

  18) Fix compilation failures of txtimestamp selftest, from Vadim
      Fedorenko.

  19) Limit rx per-napi gro queue size to fix latency regression, from
      Eric Dumazet.

  20) dpaa_eth xdp fixes from Camelia Groza.

  21) Missing txq mode update when switching CBS off, in stmmac driver,
      from Mohammad Athari Bin Ismail.

  22) Failover pending logic fix in ibmvnic driver, from Sukadev
      Bhattiprolu.

  23) Null deref fix in vmw_vsock, from Norbert Slusarek.

  24) Missing verdict update in xdp paths of ena driver, from Shay
      Agroskin.

  25) seq_file iteration fix in sctp from Neil Brown.

  26) bpf 32-bit src register truncation fix on div/mod, from Daniel
      Borkmann.

  27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann.

  28) Fix locking in vsock_shutdown(), from Stefano Garzarella.

  29) Various missing index bound checks in hns3 driver, from Yufeng Mo.

  30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from
      Vladimir Oltean.

  31) Don't mix up stp and mrp port states in bridge layer, from Horatiu
      Vultur.

  32) Fix locking during netif_tx_disable(), from Edwin Peer"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  bpf: Fix 32 bit src register truncation on div/mod
  bpf: Fix verifier jmp32 pruning decision logic
  bpf: Fix verifier jsgt branch analysis on max bound
  vsock: fix locking in vsock_shutdown()
  net: hns3: add a check for index in hclge_get_rss_key()
  net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
  net: hns3: add a check for queue_id in hclge_reset_vf_queue()
  net: dsa: felix: implement port flushing on .phylink_mac_link_down
  switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
  bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
  net: watchdog: hold device global xmit lock during tx disable
  netfilter: nftables: relax check for stateful expressions in set definition
  netfilter: conntrack: skip identical origin tuple in same zone only
  vsock/virtio: update credit only if socket is not closed
  net: fix iteration for sctp transport seq_files
  net: ena: Update XDP verdict upon failure
  net/vmw_vsock: improve locking in vsock_connect_timeout()
  net/vmw_vsock: fix NULL pointer dereference
  ibmvnic: Clear failover_pending if unable to schedule
  net: stmmac: set TxQ mode back to DCB after disabling CBS
  ...
2021-02-10 11:33:39 -08:00
Linus Torvalds
4b16b656b1 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: mm (kasan, mremap, tmpfs,
  selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and
  firmware"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  nilfs2: make splice write available again
  mm, slub: better heuristic for number of cpus when calculating slab order
  Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
  MAINTAINERS: update Andrey Ryabinin's email address
  selftests/vm: rename file run_vmtests to run_vmtests.sh
  tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
  mm/mremap: fix BUILD_BUG_ON() error in get_extent
  firmware_loader: align .builtin_fw to 8
  kasan: fix stack traces dependency for HW_TAGS
  squashfs: add more sanity checks in xattr id lookup
  squashfs: add more sanity checks in inode lookup
  squashfs: add more sanity checks in id lookup
  squashfs: avoid out of bounds writes in decompressors
2021-02-10 11:22:41 -08:00
Joachim Henke
a35d8f016e nilfs2: make splice write available again
Since 5.10, splice() or sendfile() to NILFS2 return EINVAL.  This was
caused by commit 36e2c7421f ("fs: don't allow splice read/write
without explicit ops").

This patch initializes the splice_write field in file_operations, like
most file systems do, to restore the functionality.

Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Joachim Henke <joachim.henke@t-systems.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>	[5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-10 11:19:58 -08:00
Vlastimil Babka
3286222fc6 mm, slub: better heuristic for number of cpus when calculating slab order
When creating a new kmem cache, SLUB determines how large the slab pages
will based on number of inputs, including the number of CPUs in the
system.  Larger slab pages mean that more objects can be allocated/free
from per-cpu slabs before accessing shared structures, but also
potentially more memory can be wasted due to low slab usage and
fragmentation.  The rough idea of using number of CPUs is that larger
systems will be more likely to benefit from reduced contention, and also
should have enough memory to spare.

Number of CPUs used to be determined as nr_cpu_ids, which is number of
possible cpus, but on some systems many will never be onlined, thus
commit 045ab8c948 ("mm/slub: let number of online CPUs determine the
slub page order") changed it to nr_online_cpus().  However, for kmem
caches created early before CPUs are onlined, this may lead to
permamently low slab page sizes.

Vincent reports a regression [1] of hackbench on arm64 systems:

  "I'm facing significant performances regression on a large arm64
   server system (224 CPUs). Regressions is also present on small arm64
   system (8 CPUs) but in a far smaller order of magnitude

   On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
   v5.11-rc4 : 9.135sec (+/- 0.45%)
   v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
   v5.10: 3.136sec (+/- 0.40%)"

Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:

  "i.e. the patch incurs a 7% to 32% performance penalty. This bisected
   cleanly yesterday when I was looking for the regression and then
   found the thread.

   Numerous caches change size. For example, kmalloc-512 goes from
   order-0 (vanilla) to order-2 with the revert.

   So mostly this is down to the number of times SLUB calls into the
   page allocator which only caches order-0 pages on a per-cpu basis"

Clearly num_online_cpus() doesn't work too early in bootup.  We could
change the order dynamically in a memory hotplug callback, but runtime
order changing for existing kmem caches has been already shown as
dangerous, and removed in 32a6f409b6 ("mm, slub: remove runtime
allocation order changes").

It could be resurrected in a safe manner with some effort, but to fix
the regression we need something simpler.

We could use num_present_cpus() that should be the number of physically
present CPUs even before they are onlined.  That would work for PowerPC
[3], which triggered the original commit, but that still doesn't work on
arm64 [4] as explained in [5].

So this patch tries to determine the best available value without
specific arch knowledge.

 - num_present_cpus() if the number is larger than 1, as that means the
   arch is likely setting it properly

 - nr_cpu_ids otherwise

This should fix the reported regressions while also keeping the effect
of 045ab8c948 for PowerPC systems.  It's possible there are
configurations where num_present_cpus() is 1 during boot while
nr_cpu_ids is at the same time bloated, so these (if they exist) would
keep the large orders based on nr_cpu_ids as was before 045ab8c948.

[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
[3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/

Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
Fixes: 045ab8c948 ("mm/slub: let number of online CPUs determine the slub page order")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-10 11:19:27 -08:00
David S. Miller
b8776f14a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-02-10

The following pull-request contains BPF updates for your *net* tree.

We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 3 files changed, 22 insertions(+), 21 deletions(-).

The main changes are:

1) Fix missed execution of kprobes BPF progs when kprobe is firing via
   int3, from Alexei Starovoitov.

2) Fix potential integer overflow in map max_entries for stackmap on
   32 bit archs, from Bui Quang Minh.

3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops,
   from Daniel Borkmann.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
c# Please enter a commit message to explain why this merge is necessary,
2021-02-09 18:55:17 -08:00
Johannes Weiner
e82553c10b Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
This reverts commit 536d3bf261, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.

Before the patch, a write to memory.high would first put the new limit
in place for the workload, and then reclaim the requested delta.  After
the patch, the kernel tries to reclaim the delta before putting the new
limit into place, in order to not overwhelm the workload with a sudden,
large excess over the limit.  However, if reclaim is actively racing
with new allocations from the uncurbed workload, it can keep the write()
working inside the kernel indefinitely.

This is causing problems in Facebook production.  A privileged
system-level daemon that adjusts memory.high for various workloads
running on a host can get unexpectedly stuck in the kernel and
essentially turn into a sort of involuntary kswapd for one of the
workloads.  We've observed that daemon busy-spin in a write() for
minutes at a time, neglecting its other duties on the system, and
expending privileged system resources on behalf of a workload.

To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged
to the new limit or not - and bound the write() call this way.  However,
the root cause that inspired the sequence change in the first place has
been fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.

The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is
meant to throttle excessive growth from below.  Allocating threads could
end up sleeping long after the write() had already reclaimed the delta
for which they were being punished.

However, erroneous throttling also caused problems in other scenarios at
around the same time.  This resulted in commit b3ff92916a ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included
in the same release as the offending commit.  When allocating threads
now encounter large excess caused by a racing write() to memory.high,
instead of entering punitive sleeps, they will simply be tasked with
helping reclaim down the excess, and will be held no longer than it
takes to accomplish that.  This is in line with regular limit
enforcement - i.e.  if the workload allocates up against or over an
otherwise unchanged limit from below.

With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.

Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf261 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Andrey Ryabinin
a0c2eb0a43 MAINTAINERS: update Andrey Ryabinin's email address
Update my email, @virtuozzo.com will stop working shortly.

Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.com
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Rong Chen
d52db80084 selftests/vm: rename file run_vmtests to run_vmtests.sh
Commit c2aa8afc36 has renamed run_vmtests in Makefile, but the file
still uses the old name.

The kernel test robot reported the following issue:

  # selftests: vm: run_vmtests.sh
  # Warning: file run_vmtests.sh is missing!
  not ok 1 selftests: vm: run_vmtests.sh

Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com
Fixes: c2aa8afc36 (selftests/vm: rename run_vmtests --> run_vmtests.sh)
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Seth Forshee
ad69c389ec tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t.  With
CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, whereas passing "inode64" in the
mount options will fail.  This leads to erroneous behaviours such as
this:

  # mkdir mnt
  # mount -t tmpfs nodev mnt
  # mount -o remount,rw mnt
  mount: /home/ubuntu/mnt: mount point not mounted or bad option.

Prevent CONFIG_TMPFS_INODE64 from being selected on alpha.

Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com
Fixes: ea3271f719 ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Seth Forshee
b85a7a8bb5 tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
Currently there is an assumption in tmpfs that 64-bit architectures also
have a 64-bit ino_t.  This is not true on s390 which has a 32-bit ino_t.
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers
and display "inode64" in the mount options, but passing the "inode64"
mount option will fail.  This leads to the following behavior:

  # mkdir mnt
  # mount -t tmpfs nodev mnt
  # mount -o remount,rw mnt
  mount: /home/ubuntu/mnt: mount point not mounted or bad option.

As mount sees "inode64" in the mount options and thus passes it in the
options for the remount.

So prevent CONFIG_TMPFS_INODE64 from being selected on s390.

Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com
Fixes: ea3271f719 ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Arnd Bergmann
a30a29091b mm/mremap: fix BUILD_BUG_ON() error in get_extent
clang can't evaluate this function argument at compile time when the
function is not inlined, which leads to a link time failure:

  ld.lld: error: undefined symbol: __compiletime_assert_414
  >>> referenced by mremap.c
  >>>               mremap.o:(get_extent) in archive mm/built-in.a

Mark the function as __always_inline to avoid it.

Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org
Fixes: 9ad9718bfa ("mm/mremap: calculate extent in one place")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Fangrui Song
793f49a87a firmware_loader: align .builtin_fw to 8
arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations.  The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.

The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error.  32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).

Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Signed-off-by: Fangrui Song <maskray@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Andrey Konovalov
1cc4cdb521 kasan: fix stack traces dependency for HW_TAGS
Currently, whether the alloc/free stack traces collection is enabled by
default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL.
The intention for this dependency was to only enable collection on slow
debug kernels due to a significant perf and memory impact.

As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option
and is enabled on many productions kernels including Android and Ubuntu.
As the result, this dependency is pointless and only complicates the
code and documentation.

Having stack traces collection disabled by default would make the
hardware mode work differently to to the software ones, which is
confusing.

This change removes the dependency and enables stack traces collection
by default.

Looking into the future, this default might makes sense for production
kernels, assuming we implement a fast stack trace collection approach.

Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Phillip Lougher
506220d2ba squashfs: add more sanity checks in xattr id lookup
Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit.  This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.

This patch adds a number of additional sanity checks to detect this
corruption and others.

1. It checks for a corrupted xattr index read from the inode.  This could
   be because the metadata block is uncompressed, or because the
   "compression" bit has been corrupted (turning a compressed block
   into an uncompressed block).  This would cause an out of bounds read.

2. It checks against corruption of the xattr_ids count.  This can either
   lead to the above kmalloc failure, or a smaller than expected
   table to be read.

3. It checks the contents of the index table for corruption.

[phillip@squashfs.org.uk: fix checkpatch issue]
  Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk

Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Phillip Lougher
eabac19e40 squashfs: add more sanity checks in inode lookup
Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode.  This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).

This patch adds additional sanity checks to detect this, and the
following corruption.

1. It checks against corruption of the inodes count.  This can either
   lead to a larger table to be read, or a smaller than expected
   table to be read.

   In the case of a too large inodes count, this would often have been
   trapped by the existing sanity checks, but this patch introduces
   a more exact check, which can identify too small values.

2. It checks the contents of the index table for corruption.

[phillip@squashfs.org.uk: fix checkpatch issue]
  Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk

Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Phillip Lougher
f37aa4c736 squashfs: add more sanity checks in id lookup
Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused
by a corrupted index value read from the inode.  This could be because
the metadata block is uncompressed, or because the "compression" bit has
been corrupted (turning a compressed block into an uncompressed block).

This patch adds additional sanity checks to detect this, and the
following corruption.

1. It checks against corruption of the ids count.  This can either
   lead to a larger table to be read, or a smaller than expected
   table to be read.

   In the case of a too large ids count, this would often have been
   trapped by the existing sanity checks, but this patch introduces
   a more exact check, which can identify too small values.

2. It checks the contents of the index table for corruption.

Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com
Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com
Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com
Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Phillip Lougher
e812cbbbbb squashfs: avoid out of bounds writes in decompressors
Patch series "Squashfs: fix BIO migration regression and add sanity checks".

Patch [1/4] fixes a regression introduced by the "migrate from
ll_rw_block usage to BIO" patch, which has produced a number of
Sysbot/Syzkaller reports.

Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.

Each patch has been tested against the Sysbot reproducers using the
given kernel configuration.  They have the appropriate "Reported-by:"
lines added.

Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.

Additional testing with other configurations and architectures (32bit,
big endian), and normal filesystems has also been done to trap any
inadvertent regressions caused by the additional sanity checks.

This patch (of 4):

This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".

Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above
patch.

Specifically, the patch removed the following sanity check

        if (length < 0 || length > output->length ||
		(index + length) > msblk->bytes_used)

This check did two things:

1. It ensured any reads were not beyond the end of the filesystem

2. It ensured that the "length" field read from the filesystem
   was within the expected maximum length.  Without this any
   corrupted values can over-run allocated buffers.

Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk
Fixes: 93e72b3c61 ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Philippe Liard <pliard@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
Linus Torvalds
ef7d0b5999 Merge tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c fix from Alexandre Belloni:
 "A single build warning fix"

* tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match
2021-02-09 17:19:56 -08:00
Daniel Borkmann
e88b2c6e5a bpf: Fix 32 bit src register truncation on div/mod
While reviewing a different fix, John and I noticed an oddity in one of the
BPF program dumps that stood out, for example:

  # bpftool p d x i 13
   0: (b7) r0 = 808464450
   1: (b4) w4 = 808464432
   2: (bc) w0 = w0
   3: (15) if r0 == 0x0 goto pc+1
   4: (9c) w4 %= w0
  [...]

In line 2 we noticed that the mov32 would 32 bit truncate the original src
register for the div/mod operation. While for the two operations the dst
register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
the src register is not, and thus verifier keeps tracking original bounds,
simplified:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r0 = -1
  1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (b7) r1 = -1
  2: R0_w=invP-1 R1_w=invP-1 R10=fp0
  2: (3c) w0 /= w1
  3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
  3: (77) r1 >>= 32
  4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
  4: (bf) r0 = r1
  5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
  5: (95) exit
  processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Runtime result of r0 at exit is 0 instead of expected -1. Remove the
verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
instead. After the fix, we result in the following code generation when
having dividend r1 and divisor r6:

  div, 64 bit:                             div, 32 bit:

   0: (b7) r6 = 8                           0: (b7) r6 = 8
   1: (b7) r1 = 8                           1: (b7) r1 = 8
   2: (55) if r6 != 0x0 goto pc+2           2: (56) if w6 != 0x0 goto pc+2
   3: (ac) w1 ^= w1                         3: (ac) w1 ^= w1
   4: (05) goto pc+1                        4: (05) goto pc+1
   5: (3f) r1 /= r6                         5: (3c) w1 /= w6
   6: (b7) r0 = 0                           6: (b7) r0 = 0
   7: (95) exit                             7: (95) exit

  mod, 64 bit:                             mod, 32 bit:

   0: (b7) r6 = 8                           0: (b7) r6 = 8
   1: (b7) r1 = 8                           1: (b7) r1 = 8
   2: (15) if r6 == 0x0 goto pc+1           2: (16) if w6 == 0x0 goto pc+1
   3: (9f) r1 %= r6                         3: (9c) w1 %= w6
   4: (b7) r0 = 0                           4: (b7) r0 = 0
   5: (95) exit                             5: (95) exit

x86 in particular can throw a 'divide error' exception for div
instruction not only for divisor being zero, but also for the case
when the quotient is too large for the designated register. For the
edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
since we always zero edx (rdx). Hence really the only protection
needed is against divisor being zero.

Fixes: 68fda450a7 ("bpf: fix 32-bit divide by zero")
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-10 01:32:40 +01:00
Daniel Borkmann
fd675184fc bpf: Fix verifier jmp32 pruning decision logic
Anatoly has been fuzzing with kBdysch harness and reported a hang in
one of the outcomes:

  func#0 @0
  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r0 = 808464450
  1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (b4) w4 = 808464432
  2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0
  2: (9c) w4 %= w0
  3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
  3: (66) if w4 s> 0x30303030 goto pc+0
   R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
  4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
  4: (7f) r0 >>= r0
  5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
  5: (9c) w4 %= w0
  6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  6: (66) if w0 s> 0x3030 goto pc+0
   R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
  7: (d6) if w0 s<= 0x303030 goto pc+1
  9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
  9: (95) exit
  propagating r0

  from 6 to 7: safe
  4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
  4: (7f) r0 >>= r0
  5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
  5: (9c) w4 %= w0
  6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  6: (66) if w0 s> 0x3030 goto pc+0
   R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  propagating r0
  7: safe
  propagating r0

  from 6 to 7: safe
  processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

The underlying program was xlated as follows:

  # bpftool p d x i 10
   0: (b7) r0 = 808464450
   1: (b4) w4 = 808464432
   2: (bc) w0 = w0
   3: (15) if r0 == 0x0 goto pc+1
   4: (9c) w4 %= w0
   5: (66) if w4 s> 0x30303030 goto pc+0
   6: (7f) r0 >>= r0
   7: (bc) w0 = w0
   8: (15) if r0 == 0x0 goto pc+1
   9: (9c) w4 %= w0
  10: (66) if w0 s> 0x3030 goto pc+0
  11: (d6) if w0 s<= 0x303030 goto pc+1
  12: (05) goto pc-1
  13: (95) exit

The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we are
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.

Taking a closer look at the verifier analysis, the reason is that it misjudges
its pruning decision at the first 'from 6 to 7: safe' occasion. What happens
is that while both old/cur registers are marked as precise, they get misjudged
for the jmp32 case as range_within() yields true, meaning that the prior
verification path with a wider register bound could be verified successfully
and therefore the current path with a narrower register bound is deemed safe
as well whereas in reality it's not. R0 old/cur path's bounds compare as
follows:

  old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff)
  cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff)

  old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff
  cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff

The 64 bit bounds generally look okay and while the information that got
propagated from 32 to 64 bit looks correct as well, it's not precise enough
for judging a conditional jmp32. Given the latter only operates on subregisters
we also need to take these into account as well for a range_within() probe
in order to be able to prune paths. Extending the range_within() constraint
to both bounds will be able to tell us that the old signed 32 bit bounds are
not wider than the cur signed 32 bit bounds.

With the fix in place, the program will now verify the 'goto' branch case as
it should have been:

  [...]
  6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  6: (66) if w0 s> 0x3030 goto pc+0
   R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
  7: (d6) if w0 s<= 0x303030 goto pc+1
  9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
  9: (95) exit

  7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  7: (d6) if w0 s<= 0x303030 goto pc+1
   R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
  8: (30) r0 = *(u8 *)skb[808464432]
  BPF_LD_[ABS|IND] uses reserved fields
  processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1

The bug is quite subtle in the sense that when verifier would determine that
a given branch is dead code, it would (here: wrongly) remove these instructions
from the program and hard-wire the taken branch for privileged programs instead
of the 'goto pc-1' rewrites which will cause hard to debug problems.

Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-10 01:31:46 +01:00
Daniel Borkmann
ee114dd64c bpf: Fix verifier jsgt branch analysis on max bound
Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return
code for both will tell the caller whether a given conditional jump is taken
or not, e.g. 1 means branch will be taken [for the involved registers] and the
goto target will be executed, 0 means branch will not be taken and instead we
fall-through to the next insn, and last but not least a -1 denotes that it is
not known at verification time whether a branch will be taken or not. Now while
the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the
branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval
since the branch will also be taken for reg->s32_max_value == sval. The jgt
branch analysis, for example, gets this right.

Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Fixes: 4f7b3e8258 ("bpf: improve verifier branch analysis")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-10 01:31:45 +01:00
David S. Miller
de1db4a6ed Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2021-02-08

This series contains updates to i40e driver only.

Cristian makes improvements to driver XDP path. Avoids writing
next-to-clean pointer on every update, removes redundant updates of
cleaned_count and buffer info, creates a helper function to consolidate
XDP actions and simplifies some of the behavior.

Eryk adds messages to inform the user when MTU is larger than supported
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 16:05:32 -08:00
David S. Miller
450bbc3395 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) nf_conntrack_tuple_taken() needs to recheck zone for
   NAT clash resolution, from Florian Westphal.

2) Restore support for stateful expressions when set definition
   specifies no stateful expressions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:55:59 -08:00
David S. Miller
74784ee0b9 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-02-08

This series contains updates to the ice driver and documentation.

Brett adds a log message when a trusted VF goes in and out of promiscuous
for consistency with i40e driver.

Dave implements a new LLDP command that allows adding VSI destinations to
existing filters and adds support for netdev bonding events, current
support is software based.

Michal refactors code to move from VSI stored xsk_buff_pools to
netdev-provided ones.

Kiran implements the creation scheduler aggregator nodes and distributing
VSIs within the nodes.

Ben modifies rate limit calculations to use clock frequency from the
hardware instead of using a hardcoded one.

Jesse adds support for user to control writeback frequency.

Chinh refactors DCB variables out of the ice_port_info struct.

Bruce removes some unnecessary casting.

Mitch fixes an error message that was reported as if_up instead of if_down.

Tony adjusts fallback allocation for MSI-X to use all given vectors instead
of using only the minimum configuration and updates documentation for
the ice driver.
====================

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:47:00 -08:00
David S. Miller
3e566dacc9 Merge branch 'hns3-cleanups'
Huazhong Tan says:

====================
net: hns3: some cleanups for -next

There are some cleanups for the HNS3 ethernet driver.

change log:
V2: remove previous #3 which should target net.

previous version:
V1: https://patchwork.kernel.org/project/netdevbpf/cover/1612784382-27262-1-git-send-email-tanhuazhong@huawei.com/
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:35:34 -08:00
Jian Shen
55ff3ed57b net: hns3: cleanup for endian issue for VF RSS
Currently the RSS commands of VF are using host byte order.
According to the user manual, it should use little endian in
the command to firmware. For the host and firmware are both
using little endian, so it can work well in this case.

Do cleanup to make it more explicitly.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Peng Li
7ceb40b820 net: hns3: remove unused macro definition
Some macros are defined but unused, so remove them.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Huazhong Tan
11ef971f5a net: hns3: remove an unused parameter in hclge_vf_rate_param_check()
Parameter vf in hclge_vf_rate_param_check() is unused now,
so remove it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Huazhong Tan
64749c9c38 net: hns3: remove redundant return value of hns3_uninit_all_ring()
Since hns3_uninit_all_ring() only returns 0, so remove this
redundant return value and function declaration in hns3_enet.h.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Peng Li
cad8dfe82a net: hns3: change hclge_query_bd_num() param type
The type of parameter mpf_bd_num and pf_bd_num in
hclge_query_bd_num() should be u32* instead of int*,
so change them.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Peng Li
6e7f109ee9 net: hns3: change hclge_parse_speed() param type
The type of parameters in hclge_parse_speed() should be
unsigned type, so change them.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Jiaran Zhang
c5aaf17618 net: hns3: modify some unmacthed types print parameter
Fix an issue where the formatting symbol of the formatting input and
output function does not match the actual type.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Yufeng Mo
9393eb5034 net: hns3: clean up unnecessary parentheses in macro definitions
In macro definitions, parentheses are unnecessary in some cases,
such as the calling parameter of a function, the left variable
of the equal sign, and so on. So remove these unnecessary
parentheses according to these rules.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Peng Li
9d2a1cea69 net: hns3: remove the shaper param magic number
To make the code more readable, this patch adds a definition for
the magic number 126 used for the default shaper param ir_b, and
rename macro DIVISOR_IR_B_126.

No functional change.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00
Jian Shen
ae9e492a36 net: hns3: remove redundant client_setup_tc handle
Since the real tx queue number and real rx queue number
always be updated when netdev opens, it's redundant
to call hclge_client_setup_tc to do the same thing.
So remove it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 15:34:07 -08:00