Common code in mcdi_filters.c uses these flags, so by moving them to
either struct efx_nic (in the case of must_realloc_vis) or struct
efx_mcdi_filter_table (for must_restore_rss_contexts and
must_restore_filters), decouple this code from ef10's nic_data.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various MCDI functions (especially in filter handling) need to check the
datapath caps, but those live in nic_data (since they don't exist on
Siena). Decouple from ef10-specific data structures by adding check_caps
to the nic_type, to allow using these functions from non-ef10 drivers.
Also add a convenience macro efx_has_cap() to reduce the amount of
boilerplate involved in calling it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some usage of ef10-specific nic_data structs from common MCDI
functions, in preparation for using them from a non-EF10 driver.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires
tailroom for skb_shared_info when creating an SKB based on the
redirected xdp_frame (both in cpumap and veth).
The fix requires some initial explaining. The driver uses RX page-split
when possible. It reserves the top 64 bytes in the RX-page for storing
dma_addr (struct efx_rx_page_state). It also have the XDP recommended
headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any
tailroom, it can still fit two standard MTU (1500) frames into one page.
The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe
and i40e, reduce their XDP headroom to 192 bytes, which allows them to
fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048).
The fix is to reduce this drivers headroom to 128 bytes and add the 320
bytes tailroom. This account for reserved top 64 bytes in the page, and
still fit two frame in a page for normal MTUs.
We must never go below 128 bytes of headroom for XDP, as one cacheline
is for xdp_frame area and next cacheline is reserved for metadata area.
Fixes: eb9a36be7f ("sfc: perform XDP processing on received packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now ignore the "completion" event when using tx queue timestamping,
and only pay attention to the two (high and low) timestamp events. The
NIC will send a pair of timestamp events for every packet transmitted.
The current firmware may merge the completion events, and it is possible
that future versions may reorder the completion and timestamp events.
As such the completion event is not useful.
Without this patch in place a merged completion event on a queue with
timestamping will cause a "spurious TX completion" error. This affects
SFN8000-series adapters.
Signed-off-by: Tom Zhao <tzhao@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes a couple of filtering functions and also renames a constant.
Style fixes included.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New headers contain prototypes of functions that will be common between
ef10 and upcoming driver.
Removed static modifier from the affected functions.
Some function prototypes were removed from existing headers.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the SFC4000 code, now moved to sfc-falcon, needed I2C.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was possible for channel allocation logic to get confused between what
it had and what it wanted, and end up trying to use the same channel for
both PTP and regular TX. This led to a kernel panic:
BUG: unable to handle page fault for address: 0000000000047635
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0-rc3-ehc14+ #900
Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
RIP: 0010:native_queued_spin_lock_slowpath+0x188/0x1e0
Code: f3 90 48 8b 32 48 85 f6 74 f6 eb e8 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 c0 98 02 00 48 03 04 f5 a0 c6 ed 81 <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
RSP: 0018:ffffc90000003d28 EFLAGS: 00010006
RAX: 0000000000047635 RBX: 0000000000000246 RCX: 0000000000040000
RDX: ffff888627a298c0 RSI: 0000000000003ffe RDI: ffff88861f6b8dd4
RBP: ffff8886225c6e00 R08: 0000000000040000 R09: 0000000000000000
R10: 0000000616f080c6 R11: 00000000000000c0 R12: ffff88861f6b8dd4
R13: ffffc90000003dc8 R14: ffff88861942bf00 R15: ffff8886150f2000
FS: 0000000000000000(0000) GS:ffff888627a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000047635 CR3: 000000000200a000 CR4: 00000000000006f0
Call Trace:
<IRQ>
_raw_spin_lock_irqsave+0x22/0x30
skb_queue_tail+0x1b/0x50
sock_queue_err_skb+0x9d/0xf0
__skb_complete_tx_timestamp+0x9d/0xc0
efx_dequeue_buffer+0x126/0x180 [sfc]
efx_xmit_done+0x73/0x1c0 [sfc]
efx_ef10_ev_process+0x56a/0xfe0 [sfc]
? tick_sched_do_timer+0x60/0x60
? timerqueue_add+0x5d/0x70
? enqueue_hrtimer+0x39/0x90
efx_poll+0x111/0x380 [sfc]
? rcu_accelerate_cbs+0x50/0x160
net_rx_action+0x14a/0x400
__do_softirq+0xdd/0x2d0
irq_exit+0xa0/0xb0
do_IRQ+0x53/0xe0
common_interrupt+0xf/0xf
</IRQ>
In the long run we intend to rewrite the channel allocation code, but for
'net' fix this by allocating extra_channels, and giving them TX queues,
even if we do not in fact need them (e.g. on NICs without MAC TX
timestamping), and thereby using simpler logic to assign the channels
once they're allocated.
Fixes: 3990a8fffb ("sfc: allocate channels for XDP tx queues")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there's no traffic on a channel, its ARFS expiry work will never get
scheduled by efx_poll() as that isn't being run.
So make efx_filter_rfs_expire() reschedule itself to run after 30 seconds.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Report the number of successful and failed insertions, and also the
current count of filters, to aid in tuning e.g. rps_flow_cnt.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The old rfs_filters_added method for determining the quota could potentially
allow the NIC to become filled with old filters, which never get tested for
expiry. Instead, explicitly make expiry check work depend on the number of
filters installed, and don't count checking slots without filters in as
doing work. This guarantees that each filter will be checked for expiry at
least once every thirty seconds (assuming the channel to which it belongs is
NAPI polling actively) regardless of fill level.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Tested-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Count XDP packet drops, error drops, transmissions and redirects and
expose these counters via the ethtool stats command.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each CPU needs access to its own queue to allow uncontested
transmission of XDP_TX packets. This means we need to allocate (up
front) enough channels ("xdp transmit channels") to provide at least
one extra tx queue per CPU. These tx queues should not do TSO.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a field to hold an attached xdp_prog, but never populates it (see
following patch). Also, XDP_TX support is deferred to a later patch
in the series.
Track failures of xdp_rxq_info_reg() via per-queue xdp_rxq_info_valid
flags and a per-nic xdp_rxq_info_failed flag. The per-queue flags are
needed to prevent attempts to xdp_rxq_info_unreg() structs that failed
to register. Possibly the API could be changed in the future to avoid
the need for these flags.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a field to efx_tx_buffer so that we can track xdp_frames. Add a
flag so that buffers that contain xdp_frames can be identified and
passed to xdp_return_frame.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Improves packet rate of 1-byte UDP receives by up to 10%.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Associate an arbitrary ID with each ARFS filter, allowing to properly query
for expiry. The association is maintained in a hash table, which is
protected by a spinlock.
v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
v2: fixed uninitialised variable (thanks davem and lkp-robot).
Fixes: 3af0f34290 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A misconfigured system (e.g. with all interrupts affinitised to all CPUs)
may produce a storm of ARFS steering events. With the existing sfc ARFS
implementation, that could create a backlog of workitems that grinds the
system to a halt. To prevent this, limit the number of workitems that
may be in flight for a given SFC device to 8 (EFX_RPS_MAX_IN_FLIGHT), and
return EBUSY from our ndo_rx_flow_steer method if the limit is reached.
Given this limit, also store the workitems in an array of slots within the
struct efx_nic, rather than dynamically allocating for each request.
The limit should not negatively impact performance, because it is only
likely to be hit in cases where ARFS will be ineffective anyway.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise races are possible between ethtool ops and
efx_ef10_rx_restore_rss_contexts().
Also, don't try to perform the restore on every reset, only after an MC
reboot, otherwise we'll leak RSS contexts on the NIC.
Fixes: 42356d9a13 ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this change, the spinlock efx->filter_lock is no longer used and is
thus removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx->filter_lock remains in place for use on farch, but EF10 now ignores it.
EFX_EF10_FILTER_FLAG_BUSY is no longer needed, hence it is removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having an efx->type->filter_rfs_insert() method, just use
workitems with a worker function that calls efx->type->filter_insert().
The only user of this is efx_filter_rfs(), which now queues a call to
efx_filter_rfs_work().
Similarly, efx_filter_rfs_expire() is now a worker function called on a
new channel->filter_work work_struct, so the method
efx->type->filter_rfs_expire_one() is no longer called in atomic context.
We also add a new mutex efx->rps_mutex to protect the RPS state (efx->
rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so
that the taking of efx->filter_lock can be moved to
efx->type->filter_rfs_expire_one().
Thus, all filter table functions are now called in a sleepable context,
allowing them to use sleeping locks in a future patch.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As well as 'auto' and the forced 'off', 'rs' and 'baser' states, we also
handle combinations of settings (since the fecparam->fec field is a
bitmask), where auto|rs and auto|baser specify a preferred FEC mode but
will fall back to the other if the cable or link partner doesn't support
it. rs|baser (with or without auto bit) means prefer FEC even where
auto wouldn't use it, but let FW choose which encoding to use.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a linked list to associate user-facing context IDs with FW-facing
context IDs (since the latter can change after an MC reset).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For this we create and use one or more new TX queues on the PTP channel,
and enable sync events for it.
Based on a patch by Martin Habets <mhabets@solarflare.com>.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this work, TX timestamping is done by sending each SKB to the MC.
On the 8000 series (Medford1) we have high speed timestamping via the
MAC, which means we can use normal TX queues for this without a
significant drop in bandwidth. On the X2000 series (Medford2) support
for transmitting via the MC is removed, so the new way must be used.
This patch enables timestamping on a TX queue, if requested.
It also enhances TX event handling to process the extra completion events,
and puts the time in the SKB.
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store and handle ethtool link mode masks within the driver instead of
just a single u32. However, quite a significant amount of existing code
wants to manipulate the masks directly, and thus now uses the first
unsigned long (i.e. mask[0]) as though it were a legacy u32 mask. This
is ok because all the bits that code is interested in are in the first
32 bits of the mask; but it might be a good idea to change them in
future to use the proper bitmap API.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Medford2 NICs support more than MC_CMD_MAC_NSTATS stats, and report the new
count in a field of MC_CMD_GET_CAPABILITIES_V4. This also means that the
end generation count moves (it is, as before, the last 64 bits of the DMA
buffer, but that is no longer MC_CMD_MAC_GENERATION_END).
So read num_mac_stats from the GET_CAPABILITIES response, if present;
otherwise assume MC_CMD_MAC_NSTATS; and always use num_mac_stats - 1 rather
than MC_CMD_MAC_GENERATION_END.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Log a message if PTP probing fails; if we then, unexpectedly, get PTP
events, only log a message for the first one on each device.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Medford2 can also have 16k or 64k VI stride. This is reported by MCDI in
GET_CAPABILITIES, which fortunately is called before the driver does
anything sensitive to the VI stride (such as accessing or even allocating
VIs past the zeroth).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support using BAR 0 on SFC9250, even though the driver doesn't bind to such
devices yet.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add min_interrupt_mode specification per NIC type.
It is a bit confusing because of "highest interrupt mode is less capable".
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ndo_udp_tunnel_{add,del} to update the NIC's list of VXLAN and
GENEVE UDP ports. Also reset the port list to empty on driver load and
on driver unload, with appropriate flag set on the unload case.
These port numbers are used for RX inner checksum offload, and in future
will also be used for TX inner checksum offload and encapsulated TSO.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the csum_level for encapsulated packets where the encapsulation
type, l3 class and l4 class are sets that need it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RX checksum offload of encapsulated packets. This
essentially just means paying attention to the inner checksum flags
in the RX event, and if *either* checksum flag indicates a fail then
don't tell the kernel that checksum offload was successful.
Also, count these checksum errors and export the counts to ethtool -S.
Test the most common "good" case of RX events with a single bitmask
instead of a series of ifs. Move the more specific error checking
in to a separate function for clarity, and don't use unlikely() there
since we know at least one of the bits is bad.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensures that we report the key and indirection table the NIC is using,
rather than (if setting them failed earlier) what we wanted it to use.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an option descriptor has been sent on a queue but not followed by a
packet, there will have been no completion event, so the read and write
counts won't match and we'll think we can't do PIO. This combines with
the fact that we have two TX queues (for en/disable checksum offload),
and that both must be empty for PIO to happen.
This patch adds a separate "packet_write_count" that tracks the most
recent write_count we expect to see a completion event for; this excludes
option descriptors but _includes_ PIO descriptors (even though they look
like option descriptors). This is then used, rather than write_count,
in efx_nic_tx_is_empty().
We only bother to maintain packet_write_count on EF10, since on Siena
(a) there are no option descriptors and it always equals write_count, and
(b) there's no PIO, so we don't need it anyway.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no good reason why this should be an SRIOV-only thing.
Thus, also move it out of SRIOV-specific files.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we failed to set up RSS on EF10 (e.g. because firmware declared
RX_RSS_LIMITED), ethtool --show-nfc $dev rx-flow-hash ... should report
no fields, rather than confusingly reporting what fields we _would_ be
hashing on if RSS was working.
Fixes: dcb4123cbe ("sfc: disable RSS when unsupported")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logically, EFX_BUG_ON_PARANOID can never be correct. For, BUG_ON should
only be used if it is not possible to continue without potential harm;
and since the non-DEBUG driver will continue regardless (as the BUG_ON is
compiled out), clearly the BUG_ON cannot be needed in the DEBUG driver.
So, replace every EFX_BUG_ON_PARANOID with either an EFX_WARN_ON_PARANOID
or the newly defined EFX_WARN_ON_ONCE_PARANOID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rationale: The differences between Falcon and Siena are in many ways larger
than those between Siena and EF10 (despite Siena being nominally "Falcon-
architecture"); for instance, Falcon has no MCPU, so there is no MCDI.
Removing Falcon support from the sfc driver should simplify the latter,
and avoid the possibility of Falcon support being broken by changes to sfc
(which are rarely if ever tested on Falcon, it being end-of-lifed hardware).
The sfc-falcon driver created in this changeset is essentially a copy of the
sfc driver, but with Siena- and EF10-specific code, including MCDI, removed
and with the "efx_" identifier prefix changed to "ef4_" (for "EFX 4000-
series") to avoid collisions when both drivers are built-in.
This changeset removes Falcon from the sfc driver's PCI ID table; then in
sfc I've removed obvious Falcon-related code: I removed the Falcon NIC
functions, Falcon PHY code, and EFX_REV_FALCON_*, then fixed up everything
that referenced them.
Also, increment minor version of both drivers (to 4.1).
For now, CONFIG_SFC selects CONFIG_SFC_FALCON, so that updating old configs
doesn't cause Falcon support to disappear; but that should be undone at
some point in the future.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't use ->heap_buf after commit 46d1efd852 ("sfc: remove Software
TSO") so let's remove the last traces.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for FATSOv2 to the driver. FATSOv2 offloads far more of the task
of TCP segmentation to the firmware, such that we now just pass a single
super-packet to the NIC. This means TSO has a great deal in common with a
normal DMA transmit, apart from adding a couple of option descriptors.
NIC-specific checks have been moved off the fast path and in to
initialisation where possible.
This also moves FATSOv1/SWTSO to a new file (tx_tso.c). The end of transmit
and some error handling is now outside TSO, since it is common with other
code.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Ma Yuying <yuma@redhat.com>
Suggested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MC_CMD_TRIGGER_INTERRUPT does not work on the SFC9140, as used in the
sfn7x42q and sfn7x24f.
Check for this using the MCDI workaround mechanism.
The command is only used during self test. If it's not supported, skip
the interrupt test.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On SFN8000 series adapters the MC provides a method to get the timer
quantum and the maximum timer setting. We revert to the old values if the
new call is unavailable.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SFN8000-series NICs require a new method of setting interrupt moderation,
via MCDI. This is indicated by a workaround flag. This new MCDI command
takes an explicit time value rather than a number of ticks. It therefore
makes sense to also store the moderation values in terms of time, since
that is what the ethtool interface is interested in.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If it is not supported we simply disable the feature.
For the feature to work we need firmware filter support for
OUTER_VID + LOC_MAC and for OUTER_VID + LOC_MAC_IG.
The low-latency firmware can match on OUTER_VID + LOC_MAC but not on
OUTER_VID + LOC_MAC_IG.
For the capture packet firmware it is the other way around.
Only the full-feature variant can match on both combinations.
Incorporates a fix by Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
in the net_dev->[hw_]features handling.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Supports HW VLAN filtering, en/disabled using ethtool.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It allows to change set of fixed features on datapath reset.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is used for EF10 only and logically belongs to EF10 filter table state.
It is OK that it is reset to false on filter table recreation since all
filters are removed on destruction.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we get confused when two flows on different channels get the
same flow ID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't open-code it.
CC: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Solarflare 8000 series NIC will use a new TSO scheme. The current
driver refuses to load if the current TSO scheme is not found. Remove
that check and instead make the TSO version a per-queue parameter.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the IP stack passes SKBs the sfc driver puts them in 2 different TX
queues (called partners), one for checksummed and one for not checksummed.
If the SKB has xmit_more set the driver will delay pushing the work to the
NIC.
When later it does decide to push the buffers this patch ensures it also
pushes the partner queue, if that also has any delayed work. Before this
fix the work in the partner queue would be left for a long time and cause
a netdev watchdog.
Fixes: 70b33fb ("sfc: add support for skb->xmit_more")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reduces the overhead of locking for busy poll.
Previously the state was protected by a lock, whereas now
it's manipulated solely with atomic operations.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On EF10, MC_CMD_VPORT_RECONFIGURE can cause a CODE_MC_REBOOT event
to be sent to a function without incrementing the (adapter-wide)
warm_boot_count. In this case, the reboot is not detected by the
loop on efx_mcdi_poll_reboot(), so prepare for recovery from an MC
reboot anyway. When this codepath is run, the MC has always just
rebooted, so this recovery is valid.
The loop on efx_mcdi_poll_reboot() is still required for other MC
reboot cases, so that actions in response to an MC reboot are
performed, such as clearing locally calculated statistics.
Siena NICs are unaffected by this change as the above scenario
does not apply.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, the driver would refuse to load if it couldn't secure
enough VIs from the MC to fulfill its RSS requirements.
This was causing probe to fail on later functions in
configurations where we'd run out of VIs, such as having many
VFs.
This change allows the driver to load with fewer VIs, down to a
minimum of 2. A warning will be printed saying that RSS
requirements were not met, possibly affecting performance.
efx->max_tx_channels needs to be set to avoid going down the
failure path in efx_probe_nic() immediately in the loop after the
probe() NIC-type function.
Also, Set rc=ENOSPC when bombing out of efx_probe_nic due to lack
of VIs.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the workaround to support cascaded multicast filters ("workaround_26807") is
enabled, the broadcast filter and individual multicast filters are not inserted
when in promiscuous or allmulti mode.
There is a race while inserting and removing filters when entering and leaving
promiscuous mode. When changing promiscuous state with cascaded multicast
filters, the old multicast filters are removed before inserting the new filters
to avoid duplicating packets; this can lead to dropped packets until all
filters have been inserted.
The efx_nic:mc_promisc flag is added to record the presence of a multicast
promiscuous filter; this gives a simple way to tell if the promiscuous state is
changing.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The limit for BQL is updated each time we call
netdev_tx_completed_queue.
Without this patch the BQL limit was updated for every TX event we
see.
The issue was that this only updated the limit to handle the data
we complete in two events as the first event wouldn't show that
enough traffic had been processed between them.
This was OK when interrupt moderation was off but not when it was
on as more data had to be completed in a single interrupt.
The patch changes this so that we do report the completion to BQL
only when all the TX events in the interrupt have been processed.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a set_mac_address() NIC-type function for EF10 only, and
use this to set the MAC address on the vadaptor. For Siena and
earlier, the MAC address continues to be set by MC_CMD_SET_MAC;
this is still called on EF10, and including a MAC address in
this command has no effect.
The sriov_mac_address_changed() NIC-type function is no longer
needed on EF10, but it is needed for Siena where it is used to
update the peer address of the PF for VFDI. Change this to use
the new set_mac_address function pointer.
efx_ef10_sriov_mac_address_changed() is no longer called, as VFs
will try to change the MAC address on their vadaptor rather than
trying to change to the context of the PF to alter the vport.
When a VF is running in direct passthrough mode with MAC spoofing
enabled, it will be able to change the MAC address on its vadaptor.
In this case, there is a link to the PF, so find the correct VF in
its ef10_vf array and update the MAC address.
ndo_set_mac_address() can be called during driver unload while
bonding, and in this case the device has already been stopped, so
don't call efx_net_open() to restart it after reconfiguration.
efx->port_enabled is set to false in efx_stop_port(), so it is
indicator of whether the device needs to be restarted.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Exercised with
"ip link set <PF intf> vf <vf_i> state {auto|enable|disable}"
Sets the reporting policy for VF link state to either
- mirror physical link state
- always up
- always down
get VF link state mode in efx_ef10_sriov_get_vf_config
Exercised by
"ip link show <PF intf>";
output will include a line like
vf 0 MAC 12:34:56:78:9a:bc, link-state auto
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A VF's MAC address is set by its parent PF and added to its vport.
To get this MAC address, the VF must use MC_CMD_ VPORT_GET_MAC_ADDRESSES.
In the current scheme, a VF's vport should only have one MAC address,
so warn if this is not the case.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If MCDI timeouts are encountered during efx_ef10_filter_table_remove(),
an FLR will be queued, but efx->filter_state will still be kfree()d.
The queued FLR will then call efx_ef10_filter_table_restore(), which
will try to use efx->filter_state. This previously caused a panic.
This patch adds an rwsem to protect the existence of efx->filter_state,
separately from the spinlock protecting its contents. Users which can
race against efx_ef10_filter_table_remove() should down_read this rwsem.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the PF and VF can disagree on the VF's MAC address and
this leads to strange behaviour, up to and including kernel panics.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the device ID of the VF to the PCI device ID table.
Added a boolean flag is_vf in efx_nic_type to differentiate
between a VF and PF at probe time. This flag is useful in later
patches while setting MAC address specially in the
PCI-passthrough case.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow PFs to allocate shared RSS contexts if we exhaust our
exclusive RSS contexts. Make VFs use shared RSS contexts in
all cases.
Spruce up error handling so that the shadow copy of the RSS
table is updated after successful update, rather than in all
cases, so that we report the actual contents of the RSS table
after a failure to set it, rather than what we'd like it to be.
Populate context_size parameter when vacuously allocating RSS
context of size 1.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added efx_nic_type structure for VF.
Mapped a different BAR for VF as it uses BAR 0 for memory.
Added functions sriov_init and sriov_fini.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds functions to allocate and free vswitches and vports; vadaptors
are automatically allocated and freed when TX/RX queues are
initialised and finalised. This vswitching structure is only created
if the firmware supports it, so a check that full-featured firmware
is running is performed first.
If the MC resets, the vswitching infrastructure will need to be
recreated, so mark the "must_probe_vswitching" flag when an MC reboot
is detected.
Don't try to create a vswitch if vf-count=0
This allocation of vswitches and vports does not currently support
configuring VLAN tags, but that can be added in a future change.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the use of sriov_configure on EF10
to enable Virtual Functions while the driver is loaded.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The efx_vf struct contains Siena-specific fields for VFs,
so rename to siena_vf.
Also move it into the siena_nic_data struct, as EF10 will
track its VFs in its own ef10_nic_data, storing much less
information about them since VFDI is no longer used.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By putting all the efx_{siena,ef10}_sriov_* declarations in
{siena,ef10}_sriov.h, ensure they cannot be called from nic-generic code.
Also fixes up an instance of this, where mcdi.c was calling
efx_siena_sriov_flr.
The single instance of netdev_ops should call general high level
functions that can then call something adapter specific in efx_nic_type.
We should only do adapter specialisation via efx_nic_type.
Removal of sriov functionality from the Falcon code means that tests
are needed for the presence of some callbacks.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also add dummy functions where required to avoid NULL pointer dereference.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series provides a base and cleanup for the
upcoming EF10 SRIOV support.
This patch moves the VF state into siena_nic_data as a basis to
save the VF state based on nic type.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the sfc driver code for implementing busy polling.
It adds ndo_busy_poll method and locking between it and napi poll.
It also adds each napi to the napi_hash right after netif_napi_add().
Uses efx_start_eventq and efx_stop_eventq in the self tests.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement per channel software TX and RX packet counters
accessed as ethtool statistics.
This allows confirmation with MAC statistics.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a counter rx_noskb_drop for failure to allocate an skb.
Summed the per-channel rx_nodesc_trunc counters earlier so that they can
be included in rx_dropped.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an MCDI command times out (whether or not we find it
completed when we poll), call efx_mcdi_abandon(), which tells
all subsequent MCDI calls to fail-fast, and queues up an FLR.
Because an FLR doesn't lead to receiving any reboot even from
the MC (unlike most other types of reset), we have to call
efx_ef10_reset_mc_allocations.
In efx_start_all(), if a reset (of any kind) is pending, we
bail out.
Without this, attempts to reconfigure (e.g. change mtu) can
cause driver/mc state inconsistency if the first MCDI call
triggers an FLR.
For similar reasons, on EF10, in
efx_reset_down(method=RESET_TYPE_MCDI_TIMEOUT), set the number
of active queues to zero before calling efx_stop_all().
And, on farch, in efx_reset_up(method=RESET_TYPE_MCDI_TIMEOUT),
set active_queues and flushes pending & outstanding to zero.
efx_mcdi_mode_{poll,event}() should not take us out of fail-fast
mode. Instead, this is done by efx_mcdi_reset() after the FLR
completes.
The new FLR reset_type RESET_TYPE_MCDI_TIMEOUT doesn't really
fit into the hierarchy of reset 'scopes' whereby efx_reset()
decides some resets subsume others. Thus, it uses separate logic.
Also, fixed up some inconsistency around RESET_TYPE_MC_BIST,
which was in the wrong place in that hierarchy.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove trailing blank lines in several files.
Use only one blank line between functions.
Add a blank line as a separator in a few places.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The operation can now fail, so change its return type to int.
Remove the inline wrapper while we're changing the signature.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 implementation already does this, and it makes more logical
sense to group the RSS hash key and indirection table together.
Rename the operation to rx_push_rss_config.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The primary function of an EF10 controller will share its clock
device with other functions in the same domain (which we call
secondary functions). To this end, we need to associate functions
on the same controller.
We do not control probe order, so allow primary and secondary
functions to appear in any order. Maintain global lists of all
primary functions and of unassociated secondary functions,
and a list of secondary functions on each primary function.
Use the VPD serial number to tell whether functions are part of the
same controller. VPD will not be readable by virtual functions, so
this may need to be revisited later.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 firmware can optionally insert RX timestamps in the packet
prefix. These only include the clock minor value. We must also
enable periodic time sync events on each event queue which provide
the high bits of the clock value.
[bwh: Combined and rebased several changes.
Added the above description and some sanity checks for inline vs
separate timestamps.
Changed efx_rx_skb_attach_timestamp() to read the packet prefix
from the skb head area.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We can potentially pull the entire packet contents into the head area
and then free the page it was in. In order to read an inline
timestamp safely, we need to copy the prefix into the head area as
well.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
I added efx_ptp_get_mode() to avoid moving the definition for
efx_ptp_data, since the current PTP mode is needed for
siena.c:siena_set_ptp_hwtstamp.
[bwh: Also move the rx_filters mask, and add kernel-doc]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We don't directly control RX ingress on Siena or any later
controllers, and so we cannot prevent packets from entering the RX
datapath while the RX queues are not set up. This results in
the hardware incrementing RX_NODESC_DROP_CNT, but it's not an
error and we should not include it in error stats.
When bringing an interface up or down, pull (or wait for) stats and
count the number of packets that were dropped while the interface was
down. Subtract this from the reported RX dropped count.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>