2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

498 Commits

Author SHA1 Message Date
David S. Miller
0b725a2ca6 net: Remove ndo_xmit_flush netdev operation, use signalling instead.
As reported by Jesper Dangaard Brouer, for high packet rates the
overhead of having another indirect call in the TX path is
non-trivial.

There is the indirect call itself, and then there is all of the
reloading of the state to refetch the tail pointer value and
then write the device register.

Move to a more passive scheme, which requires very light modifications
to the device drivers.

The signal is a new skb->xmit_more value, if it is non-zero it means
that more SKBs are pending to be transmitted on the same queue as the
current SKB.  And therefore, the driver may elide the tail pointer
update.

Right now skb->xmit_more is always zero.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25 16:29:42 -07:00
David S. Miller
c223a078cb virtio_net: Support netdev_ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 23:02:45 -07:00
Jason Wang
91815639d8 virtio-net: rx busy polling support
Add basic support for rx busy polling. Instead of introducing new
states and spinlock to synchronize between NAPI and polling method,
this patch just reuse NAPI state to avoid extra overhead for fast path
and simplified the codes.

Test was done between a kvm guest and an external host. Two hosts were
connected through 40gb mlx4 cards. With both busy_poll and busy_read
are set to 50 in guest, 1 byte netperf tcp_rr shows 127% improvement:
transaction rate was increased from 8353.33 to 18966.87.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 15:12:02 -07:00
Jason Wang
2ffa75988f virtio-net: introduce virtnet_receive()
Move common receive logic to a new helper virtnet_receive(). It will
also be used by rx busy polling method.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 15:12:01 -07:00
Wilfried Klaebe
7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Zhangjie \(HZ\)
6ebbc1a638 virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true
This is a small supplement for commit e7428e95a0
("virtio-net: put virtio-net header inline with data"). TCP packages have
enough room to put virtio-net header in, but UDP packages do not. By
setting dev->needed_headroom for virtio-net device, UDP packages could have
enough room.

For UDP packages, sk_buff is alloced in fun __ip_append_data. The size is
"alloclen + hh_len + 15", and "hh_len = LL_RESERVED_SPACE(rt-dst.dev);".
The Macro is defined as follows:
#define LL_RESERVED_SPACE(dev) \
     ((((dev)->hard_header_len+(dev)->needed_headroom)\
     &~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
By default, for UDP packages, after skb is allocated, only 16 bytes
reserved. And 2 bytes remained after mac header is set. That is not enough
to put virtio-net header in. If we set dev->needed_headroom to 12 or 10
(according to mergeable_rx_bufs is on or off ), more room can be reserved.
Then there is enough room for UDP packages to put the header in.

test result list as below:
guest and host: suse11sp3, netperf, intel 2.4GHz
+-------+---------+---------+---------+---------+
|       |   old             |   new             |
+-------+---------+---------+---------+---------+
| UDP   |  Gbit/s | pps     |  Gbit/s | pps     |
| 64    |  0.57   | 692232  |  0.61   | 742420  |
| 256   |  1.60   | 686860  |  1.71   | 733331  |
| 512   |  2.92   | 674576  |  3.07   | 710446  |
| 1024  |  4.99   | 598977  |  5.17   | 620821  |
| 1460  |  5.68   | 483757  |  7.16   | 610519  |
| 4096  |  6.98   | 637468  |  7.21   | 658471  |
+-------+---------+---------+---------+---------+

Signed-off-by: Zhang Jie <zhangjie14@huawei.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:31:26 -04:00
Amos Kong
c18e9cd623 virtio_net: zero is an invald queue_pairs number
Execute "ethtool -L eth0 combined 0" in guest, if multiqueue
is enabled, virtnet_send_command() will return -EINVAL error,
there is a validation in QEMU.

But if multiqueue is disabled, virtnet_set_queues() will just
return zero (success). We should return error for this situation.

Signed-off-by: Amos Kong <akong@redhat.com>
Acked-by:  Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 16:01:35 -04:00
Linus Torvalds
cd6362befe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Here is my initial pull request for the networking subsystem during
  this merge window:

   1) Support for ESN in AH (RFC 4302) from Fan Du.

   2) Add full kernel doc for ethtool command structures, from Ben
      Hutchings.

   3) Add BCM7xxx PHY driver, from Florian Fainelli.

   4) Export computed TCP rate information in netlink socket dumps, from
      Eric Dumazet.

   5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
      Dichtel.

   6) Convert many drivers to pci_enable_msix_range(), from Alexander
      Gordeev.

   7) Record SKB timestamps more efficiently, from Eric Dumazet.

   8) Switch to microsecond resolution for TCP round trip times, also
      from Eric Dumazet.

   9) Clean up and fix 6lowpan fragmentation handling by making use of
      the existing inet_frag api for it's implementation.

  10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

  11) Auto size SKB lengths when composing netlink messages based upon
      past message sizes used, from Eric Dumazet.

  12) qdisc dumps can take a long time, add a cond_resched(), From Eric
      Dumazet.

  13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
      Get rid of never-used-in-tree netpoll RX handling.  From Eric W
      Biederman.

  14) Support inter-address-family and namespace changing in VTI tunnel
      driver(s).  From Steffen Klassert.

  15) Add Altera TSE driver, from Vince Bridgers.

  16) Optimizing csum_replace2() so that it doesn't adjust the checksum
      by checksumming the entire header, from Eric Dumazet.

  17) Expand BPF internal implementation for faster interpreting, more
      direct translations into JIT'd code, and much cleaner uses of BPF
      filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
      Starovoitov"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
  netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
  net: Add a test to see if a skb is freeable in irq context
  qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
  net: ptp: move PTP classifier in its own file
  net: sxgbe: make "core_ops" static
  net: sxgbe: fix logical vs bitwise operation
  net: sxgbe: sxgbe_mdio_register() frees the bus
  Call efx_set_channels() before efx->type->dimension_resources()
  xen-netback: disable rogue vif in kthread context
  net/mlx4: Set proper build dependancy with vxlan
  be2net: fix build dependency on VxLAN
  mac802154: make csma/cca parameters per-wpan
  mac802154: allow only one WPAN to be up at any given time
  net: filter: minor: fix kdoc in __sk_run_filter
  netlink: don't compare the nul-termination in nla_strcmp
  can: c_can: Avoid led toggling for every packet.
  can: c_can: Simplify TX interrupt cleanup
  can: c_can: Store dlc private
  can: c_can: Reduce register access
  can: c_can: Make the code readable
  ...
2014-04-02 20:53:45 -07:00
Linus Torvalds
64056a9425 Nothing exciting: virtio-blk users might see a bit of a boost from the
doubling of the default queue length though.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOirvAAoJENkgDmzRrbjxpqAP/3114WOubf4MNoi24eXNkU2x
 ZC++orq408A72g8dB7A0XAhatKc8Ay0bDP5hOZNAgTHm2hjYxmhpC9UlKEzzElpW
 yjwR0wYBXA0RoJuZqCp9MNgOtkU54QoQZ0c4EZblagslUZBmKQPUDE7XdgkaqO6o
 A3azfCjAFDu523Azep9Npj1sk+H+VH3OIXyMGY+uyHBw1a2rnmIhn4lQCQ+pX+YO
 3wpqxEragpjBAizs1CAAB9wWm2O8zVACxkoUXVYI8Tu60n99lwr9Abxlc0oHSIig
 E7kBnyzQVHfkDrPXR3EdwTi3Hwd/BaOiW4dPvQ3IJKvPOzoiS4H3IpJCnCV5PfRb
 VHl3q//SzPQ+GXH7WH2Fhb9JxoLxBRzFcy3kdIR1wBHYahAOiQjcLgapOO5mVq3X
 PJy9CDs2L9rjbtxvQWtnl62V3JFGw+ZdhhG/BjeC5Who/aSh/mDoss7/qdfrxKJx
 z5IWYSlJw7ighOuF0dPdCKAX9WiWqENvga31Q2svrH4Hxlx6eGumEmX+YQw0iRAf
 ddMYA+1qLT4myPTN0nORFM+T/mkZHNkNMCr0qylRFH0j6hSiDxwWqQG0eXA661By
 W6nIkW++sj2Vkk4knMGCXSyMmy9Nrv+1R+8unQJCXixYevotP5JEY0DoCQwlGuuq
 xa0UR+2q9Htnbytu8S0K
 =AoMS
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "Nothing exciting: virtio-blk users might see a bit of a boost from the
  doubling of the default queue length though"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio-blk: base queue-depth on virtqueue ringsize or module param
  Revert a02bbb1ccf: MAINTAINERS: add virtio-dev ML for virtio
  virtio: fail adding buffer on broken queues.
  virtio-rng: don't crash if virtqueue is broken.
  virtio_balloon: don't crash if virtqueue is broken.
  virtio_blk: don't crash, report error if virtqueue is broken.
  virtio_net: don't crash if virtqueue is broken.
  virtio_balloon: don't softlockup on huge balloon changes.
  virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()
  MAINTAINERS: virtio-dev is subscribers only
  tools/virtio: add a missing )
  tools/virtio: fix missing kmemleak_ignore symbol
  tools/virtio: update internal copies of headers
2014-04-02 14:43:17 -07:00
David S. Miller
64c27237a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/marvell/mvneta.c

The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:48:54 -04:00
Jason Wang
681daee244 virtio-net: correct error handling of virtqueue_kick()
Current error handling of virtqueue_kick() was wrong in two places:
- The skb were freed immediately when virtqueue_kick() fail during
  xmit. This may lead double free since the skb was not detached from
  the virtqueue.
- try_fill_recv() returns false when virtqueue_kick() fail. This will
  lead unnecessary rescheduling of refill work.

Actually, it's safe to just ignore the kick failure in those two
places. So this patch fixes this by partially revert commit
6797590118.

Fixes 6797590118
(virtio_net: verify if virtqueue_kick() succeeded).

Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:13:21 -04:00
Eric W. Biederman
85e9452539 virtio_net: Call dev_kfree_skb_any instead of dev_kfree_skb.
Replace dev_kfree_skb with dev_kfree_skb_any in start_xmit which can
be called in hard irq and other contexts.

start_xmit only frees skbs that it is dropping.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-03-24 21:19:25 -07:00
Eric W. Biederman
57a7744e09 net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq
Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:41:36 -04:00
Rusty Russell
a7c58146cf virtio_net: don't crash if virtqueue is broken.
A bad implementation of virtio might cause us to mark the virtqueue
broken: we'll dev_err() in that case, and the device is useless, but
let's not BUG_ON().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:27:55 +10:30
Jason Wang
0e7ede80d9 virtio-net: alloc big buffers also when guest can receive UFO
We should alloc big buffers also when guest can receive UFO
packets to let the big packets fit into guest rx buffer.

Fixes 5c5167515d
(virtio-net: Allow UFO feature to be set and advertised.)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-24 18:19:52 -05:00
Michael Dalton
fbf28d78f5 virtio-net: initial rx sysfs support, export mergeable rx buffer size
Add initial support for per-rx queue sysfs attributes to virtio-net. If
mergeable packet buffers are enabled, adds a read-only mergeable packet
buffer size sysfs attribute for each RX queue.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:07 -08:00
Michael Dalton
ab7db91705 virtio-net: auto-tune mergeable rx buffer size for improved performance
Commit 2613af0ed1 ("virtio_net: migrate mergeable rx buffers to page frag
allocators") changed the mergeable receive buffer size from PAGE_SIZE to
MTU-size, introducing a single-stream regression for benchmarks with large
average packet size. There is no single optimal buffer size for all
workloads.  For workloads with packet size <= MTU bytes, MTU + virtio-net
header-sized buffers are preferred as larger buffers reduce the TCP window
due to SKB truesize. However, single-stream workloads with large average
packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
are used.

This commit auto-tunes the mergeable receiver buffer packet size by
choosing the packet buffer size based on an EWMA of the recent packet
sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
virtio-net header len to PAGE_SIZE. This improves throughput for
large packet workloads, as any workload with average packet size >=
PAGE_SIZE will use PAGE_SIZE buffers.

These optimizations interact positively with recent commit
ba27524103 ("virtio-net: coalesce rx frags when possible during rx"),
which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
optimizations benefit buffers of any size.

Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs
with all offloads & vhost enabled. All VMs and vhost threads run in a
single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
in the system will not be scheduled on the benchmark CPUs. Trunk includes
SKB rx frag coalescing.

net-next w/ virtio_net before 2613af0ed1 (PAGE_SIZE bufs): 14642.85Gb/s
net-next (MTU-size bufs):  13170.01Gb/s
net-next + auto-tune: 14555.94Gb/s

Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
using MTU-sized buffers to about 26Gb/s using auto-tuning.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Michael Dalton
fb51879dbc virtio-net: use per-receive queue page frag alloc for mergeable bufs
The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
mergeable rx buffer allocations. This commit migrates virtio-net to use
per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
mergeable rx buffer memory allocation, which now will use skb_refill_frag()
for both atomic and GFP-WAIT buffer allocations.

To address fragmentation concerns, if after buffer allocation there
is too little space left in the page frag to allocate a subsequent
buffer, the remaining space is added to the current allocated buffer
so that the remaining space can be used to store packet data.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Jason Wang
be121f46af virtio-net: drop rq->max and rq->num
It looks like there's no need for those two fields:

- Unless there's a failure for the first refill try, rq->max should be always
  equal to the vring size.
- rq->num is only used to determine the condition that we need to do the refill,
  we could check vq->num_free instead.
- rq->num was required to be increased or decreased explicitly after each
  get/put which results a bad API.

So this patch removes them both to make the code simpler.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:30:42 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Jason Wang
6cd4ce0099 virtio-net: fix refill races during restore
During restoring, try_fill_recv() was called with neither napi lock nor napi
disabled. This can lead two try_fill_recv() was called in the same time. Fix
this by refilling before trying to enable napi.

Fixes 0741bcb558
(virtio: net: Add freeze, restore handlers to support S4).

Cc: Amit Shah <amit.shah@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:23:03 -05:00
stephen hemminger
788a8b6dd3 virtio_net: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:28:06 -05:00
stephen hemminger
d24bae32fa virtio_net: remove unused parameter to send_command
All the code passes NULL for the last sg list (in).
Simplify by just removing it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:28:06 -05:00
David S. Miller
34f9f43710 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:20:14 -05:00
Michael Dalton
98bfd23cdb virtio-net: free bufs correctly on invalid packet length
When a packet with invalid length arrives, ensure that the packet
is freed correctly if mergeable packet buffers and big packets
(GUEST_TSO4) are both enabled.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 16:31:43 -05:00
Andrey Vagin
d4fb84eefe virtio: delete napi structures from netdev before releasing memory
free_netdev calls netif_napi_del too, but it's too late, because napi
structures are placed on vi->rq. netif_napi_add() is called from
virtnet_alloc_queues.

general protection fault: 0000 [#1] SMP
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables virtio_balloon pcspkr virtio_net(-) i2c_pii
CPU: 1 PID: 347 Comm: rmmod Not tainted 3.13.0-rc2+ #171
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800b779c420 ti: ffff8800379e0000 task.ti: ffff8800379e0000
RIP: 0010:[<ffffffff81322e19>]  [<ffffffff81322e19>] __list_del_entry+0x29/0xd0
RSP: 0018:ffff8800379e1dd0  EFLAGS: 00010a83
RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800379c2fd0 RCX: dead000000200200
RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: ffff8800379c2fd0
RBP: ffff8800379e1dd0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800379c2f90
R13: ffff880037839160 R14: 0000000000000000 R15: 00000000013352f0
FS:  00007f1400e34740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f464124c763 CR3: 00000000b68cf000 CR4: 00000000000006e0
Stack:
 ffff8800379e1df0 ffffffff8155beab 6b6b6b6b6b6b6b2b ffff8800378391c0
 ffff8800379e1e18 ffffffff8156499b ffff880037839be0 ffff880037839d20
 ffff88003779d3f0 ffff8800379e1e38 ffffffffa003477c ffff88003779d388
Call Trace:
 [<ffffffff8155beab>] netif_napi_del+0x1b/0x80
 [<ffffffff8156499b>] free_netdev+0x8b/0x110
 [<ffffffffa003477c>] virtnet_remove+0x7c/0x90 [virtio_net]
 [<ffffffff813ae323>] virtio_dev_remove+0x23/0x80
 [<ffffffff813f62ef>] __device_release_driver+0x7f/0xf0
 [<ffffffff813f6ca0>] driver_detach+0xc0/0xd0
 [<ffffffff813f5f28>] bus_remove_driver+0x58/0xd0
 [<ffffffff813f72ec>] driver_unregister+0x2c/0x50
 [<ffffffff813ae65e>] unregister_virtio_driver+0xe/0x10
 [<ffffffffa0036942>] virtio_net_driver_exit+0x10/0x6ce [virtio_net]
 [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220
 [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0
 [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b
Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00
RIP  [<ffffffff81322e19>] __list_del_entry+0x29/0xd0
 RSP <ffff8800379e1dd0>
---[ end trace d5931cd3f87c9763 ]---

Fixes: 986a4f4d45 (virtio_net: multiqueue support)
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 15:28:29 -05:00
Andrey Vagin
fa9fac1725 virtio-net: determine type of bufs correctly
free_unused_bufs must check vi->mergeable_rx_bufs before
vi->big_packets, because we use this sequence in other places.
Otherwise we allocate buffer of one type, then free it as another
type.

general protection fault: 0000 [#1] SMP
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables pcspkr virtio_balloon virtio_net(-) i2c_pii
CPU: 0 PID: 400 Comm: rmmod Not tainted 3.13.0-rc2+ #170
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800b6d2a210 ti: ffff8800aed32000 task.ti: ffff8800aed32000
RIP: 0010:[<ffffffffa00345f3>]  [<ffffffffa00345f3>] free_unused_bufs+0xc3/0x190 [virtio_net]
RSP: 0018:ffff8800aed33dd8  EFLAGS: 00010202
RAX: ffff8800b1fe2c00 RBX: ffff8800b66a7240 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800b8419a68 RDI: ffff8800b66a1148
RBP: ffff8800aed33e00 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: ffff8800b66a1148 R14: 0000000000000000 R15: 000077ff80000000
FS:  00007fc4f9c4e740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f63f432f000 CR3: 00000000b6538000 CR4: 00000000000006f0
Stack:
 ffff8800b66a7240 ffff8800b66a7380 ffff8800377bd3f0 0000000000000000
 00000000023302f0 ffff8800aed33e18 ffffffffa00346e2 ffff8800b66a7240
 ffff8800aed33e38 ffffffffa003474d ffff8800377bd388 ffff8800377bd390
Call Trace:
 [<ffffffffa00346e2>] remove_vq_common+0x22/0x40 [virtio_net]
 [<ffffffffa003474d>] virtnet_remove+0x4d/0x90 [virtio_net]
 [<ffffffff813ae303>] virtio_dev_remove+0x23/0x80
 [<ffffffff813f62cf>] __device_release_driver+0x7f/0xf0
 [<ffffffff813f6c80>] driver_detach+0xc0/0xd0
 [<ffffffff813f5f08>] bus_remove_driver+0x58/0xd0
 [<ffffffff813f72cc>] driver_unregister+0x2c/0x50
 [<ffffffff813ae63e>] unregister_virtio_driver+0xe/0x10
 [<ffffffffa0036852>] virtio_net_driver_exit+0x10/0x7be [virtio_net]
 [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220
 [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0
 [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b
Code: c0 74 55 0f 1f 44 00 00 80 7b 30 00 74 7a 48 8b 50 30 4c 89 e6 48 03 73 20 48 85 d2 0f 84 bb 00 00 00 66 0f
RIP  [<ffffffffa00345f3>] free_unused_bufs+0xc3/0x190 [virtio_net]
 RSP <ffff8800aed33dd8>
---[ end trace edb570ea923cce9c ]---

Fixes: 2613af0ed1 (virtio_net: migrate mergeable rx buffers to page frag allocators)
Cc: Michael Dalton <mwdalton@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 15:28:29 -05:00
Jeff Kirsher
adf8d3ff6e drivers/net/*: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Paul Mackerras <paulus@samba.org>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:55 -05:00
Michael S. Tsirkin
f121159d72 virtio_net: make all RX paths handle erors consistently
receive mergeable now handles errors internally.
Do same for big and small packet paths, otherwise
the logic is too hard to follow.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-01 20:27:16 -05:00
Michael S. Tsirkin
8fc3b9e9a2 virtio_net: fix error handling for mergeable buffers
Eric Dumazet noticed that if we encounter an error
when processing a mergeable buffer, we don't
dequeue all of the buffers from this packet,
the result is almost sure to be loss of networking.

Jason Wang noticed that we also leak a page and that we don't decrement
the rq buf count, so we won't repost buffers (a resource leak).

Fix both issues.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael Dalton <mwdalton@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-01 20:27:16 -05:00
Thomas Huth
99e872ae1e virtio_net: Fixed a trivial typo (fitler --> filter)
"MAC filter" sounds more reasonable than "MAC fitler".

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-30 16:14:24 -05:00
Linus Torvalds
1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Zhi Yong Wu
0f13b66b01 net, virtio_net: replace the magic value
It is more appropriate to use # of queue pairs currently used by
the driver instead of a magic value.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 16:23:09 -05:00
Linus Torvalds
b746f9c794 Nothing really exciting: some groundwork for changing virtio endian, and
some robustness fixes for broken virtio devices, plus minor tweaks.
 
 [vs last pull request: added the virtio-scsi broken vq escape patch, which
 I somehow lost.]
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgDsJAAoJENkgDmzRrbjxEE4P/jXqZHS/HdlxW9k0BjKKlEIF
 PdtCoP3UhWTdskXvy2pD8m6nYn214MEJYUIa4HFlIEZsdxhuexzQHY19Ynkjagyv
 57sRsUUm5fYQLIL7IUh2DUD1VU38hUFinno/y333szzvCj9qITDA/QABsiWxK8NO
 dq+Lmeixgrhc5yN9iryW+gZV+hekJIZ4LsU5ejSaJucKblzXUH8qIbmSthG7RTYJ
 tr4J7xTTXbhxY4CoC5Dpx2hvsFkvzaAIvI4Nr1mDjfq5cR8BaYvnC89U1IbhdAey
 p1AbZE58JLrY+Z8K8LBRGV2KjO8qSZ6R47hbZ9nAnodJYB7sZLyj6jUe1q+/htuC
 Dh9Xm9O4eW2xNaFk20dYeIF4UU5/HzdsbvG/IlH8x4sm8/K706ocYyAOHlzYUg2T
 k7gltrgDzDokMgb2R44gwnr4oaJ2q8Gne6JXswlPEv2eRs6vNnA5Xhc0rEHGkU6C
 gYn1vNFN6yx0vf2syG/Ce5pZtMxGpefKQkHzzWdq8FKr1B9s54dDuf2hls7J8A9t
 OQT1gE33yURSelf4Kh4k9zWXaWk/Ohv9l2R1cqpALnJ4/+q0fP5t7HdK500S7aax
 DxLeFeqvsBw7nlWgsGxQmt+fjITQFHhcDiwst0ehnt6RbDEW7XPIguz0K/gyhxYG
 +UNbl/5Gr64jnUX3YCzm
 =vY2L
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "Nothing really exciting: some groundwork for changing virtio endian,
  and some robustness fixes for broken virtio devices, plus minor
  tweaks"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio_scsi: verify if queue is broken after virtqueue_get_buf()
  x86, asmlinkage, lguest: Pass in globals into assembler statement
  virtio: mmio: fix signature checking for BE guests
  virtio_ring: adapt to notify() returning bool
  virtio_net: verify if queue is broken after virtqueue_get_buf()
  virtio_console: verify if queue is broken after virtqueue_get_buf()
  virtio_blk: verify if queue is broken after virtqueue_get_buf()
  virtio_ring: add new function virtqueue_is_broken()
  virtio_test: verify if virtqueue_kick() succeeded
  virtio_net: verify if virtqueue_kick() succeeded
  virtio_ring: let virtqueue_{kick()/notify()} return a bool
  virtio_ring: change host notification API
  virtio_config: remove virtio_config_val
  virtio: use size-based config accessors.
  virtio_config: introduce size-based accessors.
  virtio_ring: plug kmemleak false positive.
  virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
2013-11-15 13:28:47 +09:00
Michael Dalton
5061de3666 virtio-net: mergeable buffer size should include virtio-net header
Commit 2613af0ed1 ("virtio_net: migrate mergeable rx buffers to page
frag allocators") changed the mergeable receive buffer size from PAGE_SIZE
to MTU-size. However, the merge buffer size does not take into account the
size of the virtio-net header. Consequently, packets that are MTU-size
will take two buffers intead of one (to store the virtio-net header),
substantially decreasing the throughput of MTU-size traffic due to TCP
window / SKB truesize effects.

This commit changes the mergeable buffer size to include the virtio-net
header. The buffer size is cacheline-aligned because skb_page_frag_refill
will not automatically align the requested size.

Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs and
vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup
cpuset, using cgroups to ensure that other processes in the system will not
be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive
buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to
force MTU-sized packets on the receiver.

next-net trunk before 2613af0ed1 (PAGE_SIZE buf): 3861.08Gb/s
net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s
net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s
net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s

Suggested-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:22:56 -05:00
Linus Torvalds
5e30025a31 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "The biggest changes:

   - add lockdep support for seqcount/seqlocks structures, this
     unearthed both bugs and required extra annotation.

   - move the various kernel locking primitives to the new
     kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  block: Use u64_stats_init() to initialize seqcounts
  locking/lockdep: Mark __lockdep_count_forward_deps() as static
  lockdep/proc: Fix lock-time avg computation
  locking/doc: Update references to kernel/mutex.c
  ipv6: Fix possible ipv6 seqlock deadlock
  cpuset: Fix potential deadlock w/ set_mems_allowed
  seqcount: Add lockdep functionality to seqcount/seqlock structures
  net: Explicitly initialize u64_stats_sync structures for lockdep
  locking: Move the percpu-rwsem code to kernel/locking/
  locking: Move the lglocks code to kernel/locking/
  locking: Move the rwsem code to kernel/locking/
  locking: Move the rtmutex code to kernel/locking/
  locking: Move the semaphore core to kernel/locking/
  locking: Move the spinlock code to kernel/locking/
  locking: Move the lockdep code to kernel/locking/
  locking: Move the mutex code to kernel/locking/
  hung_task debugging: Add tracepoint to report the hang
  x86/locking/kconfig: Update paravirt spinlock Kconfig description
  lockstat: Report avg wait and hold times
  lockdep, x86/alternatives: Drop ancient lockdep fixup message
  ...
2013-11-14 16:30:30 +09:00
John Stultz
827da44c61 net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:25 +01:00
Jason Wang
9bb8ca8607 virtio-net: switch to use XPS to choose txq
We used to use a percpu structure vq_index to record the cpu to queue
mapping, this is suboptimal since it duplicates the work of XPS and
loses all other XPS functionality such as allowing user to configure
their own transmission steering strategy.

So this patch switches to use XPS and suggest a default mapping when
the number of cpus is equal to the number of queues. With XPS support,
there's no need for keeping per-cpu vq_index and .ndo_select_queue(),
so they were removed also.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 22:20:29 -05:00
Jason Wang
ba27524103 virtio-net: coalesce rx frags when possible during rx
Commit 2613af0ed1 (virtio_net: migrate mergeable
rx buffers to page frag allocators) try to increase the payload/truesize for
MTU-sized traffic. But this will introduce the extra overhead for GSO packets
received because of the frag list. This commit tries to reduce this issue by
coalesce the possible rx frags when possible during rx. Test result shows the
about 15% improvement on full size GSO packet receiving (and even better than
before commit 2613af0ed1).

Before this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    20303.87

After this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    23841.26

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 20:03:52 -05:00
David S. Miller
394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Jason Wang
ec9debbd9a virtio-net: correctly handle cpu hotplug notifier during resuming
commit 3ab098df35 (virtio-net: don't respond to
cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug
notifier by checking the config_enable and does nothing is it was false. So it
need to try to hold the config_lock mutex which may happen in atomic
environment which leads the following warnings:

[  622.944441] CPU0 attaching NULL sched-domain.
[  622.944446] CPU1 attaching NULL sched-domain.
[  622.944485] CPU0 attaching NULL sched-domain.
[  622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616
[  622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1
[  622.950796] no locks held by migration/1/10.
[  622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317
[  622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  622.950802]  0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70
[  622.950803]  ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed
[  622.950805]  0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000
[  622.950805] Call Trace:
[  622.950810]  [<ffffffff81a32f22>] dump_stack+0x54/0x74
[  622.950815]  [<ffffffff810edb02>] __might_sleep+0x112/0x114
[  622.950817]  [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6
[  622.950818]  [<ffffffff810e861d>] ? up+0x39/0x3e
[  622.950821]  [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d
[  622.950824]  [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62
[  622.950828]  [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87
[  622.950830]  [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e
[  622.950832]  [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10
[  622.950835]  [<ffffffff810c5556>] __cpu_notify+0x20/0x37
[  622.950836]  [<ffffffff810c5580>] cpu_notify+0x13/0x15
[  622.950838]  [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a
[  622.950841]  [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1
[  622.950842]  [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f
[  622.950844]  [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f
[  622.950847]  [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8
[  622.950848]  [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47
[  622.950850]  [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3
[  622.950852]  [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67
[  622.950854]  [<ffffffff810e36b7>] kthread+0xd8/0xe0
[  622.950857]  [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164
[  622.950859]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
[  622.950861]  [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0
[  622.950862]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
[  622.950876] smpboot: CPU 1 is now offline
[  623.194556] SMP alternatives: lockdep: fixing up alternatives
[  623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1
...

A correct fix is to unregister the hotcpu notifier during restore and register a
new one in resume.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:43:41 -04:00
Michael Dalton
2613af0ed1 virtio_net: migrate mergeable rx buffers to page frag allocators
The virtio_net driver's mergeable receive buffer allocator
uses 4KB packet buffers. For MTU-sized traffic, SKB truesize
is > 4KB but only ~1500 bytes of the buffer is used to store
packet data, reducing the effective TCP window size
substantially. This patch addresses the performance concerns
with mergeable receive buffers by allocating MTU-sized packet
buffers using page frag allocators. If more than MAX_SKB_FRAGS
buffers are needed, the SKB frag_list is used.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 23:56:46 -04:00
Heinz Graalfs
047b9b9495 virtio_net: verify if queue is broken after virtqueue_get_buf()
If a virtqueue_get_buf() call returns a NULL pointer a possibly endless while
loop should be avoided by checking for a broken virtqueue.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29 11:28:19 +10:30
Heinz Graalfs
6797590118 virtio_net: verify if virtqueue_kick() succeeded
Verify if a host kick succeeded by checking return value of virtqueue_kick().

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29 11:28:14 +10:30
Jason Wang
35ed159bfd virtio-net: refill only when device is up during setting queues
We used to schedule the refill work unconditionally after changing the
number of queues. This may lead an issue if the device is not
up. Since we only try to cancel the work in ndo_stop(), this may cause
the refill work still work after removing the device. Fix this by only
schedule the work when device is up.

The bug were introduce by commit 9b9cd8024a.
(virtio-net: fix the race between channels setting and refill)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:53:09 -04:00
Jason Wang
3ab098df35 virtio-net: don't respond to cpu hotplug notifier if we're not ready
We're trying to re-configure the affinity unconditionally in cpu hotplug
callback. This may lead the issue during resuming from s3/s4 since

- virt queues haven't been allocated at that time.
- it's unnecessary since thaw method will re-configure the affinity.

Fix this issue by checking the config_enable and do nothing is we're not ready.

The bug were introduced by commit 8de4b2f3ae
(virtio-net: reset virtqueue affinity when doing cpu hotplug).

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:53:09 -04:00
Rusty Russell
855e0c5288 virtio: use size-based config accessors.
This lets the transport do endian conversion if necessary, and insulates
the drivers from the difference.

Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-17 10:55:37 +10:30
Aaron Lu
8910700039 virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
The freeze and restore functions defined in virtio drivers are used
for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than
CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for
virtio drivers that implement freeze and restore callbacks.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-23 15:45:58 +09:30
Thomas Huth
4f49129be6 virtio-net: Set RXCSUM feature if GUEST_CSUM is available
If the VIRTIO_NET_F_GUEST_CSUM virtio feature is available, the guest
does not have to calculate the checksums on all received packets. This
is pretty much the same feature as RX checksum offloading on real
network cards, so the virtio-net driver should report this by setting
the NETIF_F_RXCSUM flag. When the user now runs "ethtool -k", he or she
can see whether the virtio-net interface has to calculate RX checksums
or not.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 21:36:41 -04:00
Michael S. Tsirkin
e7428e95a0 virtio-net: put virtio net header inline with data
For small packets we can simplify xmit processing
by linearizing buffers with the header:
most packets seem to have enough head room
we can use for this purpose.
Since existing hypervisors require that header
is the first s/g element, we need a feature bit
for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27 20:06:10 -07:00
Linus Torvalds
5f12972171 No real surprises.
Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR3NcfAAoJENkgDmzRrbjxtgAQAMAChsD867ADZPVaT1jcS6Me
 5OoavOnpw84AopY4cLOFxL13lDgHeDZxTbsS56WP/100WEDTFQqr9uWpuFxt/NU7
 ukXwMrN/5RBmTXtDpdApM+jnBdgRLxu6gJaEjmVZSW10XwPDar+zGJV+NUVX42uy
 8rhjZcFTPIhLz96VBeXgtnmlc8f33AwdFb1JtA6c5slMXmJ0pGKpme3Gri0Hqu0X
 sowVCchjuJXuhg3siyyjcyrDtWnH6hPNCbvAFuL4wwcGyMjb3TS7l922GiscD9pq
 KZcpkYywCosJEnaR4XZQ3ewDGnqaslgb556Tcigm2TXC9LfAR/k4Q1ZYiPGrxcmU
 zgiN/Nvr1PbMDcKr6Lr+utQGh82jeKTcrKz/CPlCJtusg+4hPCwuugDMhNqKK4/2
 3O+0c5t32K38TUdDhVRu2wXlvlyLoQc55yIbhw70O+G1Td7KMHrjXuxwyWbP+tGU
 X/D2DKQu5bcNcBv5sA04PdyRM2buYFvwSZUjxwdwWssmdUqU1xdybDK3pgWf92fF
 llsri86xZ/hOOE+A+jn/oUpXol2PAVOCoh80P7O9VeuDgPL2Fjl4UWf8HWqVvn/u
 A3kpCrIygogc4I7xQkdxBkR6Aa2Uh323rpXKm7E4Tlg0Ii6srqBWq74fz2Ou3TlS
 tOPNG/Npv6yiWo2ejXp8
 =XNkj
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "No real surprises"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MAINTAINERS: add tools/virtio/ under virtio
  tools/virtio: move module license stub to module.h
  virtio: include asm/barrier explicitly
  virtio: VIRTIO_F_ANY_LAYOUT feature
  lguest: fix example launcher compilation for broken glibc headers.
  virtio-net: fix the race between channels setting and refill
  tools/lguest: real barriers.
  tools/lguest: fix missing rmb().
  virtio_balloon: leak_balloon(): only tell host if we got pages deflated
  virtio-pci: fix leaks of msix_affinity_masks
  Fix comment typo "CONFIG_PAE"
2013-07-10 14:50:58 -07:00
Michael S. Tsirkin
cbdadbbf0c virtio_net: fix race in RX VQ processing
virtio net called virtqueue_enable_cq on RX path after napi_complete, so
with NAPI_STATE_SCHED clear - outside the implicit napi lock.
This violates the requirement to synchronize virtqueue_enable_cq wrt
virtqueue_add_buf.  In particular, used event can move backwards,
causing us to lose interrupts.
In a debug build, this can trigger panic within START_USE.

Jason Wang reports that he can trigger the races artificially,
by adding udelay() in virtqueue_enable_cb() after virtio_mb().

However, we must call napi_complete to clear NAPI_STATE_SCHED before
polling the virtqueue for used buffers, otherwise napi_schedule_prep in
a callback will fail, causing us to lose RX events.

To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
set (under napi lock), later call virtqueue_poll with
NAPI_STATE_SCHED clear (outside the lock).

Reported-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-09 12:45:37 -07:00
Jason Wang
9b9cd8024a virtio-net: fix the race between channels setting and refill
Commit 55257d72bd (virtio-net: fill only rx queues
which are being used) tries to refill on demand when changing the number of
channels by call try_refill_recv() directly, this may race:

- the refill work who may do the refill in the same time
- the try_refill_recv() called in bh since napi was not disabled

Which may led guest complain during setting channels:

virtio_net virtio0: input.1:id 0 is not a head!

Solve this issue by scheduling a refill work which can guarantee the
serialization of refill.

Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-04 11:25:06 +09:30
Jason Wang
e4166625ed virtio_net: enable napi for all possible queues during open
Commit 55257d72bd (virtio-net: fill only rx
queues which are being used) only does the napi enabling during open for
curr_queue_pairs. This will break multiqueue receiving since napi of new queues
were still disabled after changing the number of queues.

This patch fixes this by enabling napi for all possible queues during open.

Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23 00:10:59 -07:00
Amerigo Wang
d34710e3e3 virtio_net: use default napi weight by default
Since commit 82dc3c63c6 ("net: introduce NAPI_POLL_WEIGHT")
we warn drivers when they use napi weight higher than NAPI_POLL_WEIGHT,
but virtio_net still uses 128 by default. This patch makes its default
value to NAPI_POLL_WEIGHT.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11 17:40:14 -07:00
Linus Torvalds
736a2dd257 Lots of virtio work which wasn't quite ready for last merge window. Plus
I dived into lguest again, reworking the pagetable code so we can move
 the switcher page: our fixmaps sometimes take more than 2MB now...
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRga7lAAoJENkgDmzRrbjx/yIQAKpqIBtxOJeYH3SY+Uoe7Cfp
 toNYcpJEldvb0UcWN8M2cSZpHoxl1SUoq9djwcM29tcKa7EZAjHaGtb/Q1qMTDgv
 +B3WAfiGU2pmXFxLAkbrlLNGnysy24JspqJQ5hcYV84EiBxQdZp+nCYgOphd+GMK
 ww16vo9ya8jFjzt3GeRp/Heb3vEzV4Cp6BC3i0m8A3WNpEpbRb66pqXNk5o8ggJO
 SxQOKSXmUM+0m+jKSul5xn3e2Ls2LOrZZ8/DIHA+gW66N4Zab7n2/j1Q9VRxb4lh
 FqnR7KwgBX8OCh9IsBDqQYS7MohvMYge6eUdLtFrq84jvMleMEhrC8q9v2tucFUb
 5t18CLwvyK7Gdg6UCKiZ7YSPcuURAILO16al9bh5IseeBDsuX+43VsvQoBmFn9k6
 cLOVTZ6BlOmahK5PyRYFSvLa9Rxzr/05Mr7oYq9UgshD9io78dnqczFYIORF53rW
 zD7C4HuTZfYJFfNd0wAJ0RfVXnf8QvDlMdo7zPC26DSXNWqj8OexCY0qqSWUB+2F
 vcfJP6NkV4fZB8aawWIFUVwc64yqtt2uPVLa7ATZWqk16PgKrchGewmw3tiEwOgu
 1l7xgffTRRUIJsqaCZoXdgw3yezcKRjuUBcOxL09lDAAhc+NxWNvzZBsKp66DwDk
 yZQKn0OdXnuf0CeEOfFf
 =1tYL
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio & lguest updates from Rusty Russell:
 "Lots of virtio work which wasn't quite ready for last merge window.

  Plus I dived into lguest again, reworking the pagetable code so we can
  move the switcher page: our fixmaps sometimes take more than 2MB now..."

Ugh.  Annoying conflicts with the tcm_vhost -> vhost_scsi rename.
Hopefully correctly resolved.

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits)
  caif_virtio: Remove bouncing email addresses
  lguest: improve code readability in lg_cpu_start.
  virtio-net: fill only rx queues which are being used
  lguest: map Switcher below fixmap.
  lguest: cache last cpu we ran on.
  lguest: map Switcher text whenever we allocate a new pagetable.
  lguest: don't share Switcher PTE pages between guests.
  lguest: expost switcher_pages array (as lg_switcher_pages).
  lguest: extract shadow PTE walking / allocating.
  lguest: make check_gpte et. al return bool.
  lguest: assume Switcher text is a single page.
  lguest: rename switcher_page to switcher_pages.
  lguest: remove RESERVE_MEM constant.
  lguest: check vaddr not pgd for Switcher protection.
  lguest: prepare to make SWITCHER_ADDR a variable.
  virtio: console: replace EMFILE with EBUSY for already-open port
  virtio-scsi: reset virtqueue affinity when doing cpu hotplug
  virtio-scsi: introduce multiqueue support
  virtio-scsi: push vq lock/unlock into virtscsi_vq_done
  virtio-scsi: pass struct virtio_scsi to virtqueue completion function
  ...
2013-05-02 14:14:04 -07:00
Sasha Levin
55257d72bd virtio-net: fill only rx queues which are being used
Due to MQ support we may allocate a whole bunch of rx queues but
never use them. With this patch we'll safe the space used by
the receive buffers until they are actually in use:

sh-4.2# free -h
             total       used       free     shared    buffers     cached
Mem:          490M        35M       455M         0B         0B       4.1M
-/+ buffers/cache:        31M       459M
Swap:           0B         0B         0B
sh-4.2# ethtool -L eth0 combined 8
sh-4.2# free -h
             total       used       free     shared    buffers     cached
Mem:          490M       162M       327M         0B         0B       4.1M
-/+ buffers/cache:       158M       331M
Swap:           0B         0B         0B

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-04-29 12:47:05 +09:30
Patrick McHardy
80d5c3689b net: vlan: prepare for 802.1ad VLAN filtering offload
Change the rx_{add,kill}_vid callbacks to take a protocol argument in
preparation of 802.1ad support. The protocol argument used so far is
always htons(ETH_P_8021Q).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:45:27 -04:00
Patrick McHardy
f646968f8f net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*
Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:45:26 -04:00
Jason Wang
4fda830263 virtio-net: initialize vlan_features
There's nothing that prevent passing the device features of virtio_net to its
vlan device. So this patch simply passes those to vlan device to benefit from
advanced features.

Netperf shows better sending performance for vlan device since TSO can work on
vlan now.

before:
netperf -H 192.168.5.2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    4162.35

after:
netperf -H 192.168.5.2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    9365.42

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:21:57 -04:00
Rusty Russell
9d0ca6ed6f virtio: remove obsolete virtqueue_get_queue_index()
You can access it directly now, since 3.8: v3.7-rc1-13-g06ca287
'virtio: move queue_index and num_free fields into core struct
virtqueue.'

Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:23:34 -04:00
Rusty Russell
9dc7b9e4d0 virtio_net: use simplified virtqueue accessors.
We never add buffers with input and output parts, so use the new accessors.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20 15:45:02 +10:30
Rusty Russell
f7bc959451 virtio_net: use virtqueue_add_sgs[] for command buffers.
It's a bit cleaner to hand multiple sgs, rather than one big one.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20 15:45:01 +10:30
Linus Torvalds
3c834b6f41 All trivial, thanks to the stuff which didn't quite make it time.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRLEkHAAoJENkgDmzRrbjx6K0P/3o9/iW5hkOPYpu+KV2nr0wG
 6RG0uu0DCOb/tckigYwnn5PkS7UEcJu6kDypnEgXfhcNhYiBjoGIAUEQ2jBYyzQm
 IYc4oZhxDdqigJL/FHi2zL50mkWacTdBK83udxim3eRkgW9ysBRoBFwqMruVyhZv
 474KGS4PNT8pHDOCAlrRS1I9oW2pYTuUidN+SnfVJ8gFSkH0tuHEGoJeGrtDa010
 XkiWqiTJq6pDHTM5f4Wwp/WNQ21UNDBlvRahg0nqTZXicQsvujV0Tdb+EnAfXwa+
 bssqvO4X2WqlraVK1TJteufhcdhUgt/I1+45p9eLZvOXizk7EBKxFWynE7C878GV
 dhpz8i4mc+u6aJlGoHzwvKah0zhwDtqWdbDS+LNQjRmjMK9v45ttoisU+iR1GjLt
 JpDVg73K315aWJ9RLsYu7mvZ8JY+qRFkwXqX3lZd+GdohY0DSmGMxMqJG93sLBYi
 42vyHMaBd1JY0rDVqpfzlmjnSxX+k05uB0GYB3fO0CXbPxmfXtRKz7/5wpSz0Up+
 nTCs6Xa7t5jtG9qpC4cKxyEDEwB9M6SSxi+lhrIBkeDqlFwXAv1Fean4jqqhtFwr
 HO6i1mYgy0xCVY7EmequVyWlH6/B5GVB2xTMQuAz3f8pKm+XUr6C/twROYkgoXbV
 AOVyRWtnbISDVjtVFn9c
 =KcDx
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "All trivial, thanks to the stuff which didn't quite make it time"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio_console: Initialize guest_connected=true for rproc_serial
  virtio: use module_virtio_driver.
  virtio: Add module driver macro for virtio drivers.
  virtio_console: Use virtio device index to generate port name
  virtio: make pci_device_id const
  virtio: make config_ops const
  virtio-mmio: fix wrong comment about register offset
  virtio_console: Let unconnected rproc device receive data.
2013-02-26 14:49:12 -08:00
Pravin B Shelar
c9af6db4c1 net: Fix possible wrong checksum generation.
Patch cef401de7b (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.

Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.

tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 13:30:10 -05:00
Rusty Russell
b2a17029c2 virtio: use module_virtio_driver.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-02-13 17:00:37 +10:30
Joe Perches
e68ed8f0d8 drivers:net:misc: Remove unnecessary alloc/OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:22:35 -05:00
David S. Miller
f1e7b73acc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:32:13 -05:00
Eric Dumazet
cef401de7b net: fix possible wrong checksum generation
Pravin Shelar mentioned that GSO could potentially generate
wrong TX checksum if skb has fragments that are overwritten
by the user between the checksum computation and transmit.

He suggested to linearize skbs but this extra copy can be
avoided for normal tcp skbs cooked by tcp_sendmsg().

This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
in skb_shinfo(skb)->gso_type if at least one frag can be
modified by the user.

Typical sources of such possible overwrites are {vm}splice(),
sendfile(), and macvtap/tun/virtio_net drivers.

Tested:

$ netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
7.7.8.84 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3959.52

$ netperf -H 7.7.8.84 -t TCP_SENDFILE
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3216.80

Performance of the SENDFILE is impacted by the extra allocation and
copy, and because we use order-0 pages, while the TCP_STREAM uses
bigger pages.

Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:27:15 -05:00
Wanlong Gao
8de4b2f3ae virtio-net: reset virtqueue affinity when doing cpu hotplug
Add a cpu notifier to virtio-net, so that we can reset the
virtqueue affinity if the cpu hotplug happens. It improve
the performance through enabling or disabling the virtqueue
affinity after doing cpu hotplug.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 01:22:51 -05:00
Wanlong Gao
8898c21cf3 virtio-net: split out clean affinity function
Split out the clean affinity function to virtnet_clean_affinity().

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 01:22:51 -05:00
Wanlong Gao
47be24796c virtio-net: fix the set affinity bug when CPU IDs are not consecutive
As Michael mentioned, set affinity and select queue will not work very
well when CPU IDs are not consecutive, this can happen with hot unplug.
Fix this bug by traversal the online CPUs, and create a per cpu variable
to find the mapping from CPU to the preferable virtual-queue.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 01:22:51 -05:00
Amos Kong
7e58d5aea8 virtio-net: introduce a new control to set macaddr
Currently we write MAC address to pci config space byte by byte,
this means that we have an intermediate step where mac is wrong.
This patch introduced a new control command to set MAC address,
it's atomic.

VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.

Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 14:07:44 -05:00
Amos Kong
40cbfc3707 move virtnet_send_command() above virtnet_set_mac_address()
We want to send vq command to set mac address in
virtnet_set_mac_address(), so do this function moving.
Fixed a little issue of coding style.

Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 14:07:43 -05:00
Linus Torvalds
b7dfde956d Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
 
 There's a slightly non-trivial merge in virtio-net, as we cleaned up the
 virtio add_buf interface while DaveM accepted the mq virtio-net patches.
 
 You can see my solution in my pending-rebases branch, if that helps, but I
 know you love merging:
 
 https://git.kernel.org/?p=linux/kernel/git/rusty/linux.git;a=commit;h=12e4e64fa66a4c812e4855de32abdb4d819526fe
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQz/vKAAoJENkgDmzRrbjx+eYQAK/egj9T8Nnth6mkzdbCFSO7
 Bciga2hDiudGCiGojTRGPRSc0VP9LgfvPbY2pxX+R9CfEqR+a8q/rRQhCS79ZwPB
 /mJy3HNiCx418HZxgwNtk6vPe0PjJm6SsjbXeB9hB+PQLCbdwA0BjpG6xjF/jitP
 noPqhhXreeQgYVxAKoFPvff/Byu2GlNnDdVMQxWRmo8hTKlTCzl0T/7BHRxthhJj
 iOrXTFzrT/osPT0zyqlngT03T4wlBvL2Bfw8d/kuRPEZ71dpIctWeH2KzdwXVCrz
 hFQGxAz4OWvW3xrNwj7c6O3SWj4VemUMjQqeA/PtRiOEI5gM0Y/Bit47dWL4wM/O
 OWUKFHzq4DFs8MmwXBgDDXl5xOjOBH9Ik4FZayn3Y7COT/B8CjFdOC2MdDGmZ9yd
 NInumg7FqP+u12g+9Vq8S/b0cfoQm4qFe8VHiPJu+jRmCZglyvLjk7oq/QwW8Gaq
 Pkzit1Ey0DWo2KvZ4D/nuXJCuhmzN/AJ10M48lLYZhtOIVg9gsa0xjhfgq4FnvSK
 xFCf3rcWnlGIXcOYh/hKU25WaCLzBuqMuSK35A72IujrQOL7OJTk4Oqote3Z3H9B
 08XJmyW6SOZdfw17X4Im1jbyuLek///xQJ9Jw/tya7j9lBt8zjJ+FmLPs4mLGEOm
 WJv9uZPs+QbIMNky2Lcb
 =myDR
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio update from Rusty Russell:
 "Some nice cleanups, and even a patch my wife did as a "live" demo for
  Latinoware 2012.

  There's a slightly non-trivial merge in virtio-net, as we cleaned up
  the virtio add_buf interface while DaveM accepted the mq virtio-net
  patches."

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
  virtio_console: Add support for remoteproc serial
  virtio_console: Merge struct buffer_token into struct port_buffer
  virtio: add drv_to_virtio to make code clearly
  virtio: use dev_to_virtio wrapper in virtio
  virtio-mmio: Fix irq parsing in command line parameter
  virtio_console: Free buffers from out-queue upon close
  virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  virtio_console: Use kmalloc instead of kzalloc
  virtio_console: Free buffer if splice fails
  virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
  virtio: console: don't rely on virtqueue_add_buf() returning capacity.
  virtio_net: don't rely on virtqueue_add_buf() returning capacity.
  virtio-net: remove unused skb_vnet_hdr->num_sg field
  virtio-net: correct capacity math on ring full
  virtio: move queue_index and num_free fields into core struct virtqueue.
  ...
2012-12-20 08:37:05 -08:00
Rusty Russell
0e3daa6491 virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
We simplified virtqueue_add_buf(), make it clear in the callers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:36 +10:30
Rusty Russell
9ed4cb0734 virtio_net: don't rely on virtqueue_add_buf() returning capacity.
Now we can easily use vq->num_free to determine if there are descriptors
left in the queue, we're about to change virtqueue_add_buf() to return 0
on success.  The virtio_net driver is the only one which actually uses
the return value, so change that.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-18 15:20:33 +10:30
Michael S. Tsirkin
7bedc7dc7c virtio-net: remove unused skb_vnet_hdr->num_sg field
[Split from "correct capacity math on ring full" -- Rusty]

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-18 15:20:32 +10:30
Michael S. Tsirkin
6ee57bcc1e virtio-net: correct capacity math on ring full
Capacity math on ring full is wrong: we are
looking at num_sg but that might be optimistic
because of indirect buffer use.

The implementation also penalizes fast path
with extra memory accesses for the benefit of
ring full condition handling which is slow path.

It's easy to query ring capacity so let's do just that.

This change also makes it easier to move vnet header
for tx around as follow-up patch does.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-18 15:20:32 +10:30
Amerigo Wang
008d427807 virtio_net: fix a typo in virtnet_alloc_queues()
Obviously it should check !vi->rq.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 13:46:28 -05:00
Jason Wang
d73bcd2c28 virtio-net: support changing the number of queue pairs through ethtool
This patch implements the ethtool_{set|get}_channels method of virtio-net to
allow user to change the number of queues when the device is running on demand.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:30:55 -05:00
Jason Wang
986a4f4d45 virtio_net: multiqueue support
This patch adds the multiqueue (VIRTIO_NET_F_MQ) support to virtio_net
driver. VIRTIO_NET_F_MQ capable device could allow the driver to do packet
transmission and reception through multiple queue pairs and does the packet
steering to get better performance. By default, one one queue pair is used, user
could change the number of queue pairs by ethtool in the next patch.

When multiple queue pairs is used and the number of queue pairs is equal to the
number of vcpus. Driver does the following optimizations to implement per-cpu
virt queue pairs:

- select the txq based on the smp processor id.
- smp affinity hint to the cpu that owns the queue pairs.

This could be used with the flow steering support of the device to guarantee the
packets of a single flow is handled by the same cpu.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:30:55 -05:00
Jason Wang
e9d7417b97 virtio-net: separate fields of sending/receiving queue from virtnet_info
To support multiqueue transmitq/receiveq, the first step is to separate queue
related structure from virtnet_info. This patch introduce send_queue and
receive_queue structure and use the pointer to them as the parameter in
functions handling sending/receiving.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:30:54 -05:00
Bill Pemberton
8cc085d604 virtio_net: remove __dev* attributes
CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03 11:16:59 -08:00
Amerigo Wang
be44389964 virtio_net: use net_*_ratelimited() helpers
These can be converted to net_*_ratelimited().

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-09 17:05:28 -05:00
Linus Torvalds
aecdc33e11 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
2012-10-02 13:38:27 -07:00
Tejun Heo
3b07e9ca26 workqueue: deprecate system_nrt[_freezable]_wq
system_nrt[_freezable]_wq are now spurious.  Mark them deprecated and
convert all users to system[_freezable]_wq.

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
Please use system[_freezable]_wq instead.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com>

Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
2012-08-20 14:51:24 -07:00
Amerigo Wang
ee89bab14e net: move and rename netif_notify_peers()
I believe net/core/dev.c is a better place for netif_notify_peers(),
because other net event notify functions also stay in this file.

And rename it to netdev_notify_peers().

Cc: David S. Miller <davem@davemloft.net>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:28:23 -07:00
Kevin Groeneveld
e3906486f6 net: fix race condition in several drivers when reading stats
Fix race condition in several network drivers when reading stats on 32bit
UP architectures.  These drivers update their stats in a BH context and
therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
stats.

Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 12:12:32 -07:00
Jiri Pirko
f2f2c8b42d virtio_net: use IFF_LIVE_ADDR_CHANGE priv_flag
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-30 01:08:00 -07:00
Jiri Pirko
d4fc6918f4 virtio_net: allow to change mac when iface is running
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 21:25:58 -07:00
Eric Dumazet
83a27052c3 virtio-net: fix a race on 32bit arches
commit 3fa2a1df90 (virtio-net: per cpu 64 bit stats (v2)) added a race
on 32bit arches.

We must use separate syncp for rx and tx path as they can be run at the
same time on different cpus. Thus one sequence increment can be lost and
readers spin forever.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10 20:23:20 -07:00
Michael S. Tsirkin
3bbf372c6c virtio-net: remove useless disable on freeze
disable_cb is just an optimization: it
can not guarantee that there are no callbacks.
In particular it doesn't have any effect when
event index is on.

Instead, detach, napi disable and reset on freeze ensure we don't run
concurrently with a callback.

Remove the useless calls so we get same behaviour
with and without event index.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-30 16:36:15 -04:00
David S. Miller
17eea0df5f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-20 21:53:04 -04:00
Michael S. Tsirkin
ec13ee8014 virtio_net: invoke softirqs after __napi_schedule
__napi_schedule might raise softirq but nothing
causes do_softirq to trigger, so it does not in fact
run. As a result,
the error message "NOHZ: local_softirq_pending 08"
sometimes occurs during boot of a KVM guest when the network service is
started and we are oom:

  ...
  Bringing up loopback interface:  [  OK  ]
  Bringing up interface eth0:
  Determining IP information for eth0...NOHZ: local_softirq_pending 08
   done.
  [  OK  ]
  ...

Further, receive queue processing might get delayed
indefinitely until some interrupt triggers:
virtio_net expected napi to be run immediately.

One way to cause do_softirq to be executed is by
invoking local_bh_enable(). As __napi_schedule is
normally called from bh or irq context, this
seems to make sense: disable bh before __napi_schedule
and enable afterwards.

In fact it's a very complicated way of calling do_softirq(),
and works since this function is only used when we are not
in interrupt context.  It's not hot at all, in any ideal scenario.

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-17 12:16:38 +03:00
Jason Wang
586d17c5a0 virtio-net: send gratuitous packets when needed
As hypervior does not have the knowledge of guest network configuration, it's
better to ask guest to send gratuitous packets when needed.

This patch implements VIRTIO_NET_F_GUEST_ANNOUNCE feature: hypervisor would
notice the guest when it thinks it's time for guest to announce the link
presnece. Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt
and woule send gratuitous packets through netif_notify_peers() and ack the
notification through ctrl vq.

We need to make sure the atomicy of read and ack in guest otherwise we may ack
more times than being notified. This is done through handling the whole config
change interrupt in an non-reentrant workqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 03:23:31 -04:00
Torsten Kaiser
31304165ff net: Fix misplaced parenthesis in virtio_net.c
Commit 2e57b79cce misplaced its
parenthesis and now tx_fifo_errors will only be incremented if an
ENOMEM error is not written to the syslog.

Correct the parenthesis and indentation to the original goal of
counting all non ENOMEM errors and ratelimiting only the messages.

Signed-of-by: Torsten Kaiser <just.for.lkml@googlemail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:44 -04:00
Rick Jones
2e57b79cce virtio_net: do not rate limit counter increments
While it is desirable to rate limit certain messages, it is not
desirable to rate limit the incrementing of counters associated
with those messages.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28 04:40:45 -04:00
Danny Kukawka
f2cedb63df net: replace random_ether_addr() with eth_hw_addr_random()
Replace usage of random_ether_addr() with eth_hw_addr_random()
to set addr_assign_type correctly to NET_ADDR_RANDOM.

Change the trivial cases.

v2: adapt to renamed eth_hw_addr_random()

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 15:34:16 -05:00
Eric Dumazet
58472a76cf virtio: net: remove sparse errors
commit 3fa2a1df90 (virtio-net: per cpu 64 bit stats (v2)) added extra
__percpu qualifiers and sparse errors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 15:57:31 -05:00
Amit Shah
0741bcb558 virtio: net: Add freeze, restore handlers to support S4
Remove all the vqs, disable napi and detach from the netdev on
hibernation.

Re-create vqs after restoring from a hibernated image, re-enable napi
and re-attach the netdev.  This keeps networking working across
hibernation.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:46 +10:30
Amit Shah
04486ed019 virtio: net: Move vq and vq buf removal into separate function
The remove and PM freeze functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:46 +10:30
Amit Shah
3f9c10b0d4 virtio: net: Move vq initialization into separate function
The probe and PM restore functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:46 +10:30
Rusty Russell
f96fde41f7 virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf
Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2012-01-12 15:44:42 +10:30
Mike Waychison
3464645a10 virtio_net: Pass gfp flags when allocating rx buffers.
Currently, the refill path for RX buffers will always allocate the
buffers as GFP_ATOMIC, even if we are in process context.  This will
fail to apply memory pressure as the worker thread will not contribute
to the freeing of memory.

Fix this by changing add_recvbuf_small to use the gfp variant allocator,
__netdev_alloc_skb_ip_align().

Signed-off-by: Mike Waychison <mikew@google.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05 13:20:40 -05:00
Rusty Russell
f1776dade1 virtio_net: use non-reentrant workqueue.
Michael S. Tsirkin also noticed that we could run the refill work
multiple CPUs: if we kick off a refill on one CPU and then on another,
they would both manipulate the queue at the same time (they use
napi_disable to avoid racing against the receive handler itself).

Tejun points out that this is what the WQ_NON_REENTRANT flag is for,
and that there is a convenient system kthread we can use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 16:44:29 -05:00
Rusty Russell
b2baed69e6 virtio_net: set/cancel work on ndo_open/ndo_stop
Michael S. Tsirkin noticed that we could run the refill work after
ndo_close, which can re-enable napi - we don't disable it until
virtnet_remove.  This is clearly wrong, so move the workqueue control
to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).

One subtle point: virtnet_probe() could simply fail if it couldn't
allocate a receive buffer, but that's less polite in virtnet_open() so
we schedule a refill as we do in the normal receive path if we run out
of memory.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 16:44:29 -05:00
Rusty Russell
eb93992207 module_param: make bool parameters really bool (net & drivers/net)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 22:27:29 -05:00
Jiri Pirko
8e586137e6 net: make vlan ndo_vlan_rx_[add/kill]_vid return error value
Let caller know the result of adding/removing vlan id to/from vlan
filter.

In some drivers I make those functions to just return 0. But in those
where there is able to see if hw setup went correctly, return value is
set appropriately.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:37 -05:00
Rick Jones
021ac8d387 virtio_net: return already tracked tx_fifo_errors via virtnet_getstats()
Tx_fifo_errors are tracked in start_xmit_ for virtio_net, but not
reported in the tallies returned by virtnet_stats().  Return them
as the rx "sub-stats" rx_length_errors and rx_frame_errors are.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26 16:00:12 -05:00
Rick Jones
66846048f5 enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs
Add a new .bus_name to virtio_config_ops then modify virtio_net to
call through to it in an ethtool .get_drvinfo routine to report
bus_info in ethtool -i output which is consistent with other
emulated NICs and the output of lspci.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16 17:26:46 -05:00
Sasha Levin
77dd7693c5 virtio-net: Use virtio_config_val() for retrieving config
This patch modifies virtio-net to use virtio_config_val() instead
of a 'if(virtio_has_feature()) vdev->config->get()' construct to retrieve
optional values from the config space.

Cc: Amit Shah <amit.shah@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:40:58 +10:30
Rick Jones
8f9f4668b3 Add ethtool -g support to virtio_net
Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:07:21 -04:00
Eric Dumazet
4b727361f0 virtio_net: fix truesize underestimation
We must account in skb->truesize, the size of the fragments, not the
used part of them.

Doing this work is important to avoid unexpected OOM situations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
CC: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:00 -04:00
Krishna Kumar
8a59a7b94f virtio_net: Clean up set_skb_frag()
Remove manual initialization in set_skb_frag, and instead
use __skb_fill_page_desc() to do the same. Patch tested
on net-next.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 04:53:56 -04:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Sasha Levin
e878d78b9a virtio-net: Verify page list size before fitting into skb
This patch verifies that the length of a buffer stored in a linked list
of pages is small enough to fit into a skb.

If the size is larger than a max size of a skb, it means that we shouldn't
go ahead building skbs anyway since we won't be able to send the buffer as
the user requested.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-06 15:39:40 -04:00
Ian Campbell
86ee8130a4 virtionet: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-22 16:04:18 -04:00
Jiri Pirko
01789349ee net: introduce IFF_UNICAST_FLT private flag
Use IFF_UNICAST_FTL to find out if driver handles unicast address
filtering. In case it does not, promisc mode is entered.

Patch also fixes following drivers:
stmmac, niu: support uc filtering and yet it propagated
	ndo_set_multicast_list
bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set
	ndo_set_rx_mode but do not support uc filtering

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:21:27 -07:00
Krishna Kumar
2e66f55b3a virtio_net: Fix panic in virtnet_remove
Fix a panic in virtnet_remove. unregister_netdev has already
freed up the netdev (and virtnet_info) due to dev->destructor
being set, while virtnet_info is still required. Remove
virtnet_free altogether, and move the freeing of the per-cpu
statistics from virtnet_free to virtnet_remove.

Tested patch below.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 21:29:34 -07:00
stephen hemminger
3fa2a1df90 virtio-net: per cpu 64 bit stats (v2)
Use per-cpu variables to maintain 64 bit statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-06-30 20:43:38 -07:00
Jason Wang
10a8d94a95 virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID
There's no need for the guest to validate the checksum if it have been
validated by host nics. So this patch introduces a new flag -
VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
examing in guest. The backend (tap/macvtap) may set this flag when
met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.

No feature negotiation is needed as old driver just ignore this flag.

Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
when gro is on no difference as it produces skb with partial
checksum. But when gro is disabled, 20% or even higher improvement
could be measured by netperf.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-11 15:57:47 -07:00
Michael S. Tsirkin
7a66f78437 virtio_net: delay TX callbacks
Ask for delayed callbacks on TX ring full, to give the
other side more of a chance to make progress.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:16 +09:30
Michał Mirosław
98e778c9aa virtio_net: convert to hw_features
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-01 20:53:35 -07:00
Bruce Rogers
3e9d08ec0a virtio_net: Add schedule check to napi_enable call
Under harsh testing conditions, including low memory, the guest would
stop receiving packets. With this patch applied we no longer see any
problems in the driver while performing these tests for extended periods
of time.

Make sure napi is scheduled subsequent to each napi_enable.

Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Olaf Kirch <okir@suse.de>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10 11:03:31 -08:00
Michał Mirosław
55508d601d net: Use skb_checksum_start_offset()
Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:43:14 -08:00
Jason Wang
167c25e4c5 virtio-net: init link state correctly
For device that supports VIRTIO_NET_F_STATUS, there's no need to
assume the link is up and we need to call nerif_carrier_off() before
querying device status, otherwise we may get wrong operstate after
diver was loaded because the link watch event was not fired as
expected.

For device that does not support VIRITO_NET_F_STATUS, we could not get
its status through virtnet_update_status() and what we can only do is
always assuming the link is up.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 12:21:18 -08:00
Ben Hutchings
0141480205 ethtool: Provide a default implementation of ethtool_ops::get_drvinfo
The driver name and bus address for a net_device can normally be found
through the driver model now.  Instead of requiring drivers to provide
this information redundantly through the ethtool_ops::get_drvinfo
operation, use the driver model to do so if the driver does not define
the operation.  Since ETHTOOL_GDRVINFO no longer requires the driver
to implement any operations, do not require net_device::ethtool_ops to
be set either.

Remove implementations of get_drvinfo and ethtool_ops that provide
only this information.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-17 02:31:15 -07:00
Rusty Russell
a767bde4d4 virtio_net: implements ethtool_ops.get_drvinfo
I often use "ethtool -i" command to check what driver controls the
ehternet device.  But because current virtio_net driver doesn't
support "ethtool -i", it becomes the following:

        # ethtool -i eth3
        Cannot get driver information: Operation not supported

This patch simply adds the "ethtool -i" support. The following is the
result when using the virtio_net driver with my patch applied to.

        # ethtool -i eth3
        driver: virtio_net
        version: N/A
        firmware-version: N/A
        bus-info: virtio0

Personally, "-i" is one of the most frequently-used option, and most
network drivers support "ethtool -i", so I think virtio_net also
should do.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use ARRAY_SIZE)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-04 21:53:17 -07:00
Rusty Russell
58eba97d07 virtio_net: fix oom handling on tx
virtio net will never try to overflow the TX ring, so the only reason
add_buf may fail is out of memory. Thus, we can not stop the
device until some request completes - there's no guarantee anything
at all is outstanding.

Make the error message clearer as well: error here does not
indicate queue full.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (...and avoid TX_BUSY)
Cc: stable@kernel.org  # .34.x (s/virtqueue_/vi->svq->vq_ops->/)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02 22:27:26 -07:00
Michael S. Tsirkin
1788f49548 virtio_net: do not reschedule rx refill forever
We currently fill all of RX ring, then add_buf
returns ENOSPC, which gets mis-detected as an out of
memory condition and causes us to reschedule the work,
and so on forever. Fix this by oom = err == -ENOMEM;

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .34.x
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02 22:27:25 -07:00
Michael S. Tsirkin
aa989f5e46 virtio-net: pass gfp to add_buf
virtio-net bounces buffer allocations off to
a thread if it can't allocate buffers from the atomic
pool. However, if posting buffers still requires atomic
buffers, this is unlikely to succeed.
Fix by passing in the proper gfp_t parameter.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-01 00:21:20 -07:00
Linus Torvalds
1756ac3d3c Merge branch 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
  drivers/char: Eliminate use after free
  virtio: console: Accept console size along with resize control message
  virtio: console: Store each console's size in the console structure
  virtio: console: Resize console port 0 on config intr only if multiport is off
  virtio: console: Add support for nonblocking write()s
  virtio: console: Rename wait_is_over() to will_read_block()
  virtio: console: Don't always create a port 0 if using multiport
  virtio: console: Use a control message to add ports
  virtio: console: Move code around for future patches
  virtio: console: Remove config work handler
  virtio: console: Don't call hvc_remove() on unplugging console ports
  virtio: console: Return -EPIPE to hvc_console if we lost the connection
  virtio: console: Let host know of port or device add failures
  virtio: console: Add a __send_control_msg() that can send messages without a valid port
  virtio: Revert "virtio: disable multiport console support."
  virtio: add_buf_gfp
  trans_virtio: use virtqueue_xxx wrappers
  virtio-rng: use virtqueue_xxx wrappers
  virtio_ring: remove a level of indirection
  virtio_net: use virtqueue_xxx wrappers
  ...

Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
wrappers changes conflicting with some other cleanups.
2010-05-21 17:22:52 -07:00
Michael S. Tsirkin
1915a712f2 virtio_net: use virtqueue_xxx wrappers
Switch virtio_net to new virtqueue_xxx wrappers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:43 +09:30
David S. Miller
b4bf665c57 virtio_net: Fix mis-merge.
Pointed out by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-14 06:45:44 -07:00
David S. Miller
dad1e54b12 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/pcmcia/smc91c92_cs.c
	drivers/net/virtio_net.c
2010-04-14 05:01:33 -07:00
Shirley Ma
0e413f22e4 virtio_net: missing sg_init_table
Add missing sg_init_table for sg_set_buf in virtio_net which
induced in defer skb patch.

Reported-by: Thomas Müller <thomas@mathtm.de>
Tested-by: Thomas Müller <thomas@mathtm.de>
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-12 22:00:34 -07:00
David S. Miller
871039f02f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/stmmac/stmmac_main.c
	drivers/net/wireless/wl12xx/wl1271_cmd.c
	drivers/net/wireless/wl12xx/wl1271_main.c
	drivers/net/wireless/wl12xx/wl1271_spi.c
	net/core/ethtool.c
	net/mac80211/scan.c
2010-04-11 14:53:53 -07:00
Michael S. Tsirkin
5e01d2f91d virtio-net: move sg off stack
Move sg structure off stack and into virtnet_info structure.
This helps remove extra sg_init_table calls as well as reduce
stack usage.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:01:41 -07:00
Jiri Pirko
22bedad3ce net: convert multicast list to list_head
Converts the list and the core manipulating with it to be the same as uc_list.

+uses two functions for adding/removing mc address (normal and "global"
 variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
 manipulation with lists on a sandbox (used in bonding and 80211 drivers)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:22:15 -07:00
Shirley Ma
2c45cd43ff virtio_net: missing sg_init_table
Add missing sg_init_table for sg_set_buf in virtio_net which
induced in defer skb patch.

Reported-by: Thomas Müller <thomas@mathtm.de>
Tested-by: Thomas Müller <thomas@mathtm.de>
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 18:31:59 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jiri Pirko
2507c05ff5 virtio_net: remove forgotten assignment
This is no longer needed. I missed to remove this in
567ec874d1 ("net: convert multiple
drivers to use netdev_for_each_mc_addr, part6")

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-02 03:57:33 -08:00
Jiri Pirko
567ec874d1 net: convert multiple drivers to use netdev_for_each_mc_addr, part6
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26 02:07:31 -08:00
Shirley Ma
830a8a976f virtio_net: remove send queue
Now we have a virtio detach API (in commit
f9bfbebf34), we don't need to track xmit
skbs in the virio_net driver, which improves transmission performance.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 12:27:56 -08:00
Jiri Pirko
4cd24eaf0c net: use netdev_mc_count and netdev_mc_empty when appropriate
This patch replaces dev->mc_count in all drivers (hopefully I didn't miss
anything). Used spatch and did small tweaks and conding style changes when
it was suitable.

Jirka

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 11:38:58 -08:00
Shirley Ma
9ab86bbcf8 virtio_net: Defer skb allocation in receive path Date: Wed, 13 Jan 2010 12:53:38 -0800
virtio_net receives packets from its pre-allocated vring buffers, then it
delivers these packets to upper layer protocols as skb buffs. So it's not
necessary to pre-allocate skb for each mergable buffer, then frees extra
skbs when buffers are merged into a large packet. This patch has deferred
skb allocation in receiving packets for both big packets and mergeable buffers
to reduce skb pre-allocations and skb frees. It frees unused buffers by calling
detach_unused_buf in vring, so recv skb queue is not needed.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-02 15:55:42 -08:00
David S. Miller
05ba712d7e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-01-28 06:12:38 -08:00
Herbert Xu
39d3215774 virtio_net: Make delayed refill more reliable
I have seen RX stalls on a machine that experienced a suspected
OOM.  After the stall, the RX buffer is empty on the guest side
and there are exactly 16 entries available on the host side.  As
the number of entries is less than that required by a maximal
skb, the host cannot proceed.

The guest did not have a refill job scheduled.

My diagnosis is that an OOM had occured, with the delayed refill
job scheduled.  The job was able to allocate at least one skb, but
not enough to overcome the minimum required by the host to proceed.

As the refill job would only reschedule itself if it failed completely
to allocate any skbs, this would lead to an RX stall.

The following patch removes this stall possibility by always
rescheduling the refill job until the ring is totally refilled.

Testing has shown that the RX stall no longer occurs whereas
previously it would occur within a day.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-25 15:51:01 -08:00
Jiri Pirko
32e7bfc411 net: use helpers to access uc list V2
This patch introduces three macros to work with uc list from net drivers.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-25 13:36:10 -08:00
Joe Perches
8e95a2026f drivers/net: Move && and || to end of previous line
Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.

Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)

Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 13:18:01 -08:00
David S. Miller
3505d1a9fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sfc/sfe4001.c
	drivers/net/wireless/libertas/cmd.c
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/rtl8187se/Kconfig
	drivers/staging/rtl8192e/Kconfig
2009-11-18 22:19:03 -08:00
Linus Torvalds
1ce55238e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net/fsl_pq_mdio: add module license GPL
  can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
  can: should not use __dev_get_by_index() without locks
  hisax: remove bad udelay call to fix build error on ARM
  ipip: Fix handling of DF packets when pmtudisc is OFF
  qlge: Set PCIe reset type for EEH to fundamental.
  qlge: Fix early exit from mbox cmd complete wait.
  ixgbe: fix traffic hangs on Tx with ioatdma loaded
  ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
  ixgbe: Fix gso_max_size for 82599 when DCB is enabled
  macsonic: fix crash on PowerBook 520
  NET: cassini, fix lock imbalance
  ems_usb: Fix byte order issues on big endian machines
  be2net: Bug fix to send config commands to hardware after netdev_register
  be2net: fix to set proper flow control on resume
  netfilter: xt_connlimit: fix regression caused by zero family value
  rt2x00: Don't queue ieee80211 work after USB removal
  Revert "ipw2200: fix oops on missing firmware"
  decnet: netdevice refcount leak
  netfilter: nf_nat: fix NAT issue in 2.6.30.4+
  ...
2009-11-09 09:51:42 -08:00
David S. Miller
230f9bb701 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/usb/cdc_ether.c

All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 00:55:55 -08:00
Uwe Kleine-König
22402529df virtio_net: rename driver struct to please modpost
Commit

	3d1285b (move virtnet_remove to .devexit.text)

introduced the first reference to __devexit in struct virtio_driver
virtio_net which upset modpost ("Section mismatch in reference from the
variable virtio_net to the function .devexit.text:virtnet_remove()").

Fix this by renaming virtio_net to virtio_net_driver.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Blame-taken-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-05 01:32:44 -08:00
David S. Miller
0519d83d83 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-29 21:28:59 -07:00
Linus Torvalds
49b2de8e6f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits)
  net: Fix 'Re: PACKET_TX_RING: packet size is too long'
  netdev: usb: dm9601.c can drive a device not supported yet, add support for it
  qlge: Fix firmware mailbox command timeout.
  qlge: Fix EEH handling.
  AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)
  bonding: fix a race condition in calls to slave MII ioctls
  virtio-net: fix data corruption with OOM
  sfc: Set ip_summed correctly for page buffers passed to GRO
  cnic: Fix L2CTX_STATUSB_NUM offset in context memory.
  MAINTAINERS: rt2x00 list is moderated
  airo: Reorder tests, check bounds before element
  mac80211: fix for incorrect sequence number on hostapd injected frames
  libertas spi: fix sparse errors
  mac80211: trivial: fix spelling in mesh_hwmp
  cfg80211: sme: deauthenticate on assoc failure
  mac80211: keep auth state when assoc fails
  mac80211: fix ibss joining
  b43: add 'struct b43_wl' missing declaration
  b43: Fix Bugzilla #14181 and the bug from the previous 'fix'
  rt2x00: Fix crypto in TX frame for rt2800usb
  ...
2009-10-29 09:22:08 -07:00
Michael S. Tsirkin
03f191bab7 virtio-net: fix data corruption with OOM
virtio net used to unlink skbs from send queues on error,
but ever since 48925e372f
we do not do this. This causes guest data corruption and crashes
with vhost since net core can requeue the skb or free it without
it being taken off the list.

This patch fixes this by queueing the skb after successful
transmit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28 04:03:38 -07:00
David S. Miller
cfadf853f6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sh_eth.c
2009-10-27 01:03:26 -07:00
Linus Torvalds
964fe080d9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  move virtrng_remove to .devexit.text
  move virtballoon_remove to .devexit.text
  virtio_blk: Revert serial number support
  virtio: let header files include virtio_ids.h
  virtio_blk: revert QUEUE_FLAG_VIRT addition
2009-10-23 07:35:16 +09:00
Christian Borntraeger
e95646c3ec virtio: let header files include virtio_ids.h
Rusty,

commit 3ca4f5ca73
    virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:28 +10:30
Eric Dumazet
ed79bab847 virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
Because netpoll can call netdevice start_xmit() method with
irqs disabled, drivers should not call kfree_skb() from
their start_xmit(), but use dev_kfree_skb_any() instead.

Oct  8 11:16:52 172.30.1.31 [113074.791813] ------------[ cut here ]------------
Oct  8 11:16:52 172.30.1.31 [113074.791813] WARNING: at net/core/skbuff.c:398 \
                skb_release_head_state+0x64/0xc8()
Oct  8 11:16:52 172.30.1.31 [113074.791813] Hardware name:
Oct  8 11:16:52 172.30.1.31 [113074.791813] Modules linked in: netconsole ocfs2 jbd2 quota_tree \
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs crc32c drbd cn loop \
serio_raw psmouse snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net pcspkr parport_pc parport \
i2c_piix4 i2c_core button processor evdev ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot \
dm_mod ide_cd_mod cdrom ata_generic ata_piix virtio_blk libata scsi_mod piix ide_pci_generic ide_core \
                virtio_pci virtio_ring virtio floppy thermal fan thermal_sys [last unloaded: netconsole]
Oct  8 11:16:52 172.30.1.31 [113074.791813] Pid: 11132, comm: php5-cgi Tainted: G        W  \
                2.6.31.2-vserver #1
Oct  8 11:16:52 172.30.1.31 [113074.791813] Call Trace:
Oct  8 11:16:52 172.30.1.31 [113074.791813] <IRQ>  [<ffffffff81253cd5>] ? \
                skb_release_head_state+0x64/0xc8
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253cd5>] ? skb_release_head_state+0x64/0xc8
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81049ae1>] ? warn_slowpath_common+0x77/0xa3
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253cd5>] ? skb_release_head_state+0x64/0xc8
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253a1a>] ? __kfree_skb+0x9/0x7d
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa01cb139>] ? free_old_xmit_skbs+0x51/0x6e \
                [virtio_net]
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa01cbc85>] ? start_xmit+0x26/0xf2 [virtio_net]
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8126934f>] ? netpoll_send_skb+0xd2/0x205
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa0429216>] ? write_msg+0x90/0xeb [netconsole]
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81049f06>] ? __call_console_drivers+0x5e/0x6f
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8102b49d>] ? kvm_clock_read+0x4d/0x52
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8104a082>] ? release_console_sem+0x115/0x1ba
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8104a632>] ? vprintk+0x2f2/0x34b
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8106b142>] ? vx_update_load+0x18/0x13e
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81308309>] ? printk+0x4e/0x5d
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8102b49d>] ? kvm_clock_read+0x4d/0x52
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81070b62>] ? getnstimeofday+0x55/0xaf
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81062683>] ? ktime_get_ts+0x21/0x49
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff810626b7>] ? ktime_get+0xc/0x41
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81062788>] ? hrtimer_interrupt+0x9c/0x146
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81024a4b>] ? smp_apic_timer_interrupt+0x80/0x93
Oct  8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81011663>] ? apic_timer_interrupt+0x13/0x20
Oct  8 11:16:52 172.30.1.31 [113074.791813] <EOI>  [<ffffffff8130a9eb>] ? _spin_unlock_irq+0xd/0x31

Reported-and-tested-by: Massimo Cetra <mcetra@navynet.it>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Bug-Entry: http://bugzilla.kernel.org/show_bug.cgi?id=14378
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 23:29:56 -07:00
Eric Dumazet
89d71a66c4 net: Use netdev_alloc_skb_ip_align()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 11:48:18 -07:00
Uwe Kleine-König
3d1285beff move virtnet_remove to .devexit.text
The function virtnet_remove is used only wrapped by __devexit_p so define
it using __devexit.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alex Williamson <alex.williamson@hp.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-01 14:34:44 -07:00
Amit Shah
0aea51c37f virtio_net: Check for room in the vq before adding buffer
Saves us one cycle of alloc-add-free if the queue was full.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
2009-09-24 09:59:21 +09:30
Rusty Russell
48925e372f virtio_net: avoid (most) NETDEV_TX_BUSY by stopping queue early.
Now we can tell the theoretical capacity remaining in the output
queue, virtio_net can waste entries by stopping the queue early.

It doesn't work in the case of indirect buffers and kmalloc failure,
but that's rare (we could drop the packet in that case, but other
drivers return TX_BUSY for similar reasons).

For the record, I think this patch reflects poorly on the linux
network API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
2009-09-24 09:59:20 +09:30
Rusty Russell
b3f24698a7 virtio_net: formalize skb_vnet_hdr
We put the virtio_net_hdr into the skb's cb region; turn this into a
union to clean up the code slightly and allow future expansion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
2009-09-24 09:59:20 +09:30
Rusty Russell
b0c39dbdc2 virtio_net: don't free buffers in xmit ring
The virtio_net driver is complicated by the two methods of freeing old
xmit buffers (in addition to freeing old ones at the start of the xmit
path).

The original code used a 1/10 second timer attached to xmit_free(),
reset on every xmit.  Before we orphaned skbs on xmit, the
transmitting userspace could block with a full socket until the timer
fired, the skb destructor was called, and they were re-woken.

So we added the VIRTIO_F_NOTIFY_ON_EMPTY feature: supporting devices
send an interrupt (even if normally suppressed) on an empty xmit ring
which makes us schedule xmit_tasklet().  This was a benchmark win.

Unfortunately, VIRTIO_F_NOTIFY_ON_EMPTY makes quite a lot of work: a
host which is faster than the guest will fire the interrupt every xmit
packet (slowing the guest down further).  Attempting mitigation in the
host adds overhead of userspace timers (possibly with the additional
pain of signals), and risks increasing latency anyway if you get it
wrong.

In practice, this effect was masked by benchmarks which take advantage
of GSO (with its inherent transmit batching), but it's still there.

Now we orphan xmitted skbs, the pressure is off: remove both paths and
no longer request VIRTIO_F_NOTIFY_ON_EMPTY.  Note that the current
QEMU will notify us even if we don't negotiate this feature (legal,
but suboptimal); a patch is outstanding to improve that.

Move the skb_orphan/nf_reset to after we've done the send and notified
the other end, for a slight optimization.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mark McLoughlin <markmc@redhat.com>
2009-09-24 09:59:19 +09:30
Rusty Russell
8958f574db virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb.
This effectively reverts 99ffc696d1
"virtio: wean net driver off NETDEV_TX_BUSY".

The complexity of queuing an skb (setting a tasklet to re-xmit) is
questionable, especially once we get rid of the other reason for the
tasklet in the next patch.

If the skb won't fit in the tx queue, just return NETDEV_TX_BUSY.
This is frowned upon, so a followup patch uses a more complex solution.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
2009-09-24 09:59:19 +09:30
Rusty Russell
2b5bbe3b8b virtio_net: skb_orphan() and nf_reset() in xmit path.
The complex transmit free logic was introduced to avoid hangs on
removing the ip_conntrack module and also because drivers aren't
generally supposed to keep stale skbs for unbounded times.

After some debate, it was decided that while doing skb_orphan()
generally is a rat's nest, we can do it in this driver.  Following
patches take advantage of this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:59:18 +09:30
Fernando Luis Vazquez Cao
3ca4f5ca73 virtio: add virtio IDs file
Virtio IDs are spread all over the tree which makes assigning new IDs
bothersome. Putting them together should make the process less error-prone.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-23 22:26:32 +09:30
Rusty Russell
3c1b27d504 virtio: make add_buf return capacity remaining
This API change means that virtio_net can tell how much capacity
remains for buffers.  It's necessarily fuzzy, since
VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors
in one, *if* we can kmalloc.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
2009-09-23 22:26:31 +09:30
Stephen Hemminger
0fc0b732ea netdev: drivers should make ethtool_ops const
No need to put ethtool_ops in data, they should be const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:33 -07:00
David S. Miller
6cdee2f96a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/yellowfin.c
2009-09-02 00:32:56 -07:00
Stephen Hemminger
424efe9caf netdev: convert pseudo drivers to netdev_tx_t
These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:40 -07:00
Rusty Russell
3161e453e4 virtio: net refill on out-of-memory
If we run out of memory, use keventd to fill the buffer.  There's a
report of this happening: "Page allocation failures in guest",
Message-ID: <20090713115158.0a4892b0@mjolnir.ossman.eu>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-26 12:22:32 -07:00
Sridhar Samudrala
5c5167515d virtio-net: Allow UFO feature to be set and advertised.
- Allow setting UFO on virtio-net and advertise to host.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-17 10:10:58 -07:00
Jiri Pirko
31278e7147 net: group address list and its count
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
   on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c              |    4 +-
 drivers/net/e1000/e1000_main.c  |    4 +-
 drivers/net/ixgbe/ixgbe_main.c  |    6 +-
 drivers/net/mv643xx_eth.c       |    2 +-
 drivers/net/niu.c               |    4 +-
 drivers/net/virtio_net.c        |   10 ++--
 drivers/s390/net/qeth_l2_main.c |    2 +-
 include/linux/netdevice.h       |   17 +++--
 net/core/dev.c                  |  130 ++++++++++++++++++--------------------
 9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:08 -07:00
David S. Miller
9cbc1cb8cd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/scsi/fcoe/fcoe.c
	net/core/drop_monitor.c
	net/core/net-traces.c
2009-06-15 03:02:23 -07:00
Michael S. Tsirkin
d2a7ddda9f virtio: find_vqs/del_vqs virtio operations
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
2009-06-12 22:16:36 +09:30
Rusty Russell
9499f5e7ed virtio: add names to virtqueue struct, mapping from devices to queues.
Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:36 +09:30
Herbert Xu
8981f01001 virtio_net: Fix IP alignment on non-mergeable RX path
We need to enforce the IP alignment on the non-mergeable RX path just
like the other RX path.  Not doing so results in misaligned IP
headers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 20:55:17 -07:00
Herbert Xu
b82f08ea16 virtio_net: Set correct gso->hdr_len
Through a bug in the tun driver, I noticed that virtio_net is
producing bogus hdr_len values.  In particular, it only includes
the IP header in the linear area, and excludes the entire TCP
header.  This causes the TCP header to be copied twice for each
packet.  (The bug omitted the second copy :)

This patch corrects this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:19:11 -07:00
Jiri Pirko
ccffad25b5 net: convert unicast addr list
This patch converts unicast address list to standard list_head using
previously introduced struct netdev_hw_addr. It also relaxes the
locking. Original spinlock (still used for multicast addresses) is not
needed and is no longer used for a protection of this list. All
reading and writing takes place under rtnl (with no changes).

I also removed a possibility to specify the length of the address
while adding or deleting unicast address. It's always dev->addr_len.

The convertion touched especially e1000 and ixgbe codes when the
change is not so trivial.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c               |   13 +--
 drivers/net/e1000/e1000_main.c   |   24 +++--
 drivers/net/ixgbe/ixgbe_common.c |   14 ++--
 drivers/net/ixgbe/ixgbe_common.h |    4 +-
 drivers/net/ixgbe/ixgbe_main.c   |    6 +-
 drivers/net/ixgbe/ixgbe_type.h   |    4 +-
 drivers/net/macvlan.c            |   11 +-
 drivers/net/mv643xx_eth.c        |   11 +-
 drivers/net/niu.c                |    7 +-
 drivers/net/virtio_net.c         |    7 +-
 drivers/s390/net/qeth_l2_main.c  |    6 +-
 drivers/scsi/fcoe/fcoe.c         |   16 ++--
 include/linux/netdevice.h        |   18 ++--
 net/8021q/vlan.c                 |    4 +-
 net/8021q/vlan_dev.c             |   10 +-
 net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
 net/dsa/slave.c                  |   10 +-
 net/packet/af_packet.c           |    4 +-
 18 files changed, 227 insertions(+), 137 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 22:12:32 -07:00
David S. Miller
d252a5e7b7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-03 14:07:43 -07:00
Alex Williamson
1824a98974 virtio_net: Fix function name typo
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-01 21:26:36 -07:00
Alex Williamson
23e258e1a8 virtio_net: Cleanup command queue scatterlist usage
We were avoiding calling sg_init* on scatterlists passed
into virtnet_send_command to prevent extraneous end markers.
This caused build warnings for uninitialized variables.
Cleanup the code to create proper scatterlists.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-01 21:26:36 -07:00
Alexander Beregalov
0ee904c35c drivers/net: replace BUG() with BUG_ON() if possible
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-13 15:44:36 -07:00
Alex Williamson
62994b2d6b virtio_net: Set the mac config only when VIRITO_NET_F_MAC
VIRTIO_NET_F_MAC indicates the presence of the mac field in config
space, not the validity of the value it contains.  Allow the mac to be
changed at runtime, but only push the change into config space with the
VIRTIO_NET_F_MAC feature present.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-04 16:40:19 -07:00
David S. Miller
2b1c4354de Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/virtio_net.c
2009-03-20 02:27:41 -07:00
Pantelis Koukousoulas
4783256ef9 virtio_net: Make virtio_net support carrier detection
Impact: Make NetworkManager work with virtio_net

For now the semantics are simple: There is always carrier.

This allows a seamless experience with e.g., qemu/kvm
where NetworkManager just configures and sets up
everything automagically.

If/when a generally agreed-upon way to control
carrier on/off in the emulator/hypervisor level
emerges, it will be trivial to extend the driver
to support that too, but for now even this 2-liner
makes user experience that much better.

Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:40:02 -07:00
Alex Williamson
9c46f6d42f virtio_net: Allow setting the MAC address of the NIC
Many physical NICs let the OS re-program the "hardware" MAC
address.  Virtual NICs should allow this too.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:36:34 -08:00
Alex Williamson
0bde95690d virtio_net: Add support for VLAN filtering in the hypervisor
VLAN filtering allows the hypervisor to drop packets from VLANs
that we're not a part of, further reducing the number of extraneous
packets recieved.  This makes use of the VLAN virtqueue command class.
The CTRL_VLAN feature bit tells us whether the backend supports VLAN
filtering.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:35:13 -08:00
Alex Williamson
f565a7c259 virtio_net: Add a MAC filter table
Make use of the MAC control virtqueue class to support a MAC
filter table.  The filter table is managed by the hypervisor.
We consider the table to be available if the CTRL_RX feature
bit is set.  We leave it to the hypervisor to manage the table
and enable promiscuous or all-multi mode as necessary depending
on the resources available to it.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:35:13 -08:00
Alex Williamson
2af7698e2d virtio_net: Add a set_rx_mode interface
Make use of the RX_MODE control virtqueue class to enable the
set_rx_mode netdev interface.  This allows us to selectively
enable/disable promiscuous and allmulti mode so we don't see
packets we don't want.  For now, we automatically enable these
as needed if additional unicast or multicast addresses are
requested.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:35:12 -08:00
Alex Williamson
2a41f71d3b virtio_net: Add a virtqueue for outbound control commands
This will be used for RX mode, MAC filter table, VLAN filtering, etc...

The control transaction consists of one or more "out" sg entries and
one or more "in" sg entries.  The first out entry contains a header
defining the class and command.  Additional out entries may provide
data for the command.  The last in entry provides a status response
back from the command.

Virtqueues typically run asynchronous, running a callback function
when there's data in the channel.  We can't readily make use of this
in the command paths where we need to use this.  Instead, we kick
the virtqueue and spin.  The kick causes an I/O write, triggering an
immediate trap into the hypervisor.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:35:11 -08:00
David S. Miller
05bee47377 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/e1000/e1000_main.c
2009-01-30 14:31:07 -08:00
Ira W. Snyder
8527bec548 virtio_net: use correct accessors for scatterlists
Without this fix, virtio_net makes incorrect usage of scatterlists. It sets
the end of the scatterlist chain after the first element, despite the fact
that more entries come after it.

If you try to run dma_map_sg() on one of the scatterlists given to you by
add_buf(), you will get a null pointer oops.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-26 21:00:33 -08:00
David S. Miller
3eacdf58c2 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-01-26 17:43:16 -08:00
Alex Williamson
e918085aaf virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs
802.1Q expanded the maximum ethernet frame size by 4 bytes for the
VLAN tag.  We're not taking this into account in virtio_net, which
means the buffers we provide to the backend in the virtqueue RX ring
aren't big enough to hold a full MTU VLAN packet.  For QEMU/KVM,
this results in the backend exiting with a packet truncation error.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-25 18:06:26 -08:00
Mark McLoughlin
9f4d26d0f3 virtio_net: add link status handling
Allow the host to inform us that the link is down by adding
a VIRTIO_NET_F_STATUS which indicates that device status is
available in virtio_net config.

This is currently useful for simulating link down conditions
(e.g. using proposed qemu 'set_link' monitor command) but
would also be needed if we were to support device assignment
via virtio.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added future masking)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:34:53 -08:00
Ben Hutchings
288379f050 net: Remove redundant NAPI functions
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:33:50 -08:00
Stephen Hemminger
76288b4e57 virtio: convert to net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:44:22 -08:00
Neil Horman
908a7a16b8 net: Remove unused netdev arg from some NAPI interfaces.
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter.  This patch cleans up that api by
properly removing it..

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-22 20:43:12 -08:00
Mark McLoughlin
39da5814db virtio_net: large tx MTU support
We don't really have a max tx packet size limit, so allow configuring
the device with up to 64k tx MTU.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-02 22:12:49 +10:30
Mark McLoughlin
3f2c31d903 virtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation)
If segmentation offload is enabled by the host, we currently allocate
maximum sized packet buffers and pass them to the host. This uses up
20 ring entries, allowing us to supply only 20 packet buffers to the
host with a 256 entry ring. This is a huge overhead when receiving
small packets, and is most keenly felt when receiving MTU sized
packets from off-host.

The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
using receive buffers which are smaller than the maximum packet size.
In order to transfer large packets to the guest, the host merges
together multiple receive buffers to form a larger logical buffer.
The number of merged buffers is returned to the guest via a field in
the virtio_net_hdr.

Make use of this support by supplying single page receive buffers to
the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
the payload to the skb's linear data buffer and adjust the fragment
offset to point to the remaining data. This ensures proper alignment
and allows us to not use any paged data for small packets. If the
payload occupies multiple pages, we simply append those pages as
fragments and free the associated skbs.

This scheme allows us to be efficient in our use of ring entries
while still supporting large packets. Benchmarking using netperf from
an external machine to a guest over a 10Gb/s network shows a 100%
improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
with GSO disabled on the host side, throughput was seen to increase
from 700Mb/s to 1.7Gb/s.

Based on a patch from Herbert Xu.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 22:41:34 -08:00
Mark McLoughlin
0276b4972e virtio_net: hook up the set-tso ethtool op
Seems like an oversight that we have set-tx-csum and set-sg hooked
up, but not set-tso.

Also leads to the strange situation that if you e.g. disable tx-csum,
then tso doesn't get disabled.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 22:40:36 -08:00
Mark McLoughlin
0a888fd1f6 virtio_net: Recycle some more rx buffer pages
Each time we re-fill the recv queue with buffers, we allocate
one too many skbs and free it again when adding fails. We should
recycle the pages allocated in this case.

A previous version of this patch made trim_pages() trim trailing
unused pages from skbs with some paged data, but this actually
caused a barely measurable slowdown.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 22:39:18 -08:00
Wang Chen
8f15ea42b6 netdevice: safe convert to netdev_priv() #part-3
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
   netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 23:38:36 -08:00
Johannes Berg
e174961ca1 net: convert print_mac to %pM
This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.

I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27 17:06:18 -07:00
Rusty Russell
fb6813f480 virtio: Recycle unused recv buffer pages for large skbs in net driver
If we hack the virtio_net driver to always allocate full-sized (64k+)
skbuffs, the driver slows down (lguest numbers):

  Time to receive 1GB (small buffers): 10.85 seconds
  Time to receive 1GB (64k+ buffers): 24.75 seconds

Of course, large buffers use up more space in the ring, so we increase
that from 128 to 2048:

  Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds

If we recycle pages rather than using alloc_page/free_page:

  Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds

This demonstrates that with efficient allocation, we don't need to
have a separate "small buffer" queue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:02 +10:00
Herbert Xu
97402b96f8 virtio net: Allow receiving SG packets
Finally this patch lets virtio_net receive GSO packets in addition
to sending them.  This can definitely be optimised for the non-GSO
case.  For comparison the Xen approach stores one page in each skb
and uses subsequent skb's pages to construct an SG skb instead of
preallocating the maximum amount of pages per skb.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)
2008-07-25 12:06:01 +10:00
Herbert Xu
a9ea3fc6f2 virtio net: Add ethtool ops for SG/GSO
This patch adds some basic ethtool operations to virtio_net so
I could test SG without GSO (which was really useful because TSO
turned out to be buggy :)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)
2008-07-25 12:06:01 +10:00
Mark McLoughlin
9953ca6cb7 virtio: fix virtio_net xmit of freed skb bug
On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote:
> If we fail to transmit a packet, we assume the queue is full and put
> the skb into last_xmit_skb.  However, if more space frees up before we
> xmit it, we loop, and the result can be transmitting the same skb twice.
>
> Fix is simple: set skb to NULL if we've used it in some way, and check
> before sending.
...
> diff -r 564237b31993 drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c	Mon May 19 12:22:00 2008 +1000
> +++ b/drivers/net/virtio_net.c	Mon May 19 12:24:58 2008 +1000
> @@ -287,21 +287,25 @@ again:
>  	free_old_xmit_skbs(vi);
>
>  	/* If we has a buffer left over from last time, send it now. */
> -	if (vi->last_xmit_skb) {
> +	if (unlikely(vi->last_xmit_skb)) {
>  		if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
>  			/* Drop this skb: we only queue one. */
>  			vi->dev->stats.tx_dropped++;
>  			kfree_skb(skb);
> +			skb = NULL;
>  			goto stop_queue;
>  		}
>  		vi->last_xmit_skb = NULL;

With this, may drop an skb and then later in the function discover that
we could have sent it after all. Poor wee skb :)

How about the incremental patch below?

Cheers,
Mark.

Subject: [PATCH] virtio_net: Delay dropping tx skbs

Currently we drop the skb in start_xmit() if we have a
queued buffer and fail to transmit it.

However, if we delay dropping it until we've stopped the
queue and enabled the tx notification callback, then there
is a chance space might become available for it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:00 +10:00
Mark McLoughlin
5e4fe5c45a virtio_net: Set VIRTIO_NET_F_GUEST_CSUM feature
We can handle receiving partial csums, so set the
appropriate feature bit.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-11 01:20:33 -04:00
Rusty Russell
363f15149c virtio: use callback on empty in virtio_net
virtio_net uses a timer to free old transmitted packets, rather than
leaving callbacks enabled all the time.  If the host promises to
always notify us when the transmit ring is empty, we can free packets
at that point and avoid the timer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-10 18:20:32 -04:00
Mark McLoughlin
14c998f034 virtio: virtio_net free transmit skbs in a timer
virtio_net currently only frees old transmit skbs just
before queueing new ones. If the queue is full, it then
enables interrupts and waits for notification that more
work has been performed.

However, a side-effect of this scheme is that there are
always xmit skbs left dangling when no new packets are
sent, against the Documentation/networking/driver.txt
guideline:

  "... it is not allowed for your TX mitigation scheme
   to let TX packets "hang out" in the TX ring unreclaimed
   forever if no new TX packets are sent."

Add a timer to ensure that any time we queue new TX
skbs, we will shortly free them again.

This fixes an easily reproduced hang at shutdown where
iptables attempts to unload nf_conntrack and nf_conntrack
waits for an skb it is tracking to be freed, but virtio_net
never frees it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-10 18:20:31 -04:00
Mark McLoughlin
23cde76d80 virtio_net: Fix skb->csum_start computation
hdr->csum_start is the offset from the start of the ethernet
header to the transport layer checksum field. skb->csum_start
is the offset from skb->head.

skb_partial_csum_set() assumes that skb->data points to the
ethernet header - i.e. it computes skb->csum_start by adding
the headroom to hdr->csum_start.

Since eth_type_trans() skb_pull()s the ethernet header,
skb_partial_csum_set() should be called before
eth_type_trans().

(Without this patch, GSO packets from a guest to the world outside the
host are corrupted).

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-10 18:20:29 -04:00
Rusty Russell
11a3a1546d virtio: fix delayed xmit of packet and freeing of old packets.
Because we cache the last failed-to-xmit packet, if there are no
packets queued behind that one we may never send it (reproduced here
as TCP stalls, "cured" by an outgoing ping).

Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-30 22:07:21 -04:00
Rusty Russell
7eb2e25112 virtio: fix virtio_net xmit of freed skb bug
If we fail to transmit a packet, we assume the queue is full and put
the skb into last_xmit_skb.  However, if more space frees up before we
xmit it, we loop, and the result can be transmitting the same skb twice.

Fix is simple: set skb to NULL if we've used it in some way, and check
before sending.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-30 22:07:20 -04:00
Wang Chen
288369cc25 VIRTIO: Use __skb_queue_purge()
Use standard routine for queue purging.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:01:02 -04:00
Rusty Russell
c45a6816c1 virtio: explicit advertisement of driver features
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.

There is nothing in the API to require this, however, and even I
didn't notice when it was violated.

So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core.  The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.

Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.

Drivers can still remove feature bits in their probe routine if they
really have to.

API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Rusty Russell
5539ae9613 virtio: finer-grained features for virtio_net
So, we previously had a 'VIRTIO_NET_F_GSO' bit which meant that 'the
host can handle csum offload, and any TSO (v4&v6 incl ECN) or UFO
packets you might want to send.  I thought this was good enough for
Linux, but it actually isn't, since we don't do UFO in software.

So, add separate feature bits for what the host can handle.  Add
equivalent ones for the guest to say what it can handle, because LRO
is coming too (thanks Herbert!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:47 +10:00
Rusty Russell
99ffc696d1 virtio: wean net driver off NETDEV_TX_BUSY
Herbert tells me that returning NETDEV_TX_BUSY from hard_start_xmit is
seen as a poor thing to do; we should cache the packet and stop the queue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-05-02 21:50:46 +10:00
Rusty Russell
0527168522 virtio: fix scatterlist sizing in net driver.
Herbert Xu points out (within another patch) that my scatterlists are
too short: one entry for the gso header, one for the skb->data, and
MAX_SKB_FRAGS for all the fragments.

Fix both xmit and recv sides (recv currently unused, coming in later
patch).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:45 +10:00
Rusty Russell
655aa31f02 virtio: fix tx_ stats in virtio_net
get_buf() gives the length written by the other side, which will be
zero.  We want to add the skb length.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:44 +10:00
Linus Torvalds
90768c09bc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  [NETNS][IPV6] tcp - assign the netns for timewait sockets
  [IPV4]: Fix byte value boundary check in do_ip_getsockopt().
  BNX2X: Correct bringing chip out of reset
  [NETFILTER]: nf_nat: autoload IPv4 connection tracking
  [NETFILTER]: xt_hashlimit: fix mask calculation
  [XFRM]: xfrm_user: fix selector family initialization
  rt61pci: rt61pci_beacon_update do not free skb twice
  ssb-mipscore: Fix interrupt vectors
  ssb-pcicore: Fix IRQ TPS flag handling
  mac80211: use short_preamble mode from capability if ERP IE not present
  [NET]: Undo code bloat in hot paths due to print_mac().
  [TCP]: Don't allow FRTO to take place while MTU is being probed
  [TCP]: tcp_simple_retransmit can cause S+L
  [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb
  [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack
  nl80211: fix STA AID bug
  b43legacy: fix bcm4303 crash
  iwlwifi: fix n-band association problem
  ipw2200: set MAC address on radiotap interface
  libertas: fix mode initialization problem
2008-04-11 08:10:24 -07:00
David S. Miller
21f644f3ea [NET]: Undo code bloat in hot paths due to print_mac().
If print_mac() is used inside of a pr_debug() the compiler
can't see that the call is redundant so still performs it
even of pr_debug() ends up being a nop.

So don't use print_mac() in such cases in hot code paths,
use MAC_FMT et al. instead.

As noted by Joe Perches, pr_debug() could be modified to
handle this better, but that is a change to an interface
used by the entire kernel and thus needs to be validated
carefully.  This here is thus the less risky fix for
2.6.25

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-08 16:50:44 -07:00
Anthony Liguori
6ea0a4679d virtio_net: remove overzealous printk
The 'disable_cb' is really just a hint and as such, it's possible for more
work to get queued up while callbacks are disabled.  Under stress with an
SMP guest, this printk triggers very frequently.  There is no race here, this
is how things are designed to work so let's just remove the printk.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-07 13:56:33 -07:00
Christian Borntraeger
4265f161b6 virtio: fix race in enable_cb
There is a race in virtio_net, dealing with disabling/enabling the callback.
I saw the following oops:

kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218!
illegal operation: 0001 [#1] SMP
Modules linked in: sunrpc dm_mod
CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99
Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60)
Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001
           000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000
           0000000000000000 000000000f870000 0000000000000000 0000000000001237
           000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8
Krnl Code: 00000000002b819a: a7110001           tmll    %r1,1
           00000000002b819e: a7840004           brc     8,2b81a6
           00000000002b81a2: a7f40001           brc     15,2b81a4
          >00000000002b81a6: a51b0001           oill    %r1,1
           00000000002b81aa: 40102000           sth     %r1,0(%r2)
           00000000002b81ae: 07fe               bcr     15,%r14
           00000000002b81b0: eb7ff0380024       stmg    %r7,%r15,56(%r15)
           00000000002b81b6: a7f13e00           tmll    %r15,15872
Call Trace:
([<000000000fa0bcd0>] 0xfa0bcd0)
 [<00000000002b8350>] vring_interrupt+0x5c/0x6c
 [<000000000010ab08>] do_extint+0xb8/0xf0
 [<0000000000110716>] ext_no_vtime+0x16/0x1a
 [<0000000000107e72>] cpu_idle+0x1c2/0x1e0

The problem can be triggered with a high amount of host->guest traffic.
I think its the following race:

poll says netif_rx_complete
poll calls enable_cb
enable_cb opens the interrupt mask
a new packet comes, an interrupt is triggered----\
enable_cb sees that there is more work           |
enable_cb disables the interrupt                 |
       .                                         V
       .                            interrupt is delivered
       .                            skb_recv_done does atomic napi test, ok
 some waiting                       disable_cb is called->check fails->bang!
       .
poll would do napi check
poll would do disable_cb

The fix is to let enable_cb not disable the interrupt again, but expect the
caller to do the cleanup if it returns false. In that case, the interrupt is
only disabled, if the napi test_set_bit was successful.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
2008-03-17 22:58:21 +11:00
Amit Shah
da74e89d40 virtio: Enable netpoll interface for netconsole logging
Add a new poll_controller handler that the netpoll interface needs.

This enables netconsole logging from a kvm guest over the virtio
net interface.

Signed-off-by: Amit Shah <amitshah@gmx.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-17 22:58:20 +11:00
Christian Borntraeger
d9d5dcc88c virtio_net: Fix oops on early interrupts - introduced by virtio reset code
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-02-23 23:55:04 -05:00
Christian Borntraeger
370076d932 virtio net: fix oops on interface-up
I got the following oops during interface ifup. Unfortunately its not
easily reproducable so I cant say for sure that my fix fixes this
problem, but I am confident and I think its correct anyway:

   <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234!
    <4>illegal operation: 0001 [#1] PREEMPT SMP
    <4>Modules linked in:
    <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91
    <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8)
    <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34)
    <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
    <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344
    <4>           000000000e980900 00000000008848b0 000000000084e748 0000000000000000
    <4>           000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8
    <4>           000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8
    <4>Krnl Code: 0000000000466368: e3b0b0700004        lg      %r11,112(%r11)
    <4>           000000000046636e: 07fe                bcr     15,%r14
    <4>           0000000000466370: a7f40001            brc     15,466372
    <4>          >0000000000466374: a7f4fff6            brc     15,466360
    <4>           0000000000466378: eb7ff0500024        stmg    %r7,%r15,80(%r15)
    <4>           000000000046637e: a7f13e00            tmll    %r15,15872
    <4>           0000000000466382: b90400ef            lgr     %r14,%r15
    <4>           0000000000466386: a7840001            brc     8,466388
    <4>Call Trace:
    <4>([<000201500f85c000>] 0x201500f85c000)
    <4> [<0000000000466556>] vring_interrupt+0x72/0x88
    <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44
    <4> [<000000000010d22c>] do_extint+0xbc/0xf8
    <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a
    <4> [<000000000010a182>] cpu_idle+0x216/0x238
    <4>([<000000000010a162>] cpu_idle+0x1f6/0x238)
    <4> [<0000000000568656>] rest_init+0xaa/0xb8
    <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490
    <4> [<0000000000100020>] _stext+0x20/0x80
    <4>
    <4> <0>Kernel panic - not syncing: Fatal exception in interrupt
    <4>

After looking at the code and the dump I think the following scenario
happened: Ifup was running on cpu2 and the interrupt arrived on cpu0.
Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb
but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was
called by vring_interrupt, executed netif_rx_schedule_prep, which
succeeded and therefore called disable_cb. This triggered the BUG_ON,
as interrupts were already disabled by cpu 2.

I think the proper solution is to make the call to disable_cb depend on
the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep
in the same way as skb_recv_done.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-02-06 06:42:30 -05:00
Dor Laor
6c0cd7c000 virtio_net: parametrize the napi_weight for virtio receive queue.
It is done in order to improve performance.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:09 +11:00
Rusty Russell
2cb9c6bafc virtio: free transmit skbs when notified, not on next xmit.
This fixes a potential dangling xmit problem.

We also suppress refill interrupts until we need them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:08 +11:00
Rusty Russell
a48bd8f670 virtio: flush buffers on open
Fix bug found by Christian Borntraeger: if the other side fills all
the registered network buffers before we enable NAPI, we will never
get an interrupt.  The simplest fix is to process the input queue once
on open.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:07 +11:00
Christian Borntraeger
e70f2f1bb8 virtnet: remove double ether_setup
Hello Rusty,

virtnet_probe already calls alloc_etherdev, which calls ether_setup.
There is no need to do that again.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:07 +11:00
Rusty Russell
6e5aa7efb2 virtio: reset function
A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell
b3369c1fb4 virtio: populate network rings in the probe routine, not open
Since we want to reset the device to remove them, this is simpler
(device is reset for us on driver remove).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell
34a48579e4 virtio: Tweak virtio_net defines
1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
   separate).  Having multiple bits is a pain: if you can't support something
   you should handle it in software, which is still a performance win.

2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
   IPv6 or v4.

3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
   checksumming).

4) Add csum and gso params to virtio_net to allow more testing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:02 +11:00
Rusty Russell
50c8ea8080 virtio: Net header needs hdr_len
It's far easier to deal with packets if we don't have to parse the
packet to figure out the header length to know how much to pull into
the skb data.  Add the field to the virtio_net_hdr struct (and fix the
spaces that somehow crept in there).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:02 +11:00
Rusty Russell
18445c4d50 virtio: explicit enable_cb/disable_cb rather than callback return.
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:58 +11:00
Rusty Russell
a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Rusty Russell
f35d9d8aae virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets.
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
2008-02-04 23:49:56 +11:00
Rusty Russell
8329d98e48 virtio: fix net driver loop case where we fail to restart
skb is only NULL the first time around: it's more correct to test for
being under-budget.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-19 11:20:44 +11:00
Rusty Russell
74b2553f1d virtio: fix module/device unloading
The virtio code never hooked through the ->remove callback.  Although
noone supports device removal at the moment, this code is already
needed for module unloading.

This of course also revealed bugs in virtio_blk, virtio_net and lguest
unloading paths.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-19 11:20:42 +11:00
Rusty Russell
4d125de3a5 virtio: more fallout from scatterlist changes.
This fixes OOPS in network driver when CONFIG_DEBUG_SG=y.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12 13:55:25 +11:00
Rusty Russell
296f96fcfc Net driver using virtio
The network driver uses two virtqueues: one for input packets and one
for output packets.  This has nice locking properties (ie. we don't do
any for recv vs send).

TODO:
	1) Big packets.
	2) Multi-client devices (maybe separate driver?).
	3) Resolve freeing of old xmit skbs (Christian Borntraeger)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
2007-10-23 15:49:54 +10:00