2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

96 Commits

Author SHA1 Message Date
Eric Dumazet
e43ac79a4b sch_tbf: segment too big GSO packets
If a GSO packet has a length above tbf burst limit, the packet
is currently silently dropped.

Current way to handle this is to set the device in non GSO/TSO mode, or
setting high bursts, and its sub optimal.

We can actually segment too big GSO packets, and send individual
segments as tbf parameters allow, allowing for better interoperability.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23 00:06:40 -07:00
Jiri Pirko
b757c9336d tbf: improved accuracy at high rates
Current TBF uses rate table computed by the "tc" userspace program,
which has the following issue:

The rate table has 256 entries to map packet lengths to
token (time units).  With TSO sized packets, the 256 entry granularity
leads to loss/gain of rate, making the token bucket inaccurate.

Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.

This is a followup to 56b765b79e
("htb: improved accuracy at high rates").

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 18:59:45 -05:00
David S. Miller
1b34ec43c9 pkt_sched: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:37 -04:00
Eric Dumazet
b0460e4484 sch_tbf: report backlog information
Provide child qdisc backlog (byte count) information so that "tc -s
qdisc" can report it to user.

qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms  10.0ms
 Sent 948517 bytes 898 pkt (dropped 0, overlimits 0 requeues 1)
 rate 175056bit 16pps backlog 114b 1p requeues 1
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
 Sent 948517 bytes 898 pkt (dropped 15, overlimits 611 requeues 0)
 backlog 18168b 12p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 15:07:21 -05:00
David S. Miller
5bdc22a565 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/sched/sch_hfsc.c
	net/sched/sch_htb.c
	net/sched/sch_tbf.c
2011-01-24 14:09:35 -08:00
Eric Dumazet
9190b3b320 net_sched: accurate bytes/packets stats/rates
In commit 44b8288308 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 23:31:33 -08:00
Eric Dumazet
fd245a4adb net_sched: move TCQ_F_THROTTLED flag
In commit 3711210576 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Eric Dumazet
cc7ec456f8 net_sched: cleanups
Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:12 -08:00
Eric Dumazet
bfe0d0298f net_sched: factorize qdisc stats handling
HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.

They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets

Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.

Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 16:07:54 -08:00
Ben Greear
9871e50edd net: Use NET_XMIT_SUCCESS where possible.
This is based on work originally done by Patric McHardy.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-10 02:51:11 -07:00
stephen hemminger
f0cd15081a tbf: stop wanton destruction of children (v2)
Several netem users use TBF for rate control. But every time the parameters
of TBF are changed it destroys the child qdisc, requiring reconfigation.
Better to just keep child qdisc and just notify it of changed limit.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
Patrick McHardy
5b9a9ccfad net_sched: remove some unnecessary checks in classful schedulers
The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
originate from either ->get() or ->walk() and are always valid.

Remove unnecessary checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06 02:07:02 -07:00
Patrick McHardy
de6d5cdf88 net_sched: make cls_ops->change and cls_ops->delete optional
Some schedulers don't support creating, changing or deleting classes.
Make the respective callbacks optionally and consistently return
-EOPNOTSUPP for unsupported operations, instead of currently either
-EOPNOTSUPP, -ENOSYS or no error.

In case of sch_prio and sch_multiq, the removed operations additionally
checked for an invalid class. This is not necessary since the class
argument can only orginate from ->get() or in case of ->change is 0
for creation of new classes, in which case ->change() incorrectly
returned -ENOENT.

As a side-effect, this patch fixes a possible (root-only) NULL pointer
function call in sch_ingress, which didn't implement a so far mandatory
->delete() operation.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06 02:07:02 -07:00
Patrick McHardy
71ebe5e919 net_sched: make cls_ops->tcf_chain() optional
Some qdiscs don't support attaching filters. Handle this centrally in
cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06 02:06:12 -07:00
Ilpo Järvinen
a0bffffc14 net/*: use linux/kernel.h swap()
tcp_sack_swap seems unnecessary so I pushed swap to the caller.
Also removed comment that seemed then pointless, and added include
when not already there. Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:36:17 -07:00
Patrick McHardy
b94c8afcba pkt_sched: remove unnecessary xchg() in packet schedulers
The use of xchg() hasn't been necessary since 2.2.something when proper
locking was added to packet schedulers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 04:11:36 -08:00
Jarek Poplawski
f30ab418a1 pkt_sched: Remove qdisc->ops->requeue() etc.
After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.

The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-13 22:56:30 -08:00
Jarek Poplawski
77be155cba pkt_sched: Add peek emulation for non-work-conserving qdiscs.
This patch adds qdisc_peek_dequeued() wrapper to emulate peek method
with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until
dequeuing. This is mainly for compatibility reasons not to break some
strange configs because peeking is expected for non-work-conserving
parent qdiscs to query work-conserving child qdiscs.

This implementation requires using qdisc_dequeue_peeked() wrapper
instead of directly calling qdisc->dequeue() for all qdiscs ever
querried with qdisc->ops->peek() or qdisc_peek_dequeued().

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:47:01 -07:00
Jarek Poplawski
03c05f0d4b pkt_sched: Use qdisc->ops->peek() instead of ->dequeue() & ->requeue()
Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() pair.
After this patch the only remaining user of qdisc->ops->requeue() is
netem_enqueue(). Based on ideas of Herbert Xu, Patrick McHardy and
David S. Miller.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:46:19 -07:00
David S. Miller
69747650c8 pkt_sched: Fix return value corruption in HTB and TBF.
Based upon a bug report by Josip Rodin.

Packet schedulers should only return NET_XMIT_DROP iff
the packet really was dropped.  If the packet does reach
the device after we return NET_XMIT_DROP then TCP can
crash because it depends upon the enqueue path return
values being accurate.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-18 00:39:41 -07:00
Jarek Poplawski
378a2f090f net_sched: Add qdisc __NET_XMIT_STOLEN flag
Patrick McHardy <kaber@trash.net> noticed:
"The other problem that affects all qdiscs supporting actions is
TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
even though the packet is not queued, corrupting upper qdiscs'
qlen counters."

and later explained:
"The reason why it translates it at all seems to be to not increase
the drops counter. Within a single qdisc this could be avoided by
other means easily, upper qdiscs would still increase the counter
when we return anything besides NET_XMIT_SUCCESS though.

This means we need a new NET_XMIT return value to indicate this to
the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
in dev_queue_xmit, similar to NET_XMIT_BYPASS."

David Miller <davem@davemloft.net> noticed:
"Maybe these NET_XMIT_* values being passed around should be a set of
bits. They could be composed of base meanings, combined with specific
attributes.

So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"

The attributes get masked out by the top-level ->enqueue() caller,
such that the base meanings are the only thing that make their
way up into the stack. If it's only about communication within the
qdisc tree, let's simply code it that way."

This patch is trying to realize these ideas.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 22:31:03 -07:00
Jussi Kivilinna
0abf77e55a net_sched: Add accessor function for packet length for qdiscs
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 00:08:27 -07:00
Jussi Kivilinna
5f86173bdf net_sched: Add qdisc_enqueue wrapper
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 00:08:04 -07:00
Patrick McHardy
fb0305ce1b net-sched: consolidate default fifo qdisc setup
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 23:40:21 -07:00
Patrick McHardy
27a3421e48 [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:22 -08:00
Patrick McHardy
4b3550ef53 [NET_SCHED]: Use nla_nest_start/nla_nest_end
Use nla_nest_start/nla_nest_end for dumping nested attributes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:18 -08:00
Patrick McHardy
cee63723b3 [NET_SCHED]: Propagate nla_parse return value
nla_parse() returns more detailed errno codes, propagate them back on
error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:18 -08:00
Patrick McHardy
1e90474c37 [NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:10 -08:00
Eric Dumazet
20fea08b5f [NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections.
Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:58 -08:00
Jesper Dangaard Brouer
e9bef55d3d [NET_SCHED]: Cleanup L2T macros and handle oversized packets
Change L2T (length to time) macros, in all rate based schedulers, to
call a common function qdisc_l2t() that does the rate table lookup.
This function handles if the packet size lookup is larger than the
rate table, which often occurs with TSO enabled.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:20 -07:00
Patrick McHardy
c3bc7cff8f [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15 00:03:05 -07:00
Patrick McHardy
73ca4918fb [NET_SCHED]: act_api: qdisc internal reclassify support
The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
it to the qdisc, which could handle it internally or ignore it. With
NET_CLS_ACT however, tc_classify starts over at the first classifier
and never returns it to the qdisc. This makes it impossible to support
qdisc-internal reclassification, which in turn makes it impossible to
remove the old NET_CLS_POLICE code without breaking compatibility since
we have two qdiscs (CBQ and ATM) that support this.

This patch adds a tc_classify_compat function that handles
reclassification the old way and changes CBQ and ATM to use it.

This again is of course not fully backwards compatible with the previous
NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
compatibility *and* support qdisc internal reclassification with
NET_CLS_ACT, but this seems like the better choice over keeping the two
incompatible options around forever.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15 00:02:31 -07:00
Patrick McHardy
0ba4805383 [NET_SCHED]: Remove unnecessary includes
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:41 -07:00
Patrick McHardy
3bebcda280 [NET_SCHED]: turn PSCHED_GET_TIME into inline function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:55 -07:00
Patrick McHardy
03cc45c0a5 [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function
Also rename to psched_tdiff_bounded.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:54 -07:00
Arnaldo Carvalho de Melo
dc5fc579b9 [NETLINK]: Use nlmsg_trim() where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo
27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Patrick McHardy
f7f593e383 [NET_SCHED]: sch_tbf: use hrtimer based watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:07 -07:00
YOSHIFUJI Hideaki
10297b9931 [NET] SCHED: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:08 -08:00
Patrick McHardy
e488eafcc5 [NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures
When peeking at the next packet in a child qdisc by calling dequeue/requeue,
the upper qdisc qlen counter may get out of sync in case the requeue fails.
The qdisc and the child qdisc both have their counter decremented, but since
no packet is given to the upper qdisc it won't decrement its counter itself.

requeue should not fail, so this is mostly for "correctness".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:46 -08:00
Patrick McHardy
5e50da01d0 [NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs
Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where
necessary:

- all graft operations
- destruction of old child qdiscs in prio, red and tbf change operation
- purging of queue in sfq change operation

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:43 -08:00
Patrick McHardy
9f9afec482 [NET_SCHED]: Set parent classid in default qdiscs
Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:41 -08:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Patrick McHardy
053cfed75d [PKT_SCHED]: Restore TBF change semantic
When TBF was converted to a classful qdisc, the semantic of the limit
parameter was broken. On initilization an inner bfifo qdisc is created
for backwards compatibility, when changing parameters however the new
limit is ignored and the current child qdisc remains in place.

Always replace the child qdisc by the default bfifo when limit is above
zero, otherwise don't touch the inner qdisc. Current tc version enforce
a limit above zero, other users can avoid creating the inner qdisc by
using zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:21 -08:00
Patrick McHardy
6d037a26f0 [PKT_SCHED]: Qdisc drop operation is optional
The drop operation is optional and qdiscs must check if childs support it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:00:49 -08:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00