X-Git-Url: http://git.onelab.eu/?p=iproute2.git;a=blobdiff_plain;f=tc-cbq-details.8;fp=tc-cbq-details.8;h=0000000000000000000000000000000000000000;hp=e47da62b85ce8f6fad827bca06c3b5eddc718c9b;hb=3331a68859fd71047bb1f309048960b48eab2d83;hpb=2bd4a72f2100be7ad7d9518cb1d49bb2a5b71994 diff --git a/tc-cbq-details.8 b/tc-cbq-details.8 deleted file mode 100644 index e47da62..0000000 --- a/tc-cbq-details.8 +++ /dev/null @@ -1,425 +0,0 @@ -.TH CBQ 8 "8 December 2001" "iproute2" "Linux" -.SH NAME -CBQ \- Class Based Queueing -.SH SYNOPSIS -.B tc qdisc ... dev -dev -.B ( parent -classid -.B | root) [ handle -major: -.B ] cbq avpkt -bytes -.B bandwidth -rate -.B [ cell -bytes -.B ] [ ewma -log -.B ] [ mpu -bytes -.B ] - -.B tc class ... dev -dev -.B parent -major:[minor] -.B [ classid -major:minor -.B ] cbq allot -bytes -.B [ bandwidth -rate -.B ] [ rate -rate -.B ] prio -priority -.B [ weight -weight -.B ] [ minburst -packets -.B ] [ maxburst -packets -.B ] [ ewma -log -.B ] [ cell -bytes -.B ] avpkt -bytes -.B [ mpu -bytes -.B ] [ bounded isolated ] [ split -handle -.B & defmap -defmap -.B ] [ estimator -interval timeconstant -.B ] - -.SH DESCRIPTION -Class Based Queueing is a classful qdisc that implements a rich -linksharing hierarchy of classes. It contains shaping elements as -well as prioritizing capabilities. Shaping is performed using link -idle time calculations based on the timing of dequeue events and -underlying link bandwidth. - -.SH SHAPING ALGORITHM -Shaping is done using link idle time calculations, and actions taken if -these calculations deviate from set limits. - -When shaping a 10mbit/s connection to 1mbit/s, the link will -be idle 90% of the time. If it isn't, it needs to be throttled so that it -IS idle 90% of the time. - -From the kernel's perspective, this is hard to measure, so CBQ instead -derives the idle time from the number of microseconds (in fact, jiffies) -that elapse between requests from the device driver for more data. Combined -with the knowledge of packet sizes, this is used to approximate how full or -empty the link is. - -This is rather circumspect and doesn't always arrive at proper -results. For example, what is the actual link speed of an interface -that is not really able to transmit the full 100mbit/s of data, -perhaps because of a badly implemented driver? A PCMCIA network card -will also never achieve 100mbit/s because of the way the bus is -designed - again, how do we calculate the idle time? - -The physical link bandwidth may be ill defined in case of not-quite-real -network devices like PPP over Ethernet or PPTP over TCP/IP. The effective -bandwidth in that case is probably determined by the efficiency of pipes -to userspace - which not defined. - -During operations, the effective idletime is measured using an -exponential weighted moving average (EWMA), which considers recent -packets to be exponentially more important than past ones. The Unix -loadaverage is calculated in the same way. - -The calculated idle time is subtracted from the EWMA measured one, -the resulting number is called 'avgidle'. A perfectly loaded link has -an avgidle of zero: packets arrive exactly at the calculated -interval. - -An overloaded link has a negative avgidle and if it gets too negative, -CBQ throttles and is then 'overlimit'. - -Conversely, an idle link might amass a huge avgidle, which would then -allow infinite bandwidths after a few hours of silence. To prevent -this, avgidle is capped at -.B maxidle. - -If overlimit, in theory, the CBQ could throttle itself for exactly the -amount of time that was calculated to pass between packets, and then -pass one packet, and throttle again. Due to timer resolution constraints, -this may not be feasible, see the -.B minburst -parameter below. - -.SH CLASSIFICATION -Within the one CBQ instance many classes may exist. Each of these classes -contains another qdisc, by default -.BR tc-pfifo (8). - -When enqueueing a packet, CBQ starts at the root and uses various methods to -determine which class should receive the data. If a verdict is reached, this -process is repeated for the recipient class which might have further -means of classifying traffic to its children, if any. - -CBQ has the following methods available to classify a packet to any child -classes. -.TP -(i) -.B skb->priority class encoding. -Can be set from userspace by an application with the -.B SO_PRIORITY -setsockopt. -The -.B skb->priority class encoding -only applies if the skb->priority holds a major:minor handle of an existing -class within this qdisc. -.TP -(ii) -tc filters attached to the class. -.TP -(iii) -The defmap of a class, as set with the -.B split & defmap -parameters. The defmap may contain instructions for each possible Linux packet -priority. - -.P -Each class also has a -.B level. -Leaf nodes, attached to the bottom of the class hierarchy, have a level of 0. -.SH CLASSIFICATION ALGORITHM - -Classification is a loop, which terminates when a leaf class is found. At any -point the loop may jump to the fallback algorithm. - -The loop consists of the following steps: -.TP -(i) -If the packet is generated locally and has a valid classid encoded within its -.B skb->priority, -choose it and terminate. - -.TP -(ii) -Consult the tc filters, if any, attached to this child. If these return -a class which is not a leaf class, restart loop from the class returned. -If it is a leaf, choose it and terminate. -.TP -(iii) -If the tc filters did not return a class, but did return a classid, -try to find a class with that id within this qdisc. -Check if the found class is of a lower -.B level -than the current class. If so, and the returned class is not a leaf node, -restart the loop at the found class. If it is a leaf node, terminate. -If we found an upward reference to a higher level, enter the fallback -algorithm. -.TP -(iv) -If the tc filters did not return a class, nor a valid reference to one, -consider the minor number of the reference to be the priority. Retrieve -a class from the defmap of this class for the priority. If this did not -contain a class, consult the defmap of this class for the -.B BEST_EFFORT -class. If this is an upward reference, or no -.B BEST_EFFORT -class was defined, -enter the fallback algorithm. If a valid class was found, and it is not a -leaf node, restart the loop at this class. If it is a leaf, choose it and -terminate. If -neither the priority distilled from the classid, nor the -.B BEST_EFFORT -priority yielded a class, enter the fallback algorithm. -.P -The fallback algorithm resides outside of the loop and is as follows. -.TP -(i) -Consult the defmap of the class at which the jump to fallback occured. If -the defmap contains a class for the -.B -priority -of the class (which is related to the TOS field), choose this class and -terminate. -.TP -(ii) -Consult the map for a class for the -.B BEST_EFFORT -priority. If found, choose it, and terminate. -.TP -(iii) -Choose the class at which break out to the fallback algorithm occured. Terminate. -.P -The packet is enqueued to the class which was chosen when either algorithm -terminated. It is therefore possible for a packet to be enqueued *not* at a -leaf node, but in the middle of the hierarchy. - -.SH LINK SHARING ALGORITHM -When dequeuing for sending to the network device, CBQ decides which of its -classes will be allowed to send. It does so with a Weighted Round Robin process -in which each class with packets gets a chance to send in turn. The WRR process -starts by asking the highest priority classes (lowest numerically - -highest semantically) for packets, and will continue to do so until they -have no more data to offer, in which case the process repeats for lower -priorities. - -.B CERTAINTY ENDS HERE, ANK PLEASE HELP - -Each class is not allowed to send at length though - they can only dequeue a -configurable amount of data during each round. - -If a class is about to go overlimit, and it is not -.B bounded -it will try to borrow avgidle from siblings that are not -.B isolated. -This process is repeated from the bottom upwards. If a class is unable -to borrow enough avgidle to send a packet, it is throttled and not asked -for a packet for enough time for the avgidle to increase above zero. - -.B I REALLY NEED HELP FIGURING THIS OUT. REST OF DOCUMENT IS PRETTY CERTAIN -.B AGAIN. - -.SH QDISC -The root qdisc of a CBQ class tree has the following parameters: - -.TP -parent major:minor | root -This mandatory parameter determines the place of the CBQ instance, either at the -.B root -of an interface or within an existing class. -.TP -handle major: -Like all other qdiscs, the CBQ can be assigned a handle. Should consist only -of a major number, followed by a colon. Optional. -.TP -avpkt bytes -For calculations, the average packet size must be known. It is silently capped -at a minimum of 2/3 of the interface MTU. Mandatory. -.TP -bandwidth rate -To determine the idle time, CBQ must know the bandwidth of your underlying -physical interface, or parent qdisc. This is a vital parameter, more about it -later. Mandatory. -.TP -cell -The cell size determines he granularity of packet transmission time calculations. Has a sensible default. -.TP -mpu -A zero sized packet may still take time to transmit. This value is the lower -cap for packet transmission time calculations - packets smaller than this value -are still deemed to have this size. Defaults to zero. -.TP -ewma log -When CBQ needs to measure the average idle time, it does so using an -Exponentially Weighted Moving Average which smoothes out measurements into -a moving average. The EWMA LOG determines how much smoothing occurs. Defaults -to 5. Lower values imply greater sensitivity. Must be between 0 and 31. -.P -A CBQ qdisc does not shape out of its own accord. It only needs to know certain -parameters about the underlying link. Actual shaping is done in classes. - -.SH CLASSES -Classes have a host of parameters to configure their operation. - -.TP -parent major:minor -Place of this class within the hierarchy. If attached directly to a qdisc -and not to another class, minor can be omitted. Mandatory. -.TP -classid major:minor -Like qdiscs, classes can be named. The major number must be equal to the -major number of the qdisc to which it belongs. Optional, but needed if this -class is going to have children. -.TP -weight weight -When dequeuing to the interface, classes are tried for traffic in a -round-robin fashion. Classes with a higher configured qdisc will generally -have more traffic to offer during each round, so it makes sense to allow -it to dequeue more traffic. All weights under a class are normalized, so -only the ratios matter. Defaults to the configured rate, unless the priority -of this class is maximal, in which case it is set to 1. -.TP -allot bytes -Allot specifies how many bytes a qdisc can dequeue -during each round of the process. This parameter is weighted using the -renormalized class weight described above. - -.TP -priority priority -In the round-robin process, classes with the lowest priority field are tried -for packets first. Mandatory. - -.TP -rate rate -Maximum rate this class and all its children combined can send at. Mandatory. - -.TP -bandwidth rate -This is different from the bandwidth specified when creating a CBQ disc. Only -used to determine maxidle and offtime, which are only calculated when -specifying maxburst or minburst. Mandatory if specifying maxburst or minburst. - -.TP -maxburst -This number of packets is used to calculate maxidle so that when -avgidle is at maxidle, this number of average packets can be burst -before avgidle drops to 0. Set it higher to be more tolerant of -bursts. You can't set maxidle directly, only via this parameter. - -.TP -minburst -As mentioned before, CBQ needs to throttle in case of -overlimit. The ideal solution is to do so for exactly the calculated -idle time, and pass 1 packet. However, Unix kernels generally have a -hard time scheduling events shorter than 10ms, so it is better to -throttle for a longer period, and then pass minburst packets in one -go, and then sleep minburst times longer. - -The time to wait is called the offtime. Higher values of minburst lead -to more accurate shaping in the long term, but to bigger bursts at -millisecond timescales. - -.TP -minidle -If avgidle is below 0, we are overlimits and need to wait until -avgidle will be big enough to send one packet. To prevent a sudden -burst from shutting down the link for a prolonged period of time, -avgidle is reset to minidle if it gets too low. - -Minidle is specified in negative microseconds, so 10 means that -avgidle is capped at -10us. - -.TP -bounded -Signifies that this class will not borrow bandwidth from its siblings. -.TP -isolated -Means that this class will not borrow bandwidth to its siblings - -.TP -split major:minor & defmap bitmap[/bitmap] -If consulting filters attached to a class did not give a verdict, -CBQ can also classify based on the packet's priority. There are 16 -priorities available, numbered from 0 to 15. - -The defmap specifies which priorities this class wants to receive, -specified as a bitmap. The Least Significant Bit corresponds to priority -zero. The -.B split -parameter tells CBQ at which class the decision must be made, which should -be a (grand)parent of the class you are adding. - -As an example, 'tc class add ... classid 10:1 cbq .. split 10:0 defmap c0' -configures class 10:0 to send packets with priorities 6 and 7 to 10:1. - -The complimentary configuration would then -be: 'tc class add ... classid 10:2 cbq ... split 10:0 defmap 3f' -Which would send all packets 0, 1, 2, 3, 4 and 5 to 10:1. -.TP -estimator interval timeconstant -CBQ can measure how much bandwidth each class is using, which tc filters -can use to classify packets with. In order to determine the bandwidth -it uses a very simple estimator that measures once every -.B interval -microseconds how much traffic has passed. This again is a EWMA, for which -the time constant can be specified, also in microseconds. The -.B time constant -corresponds to the sluggishness of the measurement or, conversely, to the -sensitivity of the average to short bursts. Higher values mean less -sensitivity. - - - -.SH SOURCES -.TP -o -Sally Floyd and Van Jacobson, "Link-sharing and Resource -Management Models for Packet Networks", -IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995 - -.TP -o -Sally Floyd, "Notes on CBQ and Guarantee Service", 1995 - -.TP -o -Sally Floyd, "Notes on Class-Based Queueing: Setting -Parameters", 1996 - -.TP -o -Sally Floyd and Michael Speer, "Experimental Results -for Class-Based Queueing", 1998, not published. - - - -.SH SEE ALSO -.BR tc (8) - -.SH AUTHOR -Alexey N. Kuznetsov, . This manpage maintained by -bert hubert - -