X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=doc%2Factions%2Factions-general;fp=doc%2Factions%2Factions-general;h=bb2295d89d85d433a38e3dea9b1134417a21ca1a;hb=fcabec0aee42af28e2846ef3674ed7ba7be72c42;hp=0000000000000000000000000000000000000000;hpb=cb820e861caa85bb3942ab0c673e04b9408be0ad;p=iproute2.git diff --git a/doc/actions/actions-general b/doc/actions/actions-general new file mode 100644 index 0000000..bb2295d --- /dev/null +++ b/doc/actions/actions-general @@ -0,0 +1,254 @@ + +This documented is slightly dated but should give you idea of how things +work. + +What is it? +----------- + +An extension to the filtering/classification architecture of Linux Traffic +Control. +Up to 2.6.8 the only action that could be "attached" to a filter was policing. +i.e you could say something like: + +----- +tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \ +127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k +----- + +which implies "if a packet is seen on the ingress of the lo device with +a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and +we execute a policing action which rate limits its bandwidth utilization +to 1.5Mbps". + +The new extensions allow for more than just policing actions to be added. +They are also fully backward compatible. If you have a kernel that doesnt +understand them, then the effect is null i.e if you have a newer tc +but older kernel, the actions are not installed. Likewise if you +have a newer kernel but older tc, obviously the tc will use current +syntax which will work fine. Of course to get the required effect you need +both newer tc and kernel. If you are reading this you have the +right tc ;-> + +A side effect is that we can now get stateless firewalling to work with tc. +Essentially this is now an alternative to iptables. +I wont go into details of my dislike for iptables at times, but +scalability is one of the main issues; however, if you need stateful +classification - use netfilter (for now). + +This stuff works on both ingress and egress qdiscs. + +Features +-------- + +1) new additional syntax and actions enabled. Note old syntax is still valid. + +Essentially this is still the same syntax as tc with a new construct +"action". The syntax is of the form: +tc filter add parent 1:0 protocol ip prio 10 +flowid 1:1 action * + +You can have as many actions as you want (within sensible reasoning). + +In the past the only real action was the policer; i.e you could do something +along the lines of: +tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ +match ip src 127.0.0.1/32 flowid 1:1 \ +police mtu 4000 rate 1500kbit burst 90k + +Although you can still use the same syntax, now you can say: + +tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \ +match ip src 127.0.0.1/32 flowid 1:1 \ +action police mtu 4000 rate 1500kbit burst 90k + +" generic Actions" (gact) at the moment are: +{ drop, pass, reclassify, continue} +(If you have others, no listed here give me a reason and we will add them) ++drop says to drop the packet ++pass says to accept it ++reclassify requests for reclassification of the packet ++continue requests for next lookup to match + +2)In order to take advantage of some of the targets written by the +iptables people, a classifier can have a packet being massaged by an +iptable target. I have only tested with mangler targets up to now. +(infact anything that is not in the mangling table is disabled right now) + +In terms of hooks: +*ingress is mapped to pre-routing hook +*egress is mapped to post-routing hook +I dont see much value in the other hooks, if you see it and email me good +reasons, the addition is trivial. + +Example syntax for iptables targets usage becomes: +tc filter add ..... u32 action ipt -j + +example: +tc filter add dev lo parent ffff: protocol ip prio 8 u32 \ +match ip dst 127.0.0.8/32 flowid 1:12 \ +action ipt -j mark --set-mark 2 + +3) A feature i call pipe +The motivation is derived from Unix pipe mechanism but applied to packets. +Essentially take a matching packet and pass it through +action1 | action2 | action3 etc. +You could do something similar to this with the tc policer and the "continue" +operator but this rather restricts it to just the policer and requires +multiple rules (and lookups, hence quiet inefficient); + +as an example -- and please note that this is just an example _not_ The +Word Youve Been Waiting For (yes i have had problems giving examples +which ended becoming dogma in documents and people modifying them a little +to look clever); + +i selected the metering rates to be small so that i can show better how +things work. + +The script below does the following: +- an incoming packet from 10.0.0.21 is first given a firewall mark of 1. + +- It is then metered to make sure it does not exceed its allocated rate of +1Kbps. If it doesnt exceed rate, this is where we terminate action execution. + +- If it does exceed its rate, its "color" changes to a mark of 2 and it is +then passed through a second meter. + +-The second meter is shared across all flows on that device [i am suprised +that this seems to be not a well know feature of the policer; Bert was telling +me that someone was writing a qdisc just to do sharing across multiple devices; +it must be the summer heat again; weve had someone doing that every year around +summer -- the key to sharing is to use a operator "index" in your policer +rules (example "index 20"). All your rules have to use the same index to +share.] + +-If the second meter is exceeded the color of the flow changes further to 3. + +-We then pass the packet to another meter which is shared across all devices +in the system. If this meter is exceeded we drop the packet. + +Note the mark can be used further up the system to do things like policy +or more interesting things on the egress. + +------------------ cut here ------------------------------- +# +# Add an ingress qdisc on eth0 +tc qdisc add dev eth0 ingress +# +#if you see an incoming packet from 10.0.0.21 +tc filter add dev eth0 parent ffff: protocol ip prio 1 \ +u32 match ip src 10.0.0.21/32 flowid 1:15 \ +# +# first give it a mark of 1 +action ipt -j mark --set-mark 1 index 2 \ +# +# then pass it through a policer which allows 1kbps; if the flow +# doesnt exceed that rate, this is where we stop, if it exceeds we +# pipe the packet to the next action +action police rate 1kbit burst 9k pipe \ +# +# which marks the packet fwmark as 2 and pipes +action ipt -j mark --set-mark 2 \ +# +# next attempt to borrow b/width from a meter +# used across all flows incoming on eth0("index 30") +# and if that is exceeded we pipe to the next action +action police index 30 mtu 5000 rate 1kbit burst 10k pipe \ +# mark it as fwmark 3 if exceeded +action ipt -j mark --set-mark 3 \ +# and then attempt to borrow from a meter used by all devices in the +# system. Should this be exceeded, drop the packet on the floor. +action police index 20 mtu 5000 rate 1kbit burst 90k drop +--------------------------------- + +Now lets see the actions installed with +"tc filter show parent ffff: dev eth0" + +-------- output ----------- +jroot# tc filter show parent ffff: dev eth0 +filter protocol ip pref 1 u32 +filter protocol ip pref 1 u32 fh 800: ht divisor 1 +filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 + + action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x1 index 2 + + action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb + + action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x2 index 1 + + action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b + + action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x3 index 3 + + action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b + + match 0a000015/ffffffff at 12 +------------------------------- + +Note the ordering of the actions is based on the order in which we entered +them. In the future i will add explicit priorities. + +Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after +you see a few lines of dots + +---- +[root@jzny hadi]# ping -f 10.0.0.22 +PING 10.0.0.22 (10.0.0.22): 56 data bytes +.................................................................................................................................................................................................................................................................................................................................................................................................................................................... +--- 10.0.0.22 ping statistics --- +2248 packets transmitted, 1811 packets received, 19% packet loss +round-trip min/avg/max = 0.7/9.3/20.1 ms +----------------------------- + +Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0" + +-------------- +jroot# tc -s filter show parent ffff: dev eth0 +filter protocol ip pref 1 u32 +filter protocol ip pref 1 u32 fh 800: ht divisor 1 +filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 +5 + + action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x1 index 2 + Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0) + + action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb + Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122) + + action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x2 index 1 + Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0) + + action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b + Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945) + + action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x3 index 3 + Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0) + + action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b + Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437) + + match 0a000015/ffffffff at 12 +------------------------------- + +Neat, eh? + + +Wanna write an action module? +------------------------------ +Its easy. Either look at the code or send me email. I will document at +some point; will also accept documentation. + +TODO +---- + +Lotsa goodies/features coming. Requests also being accepted. +At the moment the focus has been on getting the architecture in place. +Expect new things in the spurious time i have to work on this +(particularly around end of year when i have typically get time off +from work). +