1 \documentstyle[12pt,twoside]{article}
2 \def\TITLE{IP Command Reference}
5 \Large\bf IP Command Reference.
10 { \large Alexey~N.~Kuznetsov } \\
11 \em Institute for Nuclear Research, Moscow \\
12 \verb|kuznet@ms2.inr.ac.ru| \\
22 \section{About this document}
24 This document presents a comprehensive description of the \verb|ip| utility
25 from the \verb|iproute2| package. It is not a tutorial or user's guide.
26 It is a {\em dictionary\/}, not explaining terms,
27 but translating them into other terms, which may also be unknown to the reader.
28 However, the document is self-contained and the reader, provided they have a
29 basic networking background, will find enough information
30 and examples to understand and configure Linux-2.2 IP and IPv6
33 This document is split into sections explaining \verb|ip| commands
34 and options, decrypting \verb|ip| output and containing a few examples.
35 More voluminous examples and some topics, which require more elaborate
36 discussion, are in the appendix.
38 The paragraphs beginning with NB contain side notes, warnings about
39 bugs and design drawbacks. They may be skipped at the first reading.
41 \section{{\tt ip} --- command syntax}
43 The generic form of an \verb|ip| command is:
45 ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]
47 where \verb|OPTIONS| is a set of optional modifiers affecting the
48 general behaviour of the \verb|ip| utility or changing its output. All options
49 begin with the character \verb|'-'| and may be used in either long or abbreviated
50 forms. Currently, the following options are available:
53 \item \verb|-V|, \verb|-Version|
55 --- print the version of the \verb|ip| utility and exit.
58 \item \verb|-s|, \verb|-stats|, \verb|-statistics|
60 --- output more information. If the option
61 appears twice or more, the amount of information increases.
62 As a rule, the information is statistics or some time values.
65 \item \verb|-f|, \verb|-family| followed by a protocol family
66 identifier: \verb|inet|, \verb|inet6| or \verb|link|.
68 --- enforce the protocol family to use. If the option is not present,
69 the protocol family is guessed from other arguments. If the rest of the command
70 line does not give enough information to guess the family, \verb|ip| falls back to the default
71 one, usually \verb|inet| or \verb|any|. \verb|link| is a special family
72 identifier meaning that no networking protocol is involved.
76 --- shortcut for \verb|-family inet|.
80 --- shortcut for \verb|-family inet6|.
84 --- shortcut for \verb|-family link|.
87 \item \verb|-o|, \verb|-oneline|
89 --- output each record on a single line, replacing line feeds
90 with the \verb|'\'| character. This is convenient when you want to
91 count records with \verb|wc| or to \verb|grep| the output. The trivial
92 script \verb|rtpr| converts the output back into readable form.
94 \item \verb|-r|, \verb|-resolve|
96 --- use the system's name resolver to print DNS names instead of
100 Do not use this option when reporting bugs or asking for advice.
103 \verb|ip| never uses DNS to resolve names to addresses.
108 \verb|OBJECT| is the object to manage or to get information about.
109 The object types currently understood by \verb|ip| are:
112 \item \verb|link| --- network device
113 \item \verb|address| --- protocol (IP or IPv6) address on a device
114 \item \verb|neighbour| --- ARP or NDISC cache entry
115 \item \verb|route| --- routing table entry
116 \item \verb|rule| --- rule in routing policy database
117 \item \verb|maddress| --- multicast address
118 \item \verb|mroute| --- multicast routing cache entry
119 \item \verb|tunnel| --- tunnel over IP
122 Again, the names of all objects may be written in full or
123 abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr|
126 \verb|COMMAND| specifies the action to perform on the object.
127 The set of possible actions depends on the object type.
128 As a rule, it is possible to \verb|add|, \verb|delete| and
129 \verb|show| (or \verb|list|) objects, but some objects
130 do not allow all of these operations or have some additional commands.
131 The \verb|help| command is available for all objects. It prints
132 out a list of available commands and argument syntax conventions.
134 If no command is given, some default command is assumed.
135 Usually it is \verb|list| or, if the objects of this class
136 cannot be listed, \verb|help|.
138 \verb|ARGUMENTS| is a list of arguments to the command.
139 The arguments depend on the command and object. There are two types of arguments:
140 {\em flags\/}, consisting of a single keyword, and {\em parameters\/},
141 consisting of a keyword followed by a value. For convenience,
142 each command has some {\em default parameter\/}
143 which may be omitted. F.e.\ parameter \verb|dev| is the default
144 for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent
145 to {\tt ip link ls dev eth0}.
146 In the command descriptions below such parameters
147 are distinguished with the marker: ``(default)''.
149 Almost all keywords may be abbreviated with several first (or even single)
150 letters. The shortcuts are convenient when \verb|ip| is used interactively,
151 but they are not recommended in scripts or when reporting bugs
152 or asking for advice. ``Officially'' allowed abbreviations are listed
153 in the document body.
157 \section{{\tt ip} --- error messages}
159 \verb|ip| may fail for one of the following reasons:
163 A syntax error on the command line: an unknown keyword, incorrectly formatted
164 IP address {\em et al\/}. In this case \verb|ip| prints an error message
165 and exits. As a rule, the error message will contain information
166 about the reason for the failure. Sometimes it also prints a help page.
169 The arguments did not pass verification for self-consistency.
172 \verb|ip| failed to compile a kernel request from the arguments
173 because the user didn't give enough information.
176 The kernel returned an error to some syscall. In this case \verb|ip|
177 prints the error message, as it is output with \verb|perror(3)|,
178 prefixed with a comment and a syscall identifier.
181 The kernel returned an error to some RTNETLINK request.
182 In this case \verb|ip| prints the error message, as it is output
183 with \verb|perror(3)| prefixed with ``RTNETLINK answers:''.
187 All the operations are atomic, i.e.\
188 if the \verb|ip| utility fails, it does not change anything
189 in the system. One harmful exception is \verb|ip link| command
190 (Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}),
191 which may change only some of the device parameters given
194 It is difficult to list all the error messages (especially
195 syntax errors). However, as a rule, their meaning is clear
196 from the context of the command.
198 The most common mistakes are:
201 \item Netlink is not configured in the kernel. The message is:
203 Cannot open netlink socket: Invalid value
206 \item RTNETLINK is not configured in the kernel. In this case
207 one of the following messages may be printed, depending on the command:
209 Cannot talk to rtnetlink: Connection refused
210 Cannot send dump request: Connection refused
213 \item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
214 when configuring the kernel. In this case any attempt to use the
215 \verb|ip| \verb|rule| command will fail, f.e.
217 kuznet@kaiser $ ip rule list
218 RTNETLINK error: Invalid argument
225 \section{{\tt ip link} --- network device configuration}
228 \paragraph{Object:} A \verb|link| is a network device and the corresponding
229 commands display and change the state of devices.
231 \paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|).
233 \subsection{{\tt ip link set} --- change device attributes}
235 \paragraph{Abbreviations:} \verb|set|, \verb|s|.
237 \paragraph{Arguments:}
240 \item \verb|dev NAME| (default)
242 --- \verb|NAME| specifies the network device on which to operate.
244 \item \verb|up| and \verb|down|
246 --- change the state of the device to \verb|UP| or \verb|DOWN|.
248 \item \verb|arp on| or \verb|arp off|
250 --- change the \verb|NOARP| flag on the device.
253 This operation is {\em not allowed\/} if the device is in state \verb|UP|.
254 Though neither the \verb|ip| utility nor the kernel check for this condition.
255 You can get unpredictable results changing this flag while the
259 \item \verb|multicast on| or \verb|multicast off|
261 --- change the \verb|MULTICAST| flag on the device.
263 \item \verb|dynamic on| or \verb|dynamic off|
265 --- change the \verb|DYNAMIC| flag on the device.
267 \item \verb|name NAME|
269 --- change the name of the device. This operation is not
270 recommended if the device is running or has some addresses
273 \item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER|
275 --- change the transmit queue length of the device.
277 \item \verb|mtu NUMBER|
279 --- change the MTU of the device.
281 \item \verb|address LLADDRESS|
283 --- change the station address of the interface.
285 \item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS|
287 --- change the link layer broadcast address or the peer address when
288 the interface is \verb|POINTOPOINT|.
292 For most devices (f.e.\ for Ethernet) changing the link layer
293 broadcast address will break networking.
294 Do not use it, if you do not understand what this operation really does.
297 \item \verb|netns PID|
299 --- move the device to the network namespace associated with the process PID.
305 The \verb|PROMISC| and \verb|ALLMULTI| flags are considered
306 obsolete and should not be changed administratively, though
307 the {\tt ip} utility will allow that.
310 \paragraph{Warning:} If multiple parameter changes are requested,
311 \verb|ip| aborts immediately after any of the changes have failed.
312 This is the only case when \verb|ip| can move the system to
313 an unpredictable state. The solution is to avoid changing
314 several parameters with one {\tt ip link set} call.
316 \paragraph{Examples:}
318 \item \verb|ip link set dummy address 00:00:00:00:00:01|
320 --- change the station address of the interface \verb|dummy|.
322 \item \verb|ip link set dummy up|
324 --- start the interface \verb|dummy|.
329 \subsection{{\tt ip link show} --- display device attributes}
332 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
335 \paragraph{Arguments:}
337 \item \verb|dev NAME| (default)
339 --- \verb|NAME| specifies the network device to show.
340 If this argument is omitted all devices are listed.
344 --- only display running interfaces.
349 \paragraph{Output format:}
352 kuznet@alisa:~ $ ip link ls eth0
353 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
354 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
355 kuznet@alisa:~ $ ip link ls sit0
356 5: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue
357 link/sit 0.0.0.0 brd 0.0.0.0
358 kuznet@alisa:~ $ ip link ls dummy
359 2: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop
360 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
365 The number before each colon is an {\em interface index\/} or {\em ifindex\/}.
366 This number uniquely identifies the interface. This is followed by the {\em interface name\/}
367 (\verb|eth0|, \verb|sit0| etc.). The interface name is also
368 unique at every given moment. However, the interface may disappear from the
369 list (f.e.\ when the corresponding driver module is unloaded) and another
370 one with the same name may be created later. Besides that,
371 the administrator may change the name of any device with
372 \verb|ip| \verb|link| \verb|set| \verb|name|
373 to make it more intelligible.
375 The interface name may have another name or \verb|NONE| appended
376 after the \verb|@| sign. This means that this device is bound to some other
378 i.e.\ packets send through it are encapsulated and sent via the ``master''
379 device. If the name is \verb|NONE|, the master is unknown.
381 Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines
382 the maximal size of data which can be sent as a single packet over this interface.
384 {\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used
385 on the interface. Particularly, \verb|noqueue| means that this interface
386 does not queue anything and \verb|noop| means that the interface is in blackhole
387 mode i.e.\ all packets sent to it are immediately discarded.
388 {\em qlen\/} is the default transmit queue length of the device measured
391 The interface flags are summarized in the angle brackets.
394 \item \verb|UP| --- the device is turned on. It is ready to accept
395 packets for transmission and it may inject into the kernel packets received
396 from other nodes on the network.
398 \item \verb|LOOPBACK| --- the interface does not communicate with other
399 hosts. All packets sent through it will be returned
400 and nothing but bounced packets can be received.
402 \item \verb|BROADCAST| --- the device has the facility to send packets
403 to all hosts sharing the same link. A typical example is an Ethernet link.
405 \item \verb|POINTOPOINT| --- the link has only two ends with one node
406 attached to each end. All packets sent to this link will reach the peer
407 and all packets received by us came from this single peer.
409 If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT|
410 are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
411 This is the most generic type of device and the most complicated one, because
412 the host attached to a NBMA link has no means to send to anyone
413 without additionally configured information.
415 \item \verb|MULTICAST| --- is an advisory flag indicating that the interface
416 is aware of multicasting i.e.\ sending packets to some subset of neighbouring
417 nodes. Broadcasting is a particular case of multicasting, where the multicast
418 group consists of all nodes on the link. It is important to emphasize
419 that software {\em must not\/} interpret the absence of this flag as the inability
420 to use multicasting on this interface. Any \verb|POINTOPOINT| and
421 \verb|BROADCAST| link is multicasting by definition, because we have
422 direct access to all the neighbours and, hence, to any part of them.
423 Certainly, the use of high bandwidth multicast transfers is not recommended
424 on broadcast-only links because of high expense, but it is not strictly
427 \item \verb|PROMISC| --- the device listens to and feeds to the kernel all
428 traffic on the link even if it is not destined for us, not broadcasted
429 and not destined for a multicast group of which we are member. Usually
430 this mode exists only on broadcast links and is used by bridges and for network
433 \item \verb|ALLMULTI| --- the device receives all multicast packets
434 wandering on the link. This mode is used by multicast routers.
436 \item \verb|NOARP| --- this flag is different from the other ones. It has
437 no invariant value and its interpretation depends on the network protocols
438 involved. As a rule, it indicates that the device needs no address
439 resolution and that the software or hardware knows how to deliver packets
440 without any help from the protocol stacks.
442 \item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
443 dynamically created and destroyed.
445 \item \verb|SLAVE| --- this interface is bonded to some other interfaces
446 to share link capacities.
452 There are other flags but they are either obsolete (\verb|NOTRAILERS|)
453 or not implemented (\verb|DEBUG|) or specific to some devices
454 (\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss
459 The second line contains information on the link layer addresses
460 associated with the device. The first word (\verb|ether|, \verb|sit|)
461 defines the interface hardware type. This type determines the format and semantics
462 of the addresses and is logically part of the address.
463 The default format of the station address and the broadcast address
464 (or the peer address for pointopoint links) is a
465 sequence of hexadecimal bytes separated by colons, but some link
466 types may have their natural address format, f.e.\ addresses
467 of tunnels over IP are printed as dotted-quad IP addresses.
471 NBMA links have no well-defined broadcast or peer address,
472 however this field may contain useful information, f.e.\
473 about the address of broadcast relay or about the address of the ARP server.
476 Multicast addresses are not shown by this command, see
477 \verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this
482 \paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also
483 prints interface statistics:
486 kuznet@alisa:~ $ ip -s link ls eth0
487 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
488 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
489 RX: bytes packets errors dropped overrun mcast
490 2449949362 2786187 0 0 0 0
491 TX: bytes packets errors dropped carrier collsns
492 178558497 1783945 332 0 332 35172
495 \verb|RX:| and \verb|TX:| lines summarize receiver and transmitter
496 statistics. They contain:
498 \item \verb|bytes| --- the total number of bytes received or transmitted
499 on the interface. This number wraps when the maximal length of the data type
500 natural for the architecture is exceeded, so continuous monitoring requires
501 a user level daemon snapping it periodically.
502 \item \verb|packets| --- the total number of packets received or transmitted
504 \item \verb|errors| --- the total number of receiver or transmitter errors.
505 \item \verb|dropped| --- the total number of packets dropped due to lack
507 \item \verb|overrun| --- the total number of receiver overruns resulting
508 in dropped packets. As a rule, if the interface is overrun, it means
509 serious problems in the kernel or that your machine is too slow
511 \item \verb|mcast| --- the total number of received multicast packets. This option
512 is only supported by a few devices.
513 \item \verb|carrier| --- total number of link media failures f.e.\ because
515 \item \verb|collsns| --- the total number of collision events
516 on Ethernet-like media. This number may have a different sense on other
518 \item \verb|compressed| --- the total number of compressed packets. This is
519 available only for links using VJ header compression.
523 If the \verb|-s| option is entered twice or more,
524 \verb|ip| prints more detailed statistics on receiver
525 and transmitter errors.
528 kuznet@alisa:~ $ ip -s -s link ls eth0
529 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
530 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
531 RX: bytes packets errors dropped overrun mcast
532 2449949362 2786187 0 0 0 0
533 RX errors: length crc frame fifo missed
535 TX: bytes packets errors dropped carrier collsns
536 178558497 1783945 332 0 332 35172
537 TX errors: aborted fifo window heartbeat
541 These error names are pure Ethernetisms. Other devices
542 may have non zero values in these fields but they may be
543 interpreted differently.
546 \section{{\tt ip address} --- protocol address management}
548 \paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|.
550 \paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached
551 to a network device. Each device must have at least one address
552 to use the corresponding protocol. It is possible to have several
553 different addresses attached to one device. These addresses are not
554 discriminated, so that the term {\em alias\/} is not quite appropriate
555 for them and we do not use it in this document.
557 The \verb|ip addr| command displays addresses and their properties,
558 adds new addresses and deletes old ones.
560 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show|
564 \subsection{{\tt ip address add} --- add a new protocol address}
567 \paragraph{Abbreviations:} \verb|add|, \verb|a|.
569 \paragraph{Arguments:}
572 \item \verb|dev NAME|
574 \noindent--- the name of the device to add the address to.
576 \item \verb|local ADDRESS| (default)
578 --- the address of the interface. The format of the address depends
579 on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
580 separated by colons for IPv6. The \verb|ADDRESS| may be followed by
581 a slash and a decimal number which encodes the network prefix length.
584 \item \verb|peer ADDRESS|
586 --- the address of the remote endpoint for pointopoint interfaces.
587 Again, the \verb|ADDRESS| may be followed by a slash and a decimal number,
588 encoding the network prefix length. If a peer address is specified,
589 the local address {\em cannot\/} have a prefix length. The network prefix is associated
590 with the peer rather than with the local address.
593 \item \verb|broadcast ADDRESS|
595 --- the broadcast address on the interface.
597 It is possible to use the special symbols \verb|'+'| and \verb|'-'|
598 instead of the broadcast address. In this case, the broadcast address
599 is derived by setting/resetting the host bits of the interface prefix.
603 Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast
604 address unless explicitly requested.
608 \item \verb|label NAME|
610 --- Each address may be tagged with a label string.
611 In order to preserve compatibility with Linux-2.0 net aliases,
612 this string must coincide with the name of the device or must be prefixed
613 with the device name followed by colon.
616 \item \verb|scope SCOPE_VALUE|
618 --- the scope of the area where this address is valid.
619 The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|.
620 Predefined scope values are:
623 \item \verb|global| --- the address is globally valid.
624 \item \verb|site| --- (IPv6 only) the address is site local,
625 i.e.\ it is valid inside this site.
626 \item \verb|link| --- the address is link local, i.e.\
627 it is valid only on this device.
628 \item \verb|host| --- the address is valid only inside this host.
631 Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document)
632 contains more details on address scopes.
636 \paragraph{Examples:}
638 \item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host|
640 --- add the usual loopback address to the loopback device.
642 \item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias|
644 --- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask
645 \verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias|
646 to the interface \verb|eth0|.
650 \subsection{{\tt ip address delete} --- delete a protocol address}
652 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
654 \paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|.
655 The device name is a required argument. The rest are optional.
656 If no arguments are given, the first address is deleted.
658 \paragraph{Examples:}
660 \item \verb|ip addr del 127.0.0.1/8 dev lo|
662 --- deletes the loopback address from the loopback device.
663 It would be best not to repeat this experiment.
665 \item Disable IP on the interface \verb|eth0|:
667 while ip -f inet addr del dev eth0; do
671 Another method to disable IP on an interface using {\tt ip addr flush}
672 may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}.
677 \subsection{{\tt ip address show} --- display protocol addresses}
679 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
682 \paragraph{Arguments:}
685 \item \verb|dev NAME| (default)
687 --- the name of the device.
689 \item \verb|scope SCOPE_VAL|
691 --- only list addresses with this scope.
693 \item \verb|to PREFIX|
695 --- only list addresses matching this prefix.
697 \item \verb|label PATTERN|
699 --- only list addresses with labels matching the \verb|PATTERN|.
700 \verb|PATTERN| is a usual shell style pattern.
703 \item \verb|dynamic| and \verb|permanent|
705 --- (IPv6 only) only list addresses installed due to stateless
706 address configuration or only list permanent (not dynamic) addresses.
708 \item \verb|tentative|
710 --- (IPv6 only) only list addresses which did not pass duplicate
713 \item \verb|deprecated|
715 --- (IPv6 only) only list deprecated addresses.
718 \item \verb|primary| and \verb|secondary|
720 --- only list primary (or secondary) addresses.
725 \paragraph{Output format:}
728 kuznet@alisa:~ $ ip addr ls eth0
729 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
730 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
731 inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0
732 inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic
733 valid_lft forever preferred_lft 604746sec
734 inet6 fe80::2a0:ccff:fe66:1878/10 scope link
738 The first two lines coincide with the output of \verb|ip link ls|.
739 It is natural to interpret link layer addresses
740 as addresses of the protocol family \verb|AF_PACKET|.
742 Then the list of IP and IPv6 addresses follows, accompanied by
743 additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD},
744 p.\pageref{IP-ADDR-ADD} above), flags and the address label.
746 Address flags are set by the kernel and cannot be changed
747 administratively. Currently, the following flags are defined:
750 \item \verb|secondary|
752 --- the address is not used when selecting the default source address
753 of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.).
754 An IP address becomes secondary if another address with the same
755 prefix bits already exists. The first address is primary.
756 It is the leader of the group of all secondary addresses. When the leader
757 is deleted, all secondaries are purged too.
758 There is a tweak in \verb|/proc/sys/net/ipv4/conf/<dev>/promote_secondaries|
759 which activate secondaries promotion when a primary is deleted.
760 To permanently enable this feature on all devices add
761 \verb|net.ipv4.conf.all.promote_secondaries=1| to \verb|/etc/sysctl.conf|.
762 This tweak is available in linux 2.6.15 and later.
767 --- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}.
768 In this case the output also contains information on times, when
769 the address is still valid. After \verb|preferred_lft| expires the address is
770 moved to the deprecated state. After \verb|valid_lft| expires the address
771 is finally invalidated.
773 \item \verb|deprecated|
775 --- the address is deprecated, i.e.\ it is still valid, but cannot
776 be used by newly created connections.
778 \item \verb|tentative|
780 --- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF}
781 is still not complete or failed.
786 \subsection{{\tt ip address flush} --- flush protocol addresses}
787 \label{IP-ADDR-FLUSH}
789 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
791 \paragraph{Description:}This command flushes the protocol addresses
792 selected by some criteria.
794 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
795 The difference is that it does not run when no arguments are given.
797 \paragraph{Warning:} This command (and other \verb|flush| commands
798 described below) is pretty dangerous. If you make a mistake, it will
799 not forgive it, but will cruelly purge all the addresses.
801 \paragraph{Statistics:} With the \verb|-statistics| option, the command
802 becomes verbose. It prints out the number of deleted addresses and the number
803 of rounds made to flush the address list. If this option is given
804 twice, \verb|ip addr flush| also dumps all the deleted addresses
805 in the format described in the previous subsection.
807 \paragraph{Example:} Delete all the addresses from the private network
810 netadm@amber:~ # ip -s -s a f to 10/8
811 2: dummy inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy
812 3: eth0 inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0
813 4: eth1 inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1
815 *** Round 1, deleting 3 addresses ***
816 *** Flush is complete after 1 round ***
819 Another instructive example is disabling IP on all the Ethernets:
821 netadm@amber:~ # ip -4 addr flush label "eth*"
823 And the last example shows how to flush all the IPv6 addresses
824 acquired by the host from stateless address autoconfiguration
825 after you enabled forwarding or disabled autoconfiguration.
827 netadm@amber:~ # ip -6 addr flush dynamic
832 \section{{\tt ip neighbour} --- neighbour/arp tables management}
834 \paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|,
837 \paragraph{Object:} \verb|neighbour| objects establish bindings between protocol
838 addresses and link layer addresses for hosts sharing the same link.
839 Neighbour entries are organized into tables. The IPv4 neighbour table
840 is known by another name --- the ARP table.
842 The corresponding commands display neighbour bindings
843 and their properties, add new neighbour entries and delete old ones.
845 \paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|,
846 \verb|delete|, \verb|flush| and \verb|show| (or \verb|list|).
848 \paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH}
849 describes how to manage proxy ARP/NDISC with the \verb|ip| utility.
852 \subsection{{\tt ip neighbour add} --- add a new neighbour entry\\
853 {\tt ip neighbour change} --- change an existing entry\\
854 {\tt ip neighbour replace} --- add a new entry or change an existing one}
856 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
857 \verb|replace|, \verb|repl|.
859 \paragraph{Description:} These commands create new neighbour records
860 or update existing ones.
862 \paragraph{Arguments:}
865 \item \verb|to ADDRESS| (default)
867 --- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
869 \item \verb|dev NAME|
871 --- the interface to which this neighbour is attached.
874 \item \verb|lladdr LLADDRESS|
876 --- the link layer address of the neighbour. \verb|LLADDRESS| can also be
879 \item \verb|nud NUD_STATE|
881 --- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour
882 Unreachability Detection''. The state can take one of the following values:
885 \item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
887 \item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
888 this entry will be made but it can be removed when its lifetime expires.
889 \item \verb|reachable| --- the neighbour entry is valid until the reachability
891 \item \verb|stale| --- the neighbour entry is valid but suspicious.
892 This option to \verb|ip neigh| does not change the neighbour state if
893 it was valid and the address is not changed by this command.
898 \paragraph{Examples:}
900 \item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm|
902 --- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
904 \item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable|
906 --- change its state to \verb|reachable|.
910 \subsection{{\tt ip neighbour delete} --- delete a neighbour entry}
912 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
914 \paragraph{Description:} This command invalidates a neighbour entry.
916 \paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|,
917 except that \verb|lladdr| and \verb|nud| are ignored.
922 \item \verb|ip neigh del 10.0.0.3 dev eth0|
924 --- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
929 The deleted neighbour entry will not disappear from the tables
930 immediately. If it is in use it cannot be deleted until the last
931 client releases it. Otherwise it will be destroyed during
932 the next garbage collection.
936 \paragraph{Warning:} Attempts to delete or manually change
937 a \verb|noarp| entry created by the kernel may result in unpredictable behaviour.
938 Particularly, the kernel may try to resolve this address even
939 on a \verb|NOARP| interface or if the address is multicast or broadcast.
942 \subsection{{\tt ip neighbour show} --- list neighbour entries}
944 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|.
946 \paragraph{Description:}This commands displays neighbour tables.
948 \paragraph{Arguments:}
952 \item \verb|to ADDRESS| (default)
954 --- the prefix selecting the neighbours to list.
956 \item \verb|dev NAME|
958 --- only list the neighbours attached to this device.
962 --- only list neighbours which are not currently in use.
964 \item \verb|nud NUD_STATE|
966 --- only list neighbour entries in this state. \verb|NUD_STATE| takes
967 values listed below or the special value \verb|all| which means all states.
968 This option may occur more than once. If this option is absent, \verb|ip|
969 lists all entries except for \verb|none| and \verb|noarp|.
974 \paragraph{Output format:}
977 kuznet@alisa:~ $ ip neigh ls
978 :: dev lo lladdr 00:00:00:00:00:00 nud noarp
979 fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \
981 0.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp
982 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable
983 193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale
987 The first word of each line is the protocol address of the neighbour.
988 Then the device name follows. The rest of the line describes the contents of
989 the neighbour entry identified by the pair (device, address).
991 \verb|lladdr| is the link layer address of the neighbour.
993 \verb|nud| is the state of the ``neighbour unreachability detection'' machine
994 for this entry. The detailed description of the neighbour
995 state machine can be found in~\cite{RFC-NDISC}. Here is the full list
996 of the states with short descriptions:
999 \item\verb|none| --- the state of the neighbour is void.
1000 \item\verb|incomplete| --- the neighbour is in the process of resolution.
1001 \item\verb|reachable| --- the neighbour is valid and apparently reachable.
1002 \item\verb|stale| --- the neighbour is valid, but is probably already
1003 unreachable, so the kernel will try to check it at the first transmission.
1004 \item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
1006 \item\verb|probe| --- the delay timer expired but no confirmation was received.
1007 The kernel has started to probe the neighbour with ARP/NDISC messages.
1008 \item\verb|failed| --- resolution has failed.
1009 \item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1011 \item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator
1012 may remove the entry from the neighbour table.
1015 The link layer address is valid in all states except for \verb|none|,
1016 \verb|failed| and \verb|incomplete|.
1018 IPv6 neighbours can be marked with the additional flag \verb|router|
1019 which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}.
1021 \paragraph{Statistics:} The \verb|-statistics| option displays some usage
1025 kuznet@alisa:~ $ ip -s n ls 193.233.7.254
1026 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1031 Here \verb|ref| is the number of users of this entry
1032 and \verb|used| is a triplet of time intervals in seconds
1033 separated by slashes. In this case they show that:
1036 \item the entry was used 12 seconds ago.
1037 \item the entry was confirmed 13 seconds ago.
1038 \item the entry was updated 20 seconds ago.
1041 \subsection{{\tt ip neighbour flush} --- flush neighbour entries}
1043 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1045 \paragraph{Description:}This command flushes neighbour tables, selecting
1046 entries to flush by some criteria.
1048 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
1049 The differences are that it does not run when no arguments are given,
1050 and that the default neighbour states to be flushed do not include
1051 \verb|permanent| and \verb|noarp|.
1054 \paragraph{Statistics:} With the \verb|-statistics| option, the command
1055 becomes verbose. It prints out the number of deleted neighbours and the number
1056 of rounds made to flush the neighbour table. If the option is given
1057 twice, \verb|ip neigh flush| also dumps all the deleted neighbours
1058 in the format described in the previous subsection.
1060 \paragraph{Example:}
1062 netadm@alisa:~ # ip -s -s n f 193.233.7.254
1063 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1066 *** Round 1, deleting 1 entries ***
1067 *** Flush is complete after 1 round ***
1072 \section{{\tt ip route} --- routing table management}
1075 \paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|.
1077 \paragraph{Object:} \verb|route| entries in the kernel routing tables keep
1078 information about paths to other networked nodes.
1080 Each route entry has a {\em key\/} consisting of a {\em prefix\/}
1081 (i.e.\ a pair containing a network address and the length of its mask) and,
1082 optionally, the TOS value. An IP packet matches the route if the highest
1083 bits of its destination address are equal to the route prefix at least
1084 up to the prefix length and if the TOS of the route is zero or equal to
1085 the TOS of the packet.
1087 If several routes match the packet, the following pruning rules
1088 are used to select the best one (see~\cite{RFC1812}):
1090 \item The longest matching prefix is selected. All shorter ones
1093 \item If the TOS of some route with the longest prefix is equal to the TOS
1094 of the packet, the routes with different TOS are dropped.
1096 If no exact TOS match was found and routes with TOS=0 exist,
1097 the rest of routes are pruned.
1099 Otherwise, the route lookup fails.
1101 \item If several routes remain after the previous steps, then
1102 the routes with the best preference values are selected.
1104 \item If we still have several routes, then the {\em first\/} of them
1108 Note the ambiguity of the last step. Unfortunately, Linux
1109 historically allows such a bizarre situation. The sense of the
1110 word ``first'' depends on the order of route additions and it is practically
1111 impossible to maintain a bundle of such routes in this order.
1114 For simplicity we will limit ourselves to the case where such a situation
1115 is impossible and routes are uniquely identified by the triplet
1116 \{prefix, tos, preference\}. Actually, it is impossible to create
1117 non-unique routes with \verb|ip| commands described in this section.
1119 One useful exception to this rule is the default route on non-forwarding
1120 hosts. It is ``officially'' allowed to have several fallback routes
1121 when several routers are present on directly connected networks.
1122 In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122}
1123 controlled by neighbour unreachability detection and by advice
1124 from transport protocols to select a working router, so the order
1125 of the routes is not essential. However, in this case,
1126 fiddling with default routes manually is not recommended. Use the Router Discovery
1127 protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP})
1128 instead. Actually, Linux-2.2 IPv6 does not give user level applications
1129 any access to default routes.
1132 Certainly, the steps above are not performed exactly
1133 in this sequence. Instead, the routing table in the kernel is kept
1134 in some data structure to achieve the final result
1135 with minimal cost. However, not depending on a particular
1136 routing algorithm implemented in the kernel, we can summarize
1137 the statements above as: a route is identified by the triplet
1138 \{prefix, tos, preference\}. This {\em key\/} lets us locate
1139 the route in the routing table.
1141 \paragraph{Route attributes:} Each route key refers to a routing
1142 information record containing
1143 the data required to deliver IP packets (f.e.\ output device and
1144 next hop router) and some optional attributes (f.e. the path MTU or
1145 the preferred source address when communicating with this destination).
1146 These attributes are described in the following subsection.
1148 \paragraph{Route types:} \label{IP-ROUTE-TYPES}
1149 It is important that the set
1150 of required and optional attributes depend on the route {\em type\/}.
1151 The most important route type
1152 is \verb|unicast|. It describes real paths to other hosts.
1153 As a rule, common routing tables contain only such routes. However,
1154 there are other types of routes with different semantics. The
1155 full list of types understood by Linux-2.2 is:
1157 \item \verb|unicast| --- the route entry describes real paths to the
1158 destinations covered by the route prefix.
1159 \item \verb|unreachable| --- these destinations are unreachable. Packets
1160 are discarded and the ICMP message {\em host unreachable\/} is generated.
1161 The local senders get an \verb|EHOSTUNREACH| error.
1162 \item \verb|blackhole| --- these destinations are unreachable. Packets
1163 are discarded silently. The local senders get an \verb|EINVAL| error.
1164 \item \verb|prohibit| --- these destinations are unreachable. Packets
1165 are discarded and the ICMP message {\em communication administratively
1166 prohibited\/} is generated. The local senders get an \verb|EACCES| error.
1167 \item \verb|local| --- the destinations are assigned to this
1168 host. The packets are looped back and delivered locally.
1169 \item \verb|broadcast| --- the destinations are broadcast addresses.
1170 The packets are sent as link broadcasts.
1171 \item \verb|throw| --- a special control route used together with policy
1172 rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup
1173 in this table is terminated pretending that no route was found.
1174 Without policy routing it is equivalent to the absence of the route in the routing
1175 table. The packets are dropped and the ICMP message {\em net unreachable\/}
1176 is generated. The local senders get an \verb|ENETUNREACH| error.
1177 \item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1178 are considered to be dummy (or external) addresses which require translation
1179 to real (or internal) ones before forwarding. The addresses to translate to
1180 are selected with the attribute \verb|via|. More about NAT is
1181 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
1182 \item \verb|anycast| --- ({\em not implemented\/}) the destinations are
1183 {\em anycast\/} addresses assigned to this host. They are mainly equivalent
1184 to \verb|local| with one difference: such addresses are invalid when used
1185 as the source address of any packet.
1186 \item \verb|multicast| --- a special type used for multicast routing.
1187 It is not present in normal routing tables.
1190 \paragraph{Route tables:} Linux-2.2 can pack routes into several routing
1191 tables identified by a number in the range from 1 to 255 or by
1192 name from the file \verb|/etc/iproute2/rt_tables|. By default all normal
1193 routes are inserted into the \verb|main| table (ID 254) and the kernel only uses
1194 this table when calculating routes.
1196 Actually, one other table always exists, which is invisible but
1197 even more important. It is the \verb|local| table (ID 255). This table
1198 consists of routes for local and broadcast addresses. The kernel maintains
1199 this table automatically and the administrator usually need not modify it
1202 The multiple routing tables enter the game when {\em policy routing\/}
1203 is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}.
1204 In this case, the table identifier effectively becomes
1205 one more parameter, which should be added to the triplet
1206 \{prefix, tos, preference\} to uniquely identify the route.
1209 \subsection{{\tt ip route add} --- add a new route\\
1210 {\tt ip route change} --- change a route\\
1211 {\tt ip route replace} --- change a route or add a new one}
1212 \label{IP-ROUTE-ADD}
1214 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
1215 \verb|replace|, \verb|repl|.
1218 \paragraph{Arguments:}
1220 \item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default)
1222 --- the destination prefix of the route. If \verb|TYPE| is omitted,
1223 \verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE|
1224 are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed
1225 by a slash and the prefix length. If the length of the prefix is missing,
1226 \verb|ip| assumes a full-length host route. There is also a special
1227 \verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or
1228 to IPv6 \verb|::/0|.
1230 \item \verb|tos TOS| or \verb|dsfield TOS|
1232 --- the Type Of Service (TOS) key. This key has no associated mask and
1233 the longest match is understood as: First, compare the TOS
1234 of the route and of the packet. If they are not equal, then the packet
1235 may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal
1236 number or an identifier from {\tt /etc/iproute2/rt\_dsfield}.
1239 \item \verb|metric NUMBER| or \verb|preference NUMBER|
1241 --- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number.
1243 \item \verb|table TABLEID|
1245 --- the table to add this route to.
1246 \verb|TABLEID| may be a number or a string from the file
1247 \verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1248 \verb|ip| assumes the \verb|main| table, with the exception of
1249 \verb|local|, \verb|broadcast| and \verb|nat| routes, which are
1250 put into the \verb|local| table by default.
1252 \item \verb|dev NAME|
1254 --- the output device name.
1256 \item \verb|via ADDRESS|
1258 --- the address of the nexthop router. Actually, the sense of this field depends
1259 on the route type. For normal \verb|unicast| routes it is either the true nexthop
1260 router or, if it is a direct route installed in BSD compatibility mode,
1261 it can be a local address of the interface.
1262 For NAT routes it is the first address of the block of translated IP destinations.
1264 \item \verb|src ADDRESS|
1266 --- the source address to prefer when sending to the destinations
1267 covered by the route prefix.
1269 \item \verb|realm REALMID|
1271 --- the realm to which this route is assigned.
1272 \verb|REALMID| may be a number or a string from the file
1273 \verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS})
1274 contains more information on realms.
1276 \item \verb|mtu MTU| or \verb|mtu lock MTU|
1278 --- the MTU along the path to the destination. If the modifier \verb|lock| is
1279 not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1280 If the modifier \verb|lock| is used, no path MTU discovery will be tried,
1281 all packets will be sent without the DF bit in IPv4 case
1282 or fragmented to MTU for IPv6.
1284 \item \verb|window NUMBER|
1286 --- the maximal window for TCP to advertise to these destinations,
1287 measured in bytes. It limits maximal data bursts that our TCP
1288 peers are allowed to send to us.
1290 \item \verb|rtt NUMBER|
1292 --- the initial RTT (``Round Trip Time'') estimate.
1295 \item \verb|rttvar NUMBER|
1297 --- \threeonly the initial RTT variance estimate.
1300 \item \verb|ssthresh NUMBER|
1302 --- \threeonly an estimate for the initial slow start threshold.
1305 \item \verb|cwnd NUMBER|
1307 --- \threeonly the clamp for congestion window. It is ignored if the \verb|lock|
1311 \item \verb|advmss NUMBER|
1313 --- \threeonly the MSS (``Maximal Segment Size'') to advertise to these
1314 destinations when establishing TCP connections. If it is not given,
1315 Linux uses a default value calculated from the first hop device MTU.
1318 If the path to these destination is asymmetric, this guess may be wrong.
1321 \item \verb|reordering NUMBER|
1323 --- \threeonly Maximal reordering on the path to this destination.
1324 If it is not given, Linux uses the value selected with \verb|sysctl|
1325 variable \verb|net/ipv4/tcp_reordering|.
1329 \item \verb|nexthop NEXTHOP|
1331 --- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value
1332 with its own syntax similar to the top level argument lists:
1334 \item \verb|via ADDRESS| is the nexthop router.
1335 \item \verb|dev NAME| is the output device.
1336 \item \verb|weight NUMBER| is a weight for this element of a multipath
1337 route reflecting its relative bandwidth or quality.
1340 \item \verb|scope SCOPE_VAL|
1342 --- the scope of the destinations covered by the route prefix.
1343 \verb|SCOPE_VAL| may be a number or a string from the file
1344 \verb|/etc/iproute2/rt_scopes|.
1345 If this parameter is omitted,
1346 \verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast|
1347 routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes
1348 and scope \verb|host| for \verb|local| routes.
1350 \item \verb|protocol RTPROTO|
1352 --- the routing protocol identifier of this route.
1353 \verb|RTPROTO| may be a number or a string from the file
1354 \verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1355 not given, \verb|ip| assumes protocol \verb|boot| (i.e.\
1356 it assumes the route was added by someone who doesn't
1357 understand what they are doing). Several protocol values have a fixed interpretation.
1360 \item \verb|redirect| --- the route was installed due to an ICMP redirect.
1361 \item \verb|kernel| --- the route was installed by the kernel during
1363 \item \verb|boot| --- the route was installed during the bootup sequence.
1364 If a routing daemon starts, it will purge all of them.
1365 \item \verb|static| --- the route was installed by the administrator
1366 to override dynamic routing. Routing daemon will respect them
1367 and, probably, even advertise them to its peers.
1368 \item \verb|ra| --- the route was installed by Router Discovery protocol.
1370 The rest of the values are not reserved and the administrator is free
1371 to assign (or not to assign) protocol tags. At least, routing
1372 daemons should take care of setting some unique protocol values,
1373 f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos|
1379 --- pretend that the nexthop is directly attached to this link,
1380 even if it does not match any interface prefix. One application of this
1381 option may be found in~\cite{IP-TUNNELS}.
1383 \item \verb|equalize|
1385 --- allow packet by packet randomization on multipath routes.
1386 Without this modifier, the route will be frozen to one selected
1387 nexthop, so that load splitting will only occur on per-flow base.
1388 \verb|equalize| only works if the kernel is patched.
1395 Actually there are more commands: \verb|prepend| does the same
1396 thing as classic \verb|route add|, i.e.\ adds a route, even if another
1397 route to the same destination exists. Its opposite case is \verb|append|,
1398 which adds the route to the end of the list. Avoid these
1402 More sad news, IPv6 only understands the \verb|append| command correctly.
1403 All the others are translated into \verb|append| commands. Certainly,
1404 this will change in the future.
1407 \paragraph{Examples:}
1409 \item add a plain route to network 10.0.0/24 via gateway 193.233.7.65
1411 ip route add 10.0.0/24 via 193.233.7.65
1413 \item change it to a direct route via the \verb|dummy| device
1415 ip ro chg 10.0.0/24 dev dummy
1417 \item add a default multipath route splitting the load between \verb|ppp0|
1420 ip route add default scope global nexthop dev ppp0 \
1423 Note the scope value. It is not necessary but it informs the kernel
1424 that this route is gatewayed rather than direct. Actually, if you
1425 know the addresses of remote endpoints it would be better to use the
1426 \verb|via| parameter.
1427 \item announce that the address 192.203.80.144 is not a real one, but
1428 should be translated to 193.233.7.83 before forwarding
1430 ip route add nat 192.203.80.144 via 193.233.7.83
1432 Backward translation is setup with policy rules described
1433 in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}).
1436 \subsection{{\tt ip route delete} --- delete a route}
1438 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
1440 \paragraph{Arguments:} \verb|ip route del| has the same arguments as
1441 \verb|ip route add|, but their semantics are a bit different.
1443 Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|)
1444 select the route to delete. If optional attributes are present, \verb|ip|
1445 verifies that they coincide with the attributes of the route to delete.
1446 If no route with the given key and attributes was found, \verb|ip route del|
1449 Linux-2.0 had the option to delete a route selected only by prefix address,
1450 ignoring its length (i.e.\ netmask). This option no longer exists
1451 because it was ambiguous. However, look at {\tt ip route flush}
1452 (sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which
1453 provides similar and even richer functionality.
1456 \paragraph{Example:}
1458 \item delete the multipath route created by the command in previous subsection
1460 ip route del default scope global nexthop dev ppp0 \
1467 \subsection{{\tt ip route show} --- list routes}
1469 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
1471 \paragraph{Description:} the command displays the contents of the routing tables
1472 or the route(s) selected by some criteria.
1475 \paragraph{Arguments:}
1477 \item \verb|to SELECTOR| (default)
1479 --- only select routes from the given range of destinations. \verb|SELECTOR|
1480 consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|)
1481 and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter
1482 than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table.
1483 \verb|match PREFIX| selects routes with prefixes not longer than
1484 \verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|,
1485 \verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and
1486 \verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|)
1487 selects routes with this exact prefix. If neither of these options
1488 are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table.
1491 \item \verb|tos TOS| or \verb|dsfield TOS|
1493 --- only select routes with the given TOS.
1496 \item \verb|table TABLEID|
1498 --- show the routes from this table(s). The default setting is to show
1499 \verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table
1500 or one of the special values:
1502 \item \verb|all| --- list all of the tables.
1503 \item \verb|cache| --- dump the routing cache.
1506 IPv6 has a single table. However, splitting it into \verb|main|, \verb|local|
1507 and \verb|cache| is emulated by the \verb|ip| utility.
1510 \item \verb|cloned| or \verb|cached|
1512 --- list cloned routes i.e.\ routes which were dynamically forked from
1513 other routes because some route attribute (f.e.\ MTU) was updated.
1514 Actually, it is equivalent to \verb|table cache|.
1516 \item \verb|from SELECTOR|
1518 --- the same syntax as for \verb|to|, but it binds the source address range
1519 rather than destinations. Note that the \verb|from| option only works with
1522 \item \verb|protocol RTPROTO|
1524 --- only list routes of this protocol.
1527 \item \verb|scope SCOPE_VAL|
1529 --- only list routes with this scope.
1531 \item \verb|type TYPE|
1533 --- only list routes of this type.
1535 \item \verb|dev NAME|
1537 --- only list routes going via this device.
1539 \item \verb|via PREFIX|
1541 --- only list routes going via the nexthop routers selected by \verb|PREFIX|.
1543 \item \verb|src PREFIX|
1545 --- only list routes with preferred source addresses selected
1548 \item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM|
1550 --- only list routes with these realms.
1554 \paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp|
1557 kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1561 To count the size of the routing cache, we have to use the \verb|-o| option
1562 because cached attributes can take more than one line of output:
1564 kuznet@amber:~ $ ip -o ro ls cloned | wc
1570 \paragraph{Output format:} The output of this command consists
1571 of per route records separated by line feeds.
1572 However, some records may consist
1573 of more than one line: particularly, this is the case when the route
1574 is cloned or you requested additional statistics. If the
1575 \verb|-o| option was given, then line feeds separating lines inside
1576 records are replaced with the backslash sign.
1578 The output has the same syntax as arguments given to {\tt ip route add},
1579 so that it can be understood easily. F.e.\
1581 kuznet@amber:~ $ ip ro ls 193.233.7/24
1582 193.233.7.0/24 dev eth0 proto gated/conn scope link \
1583 src 193.233.7.65 realms inr.ac
1587 If you list cloned entries, the output contains other attributes which
1588 are evaluated during route calculation and updated during route
1589 lifetime. An example of the output is:
1591 kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache
1592 193.233.7.82 from 193.233.7.82 dev eth0 src 193.233.7.65 \
1593 realms inr.ac/inr.ac
1594 cache <src-direct,redirect> mtu 1500 rtt 300 iif eth0
1595 193.233.7.82 dev eth0 src 193.233.7.65 realms inr.ac
1596 cache mtu 1500 rtt 300
1600 \label{NB-strange-route}
1601 The route looks a bit strange, doesn't it? Did you notice that
1602 it is a path from 193.233.7.82 back to 193.233.82? Well, you will
1603 see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness})
1606 The second line, starting with the word \verb|cache|, shows
1607 additional attributes which normal routes do not possess.
1608 Cached flags are summarized in angle brackets:
1610 \item \verb|local| --- packets are delivered locally.
1611 It stands for loopback unicast routes, for broadcast routes
1612 and for multicast routes, if this host is a member of the corresponding
1615 \item \verb|reject| --- the path is bad. Any attempt to use it results
1616 in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}).
1618 \item \verb|mc| --- the destination is multicast.
1620 \item \verb|brd| --- the destination is broadcast.
1622 \item \verb|src-direct| --- the source is on a directly connected
1625 \item \verb|redirected| --- the route was created by an ICMP Redirect.
1627 \item \verb|redirect| --- packets going via this route will
1628 trigger an ICMP redirect.
1630 \item \verb|fastroute| --- the route is eligible to be used for fastroute.
1632 \item \verb|equalize| --- make packet by packet randomization
1635 \item \verb|dst-nat| --- the destination address requires translation.
1637 \item \verb|src-nat| --- the source address requires translation.
1639 \item \verb|masq| --- the source address requires masquerading.
1640 This feature disappeared in linux-2.4.
1642 \item \verb|notify| --- ({\em not implemented}) change/deletion
1643 of this route will trigger RTNETLINK notification.
1646 Then some optional attributes follow:
1648 \item \verb|error| --- on \verb|reject| routes it is error code
1649 returned to local senders when they try to use this route.
1650 These error codes are translated into ICMP error codes, sent to remote
1651 senders, according to the rules described above in the subsection
1652 devoted to route types (p.\pageref{IP-ROUTE-TYPES}).
1653 \label{IP-ROUTE-GET-error}
1655 \item \verb|expires| --- this entry will expire after this timeout.
1657 \item \verb|iif| --- the packets for this path are expected to arrive
1661 \paragraph{Statistics:} With the \verb|-statistics| option, more
1662 information about this route is shown:
1664 \item \verb|users| --- the number of users of this entry.
1665 \item \verb|age| --- shows when this route was last used.
1666 \item \verb|used| --- the number of lookups of this route since its creation.
1670 \subsection{{\tt ip route flush} --- flush routing tables}
1671 \label{IP-ROUTE-FLUSH}
1673 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1675 \paragraph{Description:} this command flushes routes selected
1678 \paragraph{Arguments:} the arguments have the same syntax and semantics
1679 as the arguments of \verb|ip route show|, but routing tables are not
1680 listed but purged. The only difference is the default action: \verb|show|
1681 dumps all the IP main routing table but \verb|flush| prints the helper page.
1682 The reason for this difference does not require any explanation, does it?
1685 \paragraph{Statistics:} With the \verb|-statistics| option, the command
1686 becomes verbose. It prints out the number of deleted routes and the number
1687 of rounds made to flush the routing table. If the option is given
1688 twice, \verb|ip route flush| also dumps all the deleted routes
1689 in the format described in the previous subsection.
1691 \paragraph{Examples:} The first example flushes all the
1692 gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1694 netadm@amber:~ # ip -4 ro flush scope global type unicast
1696 This option deserves to be put into a scriptlet \verb|routef|.
1698 This option was described in the \verb|route(8)| man page borrowed
1699 from BSD, but was never implemented in Linux.
1702 The second example flushes all IPv6 cloned routes:
1704 netadm@amber:~ # ip -6 -s -s ro flush cache
1705 3ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \
1707 cache used 2 age 12sec mtu 1500 rtt 300
1708 3ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \
1710 cache used 2 age 15sec mtu 1500 rtt 300
1711 3ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \
1713 cache users 1 used 1 age 23sec mtu 1500 rtt 300
1714 3ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \
1716 cache used 2 age 20sec mtu 1500 rtt 300
1717 3ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \
1719 cache used 2 age 33sec mtu 1500 rtt 300
1720 ff02::1 via ff02::1 dev eth1 metric 0
1721 cache users 1 used 1 age 45sec mtu 1500 rtt 300
1723 *** Round 1, deleting 6 entries ***
1724 *** Flush is complete after 1 round ***
1725 netadm@amber:~ # ip -6 -s -s ro flush cache
1730 The third example flushes BGP routing tables after a \verb|gated|
1733 netadm@amber:~ # ip ro ls proto gated/bgp | wc
1735 netadm@amber:~ # ip -s ro f proto gated/bgp
1737 *** Round 1, deleting 1408 entries ***
1738 *** Flush is complete after 1 round ***
1739 netadm@amber:~ # ip ro f proto gated/bgp
1741 netadm@amber:~ # ip ro ls proto gated/bgp
1746 \subsection{{\tt ip route get} --- get a single route}
1747 \label{IP-ROUTE-GET}
1749 \paragraph{Abbreviations:} \verb|get|, \verb|g|.
1751 \paragraph{Description:} this command gets a single route to a destination
1752 and prints its contents exactly as the kernel sees it.
1754 \paragraph{Arguments:}
1756 \item \verb|to ADDRESS| (default)
1758 --- the destination address.
1760 \item \verb|from ADDRESS|
1762 --- the source address.
1764 \item \verb|tos TOS| or \verb|dsfield TOS|
1766 --- the Type Of Service.
1768 \item \verb|iif NAME|
1770 --- the device from which this packet is expected to arrive.
1772 \item \verb|oif NAME|
1774 --- force the output device on which this packet will be routed.
1776 \item \verb|connected|
1778 --- if no source address (option \verb|from|) was given, relookup
1779 the route with the source set to the preferred address received from the first lookup.
1780 If policy routing is used, it may be a different route.
1784 Note that this operation is not equivalent to \verb|ip route show|.
1785 \verb|show| shows existing routes. \verb|get| resolves them and
1786 creates new clones if necessary. Essentially, \verb|get|
1787 is equivalent to sending a packet along this path.
1788 If the \verb|iif| argument is not given, the kernel creates a route
1789 to output packets towards the requested destination.
1790 This is equivalent to pinging the destination
1791 with a subsequent {\tt ip route ls cache}, however, no packets are
1792 actually sent. With the \verb|iif| argument, the kernel pretends
1793 that a packet arrived from this interface and searches for
1794 a path to forward the packet.
1796 \paragraph{Output format:} This command outputs routes in the same
1797 format as \verb|ip route ls|.
1799 \paragraph{Examples:}
1801 \item Find a route to output packets to 193.233.7.82:
1803 kuznet@amber:~ $ ip route get 193.233.7.82
1804 193.233.7.82 dev eth0 src 193.233.7.65 realms inr.ac
1805 cache mtu 1500 rtt 300
1809 \item Find a route to forward packets arriving on \verb|eth0|
1810 from 193.233.7.82 and destined for 193.233.7.82:
1812 kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0
1813 193.233.7.82 from 193.233.7.82 dev eth0 src 193.233.7.65 \
1814 realms inr.ac/inr.ac
1815 cache <src-direct,redirect> mtu 1500 rtt 300 iif eth0
1819 \label{NB-nature-of-strangeness}
1820 This is the command that created the funny route from 193.233.7.82
1821 looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}).
1822 Note the \verb|redirect| flag on it.
1825 \item Find a multicast route for packets arriving on \verb|eth0|
1826 from host 193.233.7.82 and destined for multicast group 224.2.127.254
1827 (it is assumed that a multicast routing daemon is running.
1828 In this case, it is \verb|pimd|)
1830 kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0
1831 multicast 224.2.127.254 from 193.233.7.82 dev lo \
1832 src 193.233.7.65 realms inr.ac/cosmos
1833 cache <mc> iif eth0 Oifs: eth1 pimreg
1836 This route differs from the ones seen before. It contains a ``normal'' part
1837 and a ``multicast'' part. The normal part is used to deliver (or not to
1838 deliver) the packet to local IP listeners. In this case the router
1840 of this group, so that route has no \verb|local| flag and only
1841 forwards packets. The output device for such entries is always loopback.
1842 The multicast part consists of an additional \verb|Oifs:| list showing
1843 the output interfaces.
1847 It is time for a more complicated example. Let us add an invalid
1848 gatewayed route for a destination which is really directly connected:
1850 netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254
1851 netadm@alisa:~ # ip route get 193.233.7.98
1852 193.233.7.98 via 193.233.7.254 dev eth0 src 193.233.7.90
1853 cache mtu 1500 rtt 3072
1856 and probe it with ping:
1858 netadm@alisa:~ # ping -n 193.233.7.98
1859 PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes
1860 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
1861 64 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms
1862 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
1863 64 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms
1864 64 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms
1865 64 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms
1866 64 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms
1868 --- 193.233.7.98 ping statistics ---
1869 5 packets transmitted, 5 packets received, 0% packet loss
1870 round-trip min/avg/max = 0.4/1.3/3.5 ms
1873 What happened? Router 193.233.7.254 understood that we have a much
1874 better path to the destination and sent us an ICMP redirect message.
1875 We may retry \verb|ip route get| to see what we have in the routing
1878 netadm@alisa:~ # ip route get 193.233.7.98
1879 193.233.7.98 dev eth0 src 193.233.7.90
1880 cache <redirected> mtu 1500 rtt 3072
1886 \section{{\tt ip rule} --- routing policy database management}
1889 \paragraph{Abbreviations:} \verb|rule|, \verb|ru|.
1891 \paragraph{Object:} \verb|rule|s in the routing policy database control
1892 the route selection algorithm.
1894 Classic routing algorithms used in the Internet make routing decisions
1895 based only on the destination address of packets (and in theory,
1896 but not in practice, on the TOS field). The seminal review of classic
1897 routing algorithms and their modifications can be found in~\cite{RFC1812}.
1899 In some circumstances we want to route packets differently depending not only
1900 on destination addresses, but also on other packet fields: source address,
1901 IP protocol, transport protocol ports or even packet payload.
1902 This task is called ``policy routing''.
1905 ``policy routing'' $\neq$ ``routing policy''.
1907 \noindent ``policy routing'' $=$ ``cunning routing''.
1909 \noindent ``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1912 To solve this task, the conventional destination based routing table, ordered
1913 according to the longest match rule, is replaced with a ``routing policy
1914 database'' (or RPDB), which selects routes
1915 by executing some set of rules. The rules may have lots of keys of different
1916 natures and therefore they have no natural ordering, but one imposed
1917 by the administrator. Linux-2.2 RPDB is a linear list of rules
1918 ordered by numeric priority value.
1919 RPDB explicitly allows matching a few packet fields:
1922 \item packet source address.
1923 \item packet destination address.
1925 \item incoming interface (which is packet metadata, rather than a packet field).
1928 Matching IP protocols and transport ports is also possible,
1929 indirectly, via \verb|ipchains|, by exploiting their ability
1930 to mark some classes of packets with \verb|fwmark|. Therefore,
1931 \verb|fwmark| is also included in the set of keys checked by rules.
1933 Each policy routing rule consists of a {\em selector\/} and an {\em action\/}
1934 predicate. The RPDB is scanned in the order of increasing priority. The selector
1935 of each rule is applied to \{source address, destination address, incoming
1936 interface, tos, fwmark\} and, if the selector matches the packet,
1937 the action is performed. The action predicate may return with success.
1938 In this case, it will either give a route or failure indication
1939 and the RPDB lookup is terminated. Otherwise, the RPDB program
1940 continues on the next rule.
1942 What is the action, semantically? The natural action is to select the
1943 nexthop and the output device. This is what
1944 Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''.
1945 The Linux-2.2 approach is more flexible. The action includes
1946 lookups in destination-based routing tables and selecting
1947 a route from these tables according to the classic longest match algorithm.
1948 The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1949 when a second level routing table contains a single default route.
1950 Recall that Linux-2.2 supports multiple tables
1951 managed with the \verb|ip route| command, described in the previous section.
1953 At startup time the kernel configures the default RPDB consisting of three
1957 \item Priority: 0, Selector: match anything, Action: lookup routing
1958 table \verb|local| (ID 255).
1959 The \verb|local| table is a special routing table containing
1960 high priority control routes for local and broadcast addresses.
1962 Rule 0 is special. It cannot be deleted or overridden.
1965 \item Priority: 32766, Selector: match anything, Action: lookup routing
1966 table \verb|main| (ID 254).
1967 The \verb|main| table is the normal routing table containing all non-policy
1968 routes. This rule may be deleted and/or overridden with other
1969 ones by the administrator.
1971 \item Priority: 32767, Selector: match anything, Action: lookup routing
1972 table \verb|default| (ID 253).
1973 The \verb|default| table is empty. It is reserved for some
1974 post-processing if no previous default rules selected the packet.
1975 This rule may also be deleted.
1979 Do not confuse routing tables with rules: rules point to routing tables,
1980 several rules may refer to one routing table and some routing tables
1981 may have no rules pointing to them. If the administrator deletes all the rules
1982 referring to a table, the table is not used, but it still exists
1983 and will disappear only after all the routes contained in it are deleted.
1986 \paragraph{Rule attributes:} Each RPDB entry has additional
1987 attributes. F.e.\ each rule has a pointer to some routing
1988 table. NAT and masquerading rules have an attribute to select new IP
1989 address to translate/masquerade. Besides that, rules have some
1990 optional attributes, which routes have, namely \verb|realms|.
1991 These values do not override those contained in the routing tables. They
1992 are only used if the route did not select any attributes.
1995 \paragraph{Rule types:} The RPDB may contain rules of the following
1998 \item \verb|unicast| --- the rule prescribes to return the route found
1999 in the routing table referenced by the rule.
2000 \item \verb|blackhole| --- the rule prescribes to silently drop the packet.
2001 \item \verb|unreachable| --- the rule prescribes to generate a ``Network
2002 is unreachable'' error.
2003 \item \verb|prohibit| --- the rule prescribes to generate
2004 ``Communication is administratively prohibited'' error.
2005 \item \verb|nat| --- the rule prescribes to translate the source address
2006 of the IP packet into some other value. More about NAT is
2007 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
2011 \paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show|
2014 \subsection{{\tt ip rule add} --- insert a new rule\\
2015 {\tt ip rule delete} --- delete a rule}
2018 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|,
2021 \paragraph{Arguments:}
2024 \item \verb|type TYPE| (default)
2026 --- the type of this rule. The list of valid types was given in the previous
2029 \item \verb|from PREFIX|
2031 --- select the source prefix to match.
2033 \item \verb|to PREFIX|
2035 --- select the destination prefix to match.
2037 \item \verb|iif NAME|
2039 --- select the incoming device to match. If the interface is loopback,
2040 the rule only matches packets originating from this host. This means that you
2041 may create separate routing tables for forwarded and local packets and,
2042 hence, completely segregate them.
2044 \item \verb|tos TOS| or \verb|dsfield TOS|
2046 --- select the TOS value to match.
2048 \item \verb|fwmark MARK|
2050 --- select the \verb|fwmark| value to match.
2052 \item \verb|priority PREFERENCE|
2054 --- the priority of this rule. Each rule should have an explicitly
2055 set {\em unique\/} priority value.
2057 Really, for historical reasons \verb|ip rule add| does not require a
2058 priority value and allows them to be non-unique.
2059 If the user does not supplied a priority, it is selected by the kernel.
2060 If the user creates a rule with a priority value that
2061 already exists, the kernel does not reject the request. It adds
2062 the new rule before all old rules of the same priority.
2064 It is mistake in design, no more. And it will be fixed one day,
2065 so do not rely on this feature. Use explicit priorities.
2069 \item \verb|table TABLEID|
2071 --- the routing table identifier to lookup if the rule selector matches.
2073 \item \verb|realms FROM/TO|
2075 --- Realms to select if the rule matched and the routing table lookup
2076 succeeded. Realm \verb|TO| is only used if the route did not select
2079 \item \verb|nat ADDRESS|
2081 --- The base of the IP address block to translate (for source addresses).
2082 The \verb|ADDRESS| may be either the start of the block of NAT addresses
2083 (selected by NAT routes) or in linux-2.2 a local host address (or even zero).
2084 In the last case the router does not translate the packets,
2085 but masquerades them to this address; this feature disappered in 2.4.
2086 More about NAT is in Appendix~\ref{ROUTE-NAT},
2087 p.\pageref{ROUTE-NAT}.
2091 \paragraph{Warning:} Changes to the RPDB made with these commands
2092 do not become active immediately. It is assumed that after
2093 a script finishes a batch of updates, it flushes the routing cache
2094 with \verb|ip route flush cache|.
2096 \paragraph{Examples:}
2098 \item Route packets with source addresses from 192.203.80/24
2099 according to routing table \verb|inr.ruhep|:
2101 ip ru add from 192.203.80.0/24 table inr.ruhep prio 220
2104 \item Translate packet source address 193.233.7.83 into 192.203.80.144
2105 and route it according to table \#1 (actually, it is \verb|inr.ruhep|):
2107 ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320
2110 \item Delete the unused default rule:
2112 ip ru del prio 32767
2119 \subsection{{\tt ip rule show} --- list rules}
2120 \label{IP-RULE-SHOW}
2122 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2125 \paragraph{Arguments:} Good news, this is one command that has no arguments.
2127 \paragraph{Output format:}
2130 kuznet@amber:~ $ ip ru ls
2131 0: from all lookup local
2132 200: from 192.203.80.0/24 to 193.233.7.0/24 lookup main
2133 210: from 192.203.80.0/24 to 192.203.80.0/24 lookup main
2134 220: from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
2135 300: from 193.233.7.83 to 193.233.7.0/24 lookup main
2136 310: from 193.233.7.83 to 192.203.80.0/24 lookup main
2137 320: from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2138 32766: from all lookup main
2142 In the first column is the rule priority value followed
2143 by a colon. Then the selectors follow. Each key is prefixed
2144 with the same keyword that was used to create the rule.
2146 The keyword \verb|lookup| is followed by a routing table identifier,
2147 as it is recorded in the file \verb|/etc/iproute2/rt_tables|.
2149 If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword
2150 \verb|map-to| followed by the start of the block of addresses to map.
2152 The sense of this example is pretty simple. The prefixes
2153 192.203.80.0/24 and 193.233.7.0/24 form the internal network, but
2154 they are routed differently when the packets leave it.
2155 Besides that, the host 193.233.7.83 is translated into
2156 another prefix to look like 192.203.80.144 when talking
2161 \section{{\tt ip maddress} --- multicast addresses management}
2164 \paragraph{Object:} \verb|maddress| objects are multicast addresses.
2166 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|).
2168 \subsection{{\tt ip maddress show} --- list multicast addresses}
2170 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2172 \paragraph{Arguments:}
2176 \item \verb|dev NAME| (default)
2178 --- the device name.
2182 \paragraph{Output format:}
2185 kuznet@alisa:~ $ ip maddr ls dummy
2187 link 33:33:00:00:00:01
2188 link 01:00:5e:00:00:01
2189 inet 224.0.0.1 users 2
2194 The first line of the output shows the interface index and its name.
2195 Then the multicast address list follows. Each line starts with the
2196 protocol identifier. The word \verb|link| denotes a link layer
2197 multicast addresses.
2199 If a multicast address has more than one user, the number
2200 of users is shown after the \verb|users| keyword.
2202 One additional feature not present in the example above
2203 is the \verb|static| flag, which indicates that the address was joined
2204 with \verb|ip maddr add|. See the following subsection.
2208 \subsection{{\tt ip maddress add} --- add a multicast address\\
2209 {\tt ip maddress delete} --- delete a multicast address}
2211 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|.
2213 \paragraph{Description:} these commands attach/detach
2214 a static link layer multicast address to listen on the interface.
2215 Note that it is impossible to join protocol multicast groups
2216 statically. This command only manages link layer addresses.
2219 \paragraph{Arguments:}
2222 \item \verb|address LLADDRESS| (default)
2224 --- the link layer multicast address.
2226 \item \verb|dev NAME|
2228 --- the device to join/leave this multicast address.
2233 \paragraph{Example:} Let us continue with the example from the previous subsection.
2236 netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy
2237 netadm@alisa:~ # ip -0 maddr ls dummy
2239 link 33:33:00:00:00:01 users 2 static
2240 link 01:00:5e:00:00:01
2241 netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy
2245 Neither \verb|ip| nor the kernel check for multicast address validity.
2246 Particularly, this means that you can try to load a unicast address
2247 instead of a multicast address. Most drivers will ignore such addresses,
2248 but several (f.e.\ Tulip) will intern it to their on-board filter.
2249 The effects may be strange. Namely, the addresses become additional
2250 local link addresses and, if you loaded the address of another host
2251 to the router, wait for duplicated packets on the wire.
2252 It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2253 This feature is really more useful for traffic monitoring, but using it
2254 with Linux-2.2 you {\em have to\/} be sure that the host is not
2255 a router and, especially, that it is not a transparent proxy or masquerading
2261 \section{{\tt ip mroute} --- multicast routing cache management}
2264 \paragraph{Abbreviations:} \verb|mroute|, \verb|mr|.
2266 \paragraph{Object:} \verb|mroute| objects are multicast routing cache
2267 entries created by a user level mrouting daemon
2268 (f.e.\ \verb|pimd| or \verb|mrouted|).
2270 Due to the limitations of the current interface to the multicast routing
2271 engine, it is impossible to change \verb|mroute| objects administratively,
2272 so we may only display them. This limitation will be removed
2275 \paragraph{Commands:} \verb|show| (or \verb|list|).
2278 \subsection{{\tt ip mroute show} --- list mroute cache entries}
2280 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2282 \paragraph{Arguments:}
2285 \item \verb|to PREFIX| (default)
2287 --- the prefix selecting the destination multicast addresses to list.
2290 \item \verb|iif NAME|
2292 --- the interface on which multicast packets are received.
2295 \item \verb|from PREFIX|
2297 --- the prefix selecting the IP source addresses of the multicast route.
2302 \paragraph{Output format:}
2305 kuznet@amber:~ $ ip mroute ls
2306 (193.232.127.6, 224.0.1.39) Iif: unresolved
2307 (193.232.244.34, 224.0.1.40) Iif: unresolved
2308 (193.233.7.65, 224.66.66.66) Iif: eth0 Oifs: pimreg
2312 Each line shows one (S,G) entry in the multicast routing cache,
2313 where S is the source address and G is the multicast group. \verb|Iif| is
2314 the interface on which multicast packets are expected to arrive.
2315 If the word \verb|unresolved| is there instead of the interface name,
2316 it means that the routing daemon still hasn't resolved this entry.
2317 The keyword \verb|oifs| is followed by a list of output interfaces, separated
2318 by spaces. If a multicast routing entry is created with non-trivial
2319 TTL scope, administrative distances are appended to the device names
2320 in the \verb|oifs| list.
2322 \paragraph{Statistics:} The \verb|-statistics| option also prints the
2323 number of packets and bytes forwarded along this route and
2324 the number of packets that arrived on the wrong interface, if this number is not zero.
2327 kuznet@amber:~ $ ip -s mr ls 224.66/16
2328 (193.233.7.65, 224.66.66.66) Iif: eth0 Oifs: pimreg
2329 9383 packets, 300256 bytes
2334 \section{{\tt ip tunnel} --- tunnel configuration}
2337 \paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|.
2339 \paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating
2340 packets in IPv4 packets and then sending them over the IP infrastructure.
2342 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show|
2345 \paragraph{See also:} A more informal discussion of tunneling
2346 over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}.
2348 \subsection{{\tt ip tunnel add} --- add a new tunnel\\
2349 {\tt ip tunnel change} --- change an existing tunnel\\
2350 {\tt ip tunnel delete} --- destroy a tunnel}
2352 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
2353 \verb|delete|, \verb|del|, \verb|d|.
2356 \paragraph{Arguments:}
2360 \item \verb|name NAME| (default)
2362 --- select the tunnel device name.
2364 \item \verb|mode MODE|
2366 --- set the tunnel mode. Three modes are currently available:
2367 \verb|ipip|, \verb|sit| and \verb|gre|.
2369 \item \verb|remote ADDRESS|
2371 --- set the remote endpoint of the tunnel.
2373 \item \verb|local ADDRESS|
2375 --- set the fixed local address for tunneled packets.
2376 It must be an address on another interface of this host.
2380 --- set a fixed TTL \verb|N| on tunneled packets.
2381 \verb|N| is a number in the range 1--255. 0 is a special value
2382 meaning that packets inherit the TTL value.
2383 The default value is: \verb|inherit|.
2385 \item \verb|tos T| or \verb|dsfield T|
2387 --- set a fixed TOS \verb|T| on tunneled packets.
2388 The default value is: \verb|inherit|.
2392 \item \verb|dev NAME|
2394 --- bind the tunnel to the device \verb|NAME| so that
2395 tunneled packets will only be routed via this device and will
2396 not be able to escape to another device when the route to endpoint changes.
2398 \item \verb|nopmtudisc|
2400 --- disable Path MTU Discovery on this tunnel.
2401 It is enabled by default. Note that a fixed ttl is incompatible
2402 with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2404 \item \verb|key K|, \verb|ikey K|, \verb|okey K|
2406 --- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is
2407 either a number or an IP address-like dotted quad.
2408 The \verb|key| parameter sets the key to use in both directions.
2409 The \verb|ikey| and \verb|okey| parameters set different keys for input and output.
2412 \item \verb|csum|, \verb|icsum|, \verb|ocsum|
2414 --- (only GRE tunnels) generate/require checksums for tunneled packets.
2415 The \verb|ocsum| flag calculates checksums for outgoing packets.
2416 The \verb|icsum| flag requires that all input packets have the correct
2417 checksum. The \verb|csum| flag is equivalent to the combination
2418 ``\verb|icsum| \verb|ocsum|''.
2420 \item \verb|seq|, \verb|iseq|, \verb|oseq|
2422 --- (only GRE tunnels) serialize packets.
2423 The \verb|oseq| flag enables sequencing of outgoing packets.
2424 The \verb|iseq| flag requires that all input packets are serialized.
2425 The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''.
2428 I think this option does not
2429 work. At least, I did not test it, did not debug it and
2430 do not even understand how it is supposed to work or for what
2431 purpose Cisco planned to use it. Do not use it.
2437 \paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32.
2439 netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \
2440 local 192.203.80.142 ttl 32
2443 \subsection{{\tt ip tunnel show} --- list tunnels}
2445 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2448 \paragraph{Arguments:} None.
2450 \paragraph{Output format:}
2452 kuznet@amber:~ $ ip tunl ls Cisco
2453 Cisco: ipv6/ip remote 192.31.7.104 local 192.203.80.142 ttl 32
2456 The line starts with the tunnel device name followed by a colon.
2457 Then the tunnel mode follows. The parameters of the tunnel are listed
2458 with the same keywords that were used when creating the tunnel.
2460 \paragraph{Statistics:}
2463 kuznet@amber:~ $ ip -s tunl ls Cisco
2464 Cisco: ipv6/ip remote 192.31.7.104 local 192.203.80.142 ttl 32
2465 RX: Packets Bytes Errors CsumErrs OutOfSeq Mcasts
2466 12566 1707516 0 0 0 0
2467 TX: Packets Bytes Errors DeadLoop NoRoute NoBufs
2468 13445 1879677 0 0 0 0
2471 Essentially, these numbers are the same as the numbers
2472 printed with {\tt ip -s link show}
2473 (sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different
2474 to reflect that they are tunnel specific.
2476 \item \verb|CsumErrs| --- the total number of packets dropped
2477 because of checksum failures for a GRE tunnel with checksumming enabled.
2478 \item \verb|OutOfSeq| --- the total number of packets dropped
2479 because they arrived out of sequence for a GRE tunnel with
2480 serialization enabled.
2481 \item \verb|Mcasts| --- the total number of multicast packets
2482 received on a broadcast GRE tunnel.
2483 \item \verb|DeadLoop| --- the total number of packets which were not
2484 transmitted because the tunnel is looped back to itself.
2485 \item \verb|NoRoute| --- the total number of packets which were not
2486 transmitted because there is no IP route to the remote endpoint.
2487 \item \verb|NoBufs| --- the total number of packets which were not
2488 transmitted because the kernel failed to allocate a buffer.
2492 \section{{\tt ip monitor} and {\tt rtmon} --- state monitoring}
2495 The \verb|ip| utility can monitor the state of devices, addresses
2496 and routes continuously. This option has a slightly different format.
2498 the \verb|monitor| command is the first in the command line and then
2499 the object list follows:
2501 ip monitor [ file FILE ] [ all | OBJECT-LIST ]
2503 \verb|OBJECT-LIST| is the list of object types that we want to monitor.
2504 It may contain \verb|link|, \verb|address| and \verb|route|.
2505 If no \verb|file| argument is given, \verb|ip| opens RTNETLINK,
2506 listens on it and dumps state changes in the format described
2507 in previous sections.
2509 If a file name is given, it does not listen on RTNETLINK,
2510 but opens the file containing RTNETLINK messages saved in binary format
2511 and dumps them. Such a history file can be generated with the
2512 \verb|rtmon| utility. This utility has a command line syntax similar to
2514 Ideally, \verb|rtmon| should be started before
2515 the first network configuration command is issued. F.e.\ if
2518 rtmon file /var/log/rtmon.log
2520 in a startup script, you will be able to view the full history
2523 Certainly, it is possible to start \verb|rtmon| at any time.
2524 It prepends the history with the state snapshot dumped at the moment
2528 \section{Route realms and policy propagation, {\tt rtacct}}
2531 On routers using OSPF ASE or, especially, the BGP protocol, routing
2532 tables may be huge. If we want to classify or to account for the packets
2533 per route, we will have to keep lots of information. Even worse, if we
2534 want to distinguish the packets not only by their destination, but
2535 also by their source, the task gets quadratic complexity and its solution
2536 is physically impossible.
2538 One approach to propagating the policy from routing protocols
2539 to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}.
2540 Essentially, Cisco Policy Propagation via BGP is based on the fact
2541 that dedicated routers all have the RIB (Routing Information Base)
2542 close to the forwarding engine, so policy routing rules can
2543 check all the route attributes, including ASPATH information
2544 and community strings.
2546 The Linux architecture, splitting the RIB (maintained by a user level
2547 daemon) and the kernel based FIB (Forwarding Information Base),
2548 does not allow such a simple approach.
2550 It is to our fortune because there is another solution
2551 which allows even more flexible policy and richer semantics.
2553 Namely, routes can be clustered together in user space, based on their
2554 attributes. F.e.\ a BGP router knows route ASPATH, its community;
2555 an OSPF router knows the route tag or its area. The administrator, when adding
2556 routes manually, also knows their nature. Providing that the number of such
2557 aggregates (we call them {\em realms\/}) is low, the task of full
2558 classification both by source and destination becomes quite manageable.
2560 So each route may be assigned to a realm. It is assumed that
2561 this identification is made by a routing daemon, but static routes
2562 can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE},
2563 p.\pageref{IP-ROUTE}).
2565 There is a patch to \verb|gated|, allowing classification of routes
2566 to realms with all the set of policy rules implemented in \verb|gated|:
2567 by prefix, by ASPATH, by origin, by tag etc.
2570 To facilitate the construction (f.e.\ in case the routing
2571 daemon is not aware of realms), missing realms may be completed
2572 with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}.
2574 For each packet the kernel calculates a tuple of realms: source realm
2575 and destination realm, using the following algorithm:
2578 \item If the route has a realm, the destination realm of the packet is set to it.
2579 \item If the rule has a source realm, the source realm of the packet is set to it.
2580 If the destination realm was not inherited from the route and the rule has a destination realm,
2582 \item If at least one of the realms is still unknown, the kernel finds
2583 the reversed route to the source of the packet.
2584 \item If the source realm is still unknown, get it from the reversed route.
2585 \item If one of the realms is still unknown, swap the realms of reversed
2586 routes and apply step 2 again.
2589 After this procedure is completed we know what realm the packet
2590 arrived from and the realm where it is going to propagate to.
2591 If some of the realms are unknown, they are initialized to zero
2592 (or realm \verb|unknown|).
2594 The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF},
2595 where they are used to help assign packets to traffic classes,
2596 to account, police and schedule them according to this
2599 A much simpler but still very useful application is incoming packet
2600 accounting by realms. The kernel gathers a packet statistics summary
2601 which can be viewed with the \verb|rtacct| utility.
2603 kuznet@amber:~ $ rtacct russia
2604 Realm BytesTo PktsTo BytesFrom PktsFrom
2605 russia 20576778 169176 47080168 153805
2608 This shows that this router received 153805 packets from
2609 the realm \verb|russia| and forwarded 169176 packets to \verb|russia|.
2610 The realm \verb|russia| consists of routes with ASPATHs not leaving
2613 Note that locally originating packets are not accounted here,
2614 \verb|rtacct| shows incoming packets only. Using the \verb|route|
2615 classifier (see~\cite{TC-CREF}) you can get even more detailed
2616 accounting information about outgoing packets, optionally
2617 summarizing traffic not only by source or destination, but
2618 by any pair of source and destination realms.
2621 \begin{thebibliography}{99}
2622 \addcontentsline{toc}{section}{References}
2623 \bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson.
2624 ``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461.
2626 \bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten.
2627 ``IPv6 Stateless Address Autoconfiguration'', RFC-2462.
2629 \bibitem{RFC1812} F.~Baker.
2630 ``Requirements for IP Version 4 Routers'', RFC-1812.
2632 \bibitem{RFC1122} R.~T.~Braden.
2633 ``Requirements for Internet hosts --- communication layers'', RFC-1122.
2635 \bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols
2636 Command Reference, Part 1'' and
2637 ``Cisco IOS Release 12.0 Quality of Service Solutions
2638 Configuration Guide: Configuring Policy-Based Routing'',\\
2639 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2641 \bibitem{IP-TUNNELS} A.~N.~Kuznetsov.
2642 ``Tunnels over IP in Linux-2.2'', \\
2643 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2645 \bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2646 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2648 \bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions
2649 Configuration Guide: Configuring QoS Policy Propagation via
2650 Border Gateway Protocol'',\\
2651 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2653 \bibitem{RFC-DHCP} R.~Droms.
2654 ``Dynamic Host Configuration Protocol.'', RFC-2131
2656 \end{thebibliography}
2662 \addcontentsline{toc}{section}{Appendix}
2664 \section{Source address selection}
2667 When a host creates an IP packet, it must select some source
2668 address. Correct source address selection is a critical procedure,
2669 because it gives the receiver the information needed to deliver a
2670 reply. If the source is selected incorrectly, in the best case,
2671 the backward path may appear different to the forward one which
2672 is harmful for performance. In the worst case, when the addresses
2673 are administratively scoped, the reply may be lost entirely.
2675 Linux-2.2 selects source addresses using the following algorithm:
2679 The application may select a source address explicitly with \verb|bind(2)|
2680 syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object
2681 \verb|IP_PKTINFO|. In this case the kernel only checks the validity
2682 of the address and never tries to ``improve'' an incorrect user choice,
2683 generating an error instead.
2685 Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks
2686 this axiom. It has been made deliberately with the purpose
2687 of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2688 However, this hack {\em must not\/} be used on multihomed hosts
2689 and especially on routers: it would break them.
2693 \item Otherwise, IP routing tables can contain an explicit source
2694 address hint for this destination. The hint is set with the \verb|src| parameter
2695 to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}.
2698 \item Otherwise, the kernel searches through the list of addresses
2699 attached to the interface through which the packets will be routed.
2700 The search strategies are different for IP and IPv6. Namely:
2703 \item IPv6 searches for the first valid, not deprecated address
2704 with the same scope as the destination.
2706 \item IP searches for the first valid address with a scope wider
2707 than the scope of the destination but it prefers addresses
2708 which fall to the same subnet as the nexthop of the route
2709 to the destination. Unlike IPv6, the scopes of IPv4 destinations
2710 are not encoded in their addresses but are supplied
2711 in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command,
2712 sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}).
2717 \item Otherwise, if the scope of the destination is \verb|link| or \verb|host|,
2718 the algorithm fails and returns a zero source address.
2720 \item Otherwise, all interfaces are scanned to search for an address
2721 with an appropriate scope. The loopback device \verb|lo| is always the first
2722 in the search list, so that if an address with global scope (not 127.0.0.1!)
2723 is configured on loopback, it is always preferred.
2728 \section{Proxy ARP/NDISC}
2731 Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2732 In Linux-2.2 proxy ARP on an interface may be enabled
2733 by setting the kernel \verb|sysctl| variable
2734 \verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router
2735 starts to answer ARP requests on the interface \verb|<dev>|, provided
2736 the route to the requested destination does {\em not\/} go back via the same
2739 The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2740 ARP on all the IP devices.
2742 However, this approach fails in the case of IPv6 because the router
2743 must join the solicited node multicast address to listen for the corresponding
2744 NDISC queries. It means that proxy NDISC is possible only on a per destination
2747 Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2748 in user space. However, similar functionality was present in BSD kernels
2749 and in Linux-2.0, so we have to preserve it at least to the extent that
2750 is standardized in BSD.
2752 Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP.
2753 It is replaced with the sysctl flag in Linux-2.2.
2757 The \verb|ip| utility provides a way to manage proxy ARP/NDISC
2758 with the \verb|ip neigh| command, namely:
2760 ip neigh add proxy ADDRESS [ dev NAME ]
2762 adds a new proxy ARP/NDISC record and
2764 ip neigh del proxy ADDRESS [ dev NAME ]
2768 If the name of the device is not given, the router will answer solicitations
2769 for address \verb|ADDRESS| on all devices, otherwise it will only serve
2770 the device \verb|NAME|. Even if the proxy entry is created with
2771 \verb|ip neigh|, the router {\em will not\/} answer a query if the route
2772 to the destination goes back via the interface from which the solicitation
2775 It is important to emphasize that proxy entries have {\em no\/}
2776 parameters other than these (IP/IPv6 address and optional device).
2777 Particularly, the entry does not store any link layer address.
2778 It always advertises the station address of the interface
2779 on which it sends advertisements (i.e. it's own station address).
2781 \section{Route NAT status}
2784 NAT (or ``Network Address Translation'') remaps some parts
2785 of the IP address space into other ones. Linux-2.2 route NAT is supposed
2786 to be used to facilitate policy routing by rewriting addresses
2787 to other routing domains or to help while renumbering sites
2790 \paragraph{What it is not:}
2791 It is necessary to emphasize that {\em it is not supposed\/}
2792 to be used to compress address space or to split load.
2793 This is not missing functionality but a design principle.
2794 Route NAT is {\em stateless\/}. It does not hold any state
2795 about translated sessions. This means that it handles any number
2796 of sessions flawlessly. But it also means that it is {\em static\/}.
2797 It cannot detect the moment when the last TCP client stops
2798 using an address. For the same reason, it will not help to split
2799 load between several servers.
2801 It is a pretty commonly held belief that it is useful to split load between
2802 several servers with NAT. This is a mistake. All you get from this
2803 is the requirement that the router keep the state of all the TCP connections
2804 going via it. Well, if the router is so powerful, run apache on it. 8)
2807 The second feature: it does not touch packet payload,
2808 does not try to ``improve'' broken protocols by looking
2809 through its data and mangling it. It mangles IP addresses,
2810 only IP addresses and nothing but IP addresses.
2811 This also, is not missing any functionality.
2813 To resume: if you need to compress address space or keep
2814 active FTP clients happy, your choice is not route NAT but masquerading,
2815 port forwarding, NAPT etc.
2817 By the way, you may also want to look at
2818 http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
2822 \paragraph{How it works.}
2823 Some part of the address space is reserved for dummy addresses
2824 which will look for all the world like some host addresses
2825 inside your network. No other hosts may use these addresses,
2826 however other routers may also be configured to translate them.
2828 A great advantage of route NAT is that it may be used not
2829 only in stub networks but in environments with arbitrarily complicated
2830 structure. It does not firewall, it {\em forwards.}
2832 These addresses are selected by the \verb|ip route| command
2833 (sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\
2835 ip route add nat 192.203.80.144 via 193.233.7.83
2837 states that the single address 192.203.80.144 is a dummy NAT address.
2838 For all the world it looks like a host address inside our network.
2839 For neighbouring hosts and routers it looks like the local address
2840 of the translating router. The router answers ARP for it, advertises
2841 this address as routed via it, {\em et al\/}. When the router
2842 receives a packet destined for 192.203.80.144, it replaces
2843 this address with 193.233.7.83 which is the address of some real
2844 host and forwards the packet. If you need to remap
2845 blocks of addresses, you may use a command like:
2847 ip route add nat 192.203.80.192/26 via 193.233.7.64
2849 This command will map a block of 63 addresses 192.203.80.192-255 to
2852 When an internal host (193.233.7.83 in the example above)
2853 sends something to the outer world and these packets are forwarded
2854 by our router, it should translate the source address 193.233.7.83
2855 into 192.203.80.144. This task is solved by setting a special
2856 policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}):
2858 ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144
2860 This rule says that the source address 193.233.7.83
2861 should be translated into 192.203.80.144 before forwarding.
2862 It is important that the address after the \verb|nat| keyword
2863 is some NAT address, declared by {\tt ip route add nat}.
2864 If it is just a random address the router will not map to it.
2866 The exception is when the address is a local address of this
2867 router (or 0.0.0.0) and masquerading is configured in the linux-2.2
2868 kernel. In this case the router will masquerade the packets as this address.
2869 If 0.0.0.0 is selected, the result is equivalent to one
2870 obtained with firewalling rules. Otherwise, you have the way
2871 to order Linux to masquerade to this fixed address.
2872 NAT mechanism used in linux-2.4 is more flexible than
2873 masquerading, so that this feature has lost meaning and disabled.
2876 If the network has non-trivial internal structure, it is
2877 useful and even necessary to add rules disabling translation
2878 when a packet does not leave this network. Let us return to the
2879 example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}).
2881 300: from 193.233.7.83 to 193.233.7.0/24 lookup main
2882 310: from 193.233.7.83 to 192.203.80.0/24 lookup main
2883 320: from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2885 This block of rules causes normal forwarding when
2886 packets from 193.233.7.83 do not leave networks 193.233.7/24
2887 and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not
2888 contain a route to the destination (which means that the routing
2889 domain owning addresses from 192.203.80/24 is dead), no translation
2890 will occur. Otherwise, the packets are translated.
2892 \paragraph{How to only translate selected ports:}
2893 If you only want to translate selected ports (f.e.\ http)
2894 and leave the rest intact, you may use \verb|ipchains|
2895 to \verb|fwmark| a class of packets.
2896 Suppose you did and all the packets from 193.233.7.83
2897 destined for port 80 are marked with marker 0x1234 in input fwchain.
2898 In this case you may replace rule \#320 with:
2900 320: from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144
2902 and translation will only be enabled for outgoing http requests.
2904 \section{Example: minimal host setup}
2905 \label{EXAMPLE-SETUP}
2907 The following script gives an example of a fault safe
2908 setup of IP (and IPv6, if it is compiled into the kernel)
2909 in the common case of a node attached to a single broadcast
2910 network. A more advanced script, which may be used both on multihomed
2911 hosts and on routers, is described in the following
2914 The utilities used in the script may be found in the
2915 directory ftp://ftp.inr.ac.ru/ip-routing/:
2917 \item \verb|ip| --- package \verb|iproute2|.
2918 \item \verb|arping| --- package \verb|iputils|.
2919 \item \verb|rdisc| --- package \verb|iputils|.
2922 It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from
2923 recommending a good DHCP client to use. All that I can
2924 say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that
2925 can be found in the \verb|dhcp.bootp.rarp| subdirectory of
2926 the same ftp site {\em does\/} work,
2927 at least on Ethernet and Token Ring.
2934 \# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\
2935 \# {\bf Parameters:}\\
2936 \# \$1 --- Static IP address, optionally followed by prefix length.\\
2937 \# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\
2938 \# F.e. \verb|ifone 193.233.7.90|
2945 \# Parse IP address, splitting prefix length.
2947 if [ "$1" != "" ]; then
2949 if [ "$1" != "$ipaddr" ]; then
2954 pfx="${ipaddr}/${pfxlen}"
2958 \# {\bf Step 0} --- enable loopback.\\
2960 \# This step is necessary on any networked box before attempt\\
2961 \# to configure any other device.\\
2964 ip link set up dev lo
2965 ip addr add 127.0.0.1/8 dev lo brd + scope host
2968 \# IPv6 autoconfigure themself on loopback.\\
2970 \# If user gave loopback as device, we add the address as alias and exit.
2973 if [ "$dev" = "lo" ]; then
2974 if [ "$ipaddr" != "" -a "$ipaddr" != "127.0.0.1" ]; then
2975 ip address add $ipaddr dev $dev
2982 \noindent\# {\bf Step 1} --- enable device \verb|$dev|
2985 if ! ip link set up dev $dev ; then
2986 echo "Cannot enable interface $dev. Aborting." 1>&2
2991 \# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\
2992 \# and its configuration finishes here. However,\\
2993 \# IP still needs some static preconfigured address.
2996 if [ "$ipaddr" = "" ]; then
2997 echo "No address for $dev is configured, trying DHCP..." 1>&2
3004 \# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3005 \# Send two probes and wait for result for 3 seconds.\\
3006 \# If the interface opens slower f.e.\ due to long media detection,\\
3007 \# you want to increase the timeout.\\
3010 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3011 echo "Address $ipaddr is busy, trying DHCP..." 1>&2
3017 \# OK, the address is unique, we may add it on the interface.\\
3019 \# {\bf Step 3} --- Configure the address on the interface.
3023 if ! ip address add $pfx brd + dev $dev; then
3024 echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2
3030 \noindent\# {\bf Step 4} --- Announce our presence on the link.
3032 arping -A -c 1 -I $dev $ipaddr
3035 arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3039 \# {\bf Step 5} (optional) --- Add some control routes.\\
3041 \# 1. Prohibit link local multicast addresses.\\
3042 \# 2. Prohibit link local (alias, limited) broadcast.\\
3043 \# 3. Add default multicast route.
3046 ip route add unreachable 224.0.0.0/24
3047 ip route add unreachable 255.255.255.255
3048 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3049 ip route add 224.0.0.0/4 dev $dev scope global
3054 \# {\bf Step 6} --- Add fallback default route with huge metric.\\
3055 \# If a proxy ARP server is present on the interface, we will be\\
3056 \# able to talk to all the Internet without further configuration.\\
3057 \# It is not so cheap though and we still hope that this route\\
3058 \# will be overridden by more correct one by rdisc.\\
3059 \# Do not make this step if the device is not ARPable,\\
3060 \# because dead nexthop detection does not work on them.
3063 if [ "$noarp" = "0" ]; then
3064 ip ro add default dev $dev metric 30000 scope global
3069 \# {\bf Step 7} --- Restart router discovery and exit.
3072 killall -HUP rdisc || rdisc -fs
3077 \section{Example: {\protect\tt ifcfg} --- interface address management}
3078 \label{EXAMPLE-IFCFG}
3080 This is a simplistic script replacing one option of \verb|ifconfig|,
3081 namely, IP address management. It not only adds
3082 addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP},
3083 sends unsolicited ARP to update the caches of other hosts sharing
3084 the interface, adds some control routes and restarts Router Discovery
3085 when it is necessary.
3087 I strongly recommend using it {\em instead\/} of \verb|ifconfig| both
3088 on hosts and on routers.
3094 \# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\
3095 \# {\bf Parameters:}\\
3096 \# ---Device name. It may have alias suffix, separated by colon.\\
3097 \# ---Command: add, delete or stop.\\
3098 \# ---IP address, optionally followed by prefix length.\\
3099 \# ---Optional peer address for pointopoint interfaces.\\
3100 \# F.e. \verb|ifcfg eth0 193.233.7.90/24|
3102 \noindent\# This function determines, whether it is router or host.\\
3103 \# It returns 0, if the host is apparently not router.
3106 CheckForwarding () {
3108 sbase=/proc/sys/net/ipv4/conf
3110 if [ -d $sbase ]; then
3111 for dir in $sbase/*/forwarding; do
3112 fwd=$[$fwd + `cat $dir`]
3121 \# This function restarts Router Discovery.\\
3125 killall -HUP rdisc || rdisc -fs
3129 \# Calculate ABC "natural" mask length\\
3130 \# Arg: \$1 = dotquad address
3136 if [ $class -eq 0 -o $class -ge 224 ]; then return 0
3137 elif [ $class -ge 192 ]; then return 24
3138 elif [ $class -ge 128 ]; then return 16
3147 \# Strip alias suffix separated by colon.
3153 if [ "$dev" = "" -o "$1" = "help" ]; then
3154 echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
3155 echo " add - add new address" 1>&2
3156 echo " del - delete address" 1>&2
3157 echo " stop - completely disable IP" 1>&2
3166 \# Parse command. If it is ``stop'', flush and exit.
3173 if [ "$ldev" != "$dev" ]; then
3174 echo "Cannot stop alias $ldev" 1>&2
3177 ip -4 addr flush dev $dev $label || exit 1
3178 if [ $fwd -eq 0 ]; then RestartRDISC; fi
3181 deleting=1; shift ;;
3186 \# Parse prefix, split prefix length, separated by slash.
3191 if [ "$1" != "" ]; then
3193 if [ "$1" != "$ipaddr" ]; then
3196 if [ "$ipaddr" = "" ]; then
3197 echo "$1 is bad IP address." 1>&2
3204 \# If peer address is present, prefix length is 32.\\
3205 \# Otherwise, if prefix length was not given, guess it.
3209 if [ "$peer" != "" ]; then
3210 if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then
3211 echo "Peer address with non-trivial netmask." 1>&2
3214 pfx="$ipaddr peer $peer"
3216 if [ "$pfxlen" = "" ]; then
3220 pfx="$ipaddr/$pfxlen"
3222 if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then
3227 \# If deletion was requested, delete the address and restart RDISC
3230 if [ $deleting -ne 0 ]; then
3231 ip addr del $pfx dev $dev $label || exit 1
3232 if [ $fwd -eq 0 ]; then RestartRDISC; fi
3237 \# Start interface initialization.\\
3239 \# {\bf Step 0} --- enable device \verb|$dev|
3242 if ! ip link set up dev $dev ; then
3243 echo "Error: cannot enable interface $dev." 1>&2
3246 if [ "$ipaddr" = "" ]; then exit 0; fi
3249 \# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3250 \# Send two probes and wait for result for 3 seconds.\\
3251 \# If the interface opens slower f.e.\ due to long media detection,\\
3252 \# you want to increase the timeout.\\
3255 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3256 echo "Error: some host already uses address $ipaddr on $dev." 1>&2
3261 \# OK, the address is unique. We may add it to the interface.\\
3263 \# {\bf Step 2} --- Configure the address on the interface.
3266 if ! ip address add $pfx brd + dev $dev $label; then
3267 echo "Error: failed to add $pfx on $dev." 1>&2
3271 \noindent\# {\bf Step 3} --- Announce our presence on the link
3273 arping -q -A -c 1 -I $dev $ipaddr
3276 arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3279 \# {\bf Step 4} (optional) --- Add some control routes.\\
3281 \# 1. Prohibit link local multicast addresses.\\
3282 \# 2. Prohibit link local (alias, limited) broadcast.\\
3283 \# 3. Add default multicast route.
3286 ip route add unreachable 224.0.0.0/24 >& /dev/null
3287 ip route add unreachable 255.255.255.255 >& /dev/null
3288 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3289 ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null
3293 \# {\bf Step 5} --- Add fallback default route with huge metric.\\
3294 \# If a proxy ARP server is present on the interface, we will be\\
3295 \# able to talk to all the Internet without further configuration.\\
3296 \# Do not make this step on router or if the device is not ARPable.\\
3297 \# because dead nexthop detection does not work on them.
3300 if [ $fwd -eq 0 ]; then
3301 if [ $noarp -eq 0 ]; then
3302 ip ro append default dev $dev metric 30000 scope global
3303 elif [ "$peer" != "" ]; then
3304 if ping -q -c 2 -w 4 $peer ; then
3305 ip ro append default via $peer dev $dev metric 30001
3314 \# End of {\bf MAIN()}