+ ========================
+ ovs-vswitchd Internals
+ ========================
+
+This document describes some of the internals of the ovs-vswitchd
+process. It is not complete. It tends to be updated on demand, so if
+you have questions about the vswitchd implementation, ask them and
+perhaps we'll add some appropriate documentation here.
+
+Most of the ovs-vswitchd implementation is in vswitchd/bridge.c, so
+code references below should be assumed to refer to that file except
+as otherwise specified.
+
+Bonding
+=======
+
+Bonding allows two or more interfaces (the "slaves") to share network
+traffic. From a high-level point of view, bonded interfaces act like
+a single port, but they have the bandwidth of multiple network
+devices, e.g. two 1 GB physical interfaces act like a single 2 GB
+interface. Bonds also increase robustness: the bonded port does not
+go down as long as at least one of its slaves is up.
+
+In vswitchd, a bond always has at least two slaves (and may have
+more). If a configuration error, etc. would cause a bond to have only
+one slave, the port becomes an ordinary port, not a bonded port, and
+none of the special features of bonded ports described in this section
+apply.
+
+There are many forms of bonding, but ovs-vswitchd currently implements
+only a single kind, called "source load balancing" or SLB bonding.
+SLB bonding divides traffic among the slaves based on the Ethernet
+source address. This is useful only if the traffic over the bond has
+multiple Ethernet source addresses, for example if network traffic
+from multiple VMs are multiplexed over the bond.
+
+Enabling and Disabling Slaves
+-----------------------------
+
+When a bond is created, a slave is initially enabled or disabled based
+on whether carrier is detected on the NIC (see iface_create()). After
+that, a slave is disabled if its carrier goes down for a period of
+time longer than the downdelay, and it is enabled if carrier comes up
+for longer than the updelay (see bond_link_status_update()). There is
+one exception where the updelay is skipped: if no slaves at all are
+currently enabled, then the first slave on which carrier comes up is
+enabled immediately.
+
+The updelay should be set to a time longer than the STP forwarding
+delay of the physical switch to which the bond port is connected (if
+STP is enabled on that switch). Otherwise, the slave will be enabled,
+and load may be shifted to it, before the physical switch starts
+forwarding packets on that port, which can cause some data to be
+"blackholed" for a time. The exception for a single enabled slave
+does not cause any problem in this regard because when no slaves are
+enabled all output packets are blackholed anyway.
+
+When a slave becomes disabled, the vswitch immediately chooses a new
+output port for traffic that was destined for that slave (see
+bond_enable_slave()). It also sends a "gratuitous learning packet" on
+the bond port (on the newly chosen slave) for each MAC address that
+the vswitch has learned on a port other than the bond (see
+bond_send_learning_packets()), to teach the physical switch that the
+new slave should be used in place of the one that is now disabled.
+(This behavior probably makes sense only for a vswitch that has only
+one port (the bond) connected to a physical switch; vswitchd should
+probably provide a way to disable or configure it in other scenarios.)
+
+Bond Packet Input
+-----------------
+
+Bond packet input processing takes place in process_flow().
+
+Bonding accepts unicast packets on any bond slave. This can
+occasionally cause packet duplication for the first few packets sent
+to a given MAC, if the physical switch attached to the bond is
+flooding packets to that MAC because it has not yet learned the
+correct slave for that MAC.
+
+Bonding only accepts multicast (and broadcast) packets on a single
+bond slave (the "active slave") at any given time. Multicast packets
+received on other slaves are dropped. Otherwise, every multicast
+packet would be duplicated, once for every bond slave, because the
+physical switch attached to the bond will flood those packets.
+
+Bonding also drops some multicast packets received on the active
+slave: those for the vswitch has learned that the packet's MAC is on a
+port other than the bond port itself. This is because it is likely
+that the vswitch itself sent the multicast packet out the bond port,
+on a slave other than the active slave, and is now receiving the
+packet back on the active slave. However, the vswitch makes an
+exception to this rule for broadcast ARP replies, which indicate that
+the MAC has moved to another switch, probably due to VM migration.
+(ARP replies are normally unicast, so this exception does not match
+normal ARP replies. It will match the learning packets sent on bond
+fail-over.)
+
+The active slave is simply the first slave to be enabled after the
+bond is created (see bond_choose_active_iface()). If the active slave
+is disabled, then a new active slave is chosen among the slaves that
+remain active. Currently due to the way that configuration works,
+this tends to be the remaining slave whose interface name is first
+alphabetically, but this is by no means guaranteed.
+
+Bond Packet Output
+------------------
+
+When a packet is sent out a bond port, the bond slave actually used is
+selected based on the packet's source MAC (see choose_output_iface()).
+In particular, the source MAC is hashed into one of 256 values, and
+that value is looked up in a hash table (the "bond hash") kept in the
+"bond_hash" member of struct port. The hash table entry identifies a
+bond slave. If no bond slave has yet been chosen for that hash table
+entry, vswitchd chooses one arbitrarily.
+
+Every 10 seconds, vswitchd rebalances the bond slaves (see
+bond_rebalance_port()). To rebalance, vswitchd examines the
+statistics for the number of bytes transmitted by each slave over
+approximately the past minute, with data sent more recently weighted
+more heavily than data sent less recently. It considers each of the
+slaves in order from most-loaded to least-loaded. If highly loaded
+slave H is significantly more heavily loaded than the least-loaded
+slave L, and slave H carries at least two hashes, then vswitchd shifts
+one of H's hashes to L. However, vswitchd will not shift a hash from
+H to L if that will cause L's load to exceed H's load.
+
+Currently, "significantly more loaded" means that H must carry at
+least 1 Mbps more traffic, and that traffic must be at least 3%
+greater than L's.