X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=vswitchd%2FINTERNALS;h=994353dc210743e88300289068d5ee0d4694ae5b;hb=HEAD;hp=300175627a18a3db9e6ceeeef3579a623a385fdc;hpb=5422a9e189c627202a0eaa568a52d17e088d82fb;p=sliver-openvswitch.git diff --git a/vswitchd/INTERNALS b/vswitchd/INTERNALS index 300175627..994353dc2 100644 --- a/vswitchd/INTERNALS +++ b/vswitchd/INTERNALS @@ -27,12 +27,13 @@ one slave, the port becomes an ordinary port, not a bonded port, and none of the special features of bonded ports described in this section apply. -There are many forms of bonding, but ovs-vswitchd currently implements -only a single kind, called "source load balancing" or SLB bonding. -SLB bonding divides traffic among the slaves based on the Ethernet -source address. This is useful only if the traffic over the bond has -multiple Ethernet source addresses, for example if network traffic -from multiple VMs are multiplexed over the bond. +There are many forms of bonding of which ovs-vswitchd implements only +a few. The most complex bond ovs-vswitchd implements is called +"source load balancing" or SLB bonding. SLB bonding divides traffic +among the slaves based on the Ethernet source address. This is useful +only if the traffic over the bond has multiple Ethernet source +addresses, for example if network traffic from multiple VMs are +multiplexed over the bond. Enabling and Disabling Slaves ----------------------------- @@ -57,20 +58,19 @@ enabled all output packets are blackholed anyway. When a slave becomes disabled, the vswitch immediately chooses a new output port for traffic that was destined for that slave (see -bond_enable_slave()). It also sends a "gratuitous learning packet" on -the bond port (on the newly chosen slave) for each MAC address that -the vswitch has learned on a port other than the bond (see -bond_send_learning_packets()), to teach the physical switch that the -new slave should be used in place of the one that is now disabled. -(This behavior probably makes sense only for a vswitch that has only -one port (the bond) connected to a physical switch; vswitchd should -probably provide a way to disable or configure it in other scenarios.) +bond_enable_slave()). It also sends a "gratuitous learning packet", +specifically a RARP, on the bond port (on the newly chosen slave) for +each MAC address that the vswitch has learned on a port other than the +bond (see bond_send_learning_packets()), to teach the physical switch +that the new slave should be used in place of the one that is now +disabled. (This behavior probably makes sense only for a vswitch that +has only one port (the bond) connected to a physical switch; vswitchd +should probably provide a way to disable or configure it in other +scenarios.) Bond Packet Input ----------------- -Bond packet input processing takes place in process_flow(). - Bonding accepts unicast packets on any bond slave. This can occasionally cause packet duplication for the first few packets sent to a given MAC, if the physical switch attached to the bond is @@ -106,12 +106,13 @@ Bond Packet Output ------------------ When a packet is sent out a bond port, the bond slave actually used is -selected based on the packet's source MAC (see choose_output_iface()). -In particular, the source MAC is hashed into one of 256 values, and -that value is looked up in a hash table (the "bond hash") kept in the -"bond_hash" member of struct port. The hash table entry identifies a -bond slave. If no bond slave has yet been chosen for that hash table -entry, vswitchd chooses one arbitrarily. +selected based on the packet's source MAC and VLAN tag (see +choose_output_iface()). In particular, the source MAC and VLAN tag +are hashed into one of 256 values, and that value is looked up in a +hash table (the "bond hash") kept in the "bond_hash" member of struct +port. The hash table entry identifies a bond slave. If no bond slave +has yet been chosen for that hash table entry, vswitchd chooses one +arbitrarily. Every 10 seconds, vswitchd rebalances the bond slaves (see bond_rebalance_port()). To rebalance, vswitchd examines the @@ -128,3 +129,111 @@ least 0.1. Currently, "significantly more loaded" means that H must carry at least 1 Mbps more traffic, and that traffic must be at least 3% greater than L's. + +Bond Balance Modes +------------------ + +Each bond balancing mode has different considerations, described +below. + +LACP Bonding +------------ + +LACP bonding requires the remote switch to implement LACP, but it is +otherwise very simple in that, after LACP negotiation is complete, +there is no need for special handling of received packets. + +Several of the physical switches that support LACP block all traffic +for ports that are configured to use LACP, until LACP is negotiated with +the host. When configuring a LACP bond on a OVS host (eg: XenServer), +this means that there will be an interruption of the network connectivity +between the time the ports on the physical switch and the bond on the OVS +host are configured. The interruption may be relatively long, if different +people are responsible for managing the switches and the OVS host. + +Such network connectivity failure can be avoided if LACP can be configured +on the OVS host before configuring the physical switch, and having +the OVS host fall back to a bond mode (active-backup) till the physical +switch LACP configuration is complete. An option "lacp-fallback-ab" exists to +provide such behavior on openvswitch. + +Active Backup Bonding +--------------------- + +Active Backup bonds send all traffic out one "active" slave until that +slave becomes unavailable. Since they are significantly less +complicated than SLB bonds, they are preferred when LACP is not an +option. Additionally, they are the only bond mode which supports +attaching each slave to a different upstream switch. + +SLB Bonding +----------- + +SLB bonding allows a limited form of load balancing without the remote +switch's knowledge or cooperation. The basics of SLB are simple. SLB +assigns each source MAC+VLAN pair to a link and transmits all packets +from that MAC+VLAN through that link. Learning in the remote switch +causes it to send packets to that MAC+VLAN through the same link. + +SLB bonding has the following complications: + + 0. When the remote switch has not learned the MAC for the + destination of a unicast packet and hence floods the packet to + all of the links on the SLB bond, Open vSwitch will forward + duplicate packets, one per link, to each other switch port. + + Open vSwitch does not solve this problem. + + 1. When the remote switch receives a multicast or broadcast packet + from a port not on the SLB bond, it will forward it to all of + the links in the SLB bond. This would cause packet duplication + if not handled specially. + + Open vSwitch avoids packet duplication by accepting multicast + and broadcast packets on only the active slave, and dropping + multicast and broadcast packets on all other slaves. + + 2. When Open vSwitch forwards a multicast or broadcast packet to a + link in the SLB bond other than the active slave, the remote + switch will forward it to all of the other links in the SLB + bond, including the active slave. Without special handling, + this would mean that Open vSwitch would forward a second copy of + the packet to each switch port (other than the bond), including + the port that originated the packet. + + Open vSwitch deals with this case by dropping packets received + on any SLB bonded link that have a source MAC+VLAN that has been + learned on any other port. (This means that SLB as implemented + in Open vSwitch relies critically on MAC learning. Notably, SLB + is incompatible with the "flood_vlans" feature.) + + 3. Suppose that a MAC+VLAN moves to an SLB bond from another port + (e.g. when a VM is migrated from this hypervisor to a different + one). Without additional special handling, Open vSwitch will + not notice until the MAC learning entry expires, up to 60 + seconds later as a consequence of rule #2. + + Open vSwitch avoids a 60-second delay by listening for + gratuitous ARPs, which VMs commonly emit upon migration. As an + exception to rule #2, a gratuitous ARP received on an SLB bond + is not dropped and updates the MAC learning table in the usual + way. (If a move does not trigger a gratuitous ARP, or if the + gratuitous ARP is lost in the network, then a 60-second delay + still occurs.) + + 4. Suppose that a MAC+VLAN moves from an SLB bond to another port + (e.g. when a VM is migrated from a different hypervisor to this + one), that the MAC+VLAN emits a gratuitous ARP, and that Open + vSwitch forwards that gratuitous ARP to a link in the SLB bond + other than the active slave. The remote switch will forward the + gratuitous ARP to all of the other links in the SLB bond, + including the active slave. Without additional special + handling, this would mean that Open vSwitch would learn that the + MAC+VLAN was located on the SLB bond, as a consequence of rule + #3. + + Open vSwitch avoids this problem by "locking" the MAC learning + table entry for a MAC+VLAN from which a gratuitous ARP was + received from a non-SLB bond port. For 5 seconds, a locked MAC + learning table entry will not be updated based on a gratuitous + ARP received on a SLB bond.