ofproto-dpif: Avoid segfault deleting facets that execute LEARN actions.
authorBen Pfaff <blp@nicira.com>
Wed, 21 Mar 2012 16:01:02 +0000 (09:01 -0700)
committerBen Pfaff <blp@nicira.com>
Wed, 21 Mar 2012 16:25:07 +0000 (09:25 -0700)
commit19fc14ad08b4bc79515a52cc1a89b6f6e92225a3
treed8d9aa88e6a403cef6bfc9efb3edf5f1e708c1a8
parentf4f9f0868c9150c7b032884f212db1e3bcb6b1fa
ofproto-dpif: Avoid segfault deleting facets that execute LEARN actions.

"ovs-ofctl del-flows <bridge>" can result in the following call path:

  delete_flows_loose() in ofproto.c
    -> collect_rules_loose() -- uses 'ofproto_node' inside 'struct rule'
    -> rule_destruct() in ofproto-dpif.c
      -> facet_revalidate()
        -> facet_remove()
          -> facet_flush_stats()
            -> facet_account()
              -> xlate_actions()
                -> xlate_learn_action()
                  -> ofproto_flow_mod() back in ofproto.c
                    -> modify_flow_strict()
                      -> collect_rules_strict() -- also uses 'ofproto_node'

which goes "boom" when we fall back up the call chain because the nested
use of ofproto_node steps on the outer use of ofproto_node.

This commit fixes the problem by refusing to translate "learn" actions
within facet_flush_stats(), breaking the doubled use.

Another possible approach would be to switch to another way to keep track
of rules in the flow_mod implementations, so that there'd be no fighting
over 'ofproto_node'.  But then "ovs-ofctl del-flows" might still leave some
flows around (ones created by "learn" actions as flows are accounted as
facets get deleted), which would be surprising behavior.  And it seems in
general a bad idea to allow recursive flow_mods; the consequences have not
been carefully thought through.

Before this commit, one can reproduce the problem by running an "hping3"
between a couple of VMs plus the following commands on OVS in the middle.
Sometimes you have to run them a few times:

    ovs-ofctl del-flows br0
    ovs-ofctl add-flow br0 "table=0 actions=learn(table=1, \
              idle_timeout=600, NXM_OF_VLAN_TCI[0..11], \
              NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \
              output:NXM_OF_IN_PORT[], fin_idle_timeout=10), resubmit(,1)"
    ovs-ofctl add-flow br0 "table=1 priority=0 actions=flood"

This commit has a side effect that leftover unaccounted packets no longer
update the timeouts in MAC learning actions in some cases, when the facets
that cause updates are deleted.  At most one second of updates should  be
lost.

Bug #10184.
Reported-by: Michael Mao <mmao@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ofproto/ofproto-dpif.c