Daniele Di Proietto daniele.di.proietto@gmail.com
Daniele Venturino daniele.venturino@m3s.it
Danny Kukawka danny.kukawka@bisect.de
+Dave Tucker dave@dtucker.co.uk
David Erickson derickso@stanford.edu
David S. Miller davem@davemloft.net
David Yang davidy@vmware.com
James Page james.page@ubuntu.com
Jarno Rajahalme jrajahalme@nicira.com
Jason Kölker jason@koelker.net
+Jasper Capel jasper@capel.tv
Jean Tourrilhes jt@hpl.hp.com
Jeremy Stribling strib@nicira.com
Jesse Gross jesse@nicira.com
Kyle Mestery kmestery@cisco.com
Leo Alterman lalterman@nicira.com
Linda Sun lsun@vmware.com
+Lior Neudorfer lior@guardicore.com
Lorand Jakab lojakab@cisco.com
Luca Giraudo lgiraudo@nicira.com
Luigi Rizzo rizzo@iet.unipi.it
Natasha Gude natasha@nicira.com
Neil McKee neil.mckee@inmon.com
Neil Zhu zhuj@centecnetworks.com
+Padmanabhan Krishnan kprad1@yahoo.com
Paraneetharan Chandrasekaran paraneetharanc@gmail.com
Paul Fazzone pfazzone@nicira.com
Paul Ingram paul@nicira.com
Bryan Osoro bosoro@nicira.com
Cedric Hobbs cedric@nicira.com
Chris Hydon chydon@aristanetworks.com
+Christian Stigen Larsen cslarsen@gmail.com
Christopher Paggen cpaggen@cisco.com
Dave Walker DaveWalker@ubuntu.com
David Palma palma@onesource.pt
Mikael Doverhag mdoverhag@nicira.com
Nagi Reddy Jonnala njonnala@Brocade.com
Niklas Andersson nandersson@nicira.com
-Padmanabhan Krishnan kprad1@yahoo.com
Pankaj Thakkar thakkar@nicira.com
Pasi Kärkkäinen pasik@iki.fi
Paulo Cravero pcravero@as2594.net
1.11.x 2.6.18 to 3.8
2.0.x 2.6.32 to 3.10
2.1.x 2.6.32 to 3.11
- 2.2.x 2.6.32 to 3.13
+ 2.2.x 2.6.32 to 3.14
Open vSwitch userspace should also work with the Linux kernel module
built into Linux 3.3 and later.
It should build against almost any kernel, certainly against 2.6.32
and later.
-Q: What Linux kernel versions does IPFIX flow monitoring work with?
+Q: I get an error like this when I configure Open vSwitch:
-A: IPFIX flow monitoring requires the Linux kernel module from Open
- vSwitch version 1.10.90 or later.
+ configure: error: Linux kernel in <dir> is version <x>, but
+ version newer than <y> is not supported (please refer to the
+ FAQ for advice)
-Q: Should userspace or kernel be upgraded first to minimize downtime?
+ What should I do?
- In general, the Open vSwitch userspace should be used with the
- kernel version included in the same release or with the version
- from upstream Linux. However, when upgrading between two releases
- of Open vSwitch it is best to migrate userspace first to reduce
- the possibility of incompatibilities.
+A: If there is a newer version of Open vSwitch, consider building that
+ one, because it may support the kernel that you are building
+ against. (To find out, consult the table in the previous answer.)
+
+ Otherwise, use the Linux kernel module supplied with the kernel
+ that you are using. All versions of Open vSwitch userspace are
+ compatible with all versions of the Open vSwitch kernel module, so
+ this will also work. See also the following question.
Q: What features are not available in the Open vSwitch kernel datapath
that ships as part of the upstream Linux kernel?
actions. On Linux kernels before 2.6.39, maximum-sized VLAN packets
may not be transmitted.
+Q: What Linux kernel versions does IPFIX flow monitoring work with?
+
+A: IPFIX flow monitoring requires the Linux kernel module from Open
+ vSwitch version 1.10.90 or later.
+
+Q: Should userspace or kernel be upgraded first to minimize downtime?
+
+ In general, the Open vSwitch userspace should be used with the
+ kernel version included in the same release or with the version
+ from upstream Linux. However, when upgrading between two releases
+ of Open vSwitch it is best to migrate userspace first to reduce
+ the possibility of incompatibilities.
+
Q: What happened to the bridge compatibility feature?
A: Bridge compatibility was a feature of Open vSwitch 1.9 and earlier.
2.0 yes [*] [*] [*] ---
2.1 yes [*] [*] [*] ---
2.2 yes [*] [*] [*] [%]
+ 2.3 yes yes yes yes [%]
[*] Supported, with one or more missing features.
[%] Support is unsafe: ovs-vswitchd will abort when certain
unimplemented features are tested.
- Because of missing features, OpenFlow 1.1, 1.2, and 1.3 must be
- enabled manually. The following command enables OpenFlow 1.0, 1.1,
- 1.2, and 1.3 on bridge br0:
+ Open vSwitch 2.3 enables OpenFlow 1.0, 1.1, 1.2, and 1.3 by default
+ in ovs-vswitchd. In Open vSwitch 1.10 through 2.2, OpenFlow 1.1,
+ 1.2, and 1.3 must be enabled manually in ovs-vswitchd. Either way,
+ the user may override the default:
+
+ - To enable OpenFlow 1.0, 1.1, 1.2, and 1.3 on bridge br0:
+
+ ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13
+
+ - To enable only OpenFlow 1.0 on bridge br0:
- ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13
+ ovs-vsctl set bridge br0 protocols=OpenFlow10
- Use the -O option to enable support for later versions of OpenFlow
- in ovs-ofctl. For example:
+ All current versions of ovs-ofctl enable only OpenFlow 1.0 by
+ default. Use the -O option to enable support for later versions of
+ OpenFlow in ovs-ofctl. For example:
ovs-ofctl -O OpenFlow13 dump-flows br0
invoked with a special --enable-of14 command line option.
OPENFLOW-1.1+ in the Open vSwitch source tree tracks support for
- OpenFlow 1.1 and later features. When support for a given OpenFlow
- version is solidly implemented, Open vSwitch will enable that
- version by default.
+ OpenFlow 1.1 and later features. When support for OpenFlow 1.4 is
+ solidly implemented, Open vSwitch will enable that version by
+ default.
Q: Does Open vSwitch support MPLS?
% ./configure --with-linux=/lib/modules/`uname -r`/build
+ If --with-linux requests building for an unsupported version of
+ Linux, then "configure" will fail with an error message. Please
+ refer to the FAQ for advice in that case.
+
If you wish to build the kernel module for an architecture other
than the architecture of the machine used for the build, you may
specify the kernel architecture string using the KARCH variable
You must be superuser to install Debian packages.
-1. Start by installing an Open vSwitch kernel module. There are multiple ways
- to do this. In order of increasing manual effort, these are:
-
- * Use a Linux kernel 3.3 or later, which has an integrated Open
- vSwitch kernel module.
-
- The upstream Linux kernel module lacks a few features that
- are in the third-party module. For details, please see the
- FAQ, "What features are not available in the Open vSwitch
- kernel datapath that ships as part of the upstream Linux
- kernel?".
-
- * Install the "openvswitch-datapath-dkms" Debian package that
- you built earlier. This should automatically build and
- install the Open vSwitch kernel module for your running
- kernel.
-
- This option requires that you have a compiler and toolchain
- installed on the machine where you run Open vSwitch, which
- may be unacceptable in some production server environments.
-
- * Install the "openvswitch-datapath-source" Debian package, use
- "module-assistant" to build a Debian package of the Open
- vSwitch kernel module for your kernel, and then install that
- Debian package.
-
- You can install the kernel module Debian packages that you
- build this way on the same machine where you built it or on
- another machine or machines, which means that you don't
- necessarily have to have any build infrastructure on the
- machines where you use the kernel module.
-
- /usr/share/doc/openvswitch-datapath-source/README.Debian has
- details on the build process.
-
- * Build and install the kernel module by hand.
+1. Start by installing an Open vSwitch kernel module. See
+ debian/openvswitch-switch.README.Debian for the available options.
2. Install the "openvswitch-switch" and "openvswitch-common" packages.
These packages include the core userspace components of the switch.
-
Open vSwitch .deb packages not mentioned above are rarely useful.
Please refer to their individual package descriptions to find out
whether any of them are useful to you.
PORTING \
README-lisp \
REPORTING-BUGS \
+ TODO \
WHY-OVS \
boot.sh \
build-aux/cccl \
@if test -e '$(srcdir)'/.git && (git --version) >/dev/null 2>&1 && \
grep -n -f '$(srcdir)'/build-aux/thread-safety-blacklist \
`git ls-files '$(srcdir)' | grep '\.[ch]$$' \
- | $(EGREP) -v '^datapath|^lib/sflow|^third-party'` \
+ | $(EGREP) -v '^datapath|^lib/sflow|^third-party'` /dev/null \
| $(EGREP) -v ':[ ]*/?\*'; \
then \
echo "See above for list of calls to functions that are"; \
-Post-v2.1.0
+Post-v2.2.0
+---------------------
+ - OpenFlow 1.1, 1.2, and 1.3 are now enabled by default in
+ ovs-vswitchd.
+ - Linux kernel datapath now has an exact match cache optimizing the
+ flow matching process.
+ - Datapath flows now have partially wildcarded tranport port field
+ matches. This reduces userspace upcalls, but increases the
+ number of different masks in the datapath. The kernel datapath
+ exact match cache removes the overhead of matching the incoming
+ packets with the larger number of masks, but when paired with an
+ older kernel module, some workloads may perform worse with the
+ new userspace.
+
+v2.2.0 - xx xxx xxx
---------------------
- Internal ports are no longer brought up by default, because it
should be an administrator task to bring up devices as they are
- Upon the receipt of a SIGHUP signal, ovs-vswitchd no longer reopens its
log file (it will terminate instead). Please use 'ovs-appctl vlog/reopen'
instead.
- - Support for Linux kernels up to 3.13. From Kernel 3.12 onwards OVS uses
+ - Support for Linux kernels up to 3.14. From Kernel 3.12 onwards OVS uses
tunnel API for GRE and VXLAN.
- Added DPDK support.
+ - Added support for custom vlog patterns in Python
-v2.1.0 - xx xxx xxxx
+v2.1.0 - 19 Mar 2014
---------------------
- Address prefix tracking support for flow tables. New columns
"prefixes" in OVS-DB table "Flow_Table" controls which packet
--- /dev/null
+ Open vSwitch Project Ideas
+ ==========================
+
+This file lists a number of project ideas for Open vSwitch. The ideas
+here overlap somewhat with those in the OPENFLOW-1.1+ file.
+
+
+Programming Project Ideas
+=========================
+
+Each of these projects would ideally result in a patch or a short
+series of them posted to ovs-dev.
+
+Please read CONTRIBUTING and CodingStyle in the top of the source tree
+before you begin work. The OPENFLOW-1.1+ file also has an
+introduction to how OpenFlow is implemented in Open vSwitch. It is
+also a good idea to look around the source tree for related code, and
+back through the Git history for commits on related subjects, to allow
+you to follow existing patterns and conventions.
+
+Meters
+------
+
+Open vSwitch has OpenFlow protocol support for meters, but it does not
+have an implementation in the kernel or userspace datapaths. An
+implementation was proposed some time ago (I recommend looking for the
+discussion in the ovs-dev mailing list archives), but for a few
+different reasons it was not accepted. Some of those reasons apply
+only to a kernel implementation of meters. At the time, a userspace
+implementation wasn't as interesting, because the userspace switch
+did not perform at a production speed, but with the advent of
+multithreaded forwarding and, now, DPDK support, userspace-only meters
+would be a great way to get started.
+
+Improve SSL/TLS Security
+------------------------
+
+Open vSwitch allows some weak ciphers to be used for its secure
+connections. Security audits often suggest that the project remove
+those ciphers, but there's not a clean way to modify the acceptable
+ciphers. At the very least, the cipher list should be audited, but it
+would be nice to make it configurable.
+
+Open vSwitch does not insist on perfect forward security via ephemeral
+Diffie-Hellman key exchange when it establishes an SSL/TLS connection.
+Given the wiretapping revelations over the last year, it seems wise to
+turn this on. (This would probably amount to finding the right
+OpenSSL function to call or just reducing the acceptable ciphers
+further.)
+
+These changes might have backward-compatibility implications; one
+would have to test the behavior of the reduced cipher list OVS against
+older versions.
+
+OpenFlow Group Bucket Stats
+---------------------------
+
+When OpenFlow group support was added, we forgot to support statistics
+for individual buckets. xlate_group_bucket() in
+ofproto/ofproto-dpif-xlate.c appears to be where we need to increment
+the counters, in the case where ctx->xin->resubmit_stats is
+nonnull. See the ovs-dev thread starting here:
+http://openvswitch.org/pipermail/dev/2014-January/036107.html
+
+Joe Stringer adds: If this involves resubmit_stats, then it would also
+need a new xc_type. The xlate_group_bucket() code would add an entry
+to ctx->xin->xcache if it is nonnull. This would also need to follow
+the code in xlate_push_stats() and xlate_cache_clear() for the new
+xc_type.
+
+
+Bash Command Completion
+-----------------------
+
+ovs-vsctl and other programs would be easier to use if bash command
+completion (with ``tab'', etc.) were supported. Alex Wang
+<alexw@nicira.com> is leading a team for this project.
+
+Auxiliary Connections
+---------------------
+
+Auxiliary connections are a feature of OpenFlow 1.3 and later that
+allow OpenFlow messages to be carried over datagram channels such as
+UDP or DTLS. One place to start would be to implement a datagram
+abstraction library for OVS analogous to the ``stream'' library
+that already abstracts TCP, SSL, and other stream protocols.
+
+Controller connection logging to pcap file
+------------------------------------------
+
+http://patchwork.openvswitch.org/patch/2249/ is an RFC patch that
+allows the switch to record the traffic on OpenFlow controller
+connections to a pcap file for later analysis. The patch lacks a good
+way to enable and disable the feature. The task here would be to add
+that and repost the patch.
+
+Basic OpenFlow 1.4 support
+--------------------------
+
+Some basic support for OpenFlow 1.4 is missing and needs to be
+implemented. These can be found by looking through lib/ofp-util.c for
+mentions of OFP14_VERSION followed by a call to OVS_NOT_REACHED (which
+aborts the program).
+
+OpenFlow 1.4: Flow monitoring
+-----------------------------
+
+OpenFlow 1.4 introduces OFPMP_FLOW_MONITOR for notifying a controller
+of changes to selected flow tables. This feature is based on
+NXST_FLOW_MONITOR that is already part of Open vSwitch, so to
+implement this feature would be to extend that code to handle the
+OpenFlow 1.4 wire protocol.
+
+OpenFlow 1.3 also includes this feature as a ONF-defined extension, so
+ideally OVS would support that too.
+
+OpenFlow 1.4 Role Status Message
+--------------------------------
+
+OpenFlow 1.4 section 7.4.4 ``Controller Role Status Message''
+defines a new message sent by a switch to notify the controller that
+its role (whether it is a master or a slave) has changed. OVS should
+implement this.
+
+OpenFlow 1.3 also includes this feature as a ONF-defined extension, so
+ideally OVS would support that too.
+
+OpenFlow 1.4 Vacancy Events
+---------------------------
+
+OpenFlow 1.4 section 7.4.5 ``Table Status Message'' defines a new
+message sent by a switch to notify the controller that a flow table is
+close to filling up (or that it is no longer close to filling up).
+OVS should implement this.
+
+OpenFlow 1.3 also includes this feature as a ONF-defined extension, so
+ideally OVS would support that too.
+
+OpenFlow 1.4 Group and Meter Change Notification
+------------------------------------------------
+
+OpenFlow 1.4 adds a feature whereby a controller can ask the switch to
+send it copies of messages that change groups and meters. (This is
+only useful in the presence of multiple controllers.) OVS should
+implement this.
+
+OpenFlow 1.3 also includes this feature as a ONF-defined extension, so
+ideally OVS would support that too.
+
+
+Testing Project Ideas
+=====================
+
+Each of these projects would ideally result in confirmation that
+features work or bug reports explaining how they do not. Please sent
+bug reports to dev at openvswitch.org, with as many details as you have.
+
+ONF Plugfest Results Analysis
+-----------------------------
+
+Ben Pfaff has a collection of files reporting Open vSwitch conformance
+to OpenFlow 1.3 provided by one of the vendors at the ONF plugfest
+last year. Some of the reported failures have been fixed, some of the
+other failures probably result from differing interpretations of
+OpenFlow 1.3, and others are probably genuine bugs in Open vSwitch.
+Open vSwitch has also improved in the meantime. Ben can provide the
+results, privately, to some person or team who wishes to check them
+out and try to pick out the genuine bugs.
+
+OpenFlow Fuzzer
+---------------
+
+Build a ``fuzzer'' for the OpenFlow protocol (or use an existing
+one, if there is one) and run it against the Open vSwitch
+implementation. One could also build a fuzzer for the OSVDB protocol.
+
+Ryu Certification Tests Analysis
+--------------------------------
+
+The Ryu controller comes with a suite of ``certification tests''
+that check the correctness of a switch's implementation of various
+OpenFlow 1.3 features. The INSTALL file in the OVS source tree has a
+section that explains how to easily run these tests against an OVS
+source tree. Run the tests and figure out whether any tests fail but
+should pass. (Some tests fail and should fail because OVS does not
+implement the particular feature; for example, OVS does not implement
+PBB encapsulation, so related tests fail.)
+
+OFTest Results Analysis
+-----------------------
+
+OFTest is a test suite for OpenFlow 1.0 compliance. The INSTALL file
+in the OVS source tree has a section that explains how to easily run
+these tests against an OVS source tree. Run the tests and figure out
+whether any tests fail but should pass, and ideally why. OFTest is
+not particularly well vetted--in the past, at least, some tests have
+failed against OVS due to bugs in OFTest, not in OVS--so some care is
+warranted.
+
+
+Documentation Project Ideas
+===========================
+
+Each of these projects would ideally result in creating some new
+documentation for users. Some documentation might be suitable to
+accompany Open vSwitch as part of its source tree most likely either
+in plain text or ``nroff'' (manpage) format.
+
+OpenFlow Basics Tutorial
+------------------------
+
+Open vSwitch has a tutorial that covers its advanced features, but it
+does not have a basic tutorial. There are several tutorials on the
+Internet already, so a new tutorial would have to distinguish itself
+in some way. One way would be to use the Open vSwitch ``sandbox''
+environment already used in the advanced tutorial. The sandbox does
+not require any real network or even supervisor privilege on the
+machine where it runs, and thus it is easy to use with hardly any
+up-front setup, so it is a gentle way to get started.
+
+FlowVisor via patch ports
+-------------------------
+
+FlowVisor is a proxy that sits between OpenFlow controllers and a
+switch. It divides up switch resources, allowing each controller to
+control a ``slice'' of the network. For example, it can break up a
+network based on VLAN, allowing different controllers to handle
+packets with different VLANs.
+
+It seems that Open vSwitch has features that allow it to implement at
+least simple forms of FlowVisor control without any need for
+FlowVisor. Consider an Open vSwitch instance with three bridges.
+Bridge br0 has physical ports eth0 and eth1. Bridge v9 has no
+physical ports, but it has two ``patch ports'' that connect it to
+br0. Bridge v11 has the same setup. Flows in br0 match packets
+received on vlan 9, strip the vlan header, and direct them to the
+appropriate patch port leading to v9. Additional flows in br0 match
+packets received from v9, attach a VLAN 9 tag to them, and direct them
+out eth0 or eth1 as appropriate. Other flows in br0 treat packets on
+VLAN 11 similarly. Controllers attached to bridge v9 or v11 may thus
+work as if they had full control of a network.
+
+It seems to me that this is a good example of the power of OpenFlow
+and Open vSwitch. The point of this project is to explain how to do
+this, with detailed examples, in case someone finds it handy and to
+open eyes toward the generality of Open vSwitch usefulness.
+
+``Cookbooks''
+-------------
+
+The Open vSwitch website has a few ``cookbook'' entries that
+describe how to use Open vSwitch in a few scenarios. There are only a
+few of these and all of them are dated. It would be a good idea to
+come up with ideas for some more and write them. These could be added
+to the Open vSwitch website or the source tree or somewhere else.
+
+Demos
+-----
+
+Record a demo of Open vSwitch functionality in use (or something else
+relevant) and post it to youtube or another video site so that we can
+link to it from openvswitch.org.
+
+
+How to contribute
+=================
+
+If you plan to contribute code for a feature, please let everyone know
+on ovs-dev before you start work. This will help avoid duplicating
+work.
+
+Please consider the following:
+
+ * Testing. Please test your code.
+
+ * Unit tests. Please consider writing some. The tests directory
+ has many examples that you can use as a starting point.
+
+ * ovs-ofctl. If you add a feature that is useful for some
+ ovs-ofctl command then you should add support for it there.
+
+ * Documentation. If you add a user-visible feature, then you
+ should document it in the appropriate manpage and mention it in
+ NEWS as well.
+
+ * Coding style (see the CodingStyle file at the top of the source
+ tree).
+
+ * The patch submission guidelines (see CONTRIBUTING). I
+ recommend using "git send-email", which automatically follows a
+ lot of those guidelines.
+
+
+Bug Reporting
+=============
+
+Please report problems to bugs@openvswitch.org.
+
+
+Local Variables:
+mode: text
+End:
AC_MSG_RESULT([$kversion])
if test "$version" -ge 3; then
- if test "$version" = 3 && test "$patchlevel" -le 13; then
+ if test "$version" = 3 && test "$patchlevel" -le 14; then
: # Linux 3.x
else
- AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 3.13.x is not supported])
+ AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 3.14.x is not supported (please refer to the FAQ for advice)])
fi
else
if test "$version" -le 1 || test "$patchlevel" -le 5 || test "$sublevel" -le 31; then
OVS_GREP_IFELSE([$KSRC/include/linux/err.h], [ERR_CAST])
OVS_GREP_IFELSE([$KSRC/include/linux/etherdevice.h], [eth_hw_addr_random])
+ OVS_GREP_IFELSE([$KSRC/include/linux/etherdevice.h], [ether_addr_copy])
OVS_GREP_IFELSE([$KSRC/include/linux/if_vlan.h], [vlan_set_encap_proto])
OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [__skb_gso_segment])
OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [can_checksum_protocol])
OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [netdev_features_t])
+ OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [pcpu_sw_netstats])
+
+ OVS_GREP_IFELSE([$KSRC/include/linux/random.h], [prandom_u32])
OVS_GREP_IFELSE([$KSRC/include/linux/rcupdate.h], [rcu_read_lock_held], [],
[OVS_GREP_IFELSE([$KSRC/include/linux/rtnetlink.h],
OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [__skb_fill_page_desc])
OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_reset_mac_len])
OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_unclone])
+ OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_orphan_frags])
+ OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_get_hash])
+ OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_clear_hash])
+ OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [l4_rxhash])
OVS_GREP_IFELSE([$KSRC/include/linux/types.h], [bool],
[OVS_DEFINE([HAVE_BOOL_TYPE])])
# limitations under the License.
AC_PREREQ(2.64)
-AC_INIT(openvswitch, 2.1.90, bugs@openvswitch.org)
+AC_INIT(openvswitch, 2.2.90, bugs@openvswitch.org)
AC_CONFIG_SRCDIR([datapath/datapath.c])
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_AUX_DIR([build-aux])
/*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2014 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
}
csum_replace4(&nh->check, *addr, new_addr);
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
*addr = new_addr;
}
if (recalculate_csum)
update_ipv6_checksum(skb, l4_proto, addr, new_addr);
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
memcpy(addr, new_addr, sizeof(__be32[4]));
}
{
inet_proto_csum_replace2(check, skb, *port, new_port, 0);
*port = new_port;
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
}
static void set_udp_port(struct sk_buff *skb, __be16 *port, __be16 new_port)
uh->check = CSUM_MANGLED_0;
} else {
*port = new_port;
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
}
}
/* Carry any checksum errors through. */
sh->checksum = old_csum ^ old_correct_csum ^ new_csum;
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
}
return 0;
a = nla_next(a, &rem)) {
switch (nla_type(a)) {
case OVS_SAMPLE_ATTR_PROBABILITY:
- if (net_random() >= nla_get_u32(a))
+ if (prandom_u32() >= nla_get_u32(a))
return 0;
break;
nla_len(acts_list), true);
}
+static void execute_hash(struct sk_buff *skb, const struct nlattr *attr)
+{
+ struct sw_flow_key *key = OVS_CB(skb)->pkt_key;
+ struct ovs_action_hash *hash_act = nla_data(attr);
+ u32 hash = 0;
+
+ /* OVS_HASH_ALG_L4 is the only possible hash algorithm. */
+ hash = skb_get_hash(skb);
+ hash = jhash_1word(hash, hash_act->hash_basis);
+ if (!hash)
+ hash = 0x1;
+
+ key->ovs_flow_hash = hash;
+}
+
static int execute_set_action(struct sk_buff *skb,
const struct nlattr *nested_attr)
{
return err;
}
+static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
+ const struct nlattr *a)
+{
+ struct sw_flow_key recirc_key;
+ const struct vport *p = OVS_CB(skb)->input_vport;
+ uint32_t hash = OVS_CB(skb)->pkt_key->ovs_flow_hash;
+ int err;
+
+ err = ovs_flow_extract(skb, p->port_no, &recirc_key);
+ if (err)
+ return err;
+
+ recirc_key.ovs_flow_hash = hash;
+ recirc_key.recirc_id = nla_get_u32(a);
+
+ ovs_dp_process_packet_with_key(skb, &recirc_key, true);
+
+ return 0;
+}
+
/* Execute a list of actions against 'skb'. */
static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
const struct nlattr *attr, int len, bool keep_skb)
output_userspace(dp, skb, a);
break;
+ case OVS_ACTION_ATTR_HASH:
+ execute_hash(skb, a);
+ break;
+
case OVS_ACTION_ATTR_PUSH_VLAN:
err = push_vlan(skb, nla_data(a));
if (unlikely(err)) /* skb already freed. */
err = pop_vlan(skb);
break;
+ case OVS_ACTION_ATTR_RECIRC: {
+ struct sk_buff *recirc_skb;
+ const bool last_action = (a->nla_len == rem);
+
+ if (!last_action || keep_skb)
+ recirc_skb = skb_clone(skb, GFP_ATOMIC);
+ else
+ recirc_skb = skb;
+
+ err = execute_recirc(dp, recirc_skb, a);
+
+ if (last_action || err)
+ return err;
+
+ break;
+ }
+
case OVS_ACTION_ATTR_SET:
err = execute_set_action(skb, nla_data(a));
break;
case OVS_ACTION_ATTR_SAMPLE:
err = sample(dp, skb, a);
+ if (unlikely(err)) /* skb already freed. */
+ return err;
break;
}
}
/* We limit the number of times that we pass into execute_actions()
- * to avoid blowing out the stack in the event that we have a loop. */
-#define MAX_LOOPS 4
+ * to avoid blowing out the stack in the event that we have a loop.
+ *
+ * Each loop adds some (estimated) cost to the kernel stack.
+ * The loop terminates when the max cost is exceeded.
+ * */
+#define RECIRC_STACK_COST 1
+#define DEFAULT_STACK_COST 4
+/* Allow up to 4 regular services, and up to 3 recirculations */
+#define MAX_STACK_COST (DEFAULT_STACK_COST * 4 + RECIRC_STACK_COST * 3)
struct loop_counter {
- u8 count; /* Count. */
+ u8 stack_cost; /* loop stack cost. */
bool looping; /* Loop detected? */
};
static int loop_suppress(struct datapath *dp, struct sw_flow_actions *actions)
{
if (net_ratelimit())
- pr_warn("%s: flow looped %d times, dropping\n",
- ovs_dp_name(dp), MAX_LOOPS);
+ pr_warn("%s: flow loop detected, dropping\n",
+ ovs_dp_name(dp));
actions->actions_len = 0;
return -ELOOP;
}
/* Execute a list of actions against 'skb'. */
-int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
+int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, bool recirc)
{
struct sw_flow_actions *acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts);
+ const u8 stack_cost = recirc ? RECIRC_STACK_COST : DEFAULT_STACK_COST;
struct loop_counter *loop;
int error;
/* Check whether we've looped too much. */
loop = &__get_cpu_var(loop_counters);
- if (unlikely(++loop->count > MAX_LOOPS))
+ loop->stack_cost += stack_cost;
+ if (unlikely(loop->stack_cost > MAX_STACK_COST))
loop->looping = true;
if (unlikely(loop->looping)) {
error = loop_suppress(dp, acts);
error = loop_suppress(dp, acts);
out_loop:
- /* Decrement loop counter. */
- if (!--loop->count)
+ /* Decrement loop stack cost. */
+ loop->stack_cost -= stack_cost;
+ if (!loop->stack_cost)
loop->looping = false;
return error;
#include <net/route.h>
#include <net/xfrm.h>
-static inline void skb_clear_rxhash(struct sk_buff *skb)
-{
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,35)
- skb->rxhash = 0;
-#endif
-}
-
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,13,0)
#define GROUP_ID(grp) 0
#else
}
#endif
-static struct vport *new_vport(const struct vport_parms *);
static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
const struct dp_upcall_info *);
static int queue_userspace_packet(struct datapath *dp, struct sk_buff *,
{
struct datapath *dp = container_of(rcu, struct datapath, rcu);
+ ovs_flow_tbl_destroy(&dp->table);
free_percpu(dp->stats_percpu);
release_net(ovs_dp_get_net(dp));
kfree(dp->ports);
ovs_vport_del(p);
}
-/* Must be called with rcu_read_lock. */
-void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
+void ovs_dp_process_packet_with_key(struct sk_buff *skb,
+ struct sw_flow_key *pkt_key,
+ bool recirc)
{
+ const struct vport *p = OVS_CB(skb)->input_vport;
struct datapath *dp = p->dp;
struct sw_flow *flow;
struct dp_stats_percpu *stats;
- struct sw_flow_key key;
u64 *stats_counter;
u32 n_mask_hit;
- int error;
stats = this_cpu_ptr(dp->stats_percpu);
- /* Extract flow from 'skb' into 'key'. */
- error = ovs_flow_extract(skb, p->port_no, &key);
- if (unlikely(error)) {
- kfree_skb(skb);
- return;
- }
-
/* Look up flow. */
- flow = ovs_flow_tbl_lookup_stats(&dp->table, &key, &n_mask_hit);
+ flow = ovs_flow_tbl_lookup_stats(&dp->table, pkt_key, skb_get_hash(skb),
+ &n_mask_hit);
if (unlikely(!flow)) {
struct dp_upcall_info upcall;
upcall.cmd = OVS_PACKET_CMD_MISS;
- upcall.key = &key;
+ upcall.key = pkt_key;
upcall.userdata = NULL;
- upcall.portid = p->upcall_portid;
+ upcall.portid = ovs_vport_find_upcall_portid(p, skb);
ovs_dp_upcall(dp, skb, &upcall);
consume_skb(skb);
stats_counter = &stats->n_missed;
goto out;
}
+ OVS_CB(skb)->pkt_key = pkt_key;
OVS_CB(skb)->flow = flow;
- OVS_CB(skb)->pkt_key = &key;
- ovs_flow_stats_update(OVS_CB(skb)->flow, key.tp.flags, skb);
- ovs_execute_actions(dp, skb);
+ ovs_flow_stats_update(OVS_CB(skb)->flow, pkt_key->tp.flags, skb);
+ ovs_execute_actions(dp, skb, recirc);
stats_counter = &stats->n_hit;
out:
u64_stats_update_end(&stats->sync);
}
+/* Must be called with rcu_read_lock. */
+void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
+{
+ int error;
+ struct sw_flow_key key;
+
+ OVS_CB(skb)->input_vport = p;
+
+ /* Extract flow from 'skb' into 'key'. */
+ error = ovs_flow_extract(skb, p->port_no, &key);
+ if (unlikely(error)) {
+ kfree_skb(skb);
+ return;
+ }
+
+ ovs_dp_process_packet_with_key(skb, &key, false);
+}
+
int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
const struct dp_upcall_info *upcall_info)
{
}
nla->nla_len = nla_attr_size(skb->len);
- skb_zerocopy(user_skb, skb, skb->len, hlen);
+ err = skb_zerocopy(user_skb, skb, skb->len, hlen);
+ if (err)
+ goto out;
/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
out:
+ if (err)
+ skb_tx_error(skb);
kfree_skb(nskb);
return err;
}
struct sw_flow *flow;
struct datapath *dp;
struct ethhdr *eth;
+ struct vport *input_vport;
int len;
int err;
if (!dp)
goto err_unlock;
+ input_vport = ovs_vport_rcu(dp, flow->key.phy.in_port);
+ if (!input_vport)
+ input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
+
+ if (!input_vport)
+ goto err_unlock;
+
+ OVS_CB(packet)->input_vport = input_vport;
+
local_bh_disable();
- err = ovs_execute_actions(dp, packet);
+ err = ovs_execute_actions(dp, packet, false);
local_bh_enable();
rcu_read_unlock();
parms.options = NULL;
parms.dp = dp;
parms.port_no = OVSP_LOCAL;
- parms.upcall_portid = nla_get_u32(a[OVS_DP_ATTR_UPCALL_PID]);
+ parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
ovs_dp_change(dp, a);
err_destroy_percpu:
free_percpu(dp->stats_percpu);
err_destroy_table:
- ovs_flow_tbl_destroy(&dp->table, false);
+ ovs_flow_tbl_destroy(&dp->table);
err_free_dp:
release_net(ovs_dp_get_net(dp));
kfree(dp);
ovs_dp_detach_port(ovs_vport_ovsl(dp, OVSP_LOCAL));
/* RCU destroy the flow table */
- ovs_flow_tbl_destroy(&dp->table, true);
-
call_rcu(&dp->rcu, destroy_dp_rcu);
}
if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport->port_no) ||
nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport->ops->type) ||
- nla_put_string(skb, OVS_VPORT_ATTR_NAME, vport->ops->get_name(vport)) ||
- nla_put_u32(skb, OVS_VPORT_ATTR_UPCALL_PID, vport->upcall_portid))
+ nla_put_string(skb, OVS_VPORT_ATTR_NAME, vport->ops->get_name(vport)))
goto nla_put_failure;
ovs_vport_get_stats(vport, &vport_stats);
&vport_stats))
goto nla_put_failure;
+ if (ovs_vport_get_upcall_portids(vport, skb))
+ goto nla_put_failure;
+
err = ovs_vport_get_options(vport, skb);
if (err == -EMSGSIZE)
goto error;
parms.options = a[OVS_VPORT_ATTR_OPTIONS];
parms.dp = dp;
parms.port_no = port_no;
- parms.upcall_portid = nla_get_u32(a[OVS_VPORT_ATTR_UPCALL_PID]);
+ parms.upcall_portids = a[OVS_VPORT_ATTR_UPCALL_PID];
vport = new_vport(&parms);
err = PTR_ERR(vport);
if (a[OVS_VPORT_ATTR_STATS])
ovs_vport_set_stats(vport, nla_data(a[OVS_VPORT_ATTR_STATS]));
- if (a[OVS_VPORT_ATTR_UPCALL_PID])
- vport->upcall_portid = nla_get_u32(a[OVS_VPORT_ATTR_UPCALL_PID]);
+
+ if (a[OVS_VPORT_ATTR_UPCALL_PID]) {
+ err = ovs_vport_set_upcall_portids(vport,
+ a[OVS_VPORT_ATTR_UPCALL_PID]);
+ if (err)
+ goto exit_unlock_free;
+ }
err = ovs_vport_cmd_fill_info(vport, reply, info->snd_portid,
info->snd_seq, 0, OVS_VPORT_CMD_NEW);
/*
- * Copyright (c) 2007-2012 Nicira, Inc.
+ * Copyright (c) 2007-2014 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* @flow: The flow associated with this packet. May be %NULL if no flow.
* @pkt_key: The flow information extracted from the packet. Must be nonnull.
* @tun_key: Key for the tunnel that encapsulated this packet. NULL if the
- * packet is not being tunneled.
+ * @input_vport: The original vport packet came in on. This value is cached
+ * when a packet is received by OVS.
*/
struct ovs_skb_cb {
struct sw_flow *flow;
struct sw_flow_key *pkt_key;
struct ovs_key_ipv4_tunnel *tun_key;
+ struct vport *input_vport;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
extern struct genl_multicast_group ovs_dp_vport_multicast_group;
void ovs_dp_process_received_packet(struct vport *, struct sk_buff *);
+void ovs_dp_process_packet_with_key(struct sk_buff *,
+ struct sw_flow_key *pkt_key, bool recirc);
void ovs_dp_detach_port(struct vport *);
int ovs_dp_upcall(struct datapath *, struct sk_buff *,
const struct dp_upcall_info *);
struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 portid, u32 seq,
u8 cmd);
-int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb);
+int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, bool recirc);
void ovs_dp_notify_wq(struct work_struct *work);
#define OVS_NLERR(fmt, ...) \
u32 skb_mark; /* SKB mark. */
u16 in_port; /* Input switch port (or DP_MAX_PORTS). */
} __packed phy; /* Safe when right after 'tun_key'. */
+ u32 ovs_flow_hash; /* Datapath computed hash value. */
+ u32 recirc_id; /* Recirculation ID. */
struct {
u8 src[ETH_ALEN]; /* Ethernet source address. */
u8 dst[ETH_ALEN]; /* Ethernet destination address. */
struct sw_flow_mask {
int ref_count;
struct rcu_head rcu;
- struct list_head list;
struct sw_flow_key_range range;
struct sw_flow_key key;
};
[OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6),
[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
+ [OVS_KEY_ATTR_DP_HASH] = sizeof(u32),
+ [OVS_KEY_ATTR_RECIRC_ID] = sizeof(u32),
[OVS_KEY_ATTR_TUNNEL] = -1,
};
static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs,
const struct nlattr **a, bool is_mask)
{
+ if (*attrs & (1ULL << OVS_KEY_ATTR_DP_HASH)) {
+ u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
+
+ SW_FLOW_KEY_PUT(match, ovs_flow_hash, hash_val, is_mask);
+ *attrs &= ~(1ULL << OVS_KEY_ATTR_DP_HASH);
+ }
+
+ if (*attrs & (1ULL << OVS_KEY_ATTR_RECIRC_ID)) {
+ u32 recirc_id = nla_get_u32(a[OVS_KEY_ATTR_RECIRC_ID]);
+
+ SW_FLOW_KEY_PUT(match, recirc_id, recirc_id, is_mask);
+ *attrs &= ~(1ULL << OVS_KEY_ATTR_RECIRC_ID);
+ }
+
if (*attrs & (1ULL << OVS_KEY_ATTR_PRIORITY)) {
SW_FLOW_KEY_PUT(match, phy.priority,
nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask);
flow->key.phy.in_port = DP_MAX_PORTS;
flow->key.phy.priority = 0;
flow->key.phy.skb_mark = 0;
+ flow->key.ovs_flow_hash = 0;
+ flow->key.recirc_id = 0;
memset(tun_key, 0, sizeof(flow->key.tun_key));
err = parse_flow_nlattrs(attr, a, &attrs);
struct nlattr *nla, *encap;
bool is_mask = (swkey != output);
+ if (nla_put_u32(skb, OVS_KEY_ATTR_DP_HASH, output->ovs_flow_hash))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
+ goto nla_put_failure;
+
if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority))
goto nla_put_failure;
/* Expected argument lengths, (u32)-1 for variable length. */
static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
+ [OVS_ACTION_ATTR_RECIRC] = sizeof(u32),
[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
[OVS_ACTION_ATTR_POP_VLAN] = 0,
[OVS_ACTION_ATTR_SET] = (u32)-1,
- [OVS_ACTION_ATTR_SAMPLE] = (u32)-1
+ [OVS_ACTION_ATTR_SAMPLE] = (u32)-1,
+ [OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash)
};
const struct ovs_action_push_vlan *vlan;
int type = nla_type(a);
return -EINVAL;
break;
+ case OVS_ACTION_ATTR_HASH: {
+ const struct ovs_action_hash *act_hash = nla_data(a);
+
+ switch (act_hash->hash_alg) {
+ case OVS_HASH_ALG_L4:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ break;
+ }
case OVS_ACTION_ATTR_POP_VLAN:
break;
return -EINVAL;
break;
+ case OVS_ACTION_ATTR_RECIRC:
+ break;
+
case OVS_ACTION_ATTR_SET:
err = validate_set(a, key, sfa, &skip_copy);
if (err)
#include "vlan.h"
#define TBL_MIN_BUCKETS 1024
+#define MASK_ARRAY_SIZE_MIN 16
#define REHASH_INTERVAL (10 * 60 * HZ)
+#define MC_HASH_SHIFT 8
+#define MC_HASH_ENTRIES (1u << MC_HASH_SHIFT)
+#define MC_HASH_SEGS ((sizeof(uint32_t) * 8) / MC_HASH_SHIFT)
+
static struct kmem_cache *flow_cache;
struct kmem_cache *flow_stats_cache __read_mostly;
return ti;
}
+static void mask_array_rcu_cb(struct rcu_head *rcu)
+{
+ struct mask_array *ma = container_of(rcu, struct mask_array, rcu);
+
+ kfree(ma);
+}
+
+static struct mask_array *tbl_mask_array_alloc(int size)
+{
+ struct mask_array *new;
+
+ new = kzalloc(sizeof(struct mask_array) +
+ sizeof(struct sw_flow_mask *) * size, GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->count = 0;
+ new->max = size;
+
+ return new;
+}
+
+static int tbl_mask_array_realloc(struct flow_table *tbl, int size)
+{
+ struct mask_array *old;
+ struct mask_array *new;
+
+ new = tbl_mask_array_alloc(size);
+ if (!new)
+ return -ENOMEM;
+
+ old = ovsl_dereference(tbl->mask_array);
+ if (old) {
+ int i;
+
+ for (i = 0; i < old->max; i++) {
+ if (old->masks[i])
+ new->masks[new->count++] = old->masks[i];
+ }
+ }
+ rcu_assign_pointer(tbl->mask_array, new);
+
+ if (old)
+ call_rcu(&old->rcu, mask_array_rcu_cb);
+
+ return 0;
+}
+
int ovs_flow_tbl_init(struct flow_table *table)
{
struct table_instance *ti;
+ struct mask_array *ma;
- ti = table_instance_alloc(TBL_MIN_BUCKETS);
+ table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
+ MC_HASH_ENTRIES, __alignof__(struct mask_cache_entry));
+ if (!table->mask_cache)
+ return -ENOMEM;
+
+ ma = tbl_mask_array_alloc(MASK_ARRAY_SIZE_MIN);
+ if (!ma)
+ goto free_mask_cache;
+ ti = table_instance_alloc(TBL_MIN_BUCKETS);
if (!ti)
- return -ENOMEM;
+ goto free_mask_array;
rcu_assign_pointer(table->ti, ti);
- INIT_LIST_HEAD(&table->mask_list);
+ rcu_assign_pointer(table->mask_array, ma);
table->last_rehash = jiffies;
table->count = 0;
return 0;
+
+free_mask_array:
+ kfree((struct mask_array __force *)table->mask_array);
+free_mask_cache:
+ free_percpu(table->mask_cache);
+ return -ENOMEM;
}
static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu)
__table_instance_destroy(ti);
}
-void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred)
+/* No need for locking this function is called from RCU callback or
+ * error path. */
+void ovs_flow_tbl_destroy(struct flow_table *table)
{
- struct table_instance *ti = ovsl_dereference(table->ti);
+ struct table_instance *ti = (struct table_instance __force *)table->ti;
- table_instance_destroy(ti, deferred);
+ free_percpu(table->mask_cache);
+ kfree((struct mask_array __force *)table->mask_array);
+ table_instance_destroy(ti, false);
}
struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
const struct sw_flow_key *unmasked,
- struct sw_flow_mask *mask)
+ struct sw_flow_mask *mask,
+ u32 *n_mask_hit)
{
struct sw_flow *flow;
struct hlist_head *head;
ovs_flow_mask_key(&masked_key, unmasked, mask);
hash = flow_hash(&masked_key, key_start, key_end);
head = find_bucket(ti, hash);
+ (*n_mask_hit)++;
hlist_for_each_entry_rcu(flow, head, hash_node[ti->node_ver]) {
if (flow->mask == mask && flow->hash == hash &&
flow_cmp_masked_key(flow, &masked_key,
return NULL;
}
+
+static struct sw_flow *flow_lookup(struct flow_table *tbl,
+ struct table_instance *ti,
+ struct mask_array *ma,
+ const struct sw_flow_key *key,
+ u32 *n_mask_hit,
+ u32 *index)
+{
+ struct sw_flow *flow;
+ int i;
+
+ for (i = 0; i < ma->max; i++) {
+ struct sw_flow_mask *mask;
+
+ mask = rcu_dereference_ovsl(ma->masks[i]);
+ if (mask) {
+ flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
+ if (flow) { /* Found */
+ *index = i;
+ return flow;
+ }
+ }
+ }
+
+ return NULL;
+}
+
+/*
+ * mask_cache maps flow to probable mask. This cache is not tightly
+ * coupled cache, It means updates to mask list can result in inconsistent
+ * cache entry in mask cache.
+ * This is per cpu cache and is divided in MC_HASH_SEGS segments.
+ * In case of a hash collision the entry is hashed in next segment.
+ * */
struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
- const struct sw_flow_key *key,
- u32 *n_mask_hit)
+ const struct sw_flow_key *key,
+ u32 skb_hash,
+ u32 *n_mask_hit)
{
+ struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
- struct sw_flow_mask *mask;
+ struct mask_cache_entry *entries, *ce, *del;
struct sw_flow *flow;
+ u32 hash = skb_hash;
+ int seg;
*n_mask_hit = 0;
- list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
- (*n_mask_hit)++;
- flow = masked_flow_lookup(ti, key, mask);
- if (flow) /* Found */
- return flow;
+ if (unlikely(!skb_hash)) {
+ u32 __always_unused mask_index;
+
+ return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index);
}
- return NULL;
+
+ del = NULL;
+ entries = this_cpu_ptr(tbl->mask_cache);
+
+ for (seg = 0; seg < MC_HASH_SEGS; seg++) {
+ int index;
+
+ index = hash & (MC_HASH_ENTRIES - 1);
+ ce = &entries[index];
+
+ if (ce->skb_hash == skb_hash) {
+ struct sw_flow_mask *mask;
+
+ mask = rcu_dereference_ovsl(ma->masks[ce->mask_index]);
+ if (mask) {
+ flow = masked_flow_lookup(ti, key, mask,
+ n_mask_hit);
+ if (flow) /* Found */
+ return flow;
+
+ }
+ del = ce;
+ break;
+ }
+
+ if (!del || (del->skb_hash && !ce->skb_hash) ||
+ (rcu_dereference_ovsl(ma->masks[del->mask_index]) &&
+ !rcu_dereference_ovsl(ma->masks[ce->mask_index]))) {
+ del = ce;
+ }
+
+ hash >>= MC_HASH_SHIFT;
+ }
+
+ flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &del->mask_index);
+ if (flow)
+ del->skb_hash = skb_hash;
+
+ return flow;
}
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
const struct sw_flow_key *key)
{
+ struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+ struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
u32 __always_unused n_mask_hit;
+ u32 __always_unused index;
- return ovs_flow_tbl_lookup_stats(tbl, key, &n_mask_hit);
+ n_mask_hit = 0;
+ return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &index);
}
int ovs_flow_tbl_num_masks(const struct flow_table *table)
{
- struct sw_flow_mask *mask;
- int num = 0;
-
- list_for_each_entry(mask, &table->mask_list, list)
- num++;
+ struct mask_array *ma;
- return num;
+ ma = rcu_dereference_ovsl(table->mask_array);
+ return ma->count;
}
static struct table_instance *table_instance_expand(struct table_instance *ti)
mask->ref_count--;
if (!mask->ref_count) {
- list_del_rcu(&mask->list);
+ struct mask_array *ma;
+ int i;
+
+ ma = ovsl_dereference(tbl->mask_array);
+ for (i = 0; i < ma->max; i++) {
+ if (mask == ovsl_dereference(ma->masks[i])) {
+ RCU_INIT_POINTER(ma->masks[i], NULL);
+ ma->count--;
+ goto free;
+ }
+ }
+ BUG();
+free:
call_rcu(&mask->rcu, rcu_free_sw_flow_mask_cb);
}
}
static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
const struct sw_flow_mask *mask)
{
- struct list_head *ml;
+ struct mask_array *ma;
+ int i;
+
+ ma = ovsl_dereference(tbl->mask_array);
+ for (i = 0; i < ma->max; i++) {
+ struct sw_flow_mask *t;
- list_for_each(ml, &tbl->mask_list) {
- struct sw_flow_mask *m;
- m = container_of(ml, struct sw_flow_mask, list);
- if (mask_equal(mask, m))
- return m;
+ t = ovsl_dereference(ma->masks[i]);
+ if (t && mask_equal(mask, t))
+ return t;
}
return NULL;
struct sw_flow_mask *new)
{
struct sw_flow_mask *mask;
+
mask = flow_mask_find(tbl, new);
if (!mask) {
+ struct mask_array *ma;
+ int i;
+
/* Allocate a new mask if none exsits. */
mask = mask_alloc();
if (!mask)
return -ENOMEM;
+
mask->key = new->key;
mask->range = new->range;
- list_add_rcu(&mask->list, &tbl->mask_list);
+
+ /* Add mask to mask-list. */
+ ma = ovsl_dereference(tbl->mask_array);
+ if (ma->count >= ma->max) {
+ int err;
+
+ err = tbl_mask_array_realloc(tbl, ma->max +
+ MASK_ARRAY_SIZE_MIN);
+ if (err) {
+ kfree(mask);
+ return err;
+ }
+ ma = ovsl_dereference(tbl->mask_array);
+ }
+ for (i = 0; i < ma->max; i++) {
+ const struct sw_flow_mask *t;
+
+ t = ovsl_dereference(ma->masks[i]);
+ if (!t) {
+ rcu_assign_pointer(ma->masks[i], mask);
+ ma->count++;
+ break;
+ }
+ }
} else {
BUG_ON(!mask->ref_count);
mask->ref_count++;
#include "flow.h"
+struct mask_cache_entry {
+ u32 skb_hash;
+ u32 mask_index;
+};
+
+struct mask_array {
+ struct rcu_head rcu;
+ int count, max;
+ struct sw_flow_mask __rcu *masks[];
+};
+
struct table_instance {
struct flex_array *buckets;
unsigned int n_buckets;
struct flow_table {
struct table_instance __rcu *ti;
- struct list_head mask_list;
+ struct mask_cache_entry __percpu *mask_cache;
+ struct mask_array __rcu *mask_array;
unsigned long last_rehash;
unsigned int count;
};
int ovs_flow_tbl_init(struct flow_table *);
int ovs_flow_tbl_count(struct flow_table *table);
-void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred);
+void ovs_flow_tbl_destroy(struct flow_table *table);
int ovs_flow_tbl_flush(struct flow_table *flow_table);
int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
u32 *bucket, u32 *idx);
struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
- const struct sw_flow_key *,
- u32 *n_mask_hit);
+ const struct sw_flow_key *,
+ u32 skb_hash,
+ u32 *n_mask_hit);
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
const struct sw_flow_key *);
linux/compat/include/linux/if.h \
linux/compat/include/linux/if_arp.h \
linux/compat/include/linux/if_ether.h \
- linux/compat/include/linux/if_tunnel.h \
linux/compat/include/linux/if_vlan.h \
linux/compat/include/linux/in.h \
linux/compat/include/linux/ip.h \
linux/compat/include/linux/list.h \
linux/compat/include/linux/log2.h \
linux/compat/include/linux/net.h \
+ linux/compat/include/linux/random.h \
linux/compat/include/linux/netdevice.h \
linux/compat/include/linux/netdev_features.h \
linux/compat/include/linux/netlink.h \
return jhash_3words(a, b, c, hashrnd);
}
-u32 __skb_get_rxhash(struct sk_buff *skb)
+u32 __skb_get_hash(struct sk_buff *skb)
{
struct flow_keys keys;
u32 hash;
}
#endif
+#ifndef HAVE_ETHER_ADDR_COPY
static inline void ether_addr_copy(u8 *dst, const u8 *src)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
a[2] = b[2];
#endif
}
+#endif
#endif
+++ /dev/null
-#ifndef _IF_TUNNEL_WRAPPER_H_
-#define _IF_TUNNEL_WRAPPER_H_
-
-#include <linux/version.h>
-#include_next <linux/if_tunnel.h>
-
-#if LINUX_VERSION_CODE < KERNEL_VERSION(3,8,0)
-
-#include <linux/u64_stats_sync.h>
-
-struct pcpu_tstats {
- u64 rx_packets;
- u64 rx_bytes;
- u64 tx_packets;
- u64 tx_bytes;
- struct u64_stats_sync syncp;
-};
-#endif
-
-#endif /* _IF_TUNNEL_WRAPPER_H_ */
}
#endif
+#ifndef HAVE_PCPU_SW_NETSTATS
+
+#include <linux/u64_stats_sync.h>
+
+struct pcpu_sw_netstats {
+ u64 rx_packets;
+ u64 rx_bytes;
+ u64 tx_packets;
+ u64 tx_bytes;
+ struct u64_stats_sync syncp;
+};
+#endif
+
#endif
--- /dev/null
+#ifndef __LINUX_RANDOM_WRAPPER_H
+#define __LINUX_RANDOM_WRAPPER_H 1
+
+#include_next <linux/random.h>
+
+#ifndef HAVE_PRANDOM_U32
+#define prandom_u32() random32()
+#endif
+
+#endif
}
#endif
+#ifndef HAVE_SKB_ORPHAN_FRAGS
+static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
+{
+ return 0;
+}
+#endif
+
+#ifndef HAVE_SKB_GET_HASH
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,8,0)
-#define __skb_get_rxhash rpl__skb_get_rxhash
-#define skb_get_rxhash rpl_skb_get_rxhash
+#define __skb_get_hash rpl__skb_get_rxhash
+#define skb_get_hash rpl_skb_get_rxhash
-extern u32 __skb_get_rxhash(struct sk_buff *skb);
-static inline __u32 skb_get_rxhash(struct sk_buff *skb)
+extern u32 __skb_get_hash(struct sk_buff *skb);
+static inline __u32 skb_get_hash(struct sk_buff *skb)
{
#ifdef HAVE_RXHASH
if (skb->rxhash)
return skb->rxhash;
#endif
- return __skb_get_rxhash(skb);
+ return __skb_get_hash(skb);
}
-#endif
+
+#else
+#define skb_get_hash skb_get_rxhash
+#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(3,8,0) */
+#endif /* HAVE_SKB_GET_HASH */
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,8,0)
+static inline void skb_tx_error(struct sk_buff *skb)
+{
+ return;
+}
+#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(3,8,0) */
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,14,0)
unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len,
+int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len,
int hlen);
#endif
+#ifndef HAVE_SKB_CLEAR_HASH
+static inline void skb_clear_hash(struct sk_buff *skb)
+{
+#ifdef HAVE_RXHASH
+ skb->rxhash = 0;
+#endif
+#ifdef HAVE_L4_RXHASH
+ skb->l4_rxhash = 0;
+#endif
+}
+#endif
#ifndef HAVE_SKB_HAS_FRAG_LIST
#define skb_has_frag_list skb_has_frags
nf_reset(skb);
secpath_reset(skb);
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
skb_dst_drop(skb);
skb_dst_set(skb, &rt_dst(rt));
#if 0
nf_reset(skb);
secpath_reset(skb);
- skb_clear_rxhash(skb);
+ skb_clear_hash(skb);
skb_dst_drop(skb);
vlan_set_tci(skb, 0);
skb_set_queue_mapping(skb, 0);
*
* The `hlen` as calculated by skb_zerocopy_headlen() specifies the
* headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of @from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with skb geometry
*/
-void
-skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+int
+skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen)
{
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+ int ret;
struct page *page;
unsigned int offset;
BUG_ON(!head_frag(from) && !hlen);
/* dont bother with small payloads */
- if (len <= skb_tailroom(to)) {
- skb_copy_bits(from, 0, skb_put(to, len), len);
- return;
- }
+ if (len <= skb_tailroom(to))
+ return skb_copy_bits(from, 0, skb_put(to, len), len);
if (hlen) {
- skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+ ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+ if (unlikely(ret))
+ return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
to->len += len + plen;
to->data_len += len + plen;
+ if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
+ skb_tx_error(from);
+ return -ENOMEM;
+ }
+
for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
if (!len)
break;
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+ return 0;
}
#endif
unsigned int range = (port_max - port_min) + 1;
u32 hash;
- hash = skb_get_rxhash(skb);
+ hash = skb_get_hash(skb);
if (!hash)
hash = jhash(skb->data, 2 * ETH_ALEN,
(__force u32) skb->protocol);
*/
static u16 get_src_port(struct net *net, struct sk_buff *skb)
{
- u32 hash = skb_get_rxhash(skb);
+ u32 hash = skb_get_hash(skb);
unsigned int range;
int high;
int low;
vport->dp = parms->dp;
vport->port_no = parms->port_no;
- vport->upcall_portid = parms->upcall_portid;
vport->ops = ops;
INIT_HLIST_NODE(&vport->dp_hash_node);
- vport->percpu_stats = alloc_percpu(struct pcpu_tstats);
+ if (ovs_vport_set_upcall_portids(vport, parms->upcall_portids))
+ return ERR_PTR(-EINVAL);
+
+ vport->percpu_stats = alloc_percpu(struct pcpu_sw_netstats);
if (!vport->percpu_stats) {
kfree(vport);
return ERR_PTR(-ENOMEM);
}
for_each_possible_cpu(i) {
- struct pcpu_tstats *vport_stats;
+ struct pcpu_sw_netstats *vport_stats;
vport_stats = per_cpu_ptr(vport->percpu_stats, i);
u64_stats_init(&vport_stats->syncp);
}
*/
void ovs_vport_free(struct vport *vport)
{
+ kfree((struct vport_portids __force *)vport->upcall_portids);
free_percpu(vport->percpu_stats);
kfree(vport);
}
spin_unlock_bh(&vport->stats_lock);
for_each_possible_cpu(i) {
- const struct pcpu_tstats *percpu_stats;
- struct pcpu_tstats local_stats;
+ const struct pcpu_sw_netstats *percpu_stats;
+ struct pcpu_sw_netstats local_stats;
unsigned int start;
percpu_stats = per_cpu_ptr(vport->percpu_stats, i);
return 0;
}
+static void vport_portids_destroy_rcu_cb(struct rcu_head *rcu)
+{
+ struct vport_portids *ids = container_of(rcu, struct vport_portids,
+ rcu);
+
+ kfree(ids);
+}
+
+/**
+ * ovs_vport_set_upcall_portids - set upcall portids of @vport.
+ *
+ * @vport: vport to modify.
+ * @ids: new configuration, an array of port ids.
+ *
+ * Sets the vport's upcall_portids to @ids.
+ *
+ * Returns 0 if successful, -EINVAL if @ids is zero length or cannot be parsed
+ * as an array of U32.
+ *
+ * Must be called with ovs_mutex.
+ */
+int ovs_vport_set_upcall_portids(struct vport *vport, struct nlattr *ids)
+{
+ struct vport_portids *old, *vport_portids;
+
+ if (!nla_len(ids) || nla_len(ids) % sizeof(u32))
+ return -EINVAL;
+
+ old = ovsl_dereference(vport->upcall_portids);
+
+ vport_portids = kmalloc(sizeof *vport_portids + nla_len(ids),
+ GFP_KERNEL);
+ vport_portids->n_ids = nla_len(ids) / sizeof(u32);
+ vport_portids->rn_ids = reciprocal_value(vport_portids->n_ids);
+ nla_memcpy(vport_portids->ids, ids, nla_len(ids));
+
+ rcu_assign_pointer(vport->upcall_portids, vport_portids);
+
+ if (old)
+ call_rcu(&old->rcu, vport_portids_destroy_rcu_cb);
+
+ return 0;
+}
+
+/**
+ * ovs_vport_get_upcall_portids - get the upcall_portids of @vport.
+ *
+ * @vport: vport from which to retrieve the portids.
+ * @skb: sk_buff where portids should be appended.
+ *
+ * Retrieves the configuration of the given vport, appending the
+ * %OVS_VPORT_ATTR_UPCALL_PID attribute which is the array of upcall
+ * portids to @skb.
+ *
+ * Returns 0 if successful, -EMSGSIZE if @skb has insufficient room.
+ * If an error occurs, @skb is left unmodified. Must be called with
+ * ovs_mutex or rcu_read_lock.
+ */
+int ovs_vport_get_upcall_portids(const struct vport *vport,
+ struct sk_buff *skb)
+{
+ struct vport_portids *ids;
+
+ ids = rcu_dereference_ovsl(vport->upcall_portids);
+
+ if (vport->dp->user_features & OVS_DP_F_VPORT_PIDS)
+ return nla_put(skb, OVS_VPORT_ATTR_UPCALL_PID,
+ ids->n_ids * sizeof(u32), (void *) ids->ids);
+ else
+ return nla_put_u32(skb, OVS_VPORT_ATTR_UPCALL_PID, ids->ids[0]);
+}
+
+/**
+ * ovs_vport_find_upcall_portid - find the upcall portid to send upcall.
+ *
+ * @vport: vport from which the missed packet is received.
+ * @skb: skb that the missed packet was received.
+ *
+ * Uses the skb_get_hash() to select the upcall portid to send the
+ * upcall.
+ *
+ * Returns the portid of the target socket. Must be called with rcu_read_lock.
+ */
+u32 ovs_vport_find_upcall_portid(const struct vport *p, struct sk_buff *skb)
+{
+ struct vport_portids *ids;
+ u32 hash;
+
+ ids = rcu_dereference(p->upcall_portids);
+
+ if (ids->n_ids == 1 && ids->ids[0] == 0)
+ return 0;
+
+ hash = skb_get_hash(skb);
+ return ids->ids[hash - ids->n_ids * reciprocal_divide(hash, ids->rn_ids)];
+}
+
/**
* ovs_vport_receive - pass up received packet to the datapath for processing
*
void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
struct ovs_key_ipv4_tunnel *tun_key)
{
- struct pcpu_tstats *stats;
+ struct pcpu_sw_netstats *stats;
stats = this_cpu_ptr(vport->percpu_stats);
u64_stats_update_begin(&stats->syncp);
int sent = vport->ops->send(vport, skb);
if (likely(sent > 0)) {
- struct pcpu_tstats *stats;
+ struct pcpu_sw_netstats *stats;
stats = this_cpu_ptr(vport->percpu_stats);
#include <linux/list.h>
#include <linux/netlink.h>
#include <linux/openvswitch.h>
+#include <linux/reciprocal_div.h>
#include <linux/skbuff.h>
#include <linux/spinlock.h>
#include <linux/u64_stats_sync.h>
int ovs_vport_set_options(struct vport *, struct nlattr *options);
int ovs_vport_get_options(const struct vport *, struct sk_buff *);
+int ovs_vport_set_upcall_portids(struct vport *, struct nlattr *pids);
+int ovs_vport_get_upcall_portids(const struct vport *, struct sk_buff *);
+u32 ovs_vport_find_upcall_portid(const struct vport *, struct sk_buff *);
+
int ovs_vport_send(struct vport *, struct sk_buff *);
/* The following definitions are for implementers of vport devices: */
u64 tx_dropped;
u64 tx_errors;
};
+/**
+ * struct vport_portids - array of netlink portids of a vport.
+ * must be protected by rcu.
+ * @rn_ids: The reciprocal value of @n_ids.
+ * @rcu: RCU callback head for deferred destruction.
+ * @n_ids: Size of @ids array.
+ * @ids: Array storing the Netlink socket pids to be used for packets received
+ * on this port that miss the flow table.
+ */
+struct vport_portids {
+ struct reciprocal_value rn_ids;
+ struct rcu_head rcu;
+ u32 n_ids;
+ u32 ids[];
+};
/**
* struct vport - one port within a datapath
* @rcu: RCU callback head for deferred destruction.
* @dp: Datapath to which this port belongs.
- * @upcall_portid: The Netlink port to use for packets received on this port that
- * miss the flow table.
+ * @upcall_portids: RCU protected 'struct vport_portids'.
* @port_no: Index into @dp's @ports array.
* @hash_node: Element in @dev_table hash table in vport.c.
* @dp_hash_node: Element in @datapath->ports hash table in datapath.c.
struct vport {
struct rcu_head rcu;
struct datapath *dp;
- u32 upcall_portid;
+ struct vport_portids __rcu *upcall_portids;
u16 port_no;
struct hlist_node hash_node;
struct hlist_node dp_hash_node;
const struct vport_ops *ops;
- struct pcpu_tstats __percpu *percpu_stats;
+ struct pcpu_sw_netstats __percpu *percpu_stats;
spinlock_t stats_lock;
struct vport_err_stats err_stats;
/* For ovs_vport_alloc(). */
struct datapath *dp;
u16 port_no;
- u32 upcall_portid;
+ struct nlattr *upcall_portids;
};
/**
-openvswitch (2.1.90-1) unstable; urgency=low
+openvswitch (2.2.90-1) unstable; urgency=low
[ Open vSwitch team ]
* New upstream version
- Nothing yet! Try NEWS...
- -- Open vSwitch team <dev@openvswitch.org> Mon, 23 Dec 2013 18:07:18 -0700
+ -- Open vSwitch team <dev@openvswitch.org> Wed, 19 Mar 2014 16:08:38 -0700
+
+openvswitch (2.2.0-1) unstable; urgency=low
+ [ Open vSwitch team ]
+ * New upstream version
+ - Internal ports are no longer brought up by default, because it
+ should be an administrator task to bring up devices as they are
+ configured properly.
+ - ovs-vsctl now reports when ovs-vswitchd fails to create a new port or
+ bridge.
+ - The "ovsdbmonitor" graphical tool has been removed, because it was
+ poorly maintained and not widely used.
+ - New "check-ryu" Makefile target for running Ryu tests for OpenFlow
+ controllers against Open vSwitch. See INSTALL for details.
+ - Added IPFIX support for SCTP flows and templates for ICMPv4/v6 flows.
+ - Upon the receipt of a SIGHUP signal, ovs-vswitchd no longer reopens its
+ log file (it will terminate instead). Please use 'ovs-appctl vlog/reopen'
+ instead.
+ - Support for Linux kernels up to 3.14. From Kernel 3.12 onwards OVS uses
+ tunnel API for GRE and VXLAN.
+ - Added DPDK support.
+ - Added support for custom vlog patterns in Python
+
+ -- Open vSwitch team <dev@openvswitch.org> Wed, 19 Mar 2014 16:08:38 -0700
openvswitch (2.1.0-1) unstable; urgency=low
[ Open vSwitch team ]
users assumed incorrectly that ovs-controller was a necessary or
desirable part of an Open vSwitch deployment.
- -- Open vSwitch team <dev@openvswitch.org> Mon, 23 Dec 2013 18:07:18 -0700
+ -- Open vSwitch team <dev@openvswitch.org> Wed, 19 Mar 2014 16:08:38 -0700
openvswitch (2.0.0-1) unstable; urgency=low
[ Open vSwitch team ]
README.Debian for openvswitch-switch
---------------------------------
-* To use the Linux kernel-based switch implementation, you will need
- to build and install the Open vSwitch kernel module. To do so, install
- the openvswitch-datapath-source package, then follow the instructions
- given in /usr/share/doc/openvswitch-datapath-source/README.Debian
+To use the Linux kernel-based switch implementation, you will need an
+Open vSwitch kernel module. There are multiple ways to obtain one.
+In order of increasing manual effort, these are:
-* This package does not yet support the userspace datapath-based
- switch implementation.
+ * Use a Linux kernel 3.3 or later, which has an integrated Open
+ vSwitch kernel module.
+
+ The upstream Linux kernel module lacks a few features that
+ are in the third-party module. For details, please see the
+ FAQ, "What features are not available in the Open vSwitch
+ kernel datapath that ships as part of the upstream Linux
+ kernel?".
+
+ * Install the "openvswitch-datapath-dkms" Debian package that
+ you built earlier. This should automatically build and
+ install the Open vSwitch kernel module for your running
+ kernel.
+
+ This option requires that you have a compiler and toolchain
+ installed on the machine where you run Open vSwitch, which
+ may be unacceptable in some production server environments.
+
+ * Install the "openvswitch-datapath-source" Debian package, use
+ "module-assistant" to build a Debian package of the Open
+ vSwitch kernel module for your kernel, and then install that
+ Debian package.
+
+ You can install the kernel module Debian packages that you
+ build this way on the same machine where you built it or on
+ another machine or machines, which means that you don't
+ necessarily have to have any build infrastructure on the
+ machines where you use the kernel module.
+
+ /usr/share/doc/openvswitch-datapath-source/README.Debian has
+ details on the build process.
+
+ * Build and install the kernel module by hand.
- -- Ben Pfaff <blp@nicira.com>, Fri, 6 Jul 2012 15:12:38 -0700
Debian network scripts integration
----------------------------------
ifup --allow=ovs $list_of_bridges
ifdown --allow=ovs $list_of_bridges
-
--- Gurucharan Shetty <gshetty@nicira.com>, Fri, 04 May 2012 12:58:19 -0700
_debian/utilities/ovs-vsctl.8
_debian/vswitchd/ovs-vswitchd.8
_debian/vswitchd/ovs-vswitchd.conf.db.5
+utilities/ovs-ctl.8
#define OVS_DP_ATTR_MAX (__OVS_DP_ATTR_MAX - 1)
+/* All 64-bit integers within Netlink messages are 4-byte aligned only. */
struct ovs_dp_stats {
__u64 n_hit; /* Number of flow table matches. */
__u64 n_missed; /* Number of flow table misses. */
/* Allow last Netlink attribute to be unaligned */
#define OVS_DP_F_UNALIGNED (1 << 0)
+/* Allow datapath to associate multiple Netlink PIDs to each vport */
+#define OVS_DP_F_VPORT_PIDS (1 << 1)
+
/* Fixed logical ports. */
#define OVSP_LOCAL ((__u32)0)
* @OVS_PACKET_ATTR_KEY: Present for all notifications. Contains the flow key
* extracted from the packet as nested %OVS_KEY_ATTR_* attributes. This allows
* userspace to adapt its flow setup strategy by comparing its notion of the
- * flow key against the kernel's.
+ * flow key against the kernel's. When used with %OVS_PACKET_CMD_EXECUTE, only
+ * metadata key fields (e.g. priority, skb mark) are honored. All the packet
+ * header fields are parsed from the packet instead.
* @OVS_PACKET_ATTR_ACTIONS: Contains actions for the packet. Used
* for %OVS_PACKET_CMD_EXECUTE. It has nested %OVS_ACTION_ATTR_* attributes.
* @OVS_PACKET_ATTR_USERDATA: Present for an %OVS_PACKET_CMD_ACTION
* this is the name of the network device. Maximum length %IFNAMSIZ-1 bytes
* plus a null terminator.
* @OVS_VPORT_ATTR_OPTIONS: Vport-specific configuration information.
- * @OVS_VPORT_ATTR_UPCALL_PID: The Netlink socket in userspace that
- * OVS_PACKET_CMD_MISS upcalls will be directed to for packets received on
- * this port. A value of zero indicates that upcalls should not be sent.
+ * @OVS_VPORT_ATTR_UPCALL_PID: The array of Netlink socket pids in userspace
+ * among which OVS_PACKET_CMD_MISS upcalls will be distributed for packets
+ * received on this port. If this is a single-element array of value 0,
+ * upcalls should not be sent.
* @OVS_VPORT_ATTR_STATS: A &struct ovs_vport_stats giving statistics for
* packets sent or received through the vport.
*
OVS_VPORT_ATTR_TYPE, /* u32 OVS_VPORT_TYPE_* constant. */
OVS_VPORT_ATTR_NAME, /* string name, up to IFNAMSIZ bytes long */
OVS_VPORT_ATTR_OPTIONS, /* nested attributes, varies by vport type */
- OVS_VPORT_ATTR_UPCALL_PID, /* u32 Netlink PID to receive upcalls */
+ OVS_VPORT_ATTR_UPCALL_PID, /* array of u32 Netlink socket PIDs for */
+ /* receiving upcalls */
OVS_VPORT_ATTR_STATS, /* struct ovs_vport_stats */
__OVS_VPORT_ATTR_MAX
};
OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */
OVS_KEY_ATTR_SCTP, /* struct ovs_key_sctp */
OVS_KEY_ATTR_TCP_FLAGS, /* be16 TCP flags. */
- OVS_KEY_ATTR_DP_HASH, /* u32 hash value */
+ OVS_KEY_ATTR_DP_HASH, /* u32 hash value. Value 0 indicates the hash
+ is not computed by the datapath. */
OVS_KEY_ATTR_RECIRC_ID, /* u32 recirc id */
#ifdef __KERNEL__
/* Only used within kernel data path. */
/* Data path hash algorithm for computing Datapath hash.
*
- * The Algorithm type only specifies the fields in a flow
+ * The algorithm type only specifies the fields in a flow
* will be used as part of the hash. Each datapath is free
* to use its own hash algorithm. The hash value will be
* opaque to the user space daemon.
*/
-enum ovs_recirc_hash_alg {
- OVS_RECIRC_HASH_ALG_NONE,
- OVS_RECIRC_HASH_ALG_L4,
+enum ovs_hash_alg {
+ OVS_HASH_ALG_L4,
};
/*
- * struct ovs_action_recirc - %OVS_ACTION_ATTR_RECIRC action argument.
- * @recirc_id: The Recirculation label, Zero is invalid.
+ * struct ovs_action_hash - %OVS_ACTION_ATTR_HASH action argument.
* @hash_alg: Algorithm used to compute hash prior to recirculation.
- * @hash_bias: bias used for computing hash. used to compute hash prior to
- * recirculation.
+ * @hash_basis: basis used for computing hash.
*/
-struct ovs_action_recirc {
- uint32_t hash_alg; /* One of ovs_dp_hash_alg. */
- uint32_t hash_bias;
- uint32_t recirc_id; /* Recirculation label. */
+struct ovs_action_hash {
+ uint32_t hash_alg; /* One of ovs_hash_alg. */
+ uint32_t hash_basis;
};
/**
OVS_ACTION_ATTR_SAMPLE, /* Nested OVS_SAMPLE_ATTR_*. */
OVS_ACTION_ATTR_PUSH_MPLS, /* struct ovs_action_push_mpls. */
OVS_ACTION_ATTR_POP_MPLS, /* __be16 ethertype. */
- OVS_ACTION_ATTR_RECIRC, /* struct ovs_action_recirc. */
+ OVS_ACTION_ATTR_RECIRC, /* u32 recirc_id. */
+ OVS_ACTION_ATTR_HASH, /* struct ovs_action_hash. */
__OVS_ACTION_ATTR_MAX
};
};
OFP_ASSERT(sizeof(struct ofp11_group_stats_request) == 8);
+/* Used in group stats replies. */
+struct ofp11_bucket_counter {
+ ovs_be64 packet_count; /* Number of packets processed by bucket. */
+ ovs_be64 byte_count; /* Number of bytes processed by bucket. */
+};
+OFP_ASSERT(sizeof(struct ofp11_bucket_counter) == 16);
+
/* Body of reply to OFPST11_GROUP request */
struct ofp11_group_stats {
ovs_be16 length; /* Length of this entry. */
uint8_t pad2[4]; /* Align to 64 bits. */
ovs_be64 packet_count; /* Number of packets processed by group. */
ovs_be64 byte_count; /* Number of bytes processed by group. */
- /* struct ofp11_bucket_counter bucket_stats[0]; */
-
+ struct ofp11_bucket_counter bucket_stats[0];
};
OFP_ASSERT(sizeof(struct ofp11_group_stats) == 32);
-/* Used in group stats replies. */
-struct ofp11_bucket_counter {
- ovs_be64 packet_count; /* Number of packets processed by bucket. */
- ovs_be64 byte_count; /* Number of bytes processed by bucket. */
-};
-OFP_ASSERT(sizeof(struct ofp11_bucket_counter) == 16);
-
/* Body of reply to OFPST11_GROUP_DESC request. */
struct ofp11_group_desc_stats {
ovs_be16 length; /* Length of this entry. */
/* Body of reply to OFPMP13_GROUP request */
struct ofp13_group_stats {
struct ofp11_group_stats gs;
- ovs_be32 duration_sec; /* NEW: Time group has been alive in seconds. */
- ovs_be32 duration_nsec; /* NEW: Time group has been alive in nanoseconds
+ ovs_be32 duration_sec; /* Time group has been alive in seconds. */
+ ovs_be32 duration_nsec; /* Time group has been alive in nanoseconds
beyond duration_sec. */
- /* struct ofp11_bucket_counter bucket_stats[0]; */
+ struct ofp11_bucket_counter bucket_stats[0];
};
OFP_ASSERT(sizeof(struct ofp13_group_stats) == 40);
OFPBF_ORDERED = 1 << 1, /* Execute in specified order. */
};
-/* Message structure for ONF_ET_BUNDLE_CONTROL. */
+/* Message structure for OFPT_BUNDLE_CONTROL and OFPT_BUNDLE_ADD_MESSAGE. */
struct ofp14_bundle_ctrl_msg {
ovs_be32 bundle_id; /* Identify the bundle. */
- ovs_be16 type; /* OFPBCT_*. */
+ ovs_be16 type; /* OFPT_BUNDLE_CONTROL: one of OFPBCT_*.
+ * OFPT_BUNDLE_ADD_MESSAGE: not used. */
ovs_be16 flags; /* Bitmap of OFPBF_* flags. */
- /* Bundle Property list. */
- /* struct ofp14_bundle_prop_header properties[0]; */
+ /* Followed by:
+ * - For OFPT_BUNDLE_ADD_MESSAGE only, an encapsulated OpenFlow message,
+ * beginning with an ofp_header whose xid is identical to this message's
+ * outer xid.
+ * - For OFPT_BUNDLE_ADD_MESSAGE only, and only if at least one property is
+ * present, 0 to 7 bytes of padding to align on a 64-bit boundary.
+ * - Zero or more properties (see struct ofp14_bundle_prop_header). */
};
OFP_ASSERT(sizeof(struct ofp14_bundle_ctrl_msg) == 8);
-/* Message structure for OFP_BUNDLE_ADD_MESSAGE.
-* Adding a message in a bundle is done with. */
-struct ofp14_bundle_add_msg {
- ovs_be32 bundle_id; /* Identify the bundle. */
- uint8_t pad[2]; /* Align to 64 bits. */
- ovs_be16 flags; /* Bitmap of ONF_BF_* flags. */
-
- struct ofp_header message; /* Message added to the bundle. */
-
- /* If there is one property or more, 'message' is followed by:
- * - Exactly (message.length + 7)/8*8 - (message.length) (between 0 and 7)
- * bytes of all-zero bytes */
-
- /* Bundle Property list. */
- /* struct ofp14_bundle_prop_header properties[0]; */
-};
-OFP_ASSERT(sizeof(struct ofp14_bundle_add_msg) == 16);
#endif /* openflow/openflow-1.4.h */
lib/crc32c.h \
lib/csum.c \
lib/csum.h \
+ lib/daemon.c \
lib/daemon.h \
+ lib/daemon-private.h \
lib/dhcp.h \
lib/dummy.c \
lib/dummy.h \
lib/stream-fd-windows.c
else
lib_libopenvswitch_la_SOURCES += \
- lib/daemon.c \
+ lib/daemon-unix.c \
lib/latch-unix.c \
lib/signals.c \
lib/signals.h \
bool forwarding_if_rx;
long long int forwarding_if_rx_detect_time;
+ /* When 'bfd->forwarding_if_rx' is set, at least one bfd control packet
+ * is required to be received every 100 * bfd->cfg_min_rx. If bfd
+ * control packet is not received within this interval, even if data
+ * packets are received, the bfd->forwarding will still be false. */
+ long long int demand_rx_bfd_time;
+
/* BFD decay related variables. */
bool in_decay; /* True when bfd is in decay. */
int decay_min_rx; /* min_rx is set to decay_min_rx when */
long long int decay_detect_time; /* Decay detection time. */
uint64_t flap_count; /* Counts bfd forwarding flaps. */
+
+ /* True when the variables returned by bfd_get_status() are changed
+ * since last check. */
+ bool status_changed;
};
static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
static uint64_t bfd_rx_packets(const struct bfd *) OVS_REQUIRES(mutex);
static void bfd_try_decay(struct bfd *) OVS_REQUIRES(mutex);
static void bfd_decay_update(struct bfd *) OVS_REQUIRES(mutex);
+static void bfd_status_changed(struct bfd *) OVS_REQUIRES(mutex);
static void bfd_forwarding_if_rx_update(struct bfd *) OVS_REQUIRES(mutex);
static void bfd_unixctl_show(struct unixctl_conn *, int argc,
}
}
+/* Returns and resets the 'bfd->status_changed'. */
+bool
+bfd_check_status_change(struct bfd *bfd) OVS_EXCLUDED(mutex)
+{
+ bool ret;
+
+ ovs_mutex_lock(&mutex);
+ ret = bfd->status_changed;
+ bfd->status_changed = false;
+ ovs_mutex_unlock(&mutex);
+
+ return ret;
+}
+
/* Returns a 'smap' of key value pairs representing the status of 'bfd'
* intended for the OVS database. */
void
}
if (bfd->rmt_state != rmt_state) {
- seq_change(connectivity_seq_get());
+ bfd_status_changed(bfd);
}
bfd->rmt_disc = ntohl(msg->my_disc);
}
/* XXX: RFC 5880 Section 6.8.6 Demand mode related calculations here. */
+ if (bfd->forwarding_if_rx) {
+ bfd->demand_rx_bfd_time = time_msec() + 100 * bfd->cfg_min_rx;
+ }
+
out:
bfd_forwarding__(bfd);
ovs_mutex_unlock(&mutex);
static bool
bfd_forwarding__(struct bfd *bfd) OVS_REQUIRES(mutex)
{
- long long int time;
+ long long int now = time_msec();
+ bool forwarding_if_rx;
bool last_forwarding = bfd->last_forwarding;
if (bfd->forwarding_override != -1) {
return bfd->forwarding_override == 1;
}
- time = bfd->forwarding_if_rx_detect_time;
- bfd->last_forwarding = (bfd->state == STATE_UP
- || (bfd->forwarding_if_rx && time > time_msec()))
- && bfd->rmt_diag != DIAG_PATH_DOWN
- && bfd->rmt_diag != DIAG_CPATH_DOWN
- && bfd->rmt_diag != DIAG_RCPATH_DOWN;
+ forwarding_if_rx = bfd->forwarding_if_rx
+ && bfd->forwarding_if_rx_detect_time > now
+ && bfd->demand_rx_bfd_time > now;
+
+ bfd->last_forwarding = (bfd->state == STATE_UP || forwarding_if_rx)
+ && bfd->rmt_diag != DIAG_PATH_DOWN
+ && bfd->rmt_diag != DIAG_CPATH_DOWN
+ && bfd->rmt_diag != DIAG_RCPATH_DOWN;
if (bfd->last_forwarding != last_forwarding) {
bfd->flap_count++;
- seq_change(connectivity_seq_get());
+ bfd_status_changed(bfd);
}
return bfd->last_forwarding;
}
bfd_decay_update(bfd);
}
- seq_change(connectivity_seq_get());
+ bfd_status_changed(bfd);
}
}
bfd->decay_detect_time = MAX(bfd->decay_min_rx, 2000) + time_msec();
}
+/* Records the status change and changes the global connectivity seq. */
+static void
+bfd_status_changed(struct bfd *bfd) OVS_REQUIRES(mutex)
+{
+ seq_change(connectivity_seq_get());
+ bfd->status_changed = true;
+}
+
static void
bfd_forwarding_if_rx_update(struct bfd *bfd) OVS_REQUIRES(mutex)
{
goto out;
}
bfd->forwarding_override = forwarding_override;
+ bfd_status_changed(bfd);
} else {
HMAP_FOR_EACH (bfd, node, all_bfds) {
bfd->forwarding_override = forwarding_override;
+ bfd_status_changed(bfd);
}
}
void bfd_account_rx(struct bfd *, const struct dpif_flow_stats *);
bool bfd_forwarding(struct bfd *);
+bool bfd_check_status_change(struct bfd *);
void bfd_get_status(const struct bfd *, struct smap *);
void bfd_set_netdev(struct bfd *, const struct netdev *);
long long int bfd_wake_time(const struct bfd *);
((((ovs_be64) (VALUE)) & UINT64_C(0xff00000000000000)) >> 56))
#endif
+#if WORDS_BIGENDIAN
+#define BYTES_TO_BE32(B1, B2, B3, B4) \
+ (OVS_FORCE ovs_be32)((uint32_t)(B1) << 24 | (B2) << 16 | (B3) << 8 | (B4))
+#define BE16S_TO_BE32(B1, B2) \
+ (OVS_FORCE ovs_be32)((uint32_t)(B1) << 16 | (B2))
+#else
+#define BYTES_TO_BE32(B1, B2, B3, B4) \
+ (OVS_FORCE ovs_be32)((uint32_t)(B1) | (B2) << 8 | (B3) << 16 | (B4) << 24)
+#define BE16S_TO_BE32(B1, B2) \
+ (OVS_FORCE ovs_be32)((uint32_t)(B1) | (B2) << 16)
+#endif
+
#endif /* byte-order.h */
struct ovs_refcount ref_cnt;
uint64_t flap_count; /* Count the flaps since boot. */
+
+ /* True when the variables returned by cfm_get_*() are changed
+ * since last check. */
+ bool status_changed;
+
+ /* When 'cfm->demand' is set, at least one ccm is required to be received
+ * every 100 * cfm_interval. If ccm is not received within this interval,
+ * even if data packets are received, the cfm fault will be set. */
+ struct timer demand_rx_ccm_t;
};
/* Remote MPs represent foreign network entities that are configured to have
cfm_generate_maid(cfm);
hmap_insert(all_cfms, &cfm->hmap_node, hash_string(cfm->name, 0));
ovs_mutex_unlock(&mutex);
+
return cfm;
}
return cfm;
}
+/* Records the status change and changes the global connectivity seq. */
+static void
+cfm_status_changed(struct cfm *cfm) OVS_REQUIRES(mutex)
+{
+ seq_change(connectivity_seq_get());
+ cfm->status_changed = true;
+}
+
/* Should be run periodically to update fault statistics messages. */
void
cfm_run(struct cfm *cfm) OVS_EXCLUDED(mutex)
if (cfm->demand) {
uint64_t rx_packets = cfm_rx_packets(cfm);
demand_override = hmap_count(&cfm->remote_mps) == 1
- && rx_packets > cfm->rx_packets;
+ && rx_packets > cfm->rx_packets
+ && !timer_expired(&cfm->demand_rx_ccm_t);
cfm->rx_packets = rx_packets;
}
|| (old_rmps_array_len != cfm->rmps_array_len || old_rmps_deleted)
|| old_cfm_fault != cfm->fault
|| old_flap_count != cfm->flap_count) {
- seq_change(connectivity_seq_get());
+ cfm_status_changed(cfm);
}
cfm->booted = true;
rmp->mpid = ccm_mpid;
if (!cfm_fault) {
rmp->num_health_ccm++;
+ if (cfm->demand) {
+ timer_set_duration(&cfm->demand_rx_ccm_t,
+ 100 * cfm->ccm_interval_ms);
+ }
}
rmp->recv = true;
cfm->recv_fault |= cfm_fault;
ovs_mutex_unlock(&mutex);
}
+/* Returns and resets the 'cfm->status_changed'. */
+bool
+cfm_check_status_change(struct cfm *cfm) OVS_EXCLUDED(mutex)
+{
+ bool ret;
+
+ ovs_mutex_lock(&mutex);
+ ret = cfm->status_changed;
+ cfm->status_changed = false;
+ ovs_mutex_unlock(&mutex);
+
+ return ret;
+}
+
static int
cfm_get_fault__(const struct cfm *cfm) OVS_REQUIRES(mutex)
{
goto out;
}
cfm->fault_override = fault_override;
+ cfm_status_changed(cfm);
} else {
HMAP_FOR_EACH (cfm, hmap_node, all_cfms) {
cfm->fault_override = fault_override;
+ cfm_status_changed(cfm);
}
}
- seq_change(connectivity_seq_get());
unixctl_command_reply(conn, "OK");
out:
bool cfm_should_process_flow(const struct cfm *cfm, const struct flow *,
struct flow_wildcards *);
void cfm_process_heartbeat(struct cfm *, const struct ofpbuf *packet);
+bool cfm_check_status_change(struct cfm *);
int cfm_get_fault(const struct cfm *);
uint64_t cfm_get_flap_count(const struct cfm *);
int cfm_get_health(const struct cfm *);
VLOG_DEFINE_THIS_MODULE(classifier);
+struct trie_node;
struct trie_ctx;
-static struct cls_subtable *find_subtable(const struct classifier *,
+
+/* Ports trie depends on both ports sharing the same ovs_be32. */
+#define TP_PORTS_OFS32 (offsetof(struct flow, tp_src) / 4)
+BUILD_ASSERT_DECL(TP_PORTS_OFS32 == offsetof(struct flow, tp_dst) / 4);
+
+/* Prefix trie for a 'field' */
+struct cls_trie {
+ const struct mf_field *field; /* Trie field, or NULL. */
+ struct trie_node *root; /* NULL if none. */
+};
+
+struct cls_subtable_entry {
+ struct cls_subtable *subtable;
+ tag_type tag;
+ unsigned int max_priority;
+};
+
+struct cls_subtable_cache {
+ struct cls_subtable_entry *subtables;
+ size_t alloc_size; /* Number of allocated elements. */
+ size_t size; /* One past last valid array element. */
+};
+
+enum {
+ CLS_MAX_INDICES = 3 /* Maximum number of lookup indices per subtable. */
+};
+
+struct cls_classifier {
+ int n_rules; /* Total number of rules. */
+ uint8_t n_flow_segments;
+ uint8_t flow_segments[CLS_MAX_INDICES]; /* Flow segment boundaries to use
+ * for staged lookup. */
+ struct hmap subtables; /* Contains "struct cls_subtable"s. */
+ struct cls_subtable_cache subtables_priority;
+ struct hmap partitions; /* Contains "struct cls_partition"s. */
+ struct cls_trie tries[CLS_MAX_TRIES]; /* Prefix tries. */
+ unsigned int n_tries;
+};
+
+/* A set of rules that all have the same fields wildcarded. */
+struct cls_subtable {
+ struct hmap_node hmap_node; /* Within struct cls_classifier 'subtables'
+ * hmap. */
+ struct hmap rules; /* Contains "struct cls_rule"s. */
+ int n_rules; /* Number of rules, including duplicates. */
+ unsigned int max_priority; /* Max priority of any rule in the subtable. */
+ unsigned int max_count; /* Count of max_priority rules. */
+ tag_type tag; /* Tag generated from mask for partitioning. */
+ uint8_t n_indices; /* How many indices to use. */
+ uint8_t index_ofs[CLS_MAX_INDICES]; /* u32 flow segment boundaries. */
+ struct hindex indices[CLS_MAX_INDICES]; /* Staged lookup indices. */
+ unsigned int trie_plen[CLS_MAX_TRIES]; /* Trie prefix length in 'mask'. */
+ int ports_mask_len;
+ struct trie_node *ports_trie; /* NULL if none. */
+ struct minimask mask; /* Wildcards for fields. */
+ /* 'mask' must be the last field. */
+};
+
+/* Associates a metadata value (that is, a value of the OpenFlow 1.1+ metadata
+ * field) with tags for the "cls_subtable"s that contain rules that match that
+ * metadata value. */
+struct cls_partition {
+ struct hmap_node hmap_node; /* In struct cls_classifier's 'partitions'
+ * hmap. */
+ ovs_be64 metadata; /* metadata value for this partition. */
+ tag_type tags; /* OR of each flow's cls_subtable tag. */
+ struct tag_tracker tracker; /* Tracks the bits in 'tags'. */
+};
+
+/* Internal representation of a rule in a "struct cls_subtable". */
+struct cls_match {
+ struct cls_rule *cls_rule;
+ struct hindex_node index_nodes[CLS_MAX_INDICES]; /* Within subtable's
+ * 'indices'. */
+ struct hmap_node hmap_node; /* Within struct cls_subtable 'rules'. */
+ unsigned int priority; /* Larger numbers are higher priorities. */
+ struct cls_partition *partition;
+ struct list list; /* List of identical, lower-priority rules. */
+ struct miniflow flow; /* Matching rule. Mask is in the subtable. */
+ /* 'flow' must be the last field. */
+};
+
+static struct cls_match *
+cls_match_alloc(struct cls_rule *rule)
+{
+ int count = count_1bits(rule->match.flow.map);
+
+ struct cls_match *cls_match
+ = xmalloc(sizeof *cls_match - sizeof cls_match->flow.inline_values
+ + MINIFLOW_VALUES_SIZE(count));
+
+ cls_match->cls_rule = rule;
+ miniflow_clone_inline(&cls_match->flow, &rule->match.flow, count);
+ cls_match->priority = rule->priority;
+ rule->cls_match = cls_match;
+
+ return cls_match;
+}
+
+static struct cls_subtable *find_subtable(const struct cls_classifier *,
const struct minimask *);
-static struct cls_subtable *insert_subtable(struct classifier *,
+static struct cls_subtable *insert_subtable(struct cls_classifier *,
const struct minimask *);
-static void destroy_subtable(struct classifier *, struct cls_subtable *);
+static void destroy_subtable(struct cls_classifier *, struct cls_subtable *);
-static void update_subtables_after_insertion(struct classifier *,
+static void update_subtables_after_insertion(struct cls_classifier *,
struct cls_subtable *,
unsigned int new_priority);
-static void update_subtables_after_removal(struct classifier *,
+static void update_subtables_after_removal(struct cls_classifier *,
struct cls_subtable *,
unsigned int del_priority);
-static struct cls_rule *find_match_wc(const struct cls_subtable *,
- const struct flow *, struct trie_ctx *,
- unsigned int n_tries,
- struct flow_wildcards *);
-static struct cls_rule *find_equal(struct cls_subtable *,
- const struct miniflow *, uint32_t hash);
-static struct cls_rule *insert_rule(struct classifier *,
- struct cls_subtable *, struct cls_rule *);
+static struct cls_match *find_match_wc(const struct cls_subtable *,
+ const struct flow *, struct trie_ctx *,
+ unsigned int n_tries,
+ struct flow_wildcards *);
+static struct cls_match *find_equal(struct cls_subtable *,
+ const struct miniflow *, uint32_t hash);
+static struct cls_match *insert_rule(struct cls_classifier *,
+ struct cls_subtable *, struct cls_rule *);
/* Iterates RULE over HEAD and all of the cls_rules on HEAD->list. */
#define FOR_EACH_RULE_IN_LIST(RULE, HEAD) \
(RULE) != NULL && ((NEXT) = next_rule_in_list(RULE), true); \
(RULE) = (NEXT))
-static struct cls_rule *next_rule_in_list__(struct cls_rule *);
-static struct cls_rule *next_rule_in_list(struct cls_rule *);
+static struct cls_match *next_rule_in_list__(struct cls_match *);
+static struct cls_match *next_rule_in_list(struct cls_match *);
static unsigned int minimask_get_prefix_len(const struct minimask *,
const struct mf_field *);
-static void trie_init(struct classifier *, int trie_idx,
+static void trie_init(struct cls_classifier *, int trie_idx,
const struct mf_field *);
static unsigned int trie_lookup(const struct cls_trie *, const struct flow *,
unsigned int *checkbits);
-
+static unsigned int trie_lookup_value(const struct trie_node *,
+ const ovs_be32 value[],
+ unsigned int *checkbits);
static void trie_destroy(struct trie_node *);
static void trie_insert(struct cls_trie *, const struct cls_rule *, int mlen);
+static void trie_insert_prefix(struct trie_node **, const ovs_be32 *prefix,
+ int mlen);
static void trie_remove(struct cls_trie *, const struct cls_rule *, int mlen);
+static void trie_remove_prefix(struct trie_node **, const ovs_be32 *prefix,
+ int mlen);
static void mask_set_prefix_bits(struct flow_wildcards *, uint8_t be32ofs,
unsigned int nbits);
static bool mask_prefix_bits_set(const struct flow_wildcards *,
uint8_t be32ofs, unsigned int nbits);
+
+static void
+cls_subtable_cache_init(struct cls_subtable_cache *array)
+{
+ memset(array, 0, sizeof *array);
+}
+
+static void
+cls_subtable_cache_destroy(struct cls_subtable_cache *array)
+{
+ free(array->subtables);
+ memset(array, 0, sizeof *array);
+}
+
+/* Array insertion. */
+static void
+cls_subtable_cache_push_back(struct cls_subtable_cache *array,
+ struct cls_subtable_entry a)
+{
+ if (array->size == array->alloc_size) {
+ array->subtables = x2nrealloc(array->subtables, &array->alloc_size,
+ sizeof a);
+ }
+
+ array->subtables[array->size++] = a;
+}
+
+/* Only for rearranging entries in the same cache. */
+static inline void
+cls_subtable_cache_splice(struct cls_subtable_entry *to,
+ struct cls_subtable_entry *start,
+ struct cls_subtable_entry *end)
+{
+ if (to > end) {
+ /* Same as splicing entries to (start) from [end, to). */
+ struct cls_subtable_entry *temp = to;
+ to = start; start = end; end = temp;
+ }
+ if (to < start) {
+ while (start != end) {
+ struct cls_subtable_entry temp = *start;
+
+ memmove(to + 1, to, (start - to) * sizeof *to);
+ *to = temp;
+ start++;
+ }
+ } /* Else nothing to be done. */
+}
+
+/* Array removal. */
+static inline void
+cls_subtable_cache_remove(struct cls_subtable_cache *array,
+ struct cls_subtable_entry *elem)
+{
+ ssize_t size = (&array->subtables[array->size]
+ - (elem + 1)) * sizeof *elem;
+ if (size > 0) {
+ memmove(elem, elem + 1, size);
+ }
+ array->size--;
+}
+
+#define CLS_SUBTABLE_CACHE_FOR_EACH(SUBTABLE, ITER, ARRAY) \
+ for (ITER = (ARRAY)->subtables; \
+ ITER < &(ARRAY)->subtables[(ARRAY)->size] \
+ && OVS_LIKELY(SUBTABLE = ITER->subtable); \
+ ++ITER)
+#define CLS_SUBTABLE_CACHE_FOR_EACH_CONTINUE(SUBTABLE, ITER, ARRAY) \
+ for (++ITER; \
+ ITER < &(ARRAY)->subtables[(ARRAY)->size] \
+ && OVS_LIKELY(SUBTABLE = ITER->subtable); \
+ ++ITER)
+#define CLS_SUBTABLE_CACHE_FOR_EACH_REVERSE(SUBTABLE, ITER, ARRAY) \
+ for (ITER = &(ARRAY)->subtables[(ARRAY)->size]; \
+ ITER > (ARRAY)->subtables \
+ && OVS_LIKELY(SUBTABLE = (--ITER)->subtable);)
+
+\f
+/* flow/miniflow/minimask/minimatch utilities.
+ * These are only used by the classifier, so place them here to allow
+ * for better optimization. */
+
+static inline uint64_t
+miniflow_get_map_in_range(const struct miniflow *miniflow,
+ uint8_t start, uint8_t end, unsigned int *offset)
+{
+ uint64_t map = miniflow->map;
+ *offset = 0;
+
+ if (start > 0) {
+ uint64_t msk = (UINT64_C(1) << start) - 1; /* 'start' LSBs set */
+ *offset = count_1bits(map & msk);
+ map &= ~msk;
+ }
+ if (end < FLOW_U32S) {
+ uint64_t msk = (UINT64_C(1) << end) - 1; /* 'end' LSBs set */
+ map &= msk;
+ }
+ return map;
+}
+
+/* Returns a hash value for the bits of 'flow' where there are 1-bits in
+ * 'mask', given 'basis'.
+ *
+ * The hash values returned by this function are the same as those returned by
+ * miniflow_hash_in_minimask(), only the form of the arguments differ. */
+static inline uint32_t
+flow_hash_in_minimask(const struct flow *flow, const struct minimask *mask,
+ uint32_t basis)
+{
+ const uint32_t *mask_values = miniflow_get_u32_values(&mask->masks);
+ const uint32_t *flow_u32 = (const uint32_t *)flow;
+ const uint32_t *p = mask_values;
+ uint32_t hash;
+ uint64_t map;
+
+ hash = basis;
+ for (map = mask->masks.map; map; map = zero_rightmost_1bit(map)) {
+ hash = mhash_add(hash, flow_u32[raw_ctz(map)] & *p++);
+ }
+
+ return mhash_finish(hash, (p - mask_values) * 4);
+}
+
+/* Returns a hash value for the bits of 'flow' where there are 1-bits in
+ * 'mask', given 'basis'.
+ *
+ * The hash values returned by this function are the same as those returned by
+ * flow_hash_in_minimask(), only the form of the arguments differ. */
+static inline uint32_t
+miniflow_hash_in_minimask(const struct miniflow *flow,
+ const struct minimask *mask, uint32_t basis)
+{
+ const uint32_t *mask_values = miniflow_get_u32_values(&mask->masks);
+ const uint32_t *p = mask_values;
+ uint32_t hash = basis;
+ uint32_t flow_u32;
+
+ MINIFLOW_FOR_EACH_IN_MAP(flow_u32, flow, mask->masks.map) {
+ hash = mhash_add(hash, flow_u32 & *p++);
+ }
+
+ return mhash_finish(hash, (p - mask_values) * 4);
+}
+
+/* Returns a hash value for the bits of range [start, end) in 'flow',
+ * where there are 1-bits in 'mask', given 'hash'.
+ *
+ * The hash values returned by this function are the same as those returned by
+ * minimatch_hash_range(), only the form of the arguments differ. */
+static inline uint32_t
+flow_hash_in_minimask_range(const struct flow *flow,
+ const struct minimask *mask,
+ uint8_t start, uint8_t end, uint32_t *basis)
+{
+ const uint32_t *mask_values = miniflow_get_u32_values(&mask->masks);
+ const uint32_t *flow_u32 = (const uint32_t *)flow;
+ unsigned int offset;
+ uint64_t map = miniflow_get_map_in_range(&mask->masks, start, end,
+ &offset);
+ const uint32_t *p = mask_values + offset;
+ uint32_t hash = *basis;
+
+ for (; map; map = zero_rightmost_1bit(map)) {
+ hash = mhash_add(hash, flow_u32[raw_ctz(map)] & *p++);
+ }
+
+ *basis = hash; /* Allow continuation from the unfinished value. */
+ return mhash_finish(hash, (p - mask_values) * 4);
+}
+
+/* Fold minimask 'mask''s wildcard mask into 'wc's wildcard mask. */
+static inline void
+flow_wildcards_fold_minimask(struct flow_wildcards *wc,
+ const struct minimask *mask)
+{
+ flow_union_with_miniflow(&wc->masks, &mask->masks);
+}
+
+/* Fold minimask 'mask''s wildcard mask into 'wc's wildcard mask
+ * in range [start, end). */
+static inline void
+flow_wildcards_fold_minimask_range(struct flow_wildcards *wc,
+ const struct minimask *mask,
+ uint8_t start, uint8_t end)
+{
+ uint32_t *dst_u32 = (uint32_t *)&wc->masks;
+ unsigned int offset;
+ uint64_t map = miniflow_get_map_in_range(&mask->masks, start, end,
+ &offset);
+ const uint32_t *p = miniflow_get_u32_values(&mask->masks) + offset;
+
+ for (; map; map = zero_rightmost_1bit(map)) {
+ dst_u32[raw_ctz(map)] |= *p++;
+ }
+}
+
+/* Returns a hash value for 'flow', given 'basis'. */
+static inline uint32_t
+miniflow_hash(const struct miniflow *flow, uint32_t basis)
+{
+ const uint32_t *values = miniflow_get_u32_values(flow);
+ const uint32_t *p = values;
+ uint32_t hash = basis;
+ uint64_t hash_map = 0;
+ uint64_t map;
+
+ for (map = flow->map; map; map = zero_rightmost_1bit(map)) {
+ if (*p) {
+ hash = mhash_add(hash, *p);
+ hash_map |= rightmost_1bit(map);
+ }
+ p++;
+ }
+ hash = mhash_add(hash, hash_map);
+ hash = mhash_add(hash, hash_map >> 32);
+
+ return mhash_finish(hash, p - values);
+}
+
+/* Returns a hash value for 'mask', given 'basis'. */
+static inline uint32_t
+minimask_hash(const struct minimask *mask, uint32_t basis)
+{
+ return miniflow_hash(&mask->masks, basis);
+}
+
+/* Returns a hash value for 'match', given 'basis'. */
+static inline uint32_t
+minimatch_hash(const struct minimatch *match, uint32_t basis)
+{
+ return miniflow_hash(&match->flow, minimask_hash(&match->mask, basis));
+}
+
+/* Returns a hash value for the bits of range [start, end) in 'minimatch',
+ * given 'basis'.
+ *
+ * The hash values returned by this function are the same as those returned by
+ * flow_hash_in_minimask_range(), only the form of the arguments differ. */
+static inline uint32_t
+minimatch_hash_range(const struct minimatch *match, uint8_t start, uint8_t end,
+ uint32_t *basis)
+{
+ unsigned int offset;
+ const uint32_t *p, *q;
+ uint32_t hash = *basis;
+ int n, i;
+
+ n = count_1bits(miniflow_get_map_in_range(&match->mask.masks, start, end,
+ &offset));
+ q = miniflow_get_u32_values(&match->mask.masks) + offset;
+ p = miniflow_get_u32_values(&match->flow) + offset;
+
+ for (i = 0; i < n; i++) {
+ hash = mhash_add(hash, p[i] & q[i]);
+ }
+ *basis = hash; /* Allow continuation from the unfinished value. */
+ return mhash_finish(hash, (offset + n) * 4);
+}
+
\f
/* cls_rule. */
{
minimatch_init(&rule->match, match);
rule->priority = priority;
+ rule->cls_match = NULL;
}
/* Same as cls_rule_init() for initialization from a "struct minimatch". */
{
minimatch_clone(&rule->match, match);
rule->priority = priority;
+ rule->cls_match = NULL;
}
/* Initializes 'dst' as a copy of 'src'.
{
minimatch_clone(&dst->match, &src->match);
dst->priority = src->priority;
+ dst->cls_match = NULL;
}
/* Initializes 'dst' with the data in 'src', destroying 'src'.
{
minimatch_move(&dst->match, &src->match);
dst->priority = src->priority;
+ dst->cls_match = NULL;
}
/* Frees memory referenced by 'rule'. Doesn't free 'rule' itself (it's
void
cls_rule_destroy(struct cls_rule *rule)
{
+ ovs_assert(!rule->cls_match);
minimatch_destroy(&rule->match);
}
/* Initializes 'cls' as a classifier that initially contains no classification
* rules. */
void
-classifier_init(struct classifier *cls, const uint8_t *flow_segments)
+classifier_init(struct classifier *cls_, const uint8_t *flow_segments)
{
+ struct cls_classifier *cls = xmalloc(sizeof *cls);
+
+ fat_rwlock_init(&cls_->rwlock);
+
+ cls_->cls = cls;
+
cls->n_rules = 0;
hmap_init(&cls->subtables);
- list_init(&cls->subtables_priority);
+ cls_subtable_cache_init(&cls->subtables_priority);
hmap_init(&cls->partitions);
- fat_rwlock_init(&cls->rwlock);
cls->n_flow_segments = 0;
if (flow_segments) {
while (cls->n_flow_segments < CLS_MAX_INDICES
/* Destroys 'cls'. Rules within 'cls', if any, are not freed; this is the
* caller's responsibility. */
void
-classifier_destroy(struct classifier *cls)
+classifier_destroy(struct classifier *cls_)
{
- if (cls) {
+ if (cls_) {
+ struct cls_classifier *cls = cls_->cls;
struct cls_subtable *partition, *next_partition;
struct cls_subtable *subtable, *next_subtable;
int i;
+ fat_rwlock_destroy(&cls_->rwlock);
+ if (!cls) {
+ return;
+ }
+
for (i = 0; i < cls->n_tries; i++) {
trie_destroy(cls->tries[i].root);
}
free(partition);
}
hmap_destroy(&cls->partitions);
- fat_rwlock_destroy(&cls->rwlock);
+
+ cls_subtable_cache_destroy(&cls->subtables_priority);
+ free(cls);
}
}
/* Set the fields for which prefix lookup should be performed. */
void
-classifier_set_prefix_fields(struct classifier *cls,
+classifier_set_prefix_fields(struct classifier *cls_,
const enum mf_field_id *trie_fields,
unsigned int n_fields)
{
+ struct cls_classifier *cls = cls_->cls;
uint64_t fields = 0;
int i, trie;
}
static void
-trie_init(struct classifier *cls, int trie_idx,
+trie_init(struct cls_classifier *cls, int trie_idx,
const struct mf_field *field)
{
struct cls_trie *trie = &cls->tries[trie_idx];
struct cls_subtable *subtable;
+ struct cls_subtable_entry *iter;
if (trie_idx < cls->n_tries) {
trie_destroy(trie->root);
trie->field = field;
/* Add existing rules to the trie. */
- LIST_FOR_EACH (subtable, list_node, &cls->subtables_priority) {
+ CLS_SUBTABLE_CACHE_FOR_EACH (subtable, iter, &cls->subtables_priority) {
unsigned int plen;
plen = field ? minimask_get_prefix_len(&subtable->mask, field) : 0;
subtable->trie_plen[trie_idx] = plen;
if (plen) {
- struct cls_rule *head;
+ struct cls_match *head;
HMAP_FOR_EACH (head, hmap_node, &subtable->rules) {
- struct cls_rule *rule;
+ struct cls_match *match;
- FOR_EACH_RULE_IN_LIST (rule, head) {
- trie_insert(trie, rule, plen);
+ FOR_EACH_RULE_IN_LIST (match, head) {
+ trie_insert(trie, match->cls_rule, plen);
}
}
}
bool
classifier_is_empty(const struct classifier *cls)
{
- return cls->n_rules == 0;
+ return cls->cls->n_rules == 0;
}
/* Returns the number of rules in 'cls'. */
int
classifier_count(const struct classifier *cls)
{
- return cls->n_rules;
+ return cls->cls->n_rules;
}
static uint32_t
}
static struct cls_partition *
-find_partition(const struct classifier *cls, ovs_be64 metadata, uint32_t hash)
+find_partition(const struct cls_classifier *cls, ovs_be64 metadata,
+ uint32_t hash)
{
struct cls_partition *partition;
}
static struct cls_partition *
-create_partition(struct classifier *cls, struct cls_subtable *subtable,
+create_partition(struct cls_classifier *cls, struct cls_subtable *subtable,
ovs_be64 metadata)
{
uint32_t hash = hash_metadata(metadata);
return partition;
}
+static inline ovs_be32 minimatch_get_ports(const struct minimatch *match)
+{
+ /* Could optimize to use the same map if needed for fast path. */
+ return MINIFLOW_GET_BE32(&match->flow, tp_src)
+ & MINIFLOW_GET_BE32(&match->mask.masks, tp_src);
+}
+
/* Inserts 'rule' into 'cls'. Until 'rule' is removed from 'cls', the caller
* must not modify or free it.
*
* rule, even rules that cannot have any effect because the new rule matches a
* superset of their flows and has higher priority. */
struct cls_rule *
-classifier_replace(struct classifier *cls, struct cls_rule *rule)
+classifier_replace(struct classifier *cls_, struct cls_rule *rule)
{
- struct cls_rule *old_rule;
+ struct cls_classifier *cls = cls_->cls;
+ struct cls_match *old_rule;
struct cls_subtable *subtable;
subtable = find_subtable(cls, &rule->match.mask);
if (!old_rule) {
int i;
+ rule->cls_match->partition = NULL;
if (minimask_get_metadata_mask(&rule->match.mask) == OVS_BE64_MAX) {
ovs_be64 metadata = miniflow_get_metadata(&rule->match.flow);
- rule->partition = create_partition(cls, subtable, metadata);
- } else {
- rule->partition = NULL;
+ rule->cls_match->partition = create_partition(cls, subtable,
+ metadata);
}
subtable->n_rules++;
trie_insert(&cls->tries[i], rule, subtable->trie_plen[i]);
}
}
+
+ /* Ports trie. */
+ if (subtable->ports_mask_len) {
+ /* We mask the value to be inserted to always have the wildcarded
+ * bits in known (zero) state, so we can include them in comparison
+ * and they will always match (== their original value does not
+ * matter). */
+ ovs_be32 masked_ports = minimatch_get_ports(&rule->match);
+
+ trie_insert_prefix(&subtable->ports_trie, &masked_ports,
+ subtable->ports_mask_len);
+ }
+
+ return NULL;
} else {
- rule->partition = old_rule->partition;
+ struct cls_rule *old_cls_rule = old_rule->cls_rule;
+
+ rule->cls_match->partition = old_rule->partition;
+ old_cls_rule->cls_match = NULL;
+ free(old_rule);
+ return old_cls_rule;
}
- return old_rule;
}
/* Inserts 'rule' into 'cls'. Until 'rule' is removed from 'cls', the caller
* 'rule' with cls_rule_destroy(), freeing the memory block in which 'rule'
* resides, etc., as necessary. */
void
-classifier_remove(struct classifier *cls, struct cls_rule *rule)
+classifier_remove(struct classifier *cls_, struct cls_rule *rule)
{
+ struct cls_classifier *cls = cls_->cls;
struct cls_partition *partition;
- struct cls_rule *head;
+ struct cls_match *cls_match = rule->cls_match;
+ struct cls_match *head;
struct cls_subtable *subtable;
int i;
+ ovs_assert(cls_match);
+
subtable = find_subtable(cls, &rule->match.mask);
+ ovs_assert(subtable);
+
+ if (subtable->ports_mask_len) {
+ ovs_be32 masked_ports = minimatch_get_ports(&rule->match);
+ trie_remove_prefix(&subtable->ports_trie,
+ &masked_ports, subtable->ports_mask_len);
+ }
for (i = 0; i < cls->n_tries; i++) {
if (subtable->trie_plen[i]) {
trie_remove(&cls->tries[i], rule, subtable->trie_plen[i]);
/* Remove rule node from indices. */
for (i = 0; i < subtable->n_indices; i++) {
- hindex_remove(&subtable->indices[i], &rule->index_nodes[i]);
+ hindex_remove(&subtable->indices[i], &cls_match->index_nodes[i]);
}
- head = find_equal(subtable, &rule->match.flow, rule->hmap_node.hash);
- if (head != rule) {
- list_remove(&rule->list);
- } else if (list_is_empty(&rule->list)) {
- hmap_remove(&subtable->rules, &rule->hmap_node);
+ head = find_equal(subtable, &rule->match.flow, cls_match->hmap_node.hash);
+ if (head != cls_match) {
+ list_remove(&cls_match->list);
+ } else if (list_is_empty(&cls_match->list)) {
+ hmap_remove(&subtable->rules, &cls_match->hmap_node);
} else {
- struct cls_rule *next = CONTAINER_OF(rule->list.next,
- struct cls_rule, list);
+ struct cls_match *next = CONTAINER_OF(cls_match->list.next,
+ struct cls_match, list);
- list_remove(&rule->list);
- hmap_replace(&subtable->rules, &rule->hmap_node, &next->hmap_node);
+ list_remove(&cls_match->list);
+ hmap_replace(&subtable->rules, &cls_match->hmap_node,
+ &next->hmap_node);
}
- partition = rule->partition;
+ partition = cls_match->partition;
if (partition) {
tag_tracker_subtract(&partition->tracker, &partition->tags,
subtable->tag);
if (--subtable->n_rules == 0) {
destroy_subtable(cls, subtable);
} else {
- update_subtables_after_removal(cls, subtable, rule->priority);
+ update_subtables_after_removal(cls, subtable, cls_match->priority);
}
cls->n_rules--;
+
+ rule->cls_match = NULL;
+ free(cls_match);
}
/* Prefix tree context. Valid when 'lookup_done' is true. Can skip all
ctx->lookup_done = false;
}
+static inline void
+lookahead_subtable(const struct cls_subtable_entry *subtables)
+{
+ ovs_prefetch_range(subtables->subtable, sizeof *subtables->subtable);
+}
+
/* Finds and returns the highest-priority rule in 'cls' that matches 'flow'.
* Returns a null pointer if no rules in 'cls' match 'flow'. If multiple rules
* of equal priority match 'flow', returns one arbitrarily.
* earlier, 'wc' should have been initialized (e.g., by
* flow_wildcards_init_catchall()). */
struct cls_rule *
-classifier_lookup(const struct classifier *cls, const struct flow *flow,
+classifier_lookup(const struct classifier *cls_, const struct flow *flow,
struct flow_wildcards *wc)
{
+ struct cls_classifier *cls = cls_->cls;
const struct cls_partition *partition;
- struct cls_subtable *subtable;
- struct cls_rule *best;
tag_type tags;
+ struct cls_match *best;
struct trie_ctx trie_ctx[CLS_MAX_TRIES];
int i;
+ struct cls_subtable_entry *subtables = cls->subtables_priority.subtables;
+ int n_subtables = cls->subtables_priority.size;
+ int64_t best_priority = -1;
+
+ /* Prefetch the subtables array. */
+ ovs_prefetch_range(subtables, n_subtables * sizeof *subtables);
/* Determine 'tags' such that, if 'subtable->tag' doesn't intersect them,
* then 'flow' cannot possibly match in 'subtable':
for (i = 0; i < cls->n_tries; i++) {
trie_ctx_init(&trie_ctx[i], &cls->tries[i]);
}
+
+ /* Prefetch the first subtables. */
+ if (n_subtables > 1) {
+ lookahead_subtable(subtables);
+ lookahead_subtable(subtables + 1);
+ }
+
best = NULL;
- LIST_FOR_EACH (subtable, list_node, &cls->subtables_priority) {
- struct cls_rule *rule;
+ for (i = 0; OVS_LIKELY(i < n_subtables); i++) {
+ struct cls_match *rule;
+
+ if ((int64_t)subtables[i].max_priority <= best_priority) {
+ /* Subtables are in descending priority order,
+ * can not find anything better. */
+ break;
+ }
- if (!tag_intersects(tags, subtable->tag)) {
+ /* Prefetch a forthcoming subtable. */
+ if (i + 2 < n_subtables) {
+ lookahead_subtable(&subtables[i + 2]);
+ }
+
+ if (!tag_intersects(tags, subtables[i].tag)) {
continue;
}
- rule = find_match_wc(subtable, flow, trie_ctx, cls->n_tries, wc);
- if (rule) {
+ rule = find_match_wc(subtables[i].subtable, flow, trie_ctx,
+ cls->n_tries, wc);
+ if (rule && (int64_t)rule->priority > best_priority) {
+ best_priority = (int64_t)rule->priority;
best = rule;
- LIST_FOR_EACH_CONTINUE (subtable, list_node,
- &cls->subtables_priority) {
- if (subtable->max_priority <= best->priority) {
- /* Subtables are in descending priority order,
- * can not find anything better. */
- return best;
- }
- if (!tag_intersects(tags, subtable->tag)) {
- continue;
- }
+ }
+ }
- rule = find_match_wc(subtable, flow, trie_ctx, cls->n_tries,
- wc);
- if (rule && rule->priority > best->priority) {
- best = rule;
- }
- }
- break;
+ return best ? best->cls_rule : NULL;
+}
+
+/* Returns true if 'target' satisifies 'match', that is, if each bit for which
+ * 'match' specifies a particular value has the correct value in 'target'.
+ *
+ * 'flow' and 'mask' have the same mask! */
+static bool
+miniflow_and_mask_matches_miniflow(const struct miniflow *flow,
+ const struct minimask *mask,
+ const struct miniflow *target)
+{
+ const uint32_t *flowp = miniflow_get_u32_values(flow);
+ const uint32_t *maskp = miniflow_get_u32_values(&mask->masks);
+ uint32_t target_u32;
+
+ MINIFLOW_FOR_EACH_IN_MAP(target_u32, target, mask->masks.map) {
+ if ((*flowp++ ^ target_u32) & *maskp++) {
+ return false;
}
}
- return best;
+ return true;
+}
+
+static inline struct cls_match *
+find_match_miniflow(const struct cls_subtable *subtable,
+ const struct miniflow *flow,
+ uint32_t hash)
+{
+ struct cls_match *rule;
+
+ HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, hash, &subtable->rules) {
+ if (miniflow_and_mask_matches_miniflow(&rule->flow, &subtable->mask,
+ flow)) {
+ return rule;
+ }
+ }
+
+ return NULL;
+}
+
+/* Finds and returns the highest-priority rule in 'cls' that matches
+ * 'miniflow'. Returns a null pointer if no rules in 'cls' match 'flow'.
+ * If multiple rules of equal priority match 'flow', returns one arbitrarily.
+ *
+ * This function is optimized for the userspace datapath, which only ever has
+ * one priority value for it's flows!
+ */
+struct cls_rule *classifier_lookup_miniflow_first(const struct classifier *cls_,
+ const struct miniflow *flow)
+{
+ struct cls_classifier *cls = cls_->cls;
+ struct cls_subtable *subtable;
+ struct cls_subtable_entry *iter;
+
+ CLS_SUBTABLE_CACHE_FOR_EACH (subtable, iter, &cls->subtables_priority) {
+ struct cls_match *rule;
+
+ rule = find_match_miniflow(subtable, flow,
+ miniflow_hash_in_minimask(flow,
+ &subtable->mask,
+ 0));
+ if (rule) {
+ return rule->cls_rule;
+ }
+ }
+
+ return NULL;
}
/* Finds and returns a rule in 'cls' with exactly the same priority and
* matching criteria as 'target'. Returns a null pointer if 'cls' doesn't
* contain an exact match. */
struct cls_rule *
-classifier_find_rule_exactly(const struct classifier *cls,
+classifier_find_rule_exactly(const struct classifier *cls_,
const struct cls_rule *target)
{
- struct cls_rule *head, *rule;
+ struct cls_classifier *cls = cls_->cls;
+ struct cls_match *head, *rule;
struct cls_subtable *subtable;
subtable = find_subtable(cls, &target->match.mask);
&target->match.mask, 0));
FOR_EACH_RULE_IN_LIST (rule, head) {
if (target->priority >= rule->priority) {
- return target->priority == rule->priority ? rule : NULL;
+ return target->priority == rule->priority ? rule->cls_rule : NULL;
}
}
return NULL;
* considered to overlap if both rules have the same priority and a packet
* could match both. */
bool
-classifier_rule_overlaps(const struct classifier *cls,
+classifier_rule_overlaps(const struct classifier *cls_,
const struct cls_rule *target)
{
+ struct cls_classifier *cls = cls_->cls;
struct cls_subtable *subtable;
+ struct cls_subtable_entry *iter;
/* Iterate subtables in the descending max priority order. */
- LIST_FOR_EACH (subtable, list_node, &cls->subtables_priority) {
+ CLS_SUBTABLE_CACHE_FOR_EACH (subtable, iter, &cls->subtables_priority) {
uint32_t storage[FLOW_U32S];
struct minimask mask;
- struct cls_rule *head;
+ struct cls_match *head;
- if (target->priority > subtable->max_priority) {
+ if (target->priority > iter->max_priority) {
break; /* Can skip this and the rest of the subtables. */
}
minimask_combine(&mask, &target->match.mask, &subtable->mask, storage);
HMAP_FOR_EACH (head, hmap_node, &subtable->rules) {
- struct cls_rule *rule;
+ struct cls_match *rule;
FOR_EACH_RULE_IN_LIST (rule, head) {
if (rule->priority < target->priority) {
}
if (rule->priority == target->priority
&& miniflow_equal_in_minimask(&target->match.flow,
- &rule->match.flow, &mask)) {
+ &rule->flow, &mask)) {
return true;
}
}
/* Iteration. */
static bool
-rule_matches(const struct cls_rule *rule, const struct cls_rule *target)
+rule_matches(const struct cls_match *rule, const struct cls_rule *target)
{
return (!target
- || miniflow_equal_in_minimask(&rule->match.flow,
+ || miniflow_equal_in_minimask(&rule->flow,
&target->match.flow,
&target->match.mask));
}
-static struct cls_rule *
+static struct cls_match *
search_subtable(const struct cls_subtable *subtable,
const struct cls_rule *target)
{
if (!target || !minimask_has_extra(&subtable->mask, &target->match.mask)) {
- struct cls_rule *rule;
+ struct cls_match *rule;
HMAP_FOR_EACH (rule, hmap_node, &subtable->rules) {
if (rule_matches(rule, target)) {
cls_cursor_init(struct cls_cursor *cursor, const struct classifier *cls,
const struct cls_rule *target)
{
- cursor->cls = cls;
+ cursor->cls = cls->cls;
cursor->target = target && !cls_rule_is_catchall(target) ? target : NULL;
}
struct cls_subtable *subtable;
HMAP_FOR_EACH (subtable, hmap_node, &cursor->cls->subtables) {
- struct cls_rule *rule = search_subtable(subtable, cursor->target);
+ struct cls_match *rule = search_subtable(subtable, cursor->target);
if (rule) {
cursor->subtable = subtable;
- return rule;
+ return rule->cls_rule;
}
}
struct cls_rule *
cls_cursor_next(struct cls_cursor *cursor, const struct cls_rule *rule_)
{
- struct cls_rule *rule = CONST_CAST(struct cls_rule *, rule_);
+ struct cls_match *rule = CONST_CAST(struct cls_match *, rule_->cls_match);
const struct cls_subtable *subtable;
- struct cls_rule *next;
+ struct cls_match *next;
next = next_rule_in_list__(rule);
if (next->priority < rule->priority) {
- return next;
+ return next->cls_rule;
}
/* 'next' is the head of the list, that is, the rule that is included in
rule = next;
HMAP_FOR_EACH_CONTINUE (rule, hmap_node, &cursor->subtable->rules) {
if (rule_matches(rule, cursor->target)) {
- return rule;
+ return rule->cls_rule;
}
}
rule = search_subtable(subtable, cursor->target);
if (rule) {
cursor->subtable = subtable;
- return rule;
+ return rule->cls_rule;
}
}
}
\f
static struct cls_subtable *
-find_subtable(const struct classifier *cls, const struct minimask *mask)
+find_subtable(const struct cls_classifier *cls, const struct minimask *mask)
{
struct cls_subtable *subtable;
}
static struct cls_subtable *
-insert_subtable(struct classifier *cls, const struct minimask *mask)
+insert_subtable(struct cls_classifier *cls, const struct minimask *mask)
{
uint32_t hash = minimask_hash(mask, 0);
struct cls_subtable *subtable;
int i, index = 0;
struct flow_wildcards old, new;
uint8_t prev;
+ struct cls_subtable_entry elem;
+ int count = count_1bits(mask->masks.map);
- subtable = xzalloc(sizeof *subtable);
+ subtable = xzalloc(sizeof *subtable - sizeof mask->masks.inline_values
+ + MINIFLOW_VALUES_SIZE(count));
hmap_init(&subtable->rules);
- minimask_clone(&subtable->mask, mask);
+ miniflow_clone_inline(&subtable->mask.masks, &mask->masks, count);
/* Init indices for segmented lookup, if any. */
flow_wildcards_init_catchall(&new);
}
subtable->n_indices = index;
- hmap_insert(&cls->subtables, &subtable->hmap_node, hash);
- list_push_back(&cls->subtables_priority, &subtable->list_node);
subtable->tag = (minimask_get_metadata_mask(mask) == OVS_BE64_MAX
? tag_create_deterministic(hash)
: TAG_ALL);
cls->tries[i].field);
}
+ /* Ports trie. */
+ subtable->ports_trie = NULL;
+ subtable->ports_mask_len
+ = 32 - ctz32(ntohl(MINIFLOW_GET_BE32(&mask->masks, tp_src)));
+
+ hmap_insert(&cls->subtables, &subtable->hmap_node, hash);
+ elem.subtable = subtable;
+ elem.tag = subtable->tag;
+ elem.max_priority = subtable->max_priority;
+ cls_subtable_cache_push_back(&cls->subtables_priority, elem);
+
return subtable;
}
static void
-destroy_subtable(struct classifier *cls, struct cls_subtable *subtable)
+destroy_subtable(struct cls_classifier *cls, struct cls_subtable *subtable)
{
int i;
+ struct cls_subtable *table = NULL;
+ struct cls_subtable_entry *iter;
+
+ CLS_SUBTABLE_CACHE_FOR_EACH (table, iter, &cls->subtables_priority) {
+ if (table == subtable) {
+ cls_subtable_cache_remove(&cls->subtables_priority, iter);
+ break;
+ }
+ }
+
+ trie_destroy(subtable->ports_trie);
for (i = 0; i < subtable->n_indices; i++) {
hindex_destroy(&subtable->indices[i]);
minimask_destroy(&subtable->mask);
hmap_remove(&cls->subtables, &subtable->hmap_node);
hmap_destroy(&subtable->rules);
- list_remove(&subtable->list_node);
free(subtable);
}
* This function should only be called after adding a new rule, not after
* replacing a rule by an identical one or modifying a rule in-place. */
static void
-update_subtables_after_insertion(struct classifier *cls,
+update_subtables_after_insertion(struct cls_classifier *cls,
struct cls_subtable *subtable,
unsigned int new_priority)
{
if (new_priority == subtable->max_priority) {
++subtable->max_count;
} else if (new_priority > subtable->max_priority) {
- struct cls_subtable *iter;
+ struct cls_subtable *table;
+ struct cls_subtable_entry *iter, *subtable_iter = NULL;
subtable->max_priority = new_priority;
subtable->max_count = 1;
/* Possibly move 'subtable' earlier in the priority list. If we break
- * out of the loop, then 'subtable' should be moved just after that
+ * out of the loop, then 'subtable_iter' should be moved just before
* 'iter'. If the loop terminates normally, then 'iter' will be the
- * list head and we'll move subtable just after that (e.g. to the front
- * of the list). */
- iter = subtable;
- LIST_FOR_EACH_REVERSE_CONTINUE (iter, list_node,
- &cls->subtables_priority) {
- if (iter->max_priority >= subtable->max_priority) {
+ * first list element and we'll move subtable just before that
+ * (e.g. to the front of the list). */
+ CLS_SUBTABLE_CACHE_FOR_EACH_REVERSE (table, iter, &cls->subtables_priority) {
+ if (table == subtable) {
+ subtable_iter = iter; /* Locate the subtable as we go. */
+ iter->max_priority = new_priority;
+ } else if (table->max_priority >= new_priority) {
+ ovs_assert(subtable_iter != NULL);
+ iter++;
break;
}
}
- /* Move 'subtable' just after 'iter' (unless it's already there). */
- if (iter->list_node.next != &subtable->list_node) {
- list_splice(iter->list_node.next,
- &subtable->list_node, subtable->list_node.next);
+ /* Move 'subtable' just before 'iter' (unless it's already there). */
+ if (iter != subtable_iter) {
+ cls_subtable_cache_splice(iter, subtable_iter, subtable_iter + 1);
}
}
}
* This function should only be called after removing a rule, not after
* replacing a rule by an identical one or modifying a rule in-place. */
static void
-update_subtables_after_removal(struct classifier *cls,
+update_subtables_after_removal(struct cls_classifier *cls,
struct cls_subtable *subtable,
unsigned int del_priority)
{
- struct cls_subtable *iter;
-
if (del_priority == subtable->max_priority && --subtable->max_count == 0) {
- struct cls_rule *head;
+ struct cls_match *head;
+ struct cls_subtable *table;
+ struct cls_subtable_entry *iter, *subtable_iter = NULL;
subtable->max_priority = 0;
HMAP_FOR_EACH (head, hmap_node, &subtable->rules) {
* 'iter'. If the loop terminates normally, then 'iter' will be the
* list head and we'll move subtable just before that (e.g. to the back
* of the list). */
- iter = subtable;
- LIST_FOR_EACH_CONTINUE (iter, list_node, &cls->subtables_priority) {
- if (iter->max_priority <= subtable->max_priority) {
+ CLS_SUBTABLE_CACHE_FOR_EACH (table, iter, &cls->subtables_priority) {
+ if (table == subtable) {
+ subtable_iter = iter; /* Locate the subtable as we go. */
+ iter->max_priority = subtable->max_priority;
+ } else if (table->max_priority <= subtable->max_priority) {
+ ovs_assert(subtable_iter != NULL);
break;
}
}
/* Move 'subtable' just before 'iter' (unless it's already there). */
- if (iter->list_node.prev != &subtable->list_node) {
- list_splice(&iter->list_node,
- &subtable->list_node, subtable->list_node.next);
+ if (iter != subtable_iter) {
+ cls_subtable_cache_splice(iter, subtable_iter, subtable_iter + 1);
}
}
}
return false;
}
-static inline struct cls_rule *
+/* Returns true if 'target' satisifies 'flow'/'mask', that is, if each bit
+ * for which 'flow', for which 'mask' has a bit set, specifies a particular
+ * value has the correct value in 'target'.
+ *
+ * This function is equivalent to miniflow_equal_flow_in_minimask(flow,
+ * target, mask) but it is faster because of the invariant that
+ * flow->map and mask->masks.map are the same. */
+static inline bool
+miniflow_and_mask_matches_flow(const struct miniflow *flow,
+ const struct minimask *mask,
+ const struct flow *target)
+{
+ const uint32_t *flowp = miniflow_get_u32_values(flow);
+ const uint32_t *maskp = miniflow_get_u32_values(&mask->masks);
+ uint32_t target_u32;
+
+ FLOW_FOR_EACH_IN_MAP(target_u32, target, mask->masks.map) {
+ if ((*flowp++ ^ target_u32) & *maskp++) {
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static inline struct cls_match *
find_match(const struct cls_subtable *subtable, const struct flow *flow,
uint32_t hash)
{
- struct cls_rule *rule;
+ struct cls_match *rule;
HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, hash, &subtable->rules) {
- if (minimatch_matches_flow(&rule->match, flow)) {
+ if (miniflow_and_mask_matches_flow(&rule->flow, &subtable->mask,
+ flow)) {
return rule;
}
}
return NULL;
}
-static struct cls_rule *
+static struct cls_match *
find_match_wc(const struct cls_subtable *subtable, const struct flow *flow,
struct trie_ctx trie_ctx[CLS_MAX_TRIES], unsigned int n_tries,
struct flow_wildcards *wc)
{
uint32_t basis = 0, hash;
- struct cls_rule *rule = NULL;
+ struct cls_match *rule = NULL;
int i;
struct range ofs;
- if (!wc) {
+ if (OVS_UNLIKELY(!wc)) {
return find_match(subtable, flow,
flow_hash_in_minimask(flow, &subtable->mask, 0));
}
* not match, then we know that we will never get a match, but we do
* not yet know how many wildcards we need to fold into 'wc' so we
* continue iterating through indices to find that out. (We won't
- * waste time calling minimatch_matches_flow() again because we've set
- * 'rule' nonnull.)
+ * waste time calling miniflow_and_mask_matches_flow() again because
+ * we've set 'rule' nonnull.)
*
* This check shows a measurable benefit with non-trivial flow tables.
*
* optimization. */
if (!inode->s && !rule) {
ASSIGN_CONTAINER(rule, inode - i, index_nodes);
- if (minimatch_matches_flow(&rule->match, flow)) {
+ if (miniflow_and_mask_matches_flow(&rule->flow, &subtable->mask,
+ flow)) {
goto out;
}
}
* but it didn't match. */
rule = NULL;
}
+ if (!rule && subtable->ports_mask_len) {
+ /* Ports are always part of the final range, if any.
+ * No match was found for the ports. Use the ports trie to figure out
+ * which ports bits to unwildcard. */
+ unsigned int mbits;
+ ovs_be32 value, mask;
+
+ mask = MINIFLOW_GET_BE32(&subtable->mask.masks, tp_src);
+ value = ((OVS_FORCE ovs_be32 *)flow)[TP_PORTS_OFS32] & mask;
+ trie_lookup_value(subtable->ports_trie, &value, &mbits);
+
+ ((OVS_FORCE ovs_be32 *)&wc->masks)[TP_PORTS_OFS32] |=
+ mask & htonl(~0 << (32 - mbits));
+
+ ofs.start = TP_PORTS_OFS32;
+ goto range_out;
+ }
out:
/* Must unwildcard all the fields, as they were looked at. */
flow_wildcards_fold_minimask(wc, &subtable->mask);
return NULL;
}
-static struct cls_rule *
+static struct cls_match *
find_equal(struct cls_subtable *subtable, const struct miniflow *flow,
uint32_t hash)
{
- struct cls_rule *head;
+ struct cls_match *head;
HMAP_FOR_EACH_WITH_HASH (head, hmap_node, hash, &subtable->rules) {
- if (miniflow_equal(&head->match.flow, flow)) {
+ if (miniflow_equal(&head->flow, flow)) {
return head;
}
}
return NULL;
}
-static struct cls_rule *
-insert_rule(struct classifier *cls, struct cls_subtable *subtable,
+static struct cls_match *
+insert_rule(struct cls_classifier *cls, struct cls_subtable *subtable,
struct cls_rule *new)
{
- struct cls_rule *head;
- struct cls_rule *old = NULL;
+ struct cls_match *cls_match = cls_match_alloc(new);
+ struct cls_match *head;
+ struct cls_match *old = NULL;
int i;
uint32_t basis = 0, hash;
uint8_t prev_be32ofs = 0;
for (i = 0; i < subtable->n_indices; i++) {
hash = minimatch_hash_range(&new->match, prev_be32ofs,
subtable->index_ofs[i], &basis);
- hindex_insert(&subtable->indices[i], &new->index_nodes[i], hash);
+ hindex_insert(&subtable->indices[i], &cls_match->index_nodes[i], hash);
prev_be32ofs = subtable->index_ofs[i];
}
hash = minimatch_hash_range(&new->match, prev_be32ofs, FLOW_U32S, &basis);
head = find_equal(subtable, &new->match.flow, hash);
if (!head) {
- hmap_insert(&subtable->rules, &new->hmap_node, hash);
- list_init(&new->list);
+ hmap_insert(&subtable->rules, &cls_match->hmap_node, hash);
+ list_init(&cls_match->list);
goto out;
} else {
/* Scan the list for the insertion point that will keep the list in
* order of decreasing priority. */
- struct cls_rule *rule;
+ struct cls_match *rule;
- new->hmap_node.hash = hash; /* Otherwise done by hmap_insert. */
+ cls_match->hmap_node.hash = hash; /* Otherwise done by hmap_insert. */
FOR_EACH_RULE_IN_LIST (rule, head) {
- if (new->priority >= rule->priority) {
+ if (cls_match->priority >= rule->priority) {
if (rule == head) {
/* 'new' is the new highest-priority flow in the list. */
hmap_replace(&subtable->rules,
- &rule->hmap_node, &new->hmap_node);
+ &rule->hmap_node, &cls_match->hmap_node);
}
- if (new->priority == rule->priority) {
- list_replace(&new->list, &rule->list);
+ if (cls_match->priority == rule->priority) {
+ list_replace(&cls_match->list, &rule->list);
old = rule;
goto out;
} else {
- list_insert(&rule->list, &new->list);
+ list_insert(&rule->list, &cls_match->list);
goto out;
}
}
}
/* Insert 'new' at the end of the list. */
- list_push_back(&head->list, &new->list);
+ list_push_back(&head->list, &cls_match->list);
}
out:
if (!old) {
- update_subtables_after_insertion(cls, subtable, new->priority);
+ update_subtables_after_insertion(cls, subtable, cls_match->priority);
} else {
/* Remove old node from indices. */
for (i = 0; i < subtable->n_indices; i++) {
return old;
}
-static struct cls_rule *
-next_rule_in_list__(struct cls_rule *rule)
+static struct cls_match *
+next_rule_in_list__(struct cls_match *rule)
{
- struct cls_rule *next = OBJECT_CONTAINING(rule->list.next, next, list);
+ struct cls_match *next = OBJECT_CONTAINING(rule->list.next, next, list);
return next;
}
-static struct cls_rule *
-next_rule_in_list(struct cls_rule *rule)
+static struct cls_match *
+next_rule_in_list(struct cls_match *rule)
{
- struct cls_rule *next = next_rule_in_list__(rule);
+ struct cls_match *next = next_rule_in_list__(rule);
return next->priority < rule->priority ? next : NULL;
}
\f
static const ovs_be32 *
minimatch_get_prefix(const struct minimatch *match, const struct mf_field *mf)
{
- return (OVS_FORCE const ovs_be32 *)match->flow.values +
+ return miniflow_get_be32_values(&match->flow) +
count_1bits(match->flow.map & ((UINT64_C(1) << mf->flow_be32ofs) - 1));
}
static void
trie_insert(struct cls_trie *trie, const struct cls_rule *rule, int mlen)
{
- const ovs_be32 *prefix = minimatch_get_prefix(&rule->match, trie->field);
+ trie_insert_prefix(&trie->root,
+ minimatch_get_prefix(&rule->match, trie->field), mlen);
+}
+
+static void
+trie_insert_prefix(struct trie_node **edge, const ovs_be32 *prefix, int mlen)
+{
struct trie_node *node;
- struct trie_node **edge;
int ofs = 0;
/* Walk the tree. */
- for (edge = &trie->root;
- (node = *edge) != NULL;
+ for (; (node = *edge) != NULL;
edge = trie_next_edge(node, prefix, ofs)) {
unsigned int eqbits = trie_prefix_equal_bits(node, prefix, ofs, mlen);
ofs += eqbits;
static void
trie_remove(struct cls_trie *trie, const struct cls_rule *rule, int mlen)
{
- const ovs_be32 *prefix = minimatch_get_prefix(&rule->match, trie->field);
+ trie_remove_prefix(&trie->root,
+ minimatch_get_prefix(&rule->match, trie->field), mlen);
+}
+
+/* 'mlen' must be the (non-zero) CIDR prefix length of the 'trie->field' mask
+ * in 'rule'. */
+static void
+trie_remove_prefix(struct trie_node **root, const ovs_be32 *prefix, int mlen)
+{
struct trie_node *node;
struct trie_node **edges[sizeof(union mf_value) * 8];
int depth = 0, ofs = 0;
/* Walk the tree. */
- for (edges[depth] = &trie->root;
+ for (edges[0] = root;
(node = *edges[depth]) != NULL;
edges[++depth] = trie_next_edge(node, prefix, ofs)) {
unsigned int eqbits = trie_prefix_equal_bits(node, prefix, ofs, mlen);
+
if (eqbits < node->nbits) {
/* Mismatch, nothing to be removed. This should never happen, as
* only rules in the classifier are ever removed. */
/* Needed only for the lock annotation in struct classifier. */
extern struct ovs_mutex ofproto_mutex;
-struct trie_node;
-/* Prefix trie for a 'field' */
-struct cls_trie {
- const struct mf_field *field; /* Trie field, or NULL. */
- struct trie_node *root; /* NULL if none. */
-};
+/* Classifier internal data structures. */
+struct cls_classifier;
+struct cls_subtable;
+struct cls_partition;
+struct cls_match;
enum {
- CLS_MAX_INDICES = 3, /* Maximum number of lookup indices per subtable. */
CLS_MAX_TRIES = 3 /* Maximum number of prefix trees per classifier. */
};
/* A flow classifier. */
struct classifier {
- int n_rules; /* Total number of rules. */
- uint8_t n_flow_segments;
- uint8_t flow_segments[CLS_MAX_INDICES]; /* Flow segment boundaries to use
- * for staged lookup. */
- struct hmap subtables; /* Contains "struct cls_subtable"s. */
- struct list subtables_priority; /* Subtables in descending priority order.
- */
- struct hmap partitions; /* Contains "struct cls_partition"s. */
struct fat_rwlock rwlock OVS_ACQ_AFTER(ofproto_mutex);
- struct cls_trie tries[CLS_MAX_TRIES]; /* Prefix tries. */
- unsigned int n_tries;
-};
-
-/* A set of rules that all have the same fields wildcarded. */
-struct cls_subtable {
- struct hmap_node hmap_node; /* Within struct classifier 'subtables' hmap.
- */
- struct list list_node; /* Within classifier 'subtables_priority' list.
- */
- struct hmap rules; /* Contains "struct cls_rule"s. */
- struct minimask mask; /* Wildcards for fields. */
- int n_rules; /* Number of rules, including duplicates. */
- unsigned int max_priority; /* Max priority of any rule in the subtable. */
- unsigned int max_count; /* Count of max_priority rules. */
- tag_type tag; /* Tag generated from mask for partitioning. */
- uint8_t n_indices; /* How many indices to use. */
- uint8_t index_ofs[CLS_MAX_INDICES]; /* u32 flow segment boundaries. */
- struct hindex indices[CLS_MAX_INDICES]; /* Staged lookup indices. */
- unsigned int trie_plen[CLS_MAX_TRIES]; /* Trie prefix length in 'mask'. */
+ struct cls_classifier *cls;
};
-/* Returns true if 'table' is a "catch-all" subtable that will match every
- * packet (if there is no higher-priority match). */
-static inline bool
-cls_subtable_is_catchall(const struct cls_subtable *subtable)
-{
- return minimask_is_catchall(&subtable->mask);
-}
-
-/* A rule in a "struct cls_subtable". */
+/* A rule to be inserted to the classifier. */
struct cls_rule {
- struct hmap_node hmap_node; /* Within struct cls_subtable 'rules'. */
- struct list list; /* List of identical, lower-priority rules. */
- struct minimatch match; /* Matching rule. */
- unsigned int priority; /* Larger numbers are higher priorities. */
- struct cls_partition *partition;
- struct hindex_node index_nodes[CLS_MAX_INDICES]; /* Within subtable's
- * 'indices'. */
-};
-
-/* Associates a metadata value (that is, a value of the OpenFlow 1.1+ metadata
- * field) with tags for the "cls_subtable"s that contain rules that match that
- * metadata value. */
-struct cls_partition {
- struct hmap_node hmap_node; /* In struct classifier's 'partitions' hmap. */
- ovs_be64 metadata; /* metadata value for this partition. */
- tag_type tags; /* OR of each flow's cls_subtable tag. */
- struct tag_tracker tracker; /* Tracks the bits in 'tags'. */
+ struct minimatch match; /* Matching rule. */
+ unsigned int priority; /* Larger numbers are higher priorities. */
+ struct cls_match *cls_match; /* NULL if rule is not in a classifier. */
};
void cls_rule_init(struct cls_rule *, const struct match *,
const struct flow *,
struct flow_wildcards *)
OVS_REQ_RDLOCK(cls->rwlock);
+struct cls_rule *classifier_lookup_miniflow_first(const struct classifier *cls,
+ const struct miniflow *)
+ OVS_REQ_RDLOCK(cls->rwlock);
bool classifier_rule_overlaps(const struct classifier *cls,
const struct cls_rule *)
OVS_REQ_RDLOCK(cls->rwlock);
/* Iteration. */
struct cls_cursor {
- const struct classifier *cls;
+ const struct cls_classifier *cls;
const struct cls_subtable *subtable;
const struct cls_rule *target;
};
static void f(void)
#endif
+/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
+ * line containing the given address to a CPU cache.
+ * OVS_PREFETCH_WRITE() should be used when the memory is going to be
+ * written to. Depending on the target CPU, this can generate the same
+ * instruction as OVS_PREFETCH(), or bring the data into the cache in an
+ * exclusive state. */
+#if __GNUC__
+#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
+#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
+#else
+#define OVS_PREFETCH(addr)
+#define OVS_PREFETCH_WRITE(addr)
+#endif
+
#endif /* compiler.h */
--- /dev/null
+/*
+ * Copyright (c) 2014 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DAEMON_PRIVATE_H
+#define DAEMON_PRIVATE_H 1
+
+extern bool detach;
+extern char *pidfile;
+
+char *make_pidfile_name(const char *name);
+
+#endif /* daemon-private.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "daemon.h"
+#include "daemon-private.h"
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/wait.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include "command-line.h"
+#include "fatal-signal.h"
+#include "dirs.h"
+#include "lockfile.h"
+#include "ovs-thread.h"
+#include "process.h"
+#include "socket-util.h"
+#include "timeval.h"
+#include "util.h"
+#include "vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(daemon_unix);
+
+/* --detach: Should we run in the background? */
+bool detach; /* Was --detach specified? */
+static bool detached; /* Have we already detached? */
+
+/* --pidfile: Name of pidfile (null if none). */
+char *pidfile;
+
+/* Device and inode of pidfile, so we can avoid reopening it. */
+static dev_t pidfile_dev;
+static ino_t pidfile_ino;
+
+/* --overwrite-pidfile: Create pidfile even if one already exists and is
+ locked? */
+static bool overwrite_pidfile;
+
+/* --no-chdir: Should we chdir to "/"? */
+static bool chdir_ = true;
+
+/* File descriptor used by daemonize_start() and daemonize_complete(). */
+static int daemonize_fd = -1;
+
+/* --monitor: Should a supervisory process monitor the daemon and restart it if
+ * it dies due to an error signal? */
+static bool monitor;
+
+static void check_already_running(void);
+static int lock_pidfile(FILE *, int command);
+static pid_t fork_and_clean_up(void);
+static void daemonize_post_detach(void);
+
+/* Returns the file name that would be used for a pidfile if 'name' were
+ * provided to set_pidfile(). The caller must free the returned string. */
+char *
+make_pidfile_name(const char *name)
+{
+ return (!name
+ ? xasprintf("%s/%s.pid", ovs_rundir(), program_name)
+ : abs_file_name(ovs_rundir(), name));
+}
+
+/* Sets that we do not chdir to "/". */
+void
+set_no_chdir(void)
+{
+ chdir_ = false;
+}
+
+/* Normally, daemonize() or damonize_start() will terminate the program with a
+ * message if a locked pidfile already exists. If this function is called, an
+ * existing pidfile will be replaced, with a warning. */
+void
+ignore_existing_pidfile(void)
+{
+ overwrite_pidfile = true;
+}
+
+/* Sets up a following call to daemonize() to detach from the foreground
+ * session, running this process in the background. */
+void
+set_detach(void)
+{
+ detach = true;
+}
+
+/* Sets up a following call to daemonize() to fork a supervisory process to
+ * monitor the daemon and restart it if it dies due to an error signal. */
+void
+daemon_set_monitor(void)
+{
+ monitor = true;
+}
+
+/* If a pidfile has been configured, creates it and stores the running
+ * process's pid in it. Ensures that the pidfile will be deleted when the
+ * process exits. */
+static void
+make_pidfile(void)
+{
+ long int pid = getpid();
+ struct stat s;
+ char *tmpfile;
+ FILE *file;
+ int error;
+
+ /* Create a temporary pidfile. */
+ if (overwrite_pidfile) {
+ tmpfile = xasprintf("%s.tmp%ld", pidfile, pid);
+ fatal_signal_add_file_to_unlink(tmpfile);
+ } else {
+ /* Everyone shares the same file which will be treated as a lock. To
+ * avoid some uncomfortable race conditions, we can't set up the fatal
+ * signal unlink until we've acquired it. */
+ tmpfile = xasprintf("%s.tmp", pidfile);
+ }
+
+ file = fopen(tmpfile, "a+");
+ if (!file) {
+ VLOG_FATAL("%s: create failed (%s)", tmpfile, ovs_strerror(errno));
+ }
+
+ error = lock_pidfile(file, F_SETLK);
+ if (error) {
+ /* Looks like we failed to acquire the lock. Note that, if we failed
+ * for some other reason (and '!overwrite_pidfile'), we will have
+ * left 'tmpfile' as garbage in the file system. */
+ VLOG_FATAL("%s: fcntl(F_SETLK) failed (%s)", tmpfile,
+ ovs_strerror(error));
+ }
+
+ if (!overwrite_pidfile) {
+ /* We acquired the lock. Make sure to clean up on exit, and verify
+ * that we're allowed to create the actual pidfile. */
+ fatal_signal_add_file_to_unlink(tmpfile);
+ check_already_running();
+ }
+
+ if (fstat(fileno(file), &s) == -1) {
+ VLOG_FATAL("%s: fstat failed (%s)", tmpfile, ovs_strerror(errno));
+ }
+
+ if (ftruncate(fileno(file), 0) == -1) {
+ VLOG_FATAL("%s: truncate failed (%s)", tmpfile, ovs_strerror(errno));
+ }
+
+ fprintf(file, "%ld\n", pid);
+ if (fflush(file) == EOF) {
+ VLOG_FATAL("%s: write failed (%s)", tmpfile, ovs_strerror(errno));
+ }
+
+ error = rename(tmpfile, pidfile);
+
+ /* Due to a race, 'tmpfile' may be owned by a different process, so we
+ * shouldn't delete it on exit. */
+ fatal_signal_remove_file_to_unlink(tmpfile);
+
+ if (error < 0) {
+ VLOG_FATAL("failed to rename \"%s\" to \"%s\" (%s)",
+ tmpfile, pidfile, ovs_strerror(errno));
+ }
+
+ /* Ensure that the pidfile will get deleted on exit. */
+ fatal_signal_add_file_to_unlink(pidfile);
+
+ /* Clean up.
+ *
+ * We don't close 'file' because its file descriptor must remain open to
+ * hold the lock. */
+ pidfile_dev = s.st_dev;
+ pidfile_ino = s.st_ino;
+ free(tmpfile);
+}
+
+/* Calls fork() and on success returns its return value. On failure, logs an
+ * error and exits unsuccessfully.
+ *
+ * Post-fork, but before returning, this function calls a few other functions
+ * that are generally useful if the child isn't planning to exec a new
+ * process. */
+static pid_t
+fork_and_clean_up(void)
+{
+ pid_t pid = xfork();
+ if (pid > 0) {
+ /* Running in parent process. */
+ fatal_signal_fork();
+ } else if (!pid) {
+ /* Running in child process. */
+ lockfile_postfork();
+ }
+ return pid;
+}
+
+/* Forks, then:
+ *
+ * - In the parent, waits for the child to signal that it has completed its
+ * startup sequence. Then stores -1 in '*fdp' and returns the child's pid.
+ *
+ * - In the child, stores a fd in '*fdp' and returns 0. The caller should
+ * pass the fd to fork_notify_startup() after it finishes its startup
+ * sequence.
+ *
+ * If something goes wrong with the fork, logs a critical error and aborts the
+ * process. */
+static pid_t
+fork_and_wait_for_startup(int *fdp)
+{
+ int fds[2];
+ pid_t pid;
+
+ xpipe(fds);
+
+ pid = fork_and_clean_up();
+ if (pid > 0) {
+ /* Running in parent process. */
+ size_t bytes_read;
+ char c;
+
+ close(fds[1]);
+ if (read_fully(fds[0], &c, 1, &bytes_read) != 0) {
+ int retval;
+ int status;
+
+ do {
+ retval = waitpid(pid, &status, 0);
+ } while (retval == -1 && errno == EINTR);
+
+ if (retval == pid) {
+ if (WIFEXITED(status) && WEXITSTATUS(status)) {
+ /* Child exited with an error. Convey the same error
+ * to our parent process as a courtesy. */
+ exit(WEXITSTATUS(status));
+ } else {
+ char *status_msg = process_status_msg(status);
+ VLOG_FATAL("fork child died before signaling startup (%s)",
+ status_msg);
+ }
+ } else if (retval < 0) {
+ VLOG_FATAL("waitpid failed (%s)", ovs_strerror(errno));
+ } else {
+ OVS_NOT_REACHED();
+ }
+ }
+ close(fds[0]);
+ *fdp = -1;
+ } else if (!pid) {
+ /* Running in child process. */
+ close(fds[0]);
+ *fdp = fds[1];
+ }
+
+ return pid;
+}
+
+static void
+fork_notify_startup(int fd)
+{
+ if (fd != -1) {
+ size_t bytes_written;
+ int error;
+
+ error = write_fully(fd, "", 1, &bytes_written);
+ if (error) {
+ VLOG_FATAL("pipe write failed (%s)", ovs_strerror(error));
+ }
+
+ close(fd);
+ }
+}
+
+static bool
+should_restart(int status)
+{
+ if (WIFSIGNALED(status)) {
+ static const int error_signals[] = {
+ /* This list of signals is documented in daemon.man. If you
+ * change the list, update the documentation too. */
+ SIGABRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV,
+ SIGXCPU, SIGXFSZ
+ };
+
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(error_signals); i++) {
+ if (error_signals[i] == WTERMSIG(status)) {
+ return true;
+ }
+ }
+ }
+ return false;
+}
+
+static void
+monitor_daemon(pid_t daemon_pid)
+{
+ /* XXX Should log daemon's stderr output at startup time. */
+ time_t last_restart;
+ char *status_msg;
+ int crashes;
+
+ set_subprogram_name("monitor");
+ status_msg = xstrdup("healthy");
+ last_restart = TIME_MIN;
+ crashes = 0;
+ for (;;) {
+ int retval;
+ int status;
+
+ proctitle_set("monitoring pid %lu (%s)",
+ (unsigned long int) daemon_pid, status_msg);
+
+ do {
+ retval = waitpid(daemon_pid, &status, 0);
+ } while (retval == -1 && errno == EINTR);
+
+ if (retval == -1) {
+ VLOG_FATAL("waitpid failed (%s)", ovs_strerror(errno));
+ } else if (retval == daemon_pid) {
+ char *s = process_status_msg(status);
+ if (should_restart(status)) {
+ free(status_msg);
+ status_msg = xasprintf("%d crashes: pid %lu died, %s",
+ ++crashes,
+ (unsigned long int) daemon_pid, s);
+ free(s);
+
+ if (WCOREDUMP(status)) {
+ /* Disable further core dumps to save disk space. */
+ struct rlimit r;
+
+ r.rlim_cur = 0;
+ r.rlim_max = 0;
+ if (setrlimit(RLIMIT_CORE, &r) == -1) {
+ VLOG_WARN("failed to disable core dumps: %s",
+ ovs_strerror(errno));
+ }
+ }
+
+ /* Throttle restarts to no more than once every 10 seconds. */
+ if (time(NULL) < last_restart + 10) {
+ VLOG_WARN("%s, waiting until 10 seconds since last "
+ "restart", status_msg);
+ for (;;) {
+ time_t now = time(NULL);
+ time_t wakeup = last_restart + 10;
+ if (now >= wakeup) {
+ break;
+ }
+ xsleep(wakeup - now);
+ }
+ }
+ last_restart = time(NULL);
+
+ VLOG_ERR("%s, restarting", status_msg);
+ daemon_pid = fork_and_wait_for_startup(&daemonize_fd);
+ if (!daemon_pid) {
+ break;
+ }
+ } else {
+ VLOG_INFO("pid %lu died, %s, exiting",
+ (unsigned long int) daemon_pid, s);
+ free(s);
+ exit(0);
+ }
+ }
+ }
+ free(status_msg);
+
+ /* Running in new daemon process. */
+ proctitle_restore();
+ set_subprogram_name("");
+}
+
+/* If daemonization is configured, then starts daemonization, by forking and
+ * returning in the child process. The parent process hangs around until the
+ * child lets it know either that it completed startup successfully (by calling
+ * daemon_complete()) or that it failed to start up (by exiting with a nonzero
+ * exit code). */
+void
+daemonize_start(void)
+{
+ assert_single_threaded();
+ daemonize_fd = -1;
+
+ if (detach) {
+ if (fork_and_wait_for_startup(&daemonize_fd) > 0) {
+ /* Running in parent process. */
+ exit(0);
+ }
+
+ /* Running in daemon or monitor process. */
+ setsid();
+ }
+
+ if (monitor) {
+ int saved_daemonize_fd = daemonize_fd;
+ pid_t daemon_pid;
+
+ daemon_pid = fork_and_wait_for_startup(&daemonize_fd);
+ if (daemon_pid > 0) {
+ /* Running in monitor process. */
+ fork_notify_startup(saved_daemonize_fd);
+ close_standard_fds();
+ monitor_daemon(daemon_pid);
+ }
+ /* Running in daemon process. */
+ }
+
+ forbid_forking("running in daemon process");
+
+ if (pidfile) {
+ make_pidfile();
+ }
+
+ /* Make sure that the unixctl commands for vlog get registered in a
+ * daemon, even before the first log message. */
+ vlog_init();
+}
+
+/* If daemonization is configured, then this function notifies the parent
+ * process that the child process has completed startup successfully. It also
+ * call daemonize_post_detach().
+ *
+ * Calling this function more than once has no additional effect. */
+void
+daemonize_complete(void)
+{
+ if (pidfile) {
+ free(pidfile);
+ pidfile = NULL;
+ }
+
+ if (!detached) {
+ detached = true;
+
+ fork_notify_startup(daemonize_fd);
+ daemonize_fd = -1;
+ daemonize_post_detach();
+ }
+}
+
+/* If daemonization is configured, then this function does traditional Unix
+ * daemonization behavior: join a new session, chdir to the root (if not
+ * disabled), and close the standard file descriptors.
+ *
+ * It only makes sense to call this function as part of an implementation of a
+ * special daemon subprocess. A normal daemon should just call
+ * daemonize_complete(). */
+static void
+daemonize_post_detach(void)
+{
+ if (detach) {
+ if (chdir_) {
+ ignore(chdir("/"));
+ }
+ close_standard_fds();
+ }
+}
+
+void
+daemon_usage(void)
+{
+ printf(
+ "\nDaemon options:\n"
+ " --detach run in background as daemon\n"
+ " --no-chdir do not chdir to '/'\n"
+ " --pidfile[=FILE] create pidfile (default: %s/%s.pid)\n"
+ " --overwrite-pidfile with --pidfile, start even if already "
+ "running\n",
+ ovs_rundir(), program_name);
+}
+
+static int
+lock_pidfile__(FILE *file, int command, struct flock *lck)
+{
+ int error;
+
+ lck->l_type = F_WRLCK;
+ lck->l_whence = SEEK_SET;
+ lck->l_start = 0;
+ lck->l_len = 0;
+ lck->l_pid = 0;
+
+ do {
+ error = fcntl(fileno(file), command, lck) == -1 ? errno : 0;
+ } while (error == EINTR);
+ return error;
+}
+
+static int
+lock_pidfile(FILE *file, int command)
+{
+ struct flock lck;
+
+ return lock_pidfile__(file, command, &lck);
+}
+
+static pid_t
+read_pidfile__(const char *pidfile, bool delete_if_stale)
+{
+ struct stat s, s2;
+ struct flock lck;
+ char line[128];
+ FILE *file;
+ int error;
+
+ if ((pidfile_ino || pidfile_dev)
+ && !stat(pidfile, &s)
+ && s.st_ino == pidfile_ino && s.st_dev == pidfile_dev) {
+ /* It's our own pidfile. We can't afford to open it, because closing
+ * *any* fd for a file that a process has locked also releases all the
+ * locks on that file.
+ *
+ * Fortunately, we know the associated pid anyhow: */
+ return getpid();
+ }
+
+ file = fopen(pidfile, "r+");
+ if (!file) {
+ if (errno == ENOENT && delete_if_stale) {
+ return 0;
+ }
+ error = errno;
+ VLOG_WARN("%s: open: %s", pidfile, ovs_strerror(error));
+ goto error;
+ }
+
+ error = lock_pidfile__(file, F_GETLK, &lck);
+ if (error) {
+ VLOG_WARN("%s: fcntl: %s", pidfile, ovs_strerror(error));
+ goto error;
+ }
+ if (lck.l_type == F_UNLCK) {
+ /* pidfile exists but it isn't locked by anyone. We need to delete it
+ * so that a new pidfile can go in its place. But just calling
+ * unlink(pidfile) makes a nasty race: what if someone else unlinks it
+ * before we do and then replaces it by a valid pidfile? We'd unlink
+ * their valid pidfile. We do a little dance to avoid the race, by
+ * locking the invalid pidfile. Only one process can have the invalid
+ * pidfile locked, and only that process has the right to unlink it. */
+ if (!delete_if_stale) {
+ error = ESRCH;
+ VLOG_DBG("%s: pid file is stale", pidfile);
+ goto error;
+ }
+
+ /* Get the lock. */
+ error = lock_pidfile(file, F_SETLK);
+ if (error) {
+ /* We lost a race with someone else doing the same thing. */
+ VLOG_WARN("%s: lost race to lock pidfile", pidfile);
+ goto error;
+ }
+
+ /* Is the file we have locked still named 'pidfile'? */
+ if (stat(pidfile, &s) || fstat(fileno(file), &s2)
+ || s.st_ino != s2.st_ino || s.st_dev != s2.st_dev) {
+ /* No. We lost a race with someone else who got the lock before
+ * us, deleted the pidfile, and closed it (releasing the lock). */
+ error = EALREADY;
+ VLOG_WARN("%s: lost race to delete pidfile", pidfile);
+ goto error;
+ }
+
+ /* We won the right to delete the stale pidfile. */
+ if (unlink(pidfile)) {
+ error = errno;
+ VLOG_WARN("%s: failed to delete stale pidfile (%s)",
+ pidfile, ovs_strerror(error));
+ goto error;
+ }
+ VLOG_DBG("%s: deleted stale pidfile", pidfile);
+ fclose(file);
+ return 0;
+ }
+
+ if (!fgets(line, sizeof line, file)) {
+ if (ferror(file)) {
+ error = errno;
+ VLOG_WARN("%s: read: %s", pidfile, ovs_strerror(error));
+ } else {
+ error = ESRCH;
+ VLOG_WARN("%s: read: unexpected end of file", pidfile);
+ }
+ goto error;
+ }
+
+ if (lck.l_pid != strtoul(line, NULL, 10)) {
+ /* The process that has the pidfile locked is not the process that
+ * created it. It must be stale, with the process that has it locked
+ * preparing to delete it. */
+ error = ESRCH;
+ VLOG_WARN("%s: stale pidfile for pid %s being deleted by pid %ld",
+ pidfile, line, (long int) lck.l_pid);
+ goto error;
+ }
+
+ fclose(file);
+ return lck.l_pid;
+
+error:
+ if (file) {
+ fclose(file);
+ }
+ return -error;
+}
+
+/* Opens and reads a PID from 'pidfile'. Returns the positive PID if
+ * successful, otherwise a negative errno value. */
+pid_t
+read_pidfile(const char *pidfile)
+{
+ return read_pidfile__(pidfile, false);
+}
+
+/* Checks whether a process with the given 'pidfile' is already running and,
+ * if so, aborts. If 'pidfile' is stale, deletes it. */
+static void
+check_already_running(void)
+{
+ long int pid = read_pidfile__(pidfile, true);
+ if (pid > 0) {
+ VLOG_FATAL("%s: already running as pid %ld, aborting", pidfile, pid);
+ } else if (pid < 0) {
+ VLOG_FATAL("%s: pidfile check failed (%s), aborting",
+ pidfile, ovs_strerror(-pid));
+ }
+}
+
+\f
+/* stub functions for non-windows platform. */
+
+void
+service_start(int *argc OVS_UNUSED, char **argv[] OVS_UNUSED)
+{
+}
+
+void
+service_stop(void)
+{
+}
+
+bool
+should_service_stop(void)
+{
+ return false;
+}
#include <config.h>
#include "daemon.h"
+#include "daemon-private.h"
#include <stdio.h>
#include <stdlib.h>
#include "poll-loop.h"
#include "vlog.h"
-VLOG_DEFINE_THIS_MODULE(daemon);
+VLOG_DEFINE_THIS_MODULE(daemon_windows);
-static bool detach; /* Was --service specified? */
-static bool detached; /* Have we already detached? */
+static bool service_create; /* Was --service specified? */
+static bool service_started; /* Have we dispatched service to start? */
/* --service-monitor: Should the service be restarted if it dies
* unexpectedly? */
static bool monitor;
+bool detach; /* Was --detach specified? */
+static bool detached; /* Running as the child process. */
+static HANDLE write_handle; /* End of pipe to write to parent. */
+
+char *pidfile; /* --pidfile: Name of pidfile (null if none). */
+static FILE *filep_pidfile; /* File pointer to access the pidfile. */
+
/* Handle to the Services Manager and the created service. */
static SC_HANDLE manager, service;
static void init_service_status(void);
static void set_config_failure_actions(void);
+static bool detach_process(int argc, char *argv[]);
+
extern int main(int argc, char *argv[]);
void
{NULL, NULL}
};
- /* 'detached' is 'false' when service_start() is called the first time.
- * It is 'true', when it is called the second time by the Windows services
- * manager. */
- if (detached) {
+ /* If one of the command line option is "--detach", we create
+ * a new process in case of parent, wait for child to start and exit.
+ * In case of the child, we just return. We should not be creating a
+ * service in either case. */
+ if (detach_process(argc, argv)) {
+ return;
+ }
+
+ /* 'service_started' is 'false' when service_start() is called the first
+ * time. It is 'true', when it is called the second time by the Windows
+ * services manager. */
+ if (service_started) {
init_service_status();
wevent = CreateEvent(NULL, TRUE, FALSE, NULL);
* options before the call-back from the service control manager. */
for (i = 0; i < argc; i ++) {
if (!strcmp(argv[i], "--service")) {
- detach = true;
+ service_create = true;
} else if (!strcmp(argv[i], "--service-monitor")) {
monitor = true;
}
}
/* If '--service' is not a command line option, run in foreground. */
- if (!detach) {
+ if (!service_create) {
return;
}
* script. */
check_service();
- detached = true;
+ service_started = true;
/* StartServiceCtrlDispatcher blocks and returns after the service is
* stopped. */
bool
should_service_stop(void)
{
- if (detached) {
+ if (service_started) {
if (service_status.dwCurrentState != SERVICE_RUNNING) {
return true;
} else {
}
}
-\f
-/* Stub functions to handle daemonize related calls in non-windows platform. */
-bool
-get_detach()
+/* When a daemon is passed the --detach option, we create a new
+ * process and pass an additional non-documented option called --pipe-handle.
+ * Through this option, the parent passes one end of a pipe handle. */
+void
+set_pipe_handle(const char *pipe_handle)
{
- return false;
+ write_handle = (HANDLE) atoi(pipe_handle);
}
-void
-daemon_save_fd(int fd OVS_UNUSED)
+/* If one of the command line option is "--detach", creates
+ * a new process in case of parent, waits for child to start and exits.
+ * In case of the child, returns. */
+static bool
+detach_process(int argc, char *argv[])
+{
+ SECURITY_ATTRIBUTES sa;
+ STARTUPINFO si;
+ PROCESS_INFORMATION pi;
+ HANDLE read_pipe, write_pipe;
+ char *buffer;
+ int error, i;
+ char ch;
+
+ /* We are only interested in the '--detach' and '--pipe-handle'. */
+ for (i = 0; i < argc; i ++) {
+ if (!strcmp(argv[i], "--detach")) {
+ detach = true;
+ } else if (!strncmp(argv[i], "--pipe-handle", 13)) {
+ /* If running as a child, return. */
+ detached = true;
+ return true;
+ }
+ }
+
+ /* Nothing to do if the option --detach is not set. */
+ if (!detach) {
+ return false;
+ }
+
+ /* Set the security attribute such that a process created will
+ * inherit the pipe handles. */
+ sa.nLength = sizeof(sa);
+ sa.lpSecurityDescriptor = NULL;
+ sa.bInheritHandle = TRUE;
+
+ /* Create an anonymous pipe to communicate with the child. */
+ error = CreatePipe(&read_pipe, &write_pipe, &sa, 0);
+ if (!error) {
+ VLOG_FATAL("CreatePipe failed (%s)", ovs_lasterror_to_string());
+ }
+
+ GetStartupInfo(&si);
+
+ /* To the child, we pass an extra argument '--pipe-handle=write_pipe' */
+ buffer = xasprintf("%s %s=%ld", GetCommandLine(), "--pipe-handle",
+ write_pipe);
+
+ /* Create a detached child */
+ error = CreateProcess(NULL, buffer, NULL, NULL, TRUE, DETACHED_PROCESS,
+ NULL, NULL, &si, &pi);
+ if (!error) {
+ VLOG_FATAL("CreateProcess failed (%s)", ovs_lasterror_to_string());
+ }
+
+ /* Close one end of the pipe in the parent. */
+ CloseHandle(write_pipe);
+
+ /* Block and wait for child to say it is ready. */
+ error = ReadFile(read_pipe, &ch, 1, NULL, NULL);
+ if (!error) {
+ VLOG_FATAL("Failed to read from child (%s)",
+ ovs_lasterror_to_string());
+ }
+ /* The child has successfully started and is ready. */
+ exit(0);
+}
+
+static void
+unlink_pidfile(void)
{
+ if (filep_pidfile) {
+ fclose(filep_pidfile);
+ }
+ if (pidfile) {
+ unlink(pidfile);
+ }
}
-void
-daemonize(void)
+/* If a pidfile has been configured, creates it and stores the running
+ * process's pid in it. Ensures that the pidfile will be deleted when the
+ * process exits. */
+static void
+make_pidfile(void)
{
+ int error;
+
+ error = GetFileAttributes(pidfile);
+ if (error != INVALID_FILE_ATTRIBUTES) {
+ /* pidfile exists. Try to unlink() it. */
+ error = unlink(pidfile);
+ if (error) {
+ VLOG_FATAL("Failed to delete existing pidfile %s (%s)", pidfile,
+ ovs_strerror(errno));
+ }
+ }
+
+ filep_pidfile = fopen(pidfile, "w");
+ if (filep_pidfile == NULL) {
+ VLOG_FATAL("failed to open %s (%s)", pidfile, ovs_strerror(errno));
+ }
+
+ fatal_signal_add_hook(unlink_pidfile, NULL, NULL, true);
+
+ fprintf(filep_pidfile, "%d\n", _getpid());
+ if (fflush(filep_pidfile) == EOF) {
+ VLOG_FATAL("Failed to write into the pidfile %s", pidfile);
+ }
+
+ /* Don't close the pidfile till the process exits. */
}
void daemonize_start(void)
{
+ if (pidfile) {
+ make_pidfile();
+ }
}
void
daemonize_complete(void)
{
+ /* If running as a child because '--detach' option was specified,
+ * communicate with the parent to inform that the child is ready. */
+ if (detached) {
+ int error;
+
+ close_standard_fds();
+
+ error = WriteFile(write_handle, "a", 1, NULL, NULL);
+ if (!error) {
+ VLOG_FATAL("Failed to communicate with the parent (%s)",
+ ovs_lasterror_to_string());
+ }
+ }
+
service_complete();
}
+
+/* Returns the file name that would be used for a pidfile if 'name' were
+ * provided to set_pidfile(). The caller must free the returned string. */
+char *
+make_pidfile_name(const char *name)
+{
+ if (name && strchr(name, ':')) {
+ return strdup(name);
+ } else {
+ return xasprintf("%s/%s.pid", ovs_rundir(), program_name);
+ }
+}
/*
- * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013 Nicira, Inc.
+ * Copyright (c) 2014 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-
#include <config.h>
#include "daemon.h"
+#include "daemon-private.h"
#include <errno.h>
#include <fcntl.h>
-#include <signal.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/resource.h>
-#include <sys/wait.h>
-#include <sys/stat.h>
#include <unistd.h>
-#include "command-line.h"
-#include "fatal-signal.h"
-#include "dirs.h"
-#include "lockfile.h"
-#include "ovs-thread.h"
-#include "process.h"
-#include "socket-util.h"
-#include "timeval.h"
-#include "util.h"
#include "vlog.h"
VLOG_DEFINE_THIS_MODULE(daemon);
-/* --detach: Should we run in the background? */
-static bool detach; /* Was --detach specified? */
-static bool detached; /* Have we already detached? */
-
-/* --pidfile: Name of pidfile (null if none). */
-static char *pidfile;
-
-/* Device and inode of pidfile, so we can avoid reopening it. */
-static dev_t pidfile_dev;
-static ino_t pidfile_ino;
-
-/* --overwrite-pidfile: Create pidfile even if one already exists and is
- locked? */
-static bool overwrite_pidfile;
-
-/* --no-chdir: Should we chdir to "/"? */
-static bool chdir_ = true;
-
-/* File descriptor used by daemonize_start() and daemonize_complete(). */
-static int daemonize_fd = -1;
-
-/* --monitor: Should a supervisory process monitor the daemon and restart it if
- * it dies due to an error signal? */
-static bool monitor;
-
/* For each of the standard file descriptors, whether to replace it by
* /dev/null (if false) or keep it for the daemon to use (if true). */
static bool save_fds[3];
-static void check_already_running(void);
-static int lock_pidfile(FILE *, int command);
-static char *make_pidfile_name(const char *name);
-static pid_t fork_and_clean_up(void);
-static void daemonize_post_detach(void);
+/* Will daemonize() really detach? */
+bool
+get_detach(void)
+{
+ return detach;
+}
-/* Returns the file name that would be used for a pidfile if 'name' were
- * provided to set_pidfile(). The caller must free the returned string. */
-static char *
-make_pidfile_name(const char *name)
+/* If configured with set_pidfile() or set_detach(), creates the pid file and
+ * detaches from the foreground session. */
+void
+daemonize(void)
{
- return (!name
- ? xasprintf("%s/%s.pid", ovs_rundir(), program_name)
- : abs_file_name(ovs_rundir(), name));
+ daemonize_start();
+ daemonize_complete();
}
/* Sets up a following call to daemonize() to create a pidfile named 'name'.
- * If 'name' begins with '/', then it is treated as an absolute path.
- * Otherwise, it is taken relative to RUNDIR, which is $(prefix)/var/run by
- * default.
+ * If 'name' begins with '/' (or contains ':' in windows), then it is treated
+ * as an absolute path. Otherwise, it is taken relative to RUNDIR,
+ * which is $(prefix)/var/run by default.
*
* If 'name' is null, then program_name followed by ".pid" is used. */
void
pidfile = make_pidfile_name(name);
}
-/* Sets that we do not chdir to "/". */
-void
-set_no_chdir(void)
-{
- chdir_ = false;
-}
-
-/* Normally, daemonize() or damonize_start() will terminate the program with a
- * message if a locked pidfile already exists. If this function is called, an
- * existing pidfile will be replaced, with a warning. */
-void
-ignore_existing_pidfile(void)
-{
- overwrite_pidfile = true;
-}
-
-/* Sets up a following call to daemonize() to detach from the foreground
- * session, running this process in the background. */
-void
-set_detach(void)
-{
- detach = true;
-}
-
-/* Will daemonize() really detach? */
-bool
-get_detach(void)
-{
- return detach;
-}
-
-/* Sets up a following call to daemonize() to fork a supervisory process to
- * monitor the daemon and restart it if it dies due to an error signal. */
-void
-daemon_set_monitor(void)
-{
- monitor = true;
-}
-
/* A daemon doesn't normally have any use for the file descriptors for stdin,
* stdout, and stderr after it detaches. To keep these file descriptors from
* e.g. holding an SSH session open, by default detaching replaces each of
save_fds[fd] = true;
}
-/* If a pidfile has been configured, creates it and stores the running
- * process's pid in it. Ensures that the pidfile will be deleted when the
- * process exits. */
-static void
-make_pidfile(void)
-{
- long int pid = getpid();
- struct stat s;
- char *tmpfile;
- FILE *file;
- int error;
-
- /* Create a temporary pidfile. */
- if (overwrite_pidfile) {
- tmpfile = xasprintf("%s.tmp%ld", pidfile, pid);
- fatal_signal_add_file_to_unlink(tmpfile);
- } else {
- /* Everyone shares the same file which will be treated as a lock. To
- * avoid some uncomfortable race conditions, we can't set up the fatal
- * signal unlink until we've acquired it. */
- tmpfile = xasprintf("%s.tmp", pidfile);
- }
-
- file = fopen(tmpfile, "a+");
- if (!file) {
- VLOG_FATAL("%s: create failed (%s)", tmpfile, ovs_strerror(errno));
- }
-
- error = lock_pidfile(file, F_SETLK);
- if (error) {
- /* Looks like we failed to acquire the lock. Note that, if we failed
- * for some other reason (and '!overwrite_pidfile'), we will have
- * left 'tmpfile' as garbage in the file system. */
- VLOG_FATAL("%s: fcntl(F_SETLK) failed (%s)", tmpfile,
- ovs_strerror(error));
- }
-
- if (!overwrite_pidfile) {
- /* We acquired the lock. Make sure to clean up on exit, and verify
- * that we're allowed to create the actual pidfile. */
- fatal_signal_add_file_to_unlink(tmpfile);
- check_already_running();
- }
-
- if (fstat(fileno(file), &s) == -1) {
- VLOG_FATAL("%s: fstat failed (%s)", tmpfile, ovs_strerror(errno));
- }
-
- if (ftruncate(fileno(file), 0) == -1) {
- VLOG_FATAL("%s: truncate failed (%s)", tmpfile, ovs_strerror(errno));
- }
-
- fprintf(file, "%ld\n", pid);
- if (fflush(file) == EOF) {
- VLOG_FATAL("%s: write failed (%s)", tmpfile, ovs_strerror(errno));
- }
-
- error = rename(tmpfile, pidfile);
-
- /* Due to a race, 'tmpfile' may be owned by a different process, so we
- * shouldn't delete it on exit. */
- fatal_signal_remove_file_to_unlink(tmpfile);
-
- if (error < 0) {
- VLOG_FATAL("failed to rename \"%s\" to \"%s\" (%s)",
- tmpfile, pidfile, ovs_strerror(errno));
- }
-
- /* Ensure that the pidfile will get deleted on exit. */
- fatal_signal_add_file_to_unlink(pidfile);
-
- /* Clean up.
- *
- * We don't close 'file' because its file descriptor must remain open to
- * hold the lock. */
- pidfile_dev = s.st_dev;
- pidfile_ino = s.st_ino;
- free(tmpfile);
-}
-
-/* If configured with set_pidfile() or set_detach(), creates the pid file and
- * detaches from the foreground session. */
-void
-daemonize(void)
-{
- daemonize_start();
- daemonize_complete();
-}
-
-/* Calls fork() and on success returns its return value. On failure, logs an
- * error and exits unsuccessfully.
- *
- * Post-fork, but before returning, this function calls a few other functions
- * that are generally useful if the child isn't planning to exec a new
- * process. */
-static pid_t
-fork_and_clean_up(void)
-{
- pid_t pid = xfork();
- if (pid > 0) {
- /* Running in parent process. */
- fatal_signal_fork();
- } else if (!pid) {
- /* Running in child process. */
- lockfile_postfork();
- }
- return pid;
-}
-
-/* Forks, then:
- *
- * - In the parent, waits for the child to signal that it has completed its
- * startup sequence. Then stores -1 in '*fdp' and returns the child's pid.
- *
- * - In the child, stores a fd in '*fdp' and returns 0. The caller should
- * pass the fd to fork_notify_startup() after it finishes its startup
- * sequence.
- *
- * If something goes wrong with the fork, logs a critical error and aborts the
- * process. */
-static pid_t
-fork_and_wait_for_startup(int *fdp)
-{
- int fds[2];
- pid_t pid;
-
- xpipe(fds);
-
- pid = fork_and_clean_up();
- if (pid > 0) {
- /* Running in parent process. */
- size_t bytes_read;
- char c;
-
- close(fds[1]);
- if (read_fully(fds[0], &c, 1, &bytes_read) != 0) {
- int retval;
- int status;
-
- do {
- retval = waitpid(pid, &status, 0);
- } while (retval == -1 && errno == EINTR);
-
- if (retval == pid) {
- if (WIFEXITED(status) && WEXITSTATUS(status)) {
- /* Child exited with an error. Convey the same error
- * to our parent process as a courtesy. */
- exit(WEXITSTATUS(status));
- } else {
- char *status_msg = process_status_msg(status);
- VLOG_FATAL("fork child died before signaling startup (%s)",
- status_msg);
- }
- } else if (retval < 0) {
- VLOG_FATAL("waitpid failed (%s)", ovs_strerror(errno));
- } else {
- OVS_NOT_REACHED();
- }
- }
- close(fds[0]);
- *fdp = -1;
- } else if (!pid) {
- /* Running in child process. */
- close(fds[0]);
- *fdp = fds[1];
- }
-
- return pid;
-}
-
-static void
-fork_notify_startup(int fd)
-{
- if (fd != -1) {
- size_t bytes_written;
- int error;
-
- error = write_fully(fd, "", 1, &bytes_written);
- if (error) {
- VLOG_FATAL("pipe write failed (%s)", ovs_strerror(error));
- }
-
- close(fd);
- }
-}
-
-static bool
-should_restart(int status)
-{
- if (WIFSIGNALED(status)) {
- static const int error_signals[] = {
- /* This list of signals is documented in daemon.man. If you
- * change the list, update the documentation too. */
- SIGABRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV,
- SIGXCPU, SIGXFSZ
- };
-
- size_t i;
-
- for (i = 0; i < ARRAY_SIZE(error_signals); i++) {
- if (error_signals[i] == WTERMSIG(status)) {
- return true;
- }
- }
- }
- return false;
-}
-
-static void
-monitor_daemon(pid_t daemon_pid)
-{
- /* XXX Should log daemon's stderr output at startup time. */
- time_t last_restart;
- char *status_msg;
- int crashes;
-
- set_subprogram_name("monitor");
- status_msg = xstrdup("healthy");
- last_restart = TIME_MIN;
- crashes = 0;
- for (;;) {
- int retval;
- int status;
-
- proctitle_set("monitoring pid %lu (%s)",
- (unsigned long int) daemon_pid, status_msg);
-
- do {
- retval = waitpid(daemon_pid, &status, 0);
- } while (retval == -1 && errno == EINTR);
-
- if (retval == -1) {
- VLOG_FATAL("waitpid failed (%s)", ovs_strerror(errno));
- } else if (retval == daemon_pid) {
- char *s = process_status_msg(status);
- if (should_restart(status)) {
- free(status_msg);
- status_msg = xasprintf("%d crashes: pid %lu died, %s",
- ++crashes,
- (unsigned long int) daemon_pid, s);
- free(s);
-
- if (WCOREDUMP(status)) {
- /* Disable further core dumps to save disk space. */
- struct rlimit r;
-
- r.rlim_cur = 0;
- r.rlim_max = 0;
- if (setrlimit(RLIMIT_CORE, &r) == -1) {
- VLOG_WARN("failed to disable core dumps: %s",
- ovs_strerror(errno));
- }
- }
-
- /* Throttle restarts to no more than once every 10 seconds. */
- if (time(NULL) < last_restart + 10) {
- VLOG_WARN("%s, waiting until 10 seconds since last "
- "restart", status_msg);
- for (;;) {
- time_t now = time(NULL);
- time_t wakeup = last_restart + 10;
- if (now >= wakeup) {
- break;
- }
- xsleep(wakeup - now);
- }
- }
- last_restart = time(NULL);
-
- VLOG_ERR("%s, restarting", status_msg);
- daemon_pid = fork_and_wait_for_startup(&daemonize_fd);
- if (!daemon_pid) {
- break;
- }
- } else {
- VLOG_INFO("pid %lu died, %s, exiting",
- (unsigned long int) daemon_pid, s);
- free(s);
- exit(0);
- }
- }
- }
- free(status_msg);
-
- /* Running in new daemon process. */
- proctitle_restore();
- set_subprogram_name("");
-}
-
/* Returns a readable and writable fd for /dev/null, if successful, otherwise
* a negative errno value. The caller must not close the returned fd (because
* the same fd will be handed out to subsequent callers). */
get_null_fd(void)
{
static int null_fd;
+#ifndef _WIN32
+ char *device = "/dev/null";
+#else
+ char *device = "nul";
+#endif
if (!null_fd) {
- null_fd = open("/dev/null", O_RDWR);
+ null_fd = open(device, O_RDWR);
if (null_fd < 0) {
int error = errno;
- VLOG_ERR("could not open /dev/null: %s", ovs_strerror(error));
+ VLOG_ERR("could not open %s: %s", device, ovs_strerror(error));
null_fd = -error;
}
}
/* Close standard file descriptors (except any that the client has requested we
* leave open by calling daemon_save_fd()). If we're started from e.g. an SSH
* session, then this keeps us from holding that session open artificially. */
-static void
+void
close_standard_fds(void)
{
int null_fd = get_null_fd();
/* Disable logging to stderr to avoid wasting CPU time. */
vlog_set_levels(NULL, VLF_CONSOLE, VLL_OFF);
}
-
-/* If daemonization is configured, then starts daemonization, by forking and
- * returning in the child process. The parent process hangs around until the
- * child lets it know either that it completed startup successfully (by calling
- * daemon_complete()) or that it failed to start up (by exiting with a nonzero
- * exit code). */
-void
-daemonize_start(void)
-{
- assert_single_threaded();
- daemonize_fd = -1;
-
- if (detach) {
- if (fork_and_wait_for_startup(&daemonize_fd) > 0) {
- /* Running in parent process. */
- exit(0);
- }
-
- /* Running in daemon or monitor process. */
- setsid();
- }
-
- if (monitor) {
- int saved_daemonize_fd = daemonize_fd;
- pid_t daemon_pid;
-
- daemon_pid = fork_and_wait_for_startup(&daemonize_fd);
- if (daemon_pid > 0) {
- /* Running in monitor process. */
- fork_notify_startup(saved_daemonize_fd);
- close_standard_fds();
- monitor_daemon(daemon_pid);
- }
- /* Running in daemon process. */
- }
-
- forbid_forking("running in daemon process");
-
- if (pidfile) {
- make_pidfile();
- }
-
- /* Make sure that the unixctl commands for vlog get registered in a
- * daemon, even before the first log message. */
- vlog_init();
-}
-
-/* If daemonization is configured, then this function notifies the parent
- * process that the child process has completed startup successfully. It also
- * call daemonize_post_detach().
- *
- * Calling this function more than once has no additional effect. */
-void
-daemonize_complete(void)
-{
- if (pidfile) {
- free(pidfile);
- pidfile = NULL;
- }
-
- if (!detached) {
- detached = true;
-
- fork_notify_startup(daemonize_fd);
- daemonize_fd = -1;
- daemonize_post_detach();
- }
-}
-
-/* If daemonization is configured, then this function does traditional Unix
- * daemonization behavior: join a new session, chdir to the root (if not
- * disabled), and close the standard file descriptors.
- *
- * It only makes sense to call this function as part of an implementation of a
- * special daemon subprocess. A normal daemon should just call
- * daemonize_complete(). */
-static void
-daemonize_post_detach(void)
-{
- if (detach) {
- if (chdir_) {
- ignore(chdir("/"));
- }
- close_standard_fds();
- }
-}
-
-void
-daemon_usage(void)
-{
- printf(
- "\nDaemon options:\n"
- " --detach run in background as daemon\n"
- " --no-chdir do not chdir to '/'\n"
- " --pidfile[=FILE] create pidfile (default: %s/%s.pid)\n"
- " --overwrite-pidfile with --pidfile, start even if already "
- "running\n",
- ovs_rundir(), program_name);
-}
-
-static int
-lock_pidfile__(FILE *file, int command, struct flock *lck)
-{
- int error;
-
- lck->l_type = F_WRLCK;
- lck->l_whence = SEEK_SET;
- lck->l_start = 0;
- lck->l_len = 0;
- lck->l_pid = 0;
-
- do {
- error = fcntl(fileno(file), command, lck) == -1 ? errno : 0;
- } while (error == EINTR);
- return error;
-}
-
-static int
-lock_pidfile(FILE *file, int command)
-{
- struct flock lck;
-
- return lock_pidfile__(file, command, &lck);
-}
-
-static pid_t
-read_pidfile__(const char *pidfile, bool delete_if_stale)
-{
- struct stat s, s2;
- struct flock lck;
- char line[128];
- FILE *file;
- int error;
-
- if ((pidfile_ino || pidfile_dev)
- && !stat(pidfile, &s)
- && s.st_ino == pidfile_ino && s.st_dev == pidfile_dev) {
- /* It's our own pidfile. We can't afford to open it, because closing
- * *any* fd for a file that a process has locked also releases all the
- * locks on that file.
- *
- * Fortunately, we know the associated pid anyhow: */
- return getpid();
- }
-
- file = fopen(pidfile, "r+");
- if (!file) {
- if (errno == ENOENT && delete_if_stale) {
- return 0;
- }
- error = errno;
- VLOG_WARN("%s: open: %s", pidfile, ovs_strerror(error));
- goto error;
- }
-
- error = lock_pidfile__(file, F_GETLK, &lck);
- if (error) {
- VLOG_WARN("%s: fcntl: %s", pidfile, ovs_strerror(error));
- goto error;
- }
- if (lck.l_type == F_UNLCK) {
- /* pidfile exists but it isn't locked by anyone. We need to delete it
- * so that a new pidfile can go in its place. But just calling
- * unlink(pidfile) makes a nasty race: what if someone else unlinks it
- * before we do and then replaces it by a valid pidfile? We'd unlink
- * their valid pidfile. We do a little dance to avoid the race, by
- * locking the invalid pidfile. Only one process can have the invalid
- * pidfile locked, and only that process has the right to unlink it. */
- if (!delete_if_stale) {
- error = ESRCH;
- VLOG_DBG("%s: pid file is stale", pidfile);
- goto error;
- }
-
- /* Get the lock. */
- error = lock_pidfile(file, F_SETLK);
- if (error) {
- /* We lost a race with someone else doing the same thing. */
- VLOG_WARN("%s: lost race to lock pidfile", pidfile);
- goto error;
- }
-
- /* Is the file we have locked still named 'pidfile'? */
- if (stat(pidfile, &s) || fstat(fileno(file), &s2)
- || s.st_ino != s2.st_ino || s.st_dev != s2.st_dev) {
- /* No. We lost a race with someone else who got the lock before
- * us, deleted the pidfile, and closed it (releasing the lock). */
- error = EALREADY;
- VLOG_WARN("%s: lost race to delete pidfile", pidfile);
- goto error;
- }
-
- /* We won the right to delete the stale pidfile. */
- if (unlink(pidfile)) {
- error = errno;
- VLOG_WARN("%s: failed to delete stale pidfile (%s)",
- pidfile, ovs_strerror(error));
- goto error;
- }
- VLOG_DBG("%s: deleted stale pidfile", pidfile);
- fclose(file);
- return 0;
- }
-
- if (!fgets(line, sizeof line, file)) {
- if (ferror(file)) {
- error = errno;
- VLOG_WARN("%s: read: %s", pidfile, ovs_strerror(error));
- } else {
- error = ESRCH;
- VLOG_WARN("%s: read: unexpected end of file", pidfile);
- }
- goto error;
- }
-
- if (lck.l_pid != strtoul(line, NULL, 10)) {
- /* The process that has the pidfile locked is not the process that
- * created it. It must be stale, with the process that has it locked
- * preparing to delete it. */
- error = ESRCH;
- VLOG_WARN("%s: stale pidfile for pid %s being deleted by pid %ld",
- pidfile, line, (long int) lck.l_pid);
- goto error;
- }
-
- fclose(file);
- return lck.l_pid;
-
-error:
- if (file) {
- fclose(file);
- }
- return -error;
-}
-
-/* Opens and reads a PID from 'pidfile'. Returns the positive PID if
- * successful, otherwise a negative errno value. */
-pid_t
-read_pidfile(const char *pidfile)
-{
- return read_pidfile__(pidfile, false);
-}
-
-/* Checks whether a process with the given 'pidfile' is already running and,
- * if so, aborts. If 'pidfile' is stale, deletes it. */
-static void
-check_already_running(void)
-{
- long int pid = read_pidfile__(pidfile, true);
- if (pid > 0) {
- VLOG_FATAL("%s: already running as pid %ld, aborting", pidfile, pid);
- } else if (pid < 0) {
- VLOG_FATAL("%s: pidfile check failed (%s), aborting",
- pidfile, ovs_strerror(-pid));
- }
-}
-
-\f
-/* stub functions for non-windows platform. */
-
-void
-service_start(int *argc OVS_UNUSED, char **argv[] OVS_UNUSED)
-{
-}
-
-void
-service_stop(void)
-{
-}
-
-bool
-should_service_stop(void)
-{
- return false;
-}
* POSIX platforms and some are applicable only on Windows. As such, the
* function definitions unique to each platform are separated out with
* ifdef macros. More descriptive comments on individual functions are provided
- * in daemon.c (for Linux) and daemon-windows.c (for Windows).
+ * in daemon-unix.c (for POSIX platforms) and daemon-windows.c (for Windows).
* The DAEMON_OPTION_ENUMS, DAEMON_LONG_OPTIONS and DAEMON_OPTION_HANDLERS
* macros are useful for parsing command-line options in individual utilities.
- * For e.g., the command-line option "--detach" is recognized on Linux
- * and results in calling the set_detach() function. The same option is not
- * recognized on Windows platform.
+ * For e.g., the command-line option "--monitor" is recognized on Linux
+ * and results in calling the daemon_set_monitor() function. The same option is
+ * not recognized on Windows platform.
*/
#ifndef _WIN32
void set_detach(void);
void daemon_set_monitor(void);
-void set_pidfile(const char *name);
void set_no_chdir(void);
void ignore_existing_pidfile(void);
pid_t read_pidfile(const char *name);
#else
-#define DAEMON_OPTION_ENUMS \
- OPT_SERVICE, \
+#define DAEMON_OPTION_ENUMS \
+ OPT_DETACH, \
+ OPT_NO_CHDIR, \
+ OPT_PIDFILE, \
+ OPT_PIPE_HANDLE, \
+ OPT_SERVICE, \
OPT_SERVICE_MONITOR
-#define DAEMON_LONG_OPTIONS \
- {"service", no_argument, NULL, OPT_SERVICE}, \
+#define DAEMON_LONG_OPTIONS \
+ {"detach", no_argument, NULL, OPT_DETACH}, \
+ {"no-chdir", no_argument, NULL, OPT_NO_CHDIR}, \
+ {"pidfile", optional_argument, NULL, OPT_PIDFILE}, \
+ {"pipe-handle", required_argument, NULL, OPT_PIPE_HANDLE}, \
+ {"service", no_argument, NULL, OPT_SERVICE}, \
{"service-monitor", no_argument, NULL, OPT_SERVICE_MONITOR}
#define DAEMON_OPTION_HANDLERS \
+ case OPT_DETACH: \
+ break; \
+ \
+ case OPT_NO_CHDIR: \
+ break; \
+ \
+ case OPT_PIDFILE: \
+ set_pidfile(optarg); \
+ break; \
+ \
+ case OPT_PIPE_HANDLE: \
+ set_pipe_handle(optarg); \
+ break; \
+ \
case OPT_SERVICE: \
break; \
\
break;
void control_handler(DWORD request);
+void set_pipe_handle(const char *pipe_handle);
#endif /* _WIN32 */
bool get_detach(void);
void service_start(int *argcp, char **argvp[]);
void service_stop(void);
bool should_service_stop(void);
+void set_pidfile(const char *name);
+void close_standard_fds(void);
#endif /* daemon.h */
#include "dpif-provider.h"
#include "dynamic-string.h"
#include "flow.h"
+#include "fat-rwlock.h"
#include "netdev.h"
#include "netdev-linux.h"
#include "netdev-vport.h"
long long int last_poll; /* Last time this channel was polled. */
};
-static void report_loss(struct dpif *, struct dpif_channel *);
+struct dpif_handler {
+ struct dpif_channel *channels;/* Array of channels for each handler. */
+ struct epoll_event *epoll_events;
+ int epoll_fd; /* epoll fd that includes channel socks. */
+ int n_events; /* Num events returned by epoll_wait(). */
+ int event_offset; /* Offset into 'epoll_events'. */
+};
/* Datapath interface for the openvswitch Linux kernel module. */
struct dpif_linux {
int dp_ifindex;
/* Upcall messages. */
- struct ovs_mutex upcall_lock;
- int uc_array_size; /* Size of 'channels' and 'epoll_events'. */
- struct dpif_channel *channels;
- struct epoll_event *epoll_events;
- int epoll_fd; /* epoll fd that includes channel socks. */
- int n_events; /* Num events returned by epoll_wait(). */
- int event_offset; /* Offset into 'epoll_events'. */
+ struct fat_rwlock upcall_lock;
+ struct dpif_handler *handlers;
+ uint32_t n_handlers; /* Num of upcall handlers. */
+ int uc_array_size; /* Size of 'handler->channels' and */
+ /* 'handler->epoll_events'. */
/* Change notification. */
struct nl_sock *port_notifier; /* vport multicast group subscriber. */
bool refresh_channels;
};
+static void report_loss(struct dpif_linux *, struct dpif_channel *,
+ uint32_t ch_idx, uint32_t handler_id);
+
static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(9999, 5);
/* Generic Netlink family numbers for OVS.
static int open_dpif(const struct dpif_linux_dp *, struct dpif **);
static uint32_t dpif_linux_port_get_pid(const struct dpif *,
odp_port_t port_no, uint32_t hash);
-static int dpif_linux_refresh_channels(struct dpif *);
-
+static int dpif_linux_refresh_channels(struct dpif_linux *,
+ uint32_t n_handlers);
static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *,
struct ofpbuf *);
static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *,
}
dp_request.name = name;
dp_request.user_features |= OVS_DP_F_UNALIGNED;
+ dp_request.user_features |= OVS_DP_F_VPORT_PIDS;
error = dpif_linux_dp_transact(&dp_request, &dp, &buf);
if (error) {
return error;
dpif = xzalloc(sizeof *dpif);
dpif->port_notifier = NULL;
- ovs_mutex_init(&dpif->upcall_lock);
- dpif->epoll_fd = -1;
+ fat_rwlock_init(&dpif->upcall_lock);
dpif_init(&dpif->dpif, &dpif_linux_class, dp->name,
dp->dp_ifindex, dp->dp_ifindex);
return 0;
}
+/* Destroys the netlink sockets pointed by the elements in 'socksp'
+ * and frees the 'socksp'. */
static void
-destroy_channels(struct dpif_linux *dpif)
+vport_del_socksp(struct nl_sock **socksp, uint32_t n_socks)
{
- unsigned int i;
+ size_t i;
- if (dpif->epoll_fd < 0) {
- return;
+ for (i = 0; i < n_socks; i++) {
+ nl_sock_destroy(socksp[i]);
}
- for (i = 0; i < dpif->uc_array_size; i++ ) {
- struct dpif_linux_vport vport_request;
- struct dpif_channel *ch = &dpif->channels[i];
- uint32_t upcall_pid = 0;
+ free(socksp);
+}
- if (!ch->sock) {
- continue;
+/* Creates an array of netlink sockets. Returns an array of the
+ * corresponding pointers. Records the error in 'error'. */
+static struct nl_sock **
+vport_create_socksp(uint32_t n_socks, int *error)
+{
+ struct nl_sock **socksp = xzalloc(n_socks * sizeof *socksp);
+ size_t i;
+
+ for (i = 0; i < n_socks; i++) {
+ *error = nl_sock_create(NETLINK_GENERIC, &socksp[i]);
+ if (*error) {
+ goto error;
}
+ }
- epoll_ctl(dpif->epoll_fd, EPOLL_CTL_DEL, nl_sock_fd(ch->sock), NULL);
+ return socksp;
- /* Turn off upcalls. */
- dpif_linux_vport_init(&vport_request);
- vport_request.cmd = OVS_VPORT_CMD_SET;
- vport_request.dp_ifindex = dpif->dp_ifindex;
- vport_request.port_no = u32_to_odp(i);
- vport_request.upcall_pid = &upcall_pid;
- dpif_linux_vport_transact(&vport_request, NULL, NULL);
+error:
+ vport_del_socksp(socksp, n_socks);
+
+ return NULL;
+}
- nl_sock_destroy(ch->sock);
+/* Given the array of pointers to netlink sockets 'socksp', returns
+ * the array of corresponding pids. If the 'socksp' is NULL, returns
+ * a single-element array of value 0. */
+static uint32_t *
+vport_socksp_to_pids(struct nl_sock **socksp, uint32_t n_socks)
+{
+ uint32_t *pids;
+
+ if (!socksp) {
+ pids = xzalloc(sizeof *pids);
+ } else {
+ size_t i;
+
+ pids = xzalloc(n_socks * sizeof *pids);
+ for (i = 0; i < n_socks; i++) {
+ pids[i] = nl_sock_pid(socksp[i]);
+ }
}
- free(dpif->channels);
- dpif->channels = NULL;
- dpif->uc_array_size = 0;
+ return pids;
+}
+
+/* Given the port number 'port_idx', extracts the pids of netlink sockets
+ * associated to the port and assigns it to 'upcall_pids'. */
+static bool
+vport_get_pids(struct dpif_linux *dpif, uint32_t port_idx,
+ uint32_t **upcall_pids)
+{
+ uint32_t *pids;
+ size_t i;
- free(dpif->epoll_events);
- dpif->epoll_events = NULL;
- dpif->n_events = dpif->event_offset = 0;
+ /* Since the nl_sock can only be assigned in either all
+ * or none "dpif->handlers" channels, the following check
+ * would suffice. */
+ if (!dpif->handlers[0].channels[port_idx].sock) {
+ return false;
+ }
- /* Don't close dpif->epoll_fd since that would cause other threads that
- * call dpif_recv_wait() to wait on an arbitrary fd or a closed fd. */
+ pids = xzalloc(dpif->n_handlers * sizeof *pids);
+
+ for (i = 0; i < dpif->n_handlers; i++) {
+ pids[i] = nl_sock_pid(dpif->handlers[i].channels[port_idx].sock);
+ }
+
+ *upcall_pids = pids;
+
+ return true;
}
static int
-add_channel(struct dpif_linux *dpif, odp_port_t port_no, struct nl_sock *sock)
+vport_add_channels(struct dpif_linux *dpif, odp_port_t port_no,
+ struct nl_sock **socksp)
{
struct epoll_event event;
uint32_t port_idx = odp_to_u32(port_no);
+ size_t i, j;
+ int error;
- if (dpif->epoll_fd < 0) {
+ if (dpif->handlers == NULL) {
return 0;
}
- /* We assume that the datapath densely chooses port numbers, which
- * can therefore be used as an index into an array of channels. */
+ /* We assume that the datapath densely chooses port numbers, which can
+ * therefore be used as an index into 'channels' and 'epoll_events' of
+ * 'dpif->handler'. */
if (port_idx >= dpif->uc_array_size) {
uint32_t new_size = port_idx + 1;
- uint32_t i;
if (new_size > MAX_PORTS) {
VLOG_WARN_RL(&error_rl, "%s: datapath port %"PRIu32" too big",
return EFBIG;
}
- dpif->channels = xrealloc(dpif->channels,
- new_size * sizeof *dpif->channels);
- for (i = dpif->uc_array_size; i < new_size; i++) {
- dpif->channels[i].sock = NULL;
- }
+ for (i = 0; i < dpif->n_handlers; i++) {
+ struct dpif_handler *handler = &dpif->handlers[i];
- dpif->epoll_events = xrealloc(dpif->epoll_events,
- new_size * sizeof *dpif->epoll_events);
+ handler->channels = xrealloc(handler->channels,
+ new_size * sizeof *handler->channels);
+
+ for (j = dpif->uc_array_size; j < new_size; j++) {
+ handler->channels[j].sock = NULL;
+ }
+
+ handler->epoll_events = xrealloc(handler->epoll_events,
+ new_size * sizeof *handler->epoll_events);
+
+ }
dpif->uc_array_size = new_size;
}
memset(&event, 0, sizeof event);
event.events = EPOLLIN;
event.data.u32 = port_idx;
- if (epoll_ctl(dpif->epoll_fd, EPOLL_CTL_ADD, nl_sock_fd(sock),
- &event) < 0) {
- return errno;
- }
- nl_sock_destroy(dpif->channels[port_idx].sock);
- dpif->channels[port_idx].sock = sock;
- dpif->channels[port_idx].last_poll = LLONG_MIN;
+ for (i = 0; i < dpif->n_handlers; i++) {
+ struct dpif_handler *handler = &dpif->handlers[i];
+
+ if (epoll_ctl(handler->epoll_fd, EPOLL_CTL_ADD, nl_sock_fd(socksp[i]),
+ &event) < 0) {
+ error = errno;
+ goto error;
+ }
+ dpif->handlers[i].channels[port_idx].sock = socksp[i];
+ dpif->handlers[i].channels[port_idx].last_poll = LLONG_MIN;
+ }
return 0;
+
+error:
+ for (j = 0; j < i; j++) {
+ epoll_ctl(dpif->handlers[j].epoll_fd, EPOLL_CTL_DEL,
+ nl_sock_fd(socksp[j]), NULL);
+ dpif->handlers[j].channels[port_idx].sock = NULL;
+ }
+
+ return error;
}
static void
-del_channel(struct dpif_linux *dpif, odp_port_t port_no)
+vport_del_channels(struct dpif_linux *dpif, odp_port_t port_no)
{
- struct dpif_channel *ch;
uint32_t port_idx = odp_to_u32(port_no);
+ size_t i;
- if (dpif->epoll_fd < 0 || port_idx >= dpif->uc_array_size) {
+ if (!dpif->handlers || port_idx >= dpif->uc_array_size) {
return;
}
- ch = &dpif->channels[port_idx];
- if (!ch->sock) {
+ /* Since the sock can only be assigned in either all or none
+ * of "dpif->handlers" channels, the following check would
+ * suffice. */
+ if (!dpif->handlers[0].channels[port_idx].sock) {
return;
}
- epoll_ctl(dpif->epoll_fd, EPOLL_CTL_DEL, nl_sock_fd(ch->sock), NULL);
- dpif->event_offset = dpif->n_events = 0;
+ for (i = 0; i < dpif->n_handlers; i++) {
+ struct dpif_handler *handler = &dpif->handlers[i];
- nl_sock_destroy(ch->sock);
- ch->sock = NULL;
+ epoll_ctl(handler->epoll_fd, EPOLL_CTL_DEL,
+ nl_sock_fd(handler->channels[port_idx].sock), NULL);
+ nl_sock_destroy(handler->channels[port_idx].sock);
+ handler->channels[port_idx].sock = NULL;
+ handler->event_offset = handler->n_events = 0;
+ }
+}
+
+static void
+destroy_all_channels(struct dpif_linux *dpif) OVS_REQ_WRLOCK(dpif->upcall_lock)
+{
+ unsigned int i;
+
+ if (!dpif->handlers) {
+ return;
+ }
+
+ for (i = 0; i < dpif->uc_array_size; i++ ) {
+ struct dpif_linux_vport vport_request;
+ uint32_t upcall_pids = 0;
+
+ /* Since the sock can only be assigned in either all or none
+ * of "dpif->handlers" channels, the following check would
+ * suffice. */
+ if (!dpif->handlers[0].channels[i].sock) {
+ continue;
+ }
+
+ /* Turn off upcalls. */
+ dpif_linux_vport_init(&vport_request);
+ vport_request.cmd = OVS_VPORT_CMD_SET;
+ vport_request.dp_ifindex = dpif->dp_ifindex;
+ vport_request.port_no = u32_to_odp(i);
+ vport_request.upcall_pids = &upcall_pids;
+ dpif_linux_vport_transact(&vport_request, NULL, NULL);
+
+ vport_del_channels(dpif, u32_to_odp(i));
+ }
+
+ for (i = 0; i < dpif->n_handlers; i++) {
+ struct dpif_handler *handler = &dpif->handlers[i];
+
+ close(handler->epoll_fd);
+ free(handler->epoll_events);
+ free(handler->channels);
+ }
+
+ free(dpif->handlers);
+ dpif->handlers = NULL;
+ dpif->n_handlers = 0;
+ dpif->uc_array_size = 0;
}
static void
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
nl_sock_destroy(dpif->port_notifier);
- destroy_channels(dpif);
- if (dpif->epoll_fd >= 0) {
- close(dpif->epoll_fd);
- }
- ovs_mutex_destroy(&dpif->upcall_lock);
+
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ destroy_all_channels(dpif);
+ fat_rwlock_unlock(&dpif->upcall_lock);
+
+ fat_rwlock_destroy(&dpif->upcall_lock);
free(dpif);
}
dpif_linux_run(struct dpif *dpif_)
{
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
if (dpif->refresh_channels) {
dpif->refresh_channels = false;
- dpif_linux_refresh_channels(dpif_);
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ dpif_linux_refresh_channels(dpif, dpif->n_handlers);
+ fat_rwlock_unlock(&dpif->upcall_lock);
}
}
}
static int
-dpif_linux_port_add__(struct dpif *dpif_, struct netdev *netdev,
+dpif_linux_port_add__(struct dpif_linux *dpif, struct netdev *netdev,
odp_port_t *port_nop)
+ OVS_REQ_WRLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
const struct netdev_tunnel_config *tnl_cfg;
char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
const char *name = netdev_vport_get_dpif_port(netdev,
namebuf, sizeof namebuf);
const char *type = netdev_get_type(netdev);
struct dpif_linux_vport request, reply;
- struct nl_sock *sock = NULL;
- uint32_t upcall_pid;
struct ofpbuf *buf;
uint64_t options_stub[64 / 8];
struct ofpbuf options;
- int error;
+ struct nl_sock **socksp = NULL;
+ uint32_t *upcall_pids;
+ int error = 0;
- if (dpif->epoll_fd >= 0) {
- error = nl_sock_create(NETLINK_GENERIC, &sock);
- if (error) {
+ if (dpif->handlers) {
+ socksp = vport_create_socksp(dpif->n_handlers, &error);
+ if (!socksp) {
return error;
}
}
if (request.type == OVS_VPORT_TYPE_UNSPEC) {
VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has "
"unsupported type `%s'",
- dpif_name(dpif_), name, type);
- nl_sock_destroy(sock);
+ dpif_name(&dpif->dpif), name, type);
+ vport_del_socksp(socksp, dpif->n_handlers);
return EINVAL;
}
request.name = name;
}
request.port_no = *port_nop;
- upcall_pid = sock ? nl_sock_pid(sock) : 0;
- request.upcall_pid = &upcall_pid;
+ upcall_pids = vport_socksp_to_pids(socksp, dpif->n_handlers);
+ request.n_upcall_pids = socksp ? dpif->n_handlers : 1;
+ request.upcall_pids = upcall_pids;
error = dpif_linux_vport_transact(&request, &reply, &buf);
if (!error) {
*port_nop = reply.port_no;
- VLOG_DBG("%s: assigning port %"PRIu32" to netlink pid %"PRIu32,
- dpif_name(dpif_), reply.port_no, upcall_pid);
} else {
if (error == EBUSY && *port_nop != ODPP_NONE) {
VLOG_INFO("%s: requested port %"PRIu32" is in use",
- dpif_name(dpif_), *port_nop);
+ dpif_name(&dpif->dpif), *port_nop);
}
- nl_sock_destroy(sock);
- ofpbuf_delete(buf);
- return error;
+
+ vport_del_socksp(socksp, dpif->n_handlers);
+ goto exit;
}
- ofpbuf_delete(buf);
- if (sock) {
- error = add_channel(dpif, *port_nop, sock);
+ if (socksp) {
+ error = vport_add_channels(dpif, *port_nop, socksp);
if (error) {
VLOG_INFO("%s: could not add channel for port %s",
- dpif_name(dpif_), name);
+ dpif_name(&dpif->dpif), name);
/* Delete the port. */
dpif_linux_vport_init(&request);
request.dp_ifindex = dpif->dp_ifindex;
request.port_no = *port_nop;
dpif_linux_vport_transact(&request, NULL, NULL);
-
- nl_sock_destroy(sock);
- return error;
+ vport_del_socksp(socksp, dpif->n_handlers);
+ goto exit;
}
}
+ free(socksp);
- return 0;
+exit:
+ ofpbuf_delete(buf);
+ free(upcall_pids);
+
+ return error;
}
static int
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
int error;
- ovs_mutex_lock(&dpif->upcall_lock);
- error = dpif_linux_port_add__(dpif_, netdev, port_nop);
- ovs_mutex_unlock(&dpif->upcall_lock);
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ error = dpif_linux_port_add__(dpif, netdev, port_nop);
+ fat_rwlock_unlock(&dpif->upcall_lock);
return error;
}
static int
-dpif_linux_port_del__(struct dpif *dpif_, odp_port_t port_no)
+dpif_linux_port_del__(struct dpif_linux *dpif, odp_port_t port_no)
+ OVS_REQ_WRLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_vport vport;
int error;
vport.port_no = port_no;
error = dpif_linux_vport_transact(&vport, NULL, NULL);
- del_channel(dpif, port_no);
+ vport_del_channels(dpif, port_no);
return error;
}
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
int error;
- ovs_mutex_lock(&dpif->upcall_lock);
- error = dpif_linux_port_del__(dpif_, port_no);
- ovs_mutex_unlock(&dpif->upcall_lock);
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ error = dpif_linux_port_del__(dpif, port_no);
+ fat_rwlock_unlock(&dpif->upcall_lock);
return error;
}
static int
-dpif_linux_port_query__(const struct dpif *dpif, odp_port_t port_no,
+dpif_linux_port_query__(const struct dpif_linux *dpif, odp_port_t port_no,
const char *port_name, struct dpif_port *dpif_port)
{
struct dpif_linux_vport request;
dpif_linux_vport_init(&request);
request.cmd = OVS_VPORT_CMD_GET;
- request.dp_ifindex = dpif_linux_cast(dpif)->dp_ifindex;
+ request.dp_ifindex = dpif->dp_ifindex;
request.port_no = port_no;
request.name = port_name;
}
static int
-dpif_linux_port_query_by_number(const struct dpif *dpif, odp_port_t port_no,
+dpif_linux_port_query_by_number(const struct dpif *dpif_, odp_port_t port_no,
struct dpif_port *dpif_port)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
return dpif_linux_port_query__(dpif, port_no, NULL, dpif_port);
}
static int
-dpif_linux_port_query_by_name(const struct dpif *dpif, const char *devname,
+dpif_linux_port_query_by_name(const struct dpif *dpif_, const char *devname,
struct dpif_port *dpif_port)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
return dpif_linux_port_query__(dpif, 0, devname, dpif_port);
}
static uint32_t
-dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no,
- uint32_t hash OVS_UNUSED)
+dpif_linux_port_get_pid__(const struct dpif_linux *dpif, odp_port_t port_no,
+ uint32_t hash)
+ OVS_REQ_RDLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
uint32_t port_idx = odp_to_u32(port_no);
uint32_t pid = 0;
- ovs_mutex_lock(&dpif->upcall_lock);
- if (dpif->epoll_fd >= 0) {
+ if (dpif->handlers) {
/* The ODPP_NONE "reserved" port number uses the "ovs-system"'s
* channel, since it is not heavily loaded. */
uint32_t idx = port_idx >= dpif->uc_array_size ? 0 : port_idx;
- const struct nl_sock *sock = dpif->channels[idx].sock;
- pid = sock ? nl_sock_pid(sock) : 0;
+ struct dpif_handler *h = &dpif->handlers[hash % dpif->n_handlers];
+
+ pid = nl_sock_pid(h->channels[idx].sock);
}
- ovs_mutex_unlock(&dpif->upcall_lock);
return pid;
}
+static uint32_t
+dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no,
+ uint32_t hash)
+{
+ const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ uint32_t ret;
+
+ fat_rwlock_rdlock(&dpif->upcall_lock);
+ ret = dpif_linux_port_get_pid__(dpif, port_no, hash);
+ fat_rwlock_unlock(&dpif->upcall_lock);
+
+ return ret;
+}
+
static int
dpif_linux_flow_flush(struct dpif *dpif_)
{
};
static void
-dpif_linux_port_dump_start__(const struct dpif *dpif_, struct nl_dump *dump)
+dpif_linux_port_dump_start__(const struct dpif_linux *dpif,
+ struct nl_dump *dump)
{
- const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_vport request;
struct ofpbuf *buf;
}
static int
-dpif_linux_port_dump_start(const struct dpif *dpif, void **statep)
+dpif_linux_port_dump_start(const struct dpif *dpif_, void **statep)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_port_state *state;
*statep = state = xmalloc(sizeof *state);
}
static int
-dpif_linux_port_dump_next__(const struct dpif *dpif_, struct nl_dump *dump,
+dpif_linux_port_dump_next__(const struct dpif_linux *dpif, struct nl_dump *dump,
struct dpif_linux_vport *vport,
struct ofpbuf *buffer)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct ofpbuf buf;
int error;
}
static int
-dpif_linux_port_dump_next(const struct dpif *dpif OVS_UNUSED, void *state_,
+dpif_linux_port_dump_next(const struct dpif *dpif_, void *state_,
struct dpif_port *dpif_port)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_port_state *state = state_;
struct dpif_linux_vport vport;
int error;
|| vport.cmd == OVS_VPORT_CMD_SET)) {
VLOG_DBG("port_changed: dpif:%s vport:%s cmd:%"PRIu8,
dpif->dpif.full_name, vport.name, vport.cmd);
- if (vport.cmd == OVS_VPORT_CMD_DEL) {
+ if (vport.cmd == OVS_VPORT_CMD_DEL && dpif->handlers) {
dpif->refresh_channels = true;
}
*devnamep = xstrdup(vport.name);
}
static int
-dpif_linux_flow_get__(const struct dpif *dpif_,
+dpif_linux_flow_get__(const struct dpif_linux *dpif,
const struct nlattr *key, size_t key_len,
struct dpif_linux_flow *reply, struct ofpbuf **bufp)
{
- const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_flow request;
dpif_linux_flow_init(&request);
const struct nlattr *key, size_t key_len,
struct ofpbuf **actionsp, struct dpif_flow_stats *stats)
{
+ const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_flow reply;
struct ofpbuf *buf;
int error;
- error = dpif_linux_flow_get__(dpif_, key, key_len, &reply, &buf);
+ error = dpif_linux_flow_get__(dpif, key, key_len, &reply, &buf);
if (!error) {
if (stats) {
dpif_linux_flow_get_stats(&reply, stats);
}
static void
-dpif_linux_init_flow_put(struct dpif *dpif_, const struct dpif_flow_put *put,
+dpif_linux_init_flow_put(struct dpif_linux *dpif, const struct dpif_flow_put *put,
struct dpif_linux_flow *request)
{
static const struct nlattr dummy_action;
- const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
-
dpif_linux_flow_init(request);
request->cmd = (put->flags & DPIF_FP_CREATE
? OVS_FLOW_CMD_NEW : OVS_FLOW_CMD_SET);
static int
dpif_linux_flow_put(struct dpif *dpif_, const struct dpif_flow_put *put)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_flow request, reply;
struct ofpbuf *buf;
int error;
- dpif_linux_init_flow_put(dpif_, put, &request);
+ dpif_linux_init_flow_put(dpif, put, &request);
error = dpif_linux_flow_transact(&request,
put->stats ? &reply : NULL,
put->stats ? &buf : NULL);
}
static void
-dpif_linux_init_flow_del(struct dpif *dpif_, const struct dpif_flow_del *del,
+dpif_linux_init_flow_del(struct dpif_linux *dpif, const struct dpif_flow_del *del,
struct dpif_linux_flow *request)
{
- const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
-
dpif_linux_flow_init(request);
request->cmd = OVS_FLOW_CMD_DEL;
request->dp_ifindex = dpif->dp_ifindex;
static int
dpif_linux_flow_del(struct dpif *dpif_, const struct dpif_flow_del *del)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_flow request, reply;
struct ofpbuf *buf;
int error;
- dpif_linux_init_flow_del(dpif_, del, &request);
+ dpif_linux_init_flow_del(dpif, del, &request);
error = dpif_linux_flow_transact(&request,
del->stats ? &reply : NULL,
del->stats ? &buf : NULL);
const struct nlattr **actions, size_t *actions_len,
const struct dpif_flow_stats **stats)
{
+ const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct dpif_linux_flow_iter *iter = iter_;
struct dpif_linux_flow_state *state = state_;
struct ofpbuf buf;
}
if (actions && !state->flow.actions) {
- error = dpif_linux_flow_get__(dpif_, state->flow.key,
+ error = dpif_linux_flow_get__(dpif, state->flow.key,
state->flow.key_len,
&state->flow, &state->tmp);
if (error == ENOENT) {
#define MAX_OPS 50
static void
-dpif_linux_operate__(struct dpif *dpif_, struct dpif_op **ops, size_t n_ops)
+dpif_linux_operate__(struct dpif_linux *dpif, struct dpif_op **ops, size_t n_ops)
{
- const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
struct op_auxdata {
struct nl_transaction txn;
switch (op->type) {
case DPIF_OP_FLOW_PUT:
put = &op->u.flow_put;
- dpif_linux_init_flow_put(dpif_, put, &flow);
+ dpif_linux_init_flow_put(dpif, put, &flow);
if (put->stats) {
flow.nlmsg_flags |= NLM_F_ECHO;
aux->txn.reply = &aux->reply;
case DPIF_OP_FLOW_DEL:
del = &op->u.flow_del;
- dpif_linux_init_flow_del(dpif_, del, &flow);
+ dpif_linux_init_flow_del(dpif, del, &flow);
if (del->stats) {
flow.nlmsg_flags |= NLM_F_ECHO;
aux->txn.reply = &aux->reply;
}
static void
-dpif_linux_operate(struct dpif *dpif, struct dpif_op **ops, size_t n_ops)
+dpif_linux_operate(struct dpif *dpif_, struct dpif_op **ops, size_t n_ops)
{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
while (n_ops > 0) {
size_t chunk = MIN(n_ops, MAX_OPS);
dpif_linux_operate__(dpif, ops, chunk);
}
}
-/* Synchronizes 'dpif->channels' with the set of vports currently in 'dpif' in
- * the kernel, by adding a new channel for any kernel vport that lacks one and
- * deleting any channels that have no backing kernel vports. */
+/* Synchronizes 'channels' in 'dpif->handlers' with the set of vports
+ * currently in 'dpif' in the kernel, by adding a new set of channels for
+ * any kernel vport that lacks one and deleting any channels that have no
+ * backing kernel vports. */
static int
-dpif_linux_refresh_channels(struct dpif *dpif_)
+dpif_linux_refresh_channels(struct dpif_linux *dpif, uint32_t n_handlers)
+ OVS_REQ_WRLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
unsigned long int *keep_channels;
struct dpif_linux_vport vport;
size_t keep_channels_nbits;
int retval = 0;
size_t i;
- /* To start with, we need an epoll fd. */
- if (dpif->epoll_fd < 0) {
- dpif->epoll_fd = epoll_create(10);
- if (dpif->epoll_fd < 0) {
- return errno;
+ if (dpif->n_handlers != n_handlers) {
+ destroy_all_channels(dpif);
+ dpif->handlers = xzalloc(n_handlers * sizeof *dpif->handlers);
+ for (i = 0; i < n_handlers; i++) {
+ struct dpif_handler *handler = &dpif->handlers[i];
+
+ handler->epoll_fd = epoll_create(10);
+ if (handler->epoll_fd < 0) {
+ size_t j;
+
+ for (j = 0; j < i; j++) {
+ close(dpif->handlers[j].epoll_fd);
+ }
+ free(dpif->handlers);
+ dpif->handlers = NULL;
+
+ return errno;
+ }
}
+ dpif->n_handlers = n_handlers;
+ }
+
+ for (i = 0; i < n_handlers; i++) {
+ struct dpif_handler *handler = &dpif->handlers[i];
+
+ handler->event_offset = handler->n_events = 0;
}
keep_channels_nbits = dpif->uc_array_size;
keep_channels = bitmap_allocate(keep_channels_nbits);
- dpif->n_events = dpif->event_offset = 0;
-
ofpbuf_use_stub(&buf, reply_stub, sizeof reply_stub);
- dpif_linux_port_dump_start__(dpif_, &dump);
- while (!dpif_linux_port_dump_next__(dpif_, &dump, &vport, &buf)) {
+ dpif_linux_port_dump_start__(dpif, &dump);
+ while (!dpif_linux_port_dump_next__(dpif, &dump, &vport, &buf)) {
uint32_t port_no = odp_to_u32(vport.port_no);
- struct nl_sock *sock = (port_no < dpif->uc_array_size
- ? dpif->channels[port_no].sock
- : NULL);
- bool new_sock = !sock;
+ uint32_t *upcall_pids = NULL;
int error;
- if (new_sock) {
- error = nl_sock_create(NETLINK_GENERIC, &sock);
+ if (port_no >= dpif->uc_array_size
+ || !vport_get_pids(dpif, port_no, &upcall_pids)) {
+ struct nl_sock **socksp = vport_create_socksp(dpif->n_handlers,
+ &error);
+
+ if (!socksp) {
+ goto error;
+ }
+
+ error = vport_add_channels(dpif, vport.port_no, socksp);
if (error) {
+ VLOG_INFO("%s: could not add channels for port %s",
+ dpif_name(&dpif->dpif), vport.name);
+ vport_del_socksp(socksp, dpif->n_handlers);
retval = error;
goto error;
}
+ upcall_pids = vport_socksp_to_pids(socksp, dpif->n_handlers);
+ free(socksp);
}
/* Configure the vport to deliver misses to 'sock'. */
- if (!vport.upcall_pid || *vport.upcall_pid != nl_sock_pid(sock)) {
- uint32_t upcall_pid = nl_sock_pid(sock);
+ if (vport.upcall_pids[0] == 0
+ || vport.n_upcall_pids != dpif->n_handlers
+ || memcmp(upcall_pids, vport.upcall_pids, n_handlers * sizeof
+ *upcall_pids)) {
struct dpif_linux_vport vport_request;
dpif_linux_vport_init(&vport_request);
vport_request.cmd = OVS_VPORT_CMD_SET;
vport_request.dp_ifindex = dpif->dp_ifindex;
vport_request.port_no = vport.port_no;
- vport_request.upcall_pid = &upcall_pid;
+ vport_request.n_upcall_pids = dpif->n_handlers;
+ vport_request.upcall_pids = upcall_pids;
error = dpif_linux_vport_transact(&vport_request, NULL, NULL);
- if (!error) {
- VLOG_DBG("%s: assigning port %"PRIu32" to netlink pid %"PRIu32,
- dpif_name(&dpif->dpif), vport_request.port_no,
- upcall_pid);
- } else {
+ if (error) {
VLOG_WARN_RL(&error_rl,
"%s: failed to set upcall pid on port: %s",
dpif_name(&dpif->dpif), ovs_strerror(error));
}
}
- if (new_sock) {
- error = add_channel(dpif, vport.port_no, sock);
- if (error) {
- VLOG_INFO("%s: could not add channel for port %s",
- dpif_name(dpif_), vport.name);
- retval = error;
- goto error;
- }
- }
-
if (port_no < keep_channels_nbits) {
bitmap_set1(keep_channels, port_no);
}
+ free(upcall_pids);
continue;
error:
- nl_sock_destroy(sock);
+ free(upcall_pids);
+ vport_del_channels(dpif, vport.port_no);
}
nl_dump_done(&dump);
ofpbuf_uninit(&buf);
/* Discard any saved channels that we didn't reuse. */
for (i = 0; i < keep_channels_nbits; i++) {
if (!bitmap_is_set(keep_channels, i)) {
- nl_sock_destroy(dpif->channels[i].sock);
- dpif->channels[i].sock = NULL;
+ vport_del_channels(dpif, u32_to_odp(i));
}
}
free(keep_channels);
}
static int
-dpif_linux_recv_set__(struct dpif *dpif_, bool enable)
+dpif_linux_recv_set__(struct dpif_linux *dpif, bool enable)
+ OVS_REQ_WRLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
-
- if ((dpif->epoll_fd >= 0) == enable) {
+ if ((dpif->handlers != NULL) == enable) {
return 0;
} else if (!enable) {
- destroy_channels(dpif);
+ destroy_all_channels(dpif);
return 0;
} else {
- return dpif_linux_refresh_channels(dpif_);
+ return dpif_linux_refresh_channels(dpif, 1);
}
}
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
int error;
- ovs_mutex_lock(&dpif->upcall_lock);
- error = dpif_linux_recv_set__(dpif_, enable);
- ovs_mutex_unlock(&dpif->upcall_lock);
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ error = dpif_linux_recv_set__(dpif, enable);
+ fat_rwlock_unlock(&dpif->upcall_lock);
return error;
}
static int
-dpif_linux_handlers_set(struct dpif *dpif_ OVS_UNUSED,
- uint32_t n_handlers OVS_UNUSED)
+dpif_linux_handlers_set(struct dpif *dpif_, uint32_t n_handlers)
{
- return 0;
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ int error = 0;
+
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ if (dpif->handlers) {
+ error = dpif_linux_refresh_channels(dpif, n_handlers);
+ }
+ fat_rwlock_unlock(&dpif->upcall_lock);
+
+ return error;
}
static int
}
static int
-dpif_linux_recv__(struct dpif *dpif_, struct dpif_upcall *upcall,
- struct ofpbuf *buf)
+dpif_linux_recv__(struct dpif_linux *dpif, uint32_t handler_id,
+ struct dpif_upcall *upcall, struct ofpbuf *buf)
+ OVS_REQ_RDLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ struct dpif_handler *handler;
int read_tries = 0;
- if (dpif->epoll_fd < 0) {
- return EAGAIN;
+ if (!dpif->handlers || handler_id >= dpif->n_handlers) {
+ return EAGAIN;
}
- if (dpif->event_offset >= dpif->n_events) {
+ handler = &dpif->handlers[handler_id];
+ if (handler->event_offset >= handler->n_events) {
int retval;
- dpif->event_offset = dpif->n_events = 0;
+ handler->event_offset = handler->n_events = 0;
do {
- retval = epoll_wait(dpif->epoll_fd, dpif->epoll_events,
+ retval = epoll_wait(handler->epoll_fd, handler->epoll_events,
dpif->uc_array_size, 0);
} while (retval < 0 && errno == EINTR);
if (retval < 0) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
VLOG_WARN_RL(&rl, "epoll_wait failed (%s)", ovs_strerror(errno));
} else if (retval > 0) {
- dpif->n_events = retval;
+ handler->n_events = retval;
}
}
- while (dpif->event_offset < dpif->n_events) {
- int idx = dpif->epoll_events[dpif->event_offset].data.u32;
- struct dpif_channel *ch = &dpif->channels[idx];
+ while (handler->event_offset < handler->n_events) {
+ int idx = handler->epoll_events[handler->event_offset].data.u32;
+ struct dpif_channel *ch = &dpif->handlers[handler_id].channels[idx];
- dpif->event_offset++;
+ handler->event_offset++;
for (;;) {
int dp_ifindex;
* packets that the buffer overflowed. Try again
* immediately because there's almost certainly a packet
* waiting for us. */
- report_loss(dpif_, ch);
+ report_loss(dpif, ch, idx, handler_id);
continue;
}
}
static int
-dpif_linux_recv(struct dpif *dpif_, uint32_t handler_id OVS_UNUSED,
+dpif_linux_recv(struct dpif *dpif_, uint32_t handler_id,
struct dpif_upcall *upcall, struct ofpbuf *buf)
{
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
int error;
- ovs_mutex_lock(&dpif->upcall_lock);
- error = dpif_linux_recv__(dpif_, upcall, buf);
- ovs_mutex_unlock(&dpif->upcall_lock);
+ fat_rwlock_rdlock(&dpif->upcall_lock);
+ error = dpif_linux_recv__(dpif, handler_id, upcall, buf);
+ fat_rwlock_unlock(&dpif->upcall_lock);
return error;
}
static void
-dpif_linux_recv_wait(struct dpif *dpif_, uint32_t handler_id OVS_UNUSED)
+dpif_linux_recv_wait__(struct dpif_linux *dpif, uint32_t handler_id)
+ OVS_REQ_RDLOCK(dpif->upcall_lock)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ if (dpif->handlers && handler_id < dpif->n_handlers) {
+ struct dpif_handler *handler = &dpif->handlers[handler_id];
- ovs_mutex_lock(&dpif->upcall_lock);
- if (dpif->epoll_fd >= 0) {
- poll_fd_wait(dpif->epoll_fd, POLLIN);
+ poll_fd_wait(handler->epoll_fd, POLLIN);
}
- ovs_mutex_unlock(&dpif->upcall_lock);
}
static void
-dpif_linux_recv_purge(struct dpif *dpif_)
+dpif_linux_recv_wait(struct dpif *dpif_, uint32_t handler_id)
{
struct dpif_linux *dpif = dpif_linux_cast(dpif_);
- ovs_mutex_lock(&dpif->upcall_lock);
- if (dpif->epoll_fd >= 0) {
- struct dpif_channel *ch;
+ fat_rwlock_rdlock(&dpif->upcall_lock);
+ dpif_linux_recv_wait__(dpif, handler_id);
+ fat_rwlock_unlock(&dpif->upcall_lock);
+}
+
+static void
+dpif_linux_recv_purge__(struct dpif_linux *dpif)
+ OVS_REQ_WRLOCK(dpif->upcall_lock)
+{
+ if (dpif->handlers) {
+ size_t i, j;
- for (ch = dpif->channels; ch < &dpif->channels[dpif->uc_array_size];
- ch++) {
- if (ch->sock) {
- nl_sock_drain(ch->sock);
+ for (i = 0; i < dpif->uc_array_size; i++ ) {
+ if (!dpif->handlers[0].channels[i].sock) {
+ continue;
+ }
+
+ for (j = 0; j < dpif->n_handlers; j++) {
+ nl_sock_drain(dpif->handlers[j].channels[i].sock);
}
}
}
- ovs_mutex_unlock(&dpif->upcall_lock);
+}
+
+static void
+dpif_linux_recv_purge(struct dpif *dpif_)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
+ fat_rwlock_wrlock(&dpif->upcall_lock);
+ dpif_linux_recv_purge__(dpif);
+ fat_rwlock_unlock(&dpif->upcall_lock);
}
const struct dpif_class dpif_linux_class = {
[OVS_VPORT_ATTR_PORT_NO] = { .type = NL_A_U32 },
[OVS_VPORT_ATTR_TYPE] = { .type = NL_A_U32 },
[OVS_VPORT_ATTR_NAME] = { .type = NL_A_STRING, .max_len = IFNAMSIZ },
- [OVS_VPORT_ATTR_UPCALL_PID] = { .type = NL_A_U32 },
+ [OVS_VPORT_ATTR_UPCALL_PID] = { .type = NL_A_UNSPEC },
[OVS_VPORT_ATTR_STATS] = { NL_POLICY_FOR(struct ovs_vport_stats),
.optional = true },
[OVS_VPORT_ATTR_OPTIONS] = { .type = NL_A_NESTED, .optional = true },
vport->type = nl_attr_get_u32(a[OVS_VPORT_ATTR_TYPE]);
vport->name = nl_attr_get_string(a[OVS_VPORT_ATTR_NAME]);
if (a[OVS_VPORT_ATTR_UPCALL_PID]) {
- vport->upcall_pid = nl_attr_get(a[OVS_VPORT_ATTR_UPCALL_PID]);
+ vport->n_upcall_pids = nl_attr_get_size(a[OVS_VPORT_ATTR_UPCALL_PID])
+ / (sizeof *vport->upcall_pids);
+ vport->upcall_pids = nl_attr_get(a[OVS_VPORT_ATTR_UPCALL_PID]);
+
}
if (a[OVS_VPORT_ATTR_STATS]) {
vport->stats = nl_attr_get(a[OVS_VPORT_ATTR_STATS]);
nl_msg_put_string(buf, OVS_VPORT_ATTR_NAME, vport->name);
}
- if (vport->upcall_pid) {
- nl_msg_put_u32(buf, OVS_VPORT_ATTR_UPCALL_PID, *vport->upcall_pid);
+ if (vport->upcall_pids) {
+ nl_msg_put_unspec(buf, OVS_VPORT_ATTR_UPCALL_PID,
+ vport->upcall_pids,
+ vport->n_upcall_pids * sizeof *vport->upcall_pids);
}
if (vport->stats) {
/* Logs information about a packet that was recently lost in 'ch' (in
* 'dpif_'). */
static void
-report_loss(struct dpif *dpif_, struct dpif_channel *ch)
+report_loss(struct dpif_linux *dpif, struct dpif_channel *ch, uint32_t ch_idx,
+ uint32_t handler_id)
{
- struct dpif_linux *dpif = dpif_linux_cast(dpif_);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
struct ds s;
time_msec() - ch->last_poll);
}
- VLOG_WARN("%s: lost packet on channel %"PRIdPTR"%s",
- dpif_name(dpif_), ch - dpif->channels, ds_cstr(&s));
+ VLOG_WARN("%s: lost packet on port channel %u of handler %u",
+ dpif_name(&dpif->dpif), ch_idx, handler_id);
ds_destroy(&s);
}
* 32-bit boundaries, so use get_unaligned_u64() to access its values.
*/
const char *name; /* OVS_VPORT_ATTR_NAME. */
- const uint32_t *upcall_pid; /* OVS_VPORT_ATTR_UPCALL_PID. */
+ uint32_t n_upcall_pids;
+ const uint32_t *upcall_pids; /* OVS_VPORT_ATTR_UPCALL_PID. */
const struct ovs_vport_stats *stats; /* OVS_VPORT_ATTR_STATS. */
const struct nlattr *options; /* OVS_VPORT_ATTR_OPTIONS. */
size_t options_len;
pthread_t thread;
int id;
atomic_uint change_seq;
- char *name;
};
/* Interface to netdev-based datapath. */
bool create, struct dpif **);
static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *,
int queue_no, int type,
- const struct flow *,
+ const struct miniflow *,
const struct nlattr *userdata);
static void dp_netdev_execute_actions(struct dp_netdev *dp,
- const struct flow *, struct ofpbuf *, bool may_steal,
+ const struct miniflow *,
+ struct ofpbuf *, bool may_steal,
struct pkt_metadata *,
const struct nlattr *actions,
size_t actions_len);
}
static struct dp_netdev_flow *
-dp_netdev_lookup_flow(const struct dp_netdev *dp, const struct flow *flow)
+dp_netdev_lookup_flow(const struct dp_netdev *dp, const struct miniflow *key)
OVS_EXCLUDED(dp->cls.rwlock)
{
struct dp_netdev_flow *netdev_flow;
+ struct cls_rule *rule;
fat_rwlock_rdlock(&dp->cls.rwlock);
- netdev_flow = dp_netdev_flow_cast(classifier_lookup(&dp->cls, flow, NULL));
+ rule = classifier_lookup_miniflow_first(&dp->cls, key);
+ netdev_flow = dp_netdev_flow_cast(rule);
fat_rwlock_unlock(&dp->cls.rwlock);
return netdev_flow;
struct dp_netdev *dp = get_dp_netdev(dpif);
struct dp_netdev_flow *netdev_flow;
struct flow flow;
+ struct miniflow miniflow;
struct flow_wildcards wc;
int error;
if (error) {
return error;
}
+ miniflow_init(&miniflow, &flow);
ovs_mutex_lock(&dp->flow_mutex);
- netdev_flow = dp_netdev_lookup_flow(dp, &flow);
+ netdev_flow = dp_netdev_lookup_flow(dp, &miniflow);
if (!netdev_flow) {
if (put->flags & DPIF_FP_CREATE) {
if (hmap_count(&dp->flow_table) < MAX_FLOWS) {
struct dp_netdev_flow_state *state = state_;
struct dp_netdev *dp = get_dp_netdev(dpif);
struct dp_netdev_flow *netdev_flow;
+ struct flow_wildcards wc;
int error;
ovs_mutex_lock(&iter->mutex);
return error;
}
+ minimask_expand(&netdev_flow->cr.match.mask, &wc);
+
if (key) {
struct ofpbuf buf;
ofpbuf_use_stack(&buf, &state->keybuf, sizeof state->keybuf);
- odp_flow_key_from_flow(&buf, &netdev_flow->flow,
+ odp_flow_key_from_flow(&buf, &netdev_flow->flow, &wc.masks,
netdev_flow->flow.in_port.odp_port);
*key = ofpbuf_data(&buf);
if (key && mask) {
struct ofpbuf buf;
- struct flow_wildcards wc;
ofpbuf_use_stack(&buf, &state->maskbuf, sizeof state->maskbuf);
- minimask_expand(&netdev_flow->cr.match.mask, &wc);
odp_flow_key_from_mask(&buf, &wc.masks, &netdev_flow->flow,
odp_to_u32(wc.masks.in_port.odp_port),
SIZE_MAX);
{
struct dp_netdev *dp = get_dp_netdev(dpif);
struct pkt_metadata *md = &execute->md;
- struct flow key;
+ struct {
+ struct miniflow flow;
+ uint32_t buf[FLOW_U32S];
+ } key;
if (ofpbuf_size(execute->packet) < ETH_HEADER_LEN ||
ofpbuf_size(execute->packet) > UINT16_MAX) {
}
/* Extract flow key. */
- flow_extract(execute->packet, md, &key);
+ miniflow_initialize(&key.flow, key.buf);
+ miniflow_extract(execute->packet, md, &key.flow);
ovs_rwlock_rdlock(&dp->port_rwlock);
- dp_netdev_execute_actions(dp, &key, execute->packet, false, md,
+ dp_netdev_execute_actions(dp, &key.flow, execute->packet, false, md,
execute->actions, execute->actions_len);
ovs_rwlock_unlock(&dp->port_rwlock);
int poll_cnt;
int i;
- f->name = xasprintf("pmd_%u", ovsthread_id_self());
- set_subprogram_name("%s", f->name);
poll_cnt = 0;
poll_list = NULL;
}
free(poll_list);
- free(f->name);
return NULL;
}
/* Each thread will distribute all devices rx-queues among
* themselves. */
- xpthread_create(&f->thread, NULL, pmd_thread_main, f);
+ f->thread = ovs_thread_create("pmd", pmd_thread_main, f);
}
}
static void
dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow,
const struct ofpbuf *packet,
- const struct flow *key)
+ const struct miniflow *key)
{
- uint16_t tcp_flags = ntohs(key->tcp_flags);
+ uint16_t tcp_flags = miniflow_get_tcp_flags(key);
long long int now = time_msec();
struct dp_netdev_flow_stats *bucket;
OVS_REQ_RDLOCK(dp->port_rwlock)
{
struct dp_netdev_flow *netdev_flow;
- struct flow key;
+ struct {
+ struct miniflow flow;
+ uint32_t buf[FLOW_U32S];
+ } key;
if (ofpbuf_size(packet) < ETH_HEADER_LEN) {
ofpbuf_delete(packet);
return;
}
- flow_extract(packet, md, &key);
- netdev_flow = dp_netdev_lookup_flow(dp, &key);
+ miniflow_initialize(&key.flow, key.buf);
+ miniflow_extract(packet, md, &key.flow);
+
+ netdev_flow = dp_netdev_lookup_flow(dp, &key.flow);
if (netdev_flow) {
struct dp_netdev_actions *actions;
- dp_netdev_flow_used(netdev_flow, packet, &key);
+ dp_netdev_flow_used(netdev_flow, packet, &key.flow);
actions = dp_netdev_flow_get_actions(netdev_flow);
- dp_netdev_execute_actions(dp, &key, packet, true, md,
+ dp_netdev_execute_actions(dp, &key.flow, packet, true, md,
actions->actions, actions->size);
dp_netdev_count_packet(dp, DP_STAT_HIT);
} else if (dp->handler_queues) {
dp_netdev_count_packet(dp, DP_STAT_MISS);
dp_netdev_output_userspace(dp, packet,
- flow_hash_5tuple(&key, 0) % dp->n_handlers,
- DPIF_UC_MISS, &key, NULL);
+ miniflow_hash_5tuple(&key.flow, 0)
+ % dp->n_handlers,
+ DPIF_UC_MISS, &key.flow, NULL);
ofpbuf_delete(packet);
}
}
static int
dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet,
- int queue_no, int type, const struct flow *flow,
+ int queue_no, int type, const struct miniflow *key,
const struct nlattr *userdata)
{
struct dp_netdev_queue *q;
struct dpif_upcall *upcall = &u->upcall;
struct ofpbuf *buf = &u->buf;
size_t buf_size;
+ struct flow flow;
upcall->type = type;
ofpbuf_init(buf, buf_size);
/* Put ODP flow. */
- odp_flow_key_from_flow(buf, flow, flow->in_port.odp_port);
+ miniflow_expand(key, &flow);
+ odp_flow_key_from_flow(buf, &flow, NULL, flow.in_port.odp_port);
upcall->key = ofpbuf_data(buf);
upcall->key_len = ofpbuf_size(buf);
struct dp_netdev_execute_aux {
struct dp_netdev *dp;
- const struct flow *key;
+ const struct miniflow *key;
};
static void
userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA);
dp_netdev_output_userspace(aux->dp, packet,
- flow_hash_5tuple(aux->key, 0)
+ miniflow_hash_5tuple(aux->key, 0)
% aux->dp->n_handlers,
DPIF_UC_ACTION, aux->key,
userdata);
break;
}
+ case OVS_ACTION_ATTR_HASH: {
+ const struct ovs_action_hash *hash_act;
+ uint32_t hash;
+
+ hash_act = nl_attr_get(a);
+ if (hash_act->hash_alg == OVS_HASH_ALG_L4) {
+ /* Hash need not be symmetric, nor does it need to include
+ * L2 fields. */
+ hash = miniflow_hash_5tuple(aux->key, hash_act->hash_basis);
+ if (!hash) {
+ hash = 1; /* 0 is not valid */
+ }
+
+ } else {
+ VLOG_WARN("Unknown hash algorithm specified for the hash action.");
+ hash = 2;
+ }
+
+ md->dp_hash = hash;
+ break;
+ }
+
case OVS_ACTION_ATTR_RECIRC:
if (*depth < MAX_RECIRC_DEPTH) {
struct pkt_metadata recirc_md = *md;
struct ofpbuf *recirc_packet;
- const struct ovs_action_recirc *act;
recirc_packet = may_steal ? packet : ofpbuf_clone(packet);
-
- act = nl_attr_get(a);
- recirc_md.recirc_id = act->recirc_id;
- recirc_md.dp_hash = 0;
-
- if (act->hash_alg == OVS_RECIRC_HASH_ALG_L4) {
- recirc_md.dp_hash = flow_hash_symmetric_l4(aux->key,
- act->hash_bias);
- if (!recirc_md.dp_hash) {
- recirc_md.dp_hash = 1; /* 0 is not valid */
- }
- }
+ recirc_md.recirc_id = nl_attr_get_u32(a);
(*depth)++;
dp_netdev_input(aux->dp, recirc_packet, &recirc_md);
}
static void
-dp_netdev_execute_actions(struct dp_netdev *dp, const struct flow *key,
+dp_netdev_execute_actions(struct dp_netdev *dp, const struct miniflow *key,
struct ofpbuf *packet, bool may_steal,
struct pkt_metadata *md,
const struct nlattr *actions, size_t actions_len)
case OVS_ACTION_ATTR_SAMPLE:
case OVS_ACTION_ATTR_UNSPEC:
case OVS_ACTION_ATTR_RECIRC:
+ case OVS_ACTION_ATTR_HASH:
case __OVS_ACTION_ATTR_MAX:
OVS_NOT_REACHED();
}
FLOW_U32S
};
-static struct arp_eth_header *
-pull_arp(struct ofpbuf *packet)
-{
- return ofpbuf_try_pull(packet, ARP_ETH_HEADER_LEN);
-}
-
-static struct ip_header *
-pull_ip(struct ofpbuf *packet)
-{
- if (ofpbuf_size(packet) >= IP_HEADER_LEN) {
- struct ip_header *ip = ofpbuf_data(packet);
- int ip_len = IP_IHL(ip->ip_ihl_ver) * 4;
- if (ip_len >= IP_HEADER_LEN && ofpbuf_size(packet) >= ip_len) {
- return ofpbuf_pull(packet, ip_len);
- }
- }
- return NULL;
-}
-
-static struct icmp_header *
-pull_icmp(struct ofpbuf *packet)
-{
- return ofpbuf_try_pull(packet, ICMP_HEADER_LEN);
-}
-
-static struct icmp6_hdr *
-pull_icmpv6(struct ofpbuf *packet)
-{
- return ofpbuf_try_pull(packet, sizeof(struct icmp6_hdr));
-}
-
-static void
-parse_mpls(struct ofpbuf *b, struct flow *flow)
-{
- struct mpls_hdr *mh;
- int idx = 0;
+/* miniflow_extract() assumes the following to be true to optimize the
+ * extraction process. */
+BUILD_ASSERT_DECL(offsetof(struct flow, dl_type) + 2
+ == offsetof(struct flow, vlan_tci) &&
+ offsetof(struct flow, dl_type) / 4
+ == offsetof(struct flow, vlan_tci) / 4 );
+
+BUILD_ASSERT_DECL(offsetof(struct flow, nw_frag) + 3
+ == offsetof(struct flow, nw_proto) &&
+ offsetof(struct flow, nw_tos) + 2
+ == offsetof(struct flow, nw_proto) &&
+ offsetof(struct flow, nw_ttl) + 1
+ == offsetof(struct flow, nw_proto) &&
+ offsetof(struct flow, nw_frag) / 4
+ == offsetof(struct flow, nw_tos) / 4 &&
+ offsetof(struct flow, nw_ttl) / 4
+ == offsetof(struct flow, nw_tos) / 4 &&
+ offsetof(struct flow, nw_proto) / 4
+ == offsetof(struct flow, nw_tos) / 4);
+
+/* TCP flags in the first half of a BE32, zeroes in the other half. */
+BUILD_ASSERT_DECL(offsetof(struct flow, tcp_flags) + 2
+ == offsetof(struct flow, pad) &&
+ offsetof(struct flow, tcp_flags) / 4
+ == offsetof(struct flow, pad) / 4);
+#if WORDS_BIGENDIAN
+#define TCP_FLAGS_BE32(tcp_ctl) ((OVS_FORCE ovs_be32)TCP_FLAGS_BE16(tcp_ctl) \
+ << 16)
+#else
+#define TCP_FLAGS_BE32(tcp_ctl) ((OVS_FORCE ovs_be32)TCP_FLAGS_BE16(tcp_ctl))
+#endif
+
+BUILD_ASSERT_DECL(offsetof(struct flow, tp_src) + 2
+ == offsetof(struct flow, tp_dst) &&
+ offsetof(struct flow, tp_src) / 4
+ == offsetof(struct flow, tp_dst) / 4);
+
+/* Removes 'size' bytes from the head end of '*datap', of size '*sizep', which
+ * must contain at least 'size' bytes of data. Returns the first byte of data
+ * removed. */
+static inline const void *
+data_pull(void **datap, size_t *sizep, size_t size)
+{
+ char *data = (char *)*datap;
+ *datap = data + size;
+ *sizep -= size;
+ return data;
+}
+
+/* If '*datap' has at least 'size' bytes of data, removes that many bytes from
+ * the head end of '*datap' and returns the first byte removed. Otherwise,
+ * returns a null pointer without modifying '*datap'. */
+static inline const void *
+data_try_pull(void **datap, size_t *sizep, size_t size)
+{
+ return OVS_LIKELY(*sizep >= size) ? data_pull(datap, sizep, size) : NULL;
+}
+
+/* Context for pushing data to a miniflow. */
+struct mf_ctx {
+ uint64_t map;
+ uint32_t *data;
+ uint32_t * const end;
+};
- while ((mh = ofpbuf_try_pull(b, sizeof *mh))) {
- ovs_be32 mpls_lse = get_16aligned_be32(&mh->mpls_lse);
- if (idx < FLOW_MAX_MPLS_LABELS) {
- flow->mpls_lse[idx++] = mpls_lse;
- }
- if (mpls_lse & htonl(MPLS_BOS_MASK)) {
+/* miniflow_push_* macros allow filling in a miniflow data values in order.
+ * Assertions are needed only when the layout of the struct flow is modified.
+ * 'ofs' is a compile-time constant, which allows most of the code be optimized
+ * away. Some GCC versions gave warnigns on ALWAYS_INLINE, so these are
+ * defined as macros. */
+
+#if (FLOW_WC_SEQ != 26)
+#define MINIFLOW_ASSERT(X) ovs_assert(X)
+#else
+#define MINIFLOW_ASSERT(X)
+#endif
+
+#define miniflow_push_uint32_(MF, OFS, VALUE) \
+{ \
+ MINIFLOW_ASSERT(MF.data < MF.end && (OFS) % 4 == 0 \
+ && !(MF.map & (UINT64_MAX << (OFS) / 4))); \
+ *MF.data++ = VALUE; \
+ MF.map |= UINT64_C(1) << (OFS) / 4; \
+}
+
+#define miniflow_push_be32_(MF, OFS, VALUE) \
+ miniflow_push_uint32_(MF, OFS, (OVS_FORCE uint32_t)(VALUE))
+
+#define miniflow_push_uint16_(MF, OFS, VALUE) \
+{ \
+ MINIFLOW_ASSERT(MF.data < MF.end && \
+ (((OFS) % 4 == 0 && !(MF.map & (UINT64_MAX << (OFS) / 4))) \
+ || ((OFS) % 4 == 2 && MF.map & (UINT64_C(1) << (OFS) / 4) \
+ && !(MF.map & (UINT64_MAX << ((OFS) / 4 + 1)))))); \
+ \
+ if ((OFS) % 4 == 0) { \
+ *(uint16_t *)MF.data = VALUE; \
+ MF.map |= UINT64_C(1) << (OFS) / 4; \
+ } else if ((OFS) % 4 == 2) { \
+ *((uint16_t *)MF.data + 1) = VALUE; \
+ MF.data++; \
+ } \
+}
+
+#define miniflow_push_be16_(MF, OFS, VALUE) \
+ miniflow_push_uint16_(MF, OFS, (OVS_FORCE uint16_t)VALUE);
+
+/* Data at 'valuep' may be unaligned. */
+#define miniflow_push_words_(MF, OFS, VALUEP, N_WORDS) \
+{ \
+ int ofs32 = (OFS) / 4; \
+ \
+ MINIFLOW_ASSERT(MF.data + (N_WORDS) <= MF.end && (OFS) % 4 == 0 \
+ && !(MF.map & (UINT64_MAX << ofs32))); \
+ \
+ memcpy(MF.data, (VALUEP), (N_WORDS) * sizeof *MF.data); \
+ MF.data += (N_WORDS); \
+ MF.map |= ((UINT64_MAX >> (64 - (N_WORDS))) << ofs32); \
+}
+
+#define miniflow_push_uint32(MF, FIELD, VALUE) \
+ miniflow_push_uint32_(MF, offsetof(struct flow, FIELD), VALUE)
+
+#define miniflow_push_be32(MF, FIELD, VALUE) \
+ miniflow_push_be32_(MF, offsetof(struct flow, FIELD), VALUE)
+
+#define miniflow_push_uint32_check(MF, FIELD, VALUE) \
+ { if (OVS_LIKELY(VALUE)) { \
+ miniflow_push_uint32_(MF, offsetof(struct flow, FIELD), VALUE); \
+ } \
+ }
+
+#define miniflow_push_be32_check(MF, FIELD, VALUE) \
+ { if (OVS_LIKELY(VALUE)) { \
+ miniflow_push_be32_(MF, offsetof(struct flow, FIELD), VALUE); \
+ } \
+ }
+
+#define miniflow_push_uint16(MF, FIELD, VALUE) \
+ miniflow_push_uint16_(MF, offsetof(struct flow, FIELD), VALUE)
+
+#define miniflow_push_be16(MF, FIELD, VALUE) \
+ miniflow_push_be16_(MF, offsetof(struct flow, FIELD), VALUE)
+
+#define miniflow_push_words(MF, FIELD, VALUEP, N_WORDS) \
+ miniflow_push_words_(MF, offsetof(struct flow, FIELD), VALUEP, N_WORDS)
+
+/* Pulls the MPLS headers at '*datap' and returns the count of them. */
+static inline int
+parse_mpls(void **datap, size_t *sizep)
+{
+ const struct mpls_hdr *mh;
+ int count = 0;
+
+ while ((mh = data_try_pull(datap, sizep, sizeof *mh))) {
+ count++;
+ if (mh->mpls_lse.lo & htons(1 << MPLS_BOS_SHIFT)) {
break;
}
}
+ return MAX(count, FLOW_MAX_MPLS_LABELS);
}
-static void
-parse_vlan(struct ofpbuf *b, struct flow *flow)
+static inline ovs_be16
+parse_vlan(void **datap, size_t *sizep)
{
+ const struct eth_header *eth = *datap;
+
struct qtag_prefix {
ovs_be16 eth_type; /* ETH_TYPE_VLAN */
ovs_be16 tci;
};
- if (ofpbuf_size(b) >= sizeof(struct qtag_prefix) + sizeof(ovs_be16)) {
- struct qtag_prefix *qp = ofpbuf_pull(b, sizeof *qp);
- flow->vlan_tci = qp->tci | htons(VLAN_CFI);
+ data_pull(datap, sizep, ETH_ADDR_LEN * 2);
+
+ if (eth->eth_type == htons(ETH_TYPE_VLAN)) {
+ if (OVS_LIKELY(*sizep
+ >= sizeof(struct qtag_prefix) + sizeof(ovs_be16))) {
+ const struct qtag_prefix *qp = data_pull(datap, sizep, sizeof *qp);
+ return qp->tci | htons(VLAN_CFI);
+ }
}
+ return 0;
}
-static ovs_be16
-parse_ethertype(struct ofpbuf *b)
+static inline ovs_be16
+parse_ethertype(void **datap, size_t *sizep)
{
- struct llc_snap_header *llc;
+ const struct llc_snap_header *llc;
ovs_be16 proto;
- proto = *(ovs_be16 *) ofpbuf_pull(b, sizeof proto);
- if (ntohs(proto) >= ETH_TYPE_MIN) {
+ proto = *(ovs_be16 *) data_pull(datap, sizep, sizeof proto);
+ if (OVS_LIKELY(ntohs(proto) >= ETH_TYPE_MIN)) {
return proto;
}
- if (ofpbuf_size(b) < sizeof *llc) {
+ if (OVS_UNLIKELY(*sizep < sizeof *llc)) {
return htons(FLOW_DL_TYPE_NONE);
}
- llc = ofpbuf_data(b);
- if (llc->llc.llc_dsap != LLC_DSAP_SNAP
- || llc->llc.llc_ssap != LLC_SSAP_SNAP
- || llc->llc.llc_cntl != LLC_CNTL_SNAP
- || memcmp(llc->snap.snap_org, SNAP_ORG_ETHERNET,
- sizeof llc->snap.snap_org)) {
+ llc = *datap;
+ if (OVS_UNLIKELY(llc->llc.llc_dsap != LLC_DSAP_SNAP
+ || llc->llc.llc_ssap != LLC_SSAP_SNAP
+ || llc->llc.llc_cntl != LLC_CNTL_SNAP
+ || memcmp(llc->snap.snap_org, SNAP_ORG_ETHERNET,
+ sizeof llc->snap.snap_org))) {
return htons(FLOW_DL_TYPE_NONE);
}
- ofpbuf_pull(b, sizeof *llc);
+ data_pull(datap, sizep, sizeof *llc);
- if (ntohs(llc->snap.snap_type) >= ETH_TYPE_MIN) {
+ if (OVS_LIKELY(ntohs(llc->snap.snap_type) >= ETH_TYPE_MIN)) {
return llc->snap.snap_type;
}
return htons(FLOW_DL_TYPE_NONE);
}
-static int
-parse_ipv6(struct ofpbuf *packet, struct flow *flow)
+static inline bool
+parse_icmpv6(void **datap, size_t *sizep, const struct icmp6_hdr *icmp,
+ const struct in6_addr **nd_target,
+ uint8_t arp_buf[2][ETH_ADDR_LEN])
{
- const struct ovs_16aligned_ip6_hdr *nh;
- ovs_be32 tc_flow;
- int nexthdr;
-
- nh = ofpbuf_try_pull(packet, sizeof *nh);
- if (!nh) {
- return EINVAL;
- }
-
- nexthdr = nh->ip6_nxt;
-
- memcpy(&flow->ipv6_src, &nh->ip6_src, sizeof flow->ipv6_src);
- memcpy(&flow->ipv6_dst, &nh->ip6_dst, sizeof flow->ipv6_dst);
-
- tc_flow = get_16aligned_be32(&nh->ip6_flow);
- flow->nw_tos = ntohl(tc_flow) >> 20;
- flow->ipv6_label = tc_flow & htonl(IPV6_LABEL_MASK);
- flow->nw_ttl = nh->ip6_hlim;
- flow->nw_proto = IPPROTO_NONE;
-
- while (1) {
- if ((nexthdr != IPPROTO_HOPOPTS)
- && (nexthdr != IPPROTO_ROUTING)
- && (nexthdr != IPPROTO_DSTOPTS)
- && (nexthdr != IPPROTO_AH)
- && (nexthdr != IPPROTO_FRAGMENT)) {
- /* It's either a terminal header (e.g., TCP, UDP) or one we
- * don't understand. In either case, we're done with the
- * packet, so use it to fill in 'nw_proto'. */
- break;
- }
-
- /* We only verify that at least 8 bytes of the next header are
- * available, but many of these headers are longer. Ensure that
- * accesses within the extension header are within those first 8
- * bytes. All extension headers are required to be at least 8
- * bytes. */
- if (ofpbuf_size(packet) < 8) {
- return EINVAL;
- }
-
- if ((nexthdr == IPPROTO_HOPOPTS)
- || (nexthdr == IPPROTO_ROUTING)
- || (nexthdr == IPPROTO_DSTOPTS)) {
- /* These headers, while different, have the fields we care about
- * in the same location and with the same interpretation. */
- const struct ip6_ext *ext_hdr = ofpbuf_data(packet);
- nexthdr = ext_hdr->ip6e_nxt;
- if (!ofpbuf_try_pull(packet, (ext_hdr->ip6e_len + 1) * 8)) {
- return EINVAL;
- }
- } else if (nexthdr == IPPROTO_AH) {
- /* A standard AH definition isn't available, but the fields
- * we care about are in the same location as the generic
- * option header--only the header length is calculated
- * differently. */
- const struct ip6_ext *ext_hdr = ofpbuf_data(packet);
- nexthdr = ext_hdr->ip6e_nxt;
- if (!ofpbuf_try_pull(packet, (ext_hdr->ip6e_len + 2) * 4)) {
- return EINVAL;
- }
- } else if (nexthdr == IPPROTO_FRAGMENT) {
- const struct ovs_16aligned_ip6_frag *frag_hdr = ofpbuf_data(packet);
-
- nexthdr = frag_hdr->ip6f_nxt;
- if (!ofpbuf_try_pull(packet, sizeof *frag_hdr)) {
- return EINVAL;
- }
-
- /* We only process the first fragment. */
- if (frag_hdr->ip6f_offlg != htons(0)) {
- flow->nw_frag = FLOW_NW_FRAG_ANY;
- if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) {
- flow->nw_frag |= FLOW_NW_FRAG_LATER;
- nexthdr = IPPROTO_FRAGMENT;
- break;
- }
- }
- }
- }
-
- flow->nw_proto = nexthdr;
- return 0;
-}
-
-static void
-parse_tcp(struct ofpbuf *b, struct flow *flow)
-{
- if (ofpbuf_size(b) >= TCP_HEADER_LEN) {
- const struct tcp_header *tcp = ofpbuf_data(b);
-
- flow->tp_src = tcp->tcp_src;
- flow->tp_dst = tcp->tcp_dst;
- flow->tcp_flags = tcp->tcp_ctl & htons(0x0fff);
- }
-}
-
-static void
-parse_udp(struct ofpbuf *b, struct flow *flow)
-{
- if (ofpbuf_size(b) >= UDP_HEADER_LEN) {
- const struct udp_header *udp = ofpbuf_data(b);
-
- flow->tp_src = udp->udp_src;
- flow->tp_dst = udp->udp_dst;
- }
-}
-
-static void
-parse_sctp(struct ofpbuf *b, struct flow *flow)
-{
- if (ofpbuf_size(b) >= SCTP_HEADER_LEN) {
- const struct sctp_header *sctp = ofpbuf_data(b);
-
- flow->tp_src = sctp->sctp_src;
- flow->tp_dst = sctp->sctp_dst;
- }
-}
-
-static void
-parse_icmpv6(struct ofpbuf *b, struct flow *flow)
-{
- const struct icmp6_hdr *icmp = pull_icmpv6(b);
-
- if (!icmp) {
- return;
- }
-
- /* The ICMPv6 type and code fields use the 16-bit transport port
- * fields, so we need to store them in 16-bit network byte order. */
- flow->tp_src = htons(icmp->icmp6_type);
- flow->tp_dst = htons(icmp->icmp6_code);
-
if (icmp->icmp6_code == 0 &&
(icmp->icmp6_type == ND_NEIGHBOR_SOLICIT ||
icmp->icmp6_type == ND_NEIGHBOR_ADVERT)) {
- const struct in6_addr *nd_target;
- nd_target = ofpbuf_try_pull(b, sizeof *nd_target);
- if (!nd_target) {
- return;
+ *nd_target = data_try_pull(datap, sizep, sizeof *nd_target);
+ if (OVS_UNLIKELY(!*nd_target)) {
+ return false;
}
- flow->nd_target = *nd_target;
- while (ofpbuf_size(b) >= 8) {
+ while (*sizep >= 8) {
/* The minimum size of an option is 8 bytes, which also is
* the size of Ethernet link-layer options. */
- const struct nd_opt_hdr *nd_opt = ofpbuf_data(b);
+ const struct nd_opt_hdr *nd_opt = *datap;
int opt_len = nd_opt->nd_opt_len * 8;
- if (!opt_len || opt_len > ofpbuf_size(b)) {
+ if (!opt_len || opt_len > *sizep) {
goto invalid;
}
* layer option is specified twice. */
if (nd_opt->nd_opt_type == ND_OPT_SOURCE_LINKADDR
&& opt_len == 8) {
- if (eth_addr_is_zero(flow->arp_sha)) {
- memcpy(flow->arp_sha, nd_opt + 1, ETH_ADDR_LEN);
+ if (OVS_LIKELY(eth_addr_is_zero(arp_buf[0]))) {
+ memcpy(arp_buf[0], nd_opt + 1, ETH_ADDR_LEN);
} else {
goto invalid;
}
} else if (nd_opt->nd_opt_type == ND_OPT_TARGET_LINKADDR
&& opt_len == 8) {
- if (eth_addr_is_zero(flow->arp_tha)) {
- memcpy(flow->arp_tha, nd_opt + 1, ETH_ADDR_LEN);
+ if (OVS_LIKELY(eth_addr_is_zero(arp_buf[1]))) {
+ memcpy(arp_buf[1], nd_opt + 1, ETH_ADDR_LEN);
} else {
goto invalid;
}
}
- if (!ofpbuf_try_pull(b, opt_len)) {
+ if (OVS_UNLIKELY(!data_try_pull(datap, sizep, opt_len))) {
goto invalid;
}
}
}
- return;
+ return true;
invalid:
- memset(&flow->nd_target, 0, sizeof(flow->nd_target));
- memset(flow->arp_sha, 0, sizeof(flow->arp_sha));
- memset(flow->arp_tha, 0, sizeof(flow->arp_tha));
-
- return;
+ return false;
}
/* Initializes 'flow' members from 'packet' and 'md'
flow_extract(struct ofpbuf *packet, const struct pkt_metadata *md,
struct flow *flow)
{
- struct ofpbuf b = *packet;
- struct eth_header *eth;
+ struct {
+ struct miniflow mf;
+ uint32_t buf[FLOW_U32S];
+ } m;
COVERAGE_INC(flow_extract);
- memset(flow, 0, sizeof *flow);
+ miniflow_initialize(&m.mf, m.buf);
+ miniflow_extract(packet, md, &m.mf);
+ miniflow_expand(&m.mf, flow);
+}
+/* Caller is responsible for initializing 'dst' with enough storage for
+ * FLOW_U32S * 4 bytes. */
+void
+miniflow_extract(struct ofpbuf *packet, const struct pkt_metadata *md,
+ struct miniflow *dst)
+{
+ void *data = ofpbuf_data(packet);
+ size_t size = ofpbuf_size(packet);
+ uint32_t *values = miniflow_values(dst);
+ struct mf_ctx mf = { 0, values, values + FLOW_U32S };
+ char *l2;
+ ovs_be16 dl_type;
+ uint8_t nw_frag, nw_tos, nw_ttl, nw_proto;
+
+ /* Metadata. */
if (md) {
- flow->tunnel = md->tunnel;
- flow->in_port = md->in_port;
- flow->skb_priority = md->skb_priority;
- flow->pkt_mark = md->pkt_mark;
- flow->recirc_id = md->recirc_id;
- flow->dp_hash = md->dp_hash;
+ if (md->tunnel.ip_dst) {
+ miniflow_push_words(mf, tunnel, &md->tunnel,
+ sizeof md->tunnel / 4);
+ }
+ miniflow_push_uint32_check(mf, skb_priority, md->skb_priority);
+ miniflow_push_uint32_check(mf, pkt_mark, md->pkt_mark);
+ miniflow_push_uint32_check(mf, recirc_id, md->recirc_id);
+ miniflow_push_uint32(mf, in_port, odp_to_u32(md->in_port.odp_port));
}
- ofpbuf_set_frame(packet, ofpbuf_data(packet));
+ /* Initialize packet's layer pointer and offsets. */
+ l2 = data;
+ ofpbuf_set_frame(packet, data);
- if (ofpbuf_size(&b) < sizeof *eth) {
- return;
- }
-
- /* Link layer. */
- eth = ofpbuf_data(&b);
- memcpy(flow->dl_src, eth->eth_src, ETH_ADDR_LEN);
- memcpy(flow->dl_dst, eth->eth_dst, ETH_ADDR_LEN);
+ /* Must have full Ethernet header to proceed. */
+ if (OVS_UNLIKELY(size < sizeof(struct eth_header))) {
+ goto out;
+ } else {
+ ovs_be16 vlan_tci;
- /* dl_type, vlan_tci. */
- ofpbuf_pull(&b, ETH_ADDR_LEN * 2);
- if (eth->eth_type == htons(ETH_TYPE_VLAN)) {
- parse_vlan(&b, flow);
+ /* Link layer. */
+ BUILD_ASSERT(offsetof(struct flow, dl_dst) + 6
+ == offsetof(struct flow, dl_src));
+ miniflow_push_words(mf, dl_dst, data, ETH_ADDR_LEN * 2 / 4);
+ /* dl_type, vlan_tci. */
+ vlan_tci = parse_vlan(&data, &size);
+ dl_type = parse_ethertype(&data, &size);
+ miniflow_push_be16(mf, dl_type, dl_type);
+ miniflow_push_be16(mf, vlan_tci, vlan_tci);
}
- flow->dl_type = parse_ethertype(&b);
- /* Parse mpls, copy l3 ttl. */
- if (eth_type_mpls(flow->dl_type)) {
- ofpbuf_set_l2_5(packet, ofpbuf_data(&b));
- parse_mpls(&b, flow);
+ /* Parse mpls. */
+ if (OVS_UNLIKELY(eth_type_mpls(dl_type))) {
+ int count;
+ const void *mpls = data;
+
+ packet->l2_5_ofs = (char *)data - l2;
+ count = parse_mpls(&data, &size);
+ miniflow_push_words(mf, mpls_lse, mpls, count);
}
/* Network layer. */
- ofpbuf_set_l3(packet, ofpbuf_data(&b));
- if (flow->dl_type == htons(ETH_TYPE_IP)) {
- const struct ip_header *nh = pull_ip(&b);
- if (nh) {
- ofpbuf_set_l4(packet, ofpbuf_data(&b));
-
- flow->nw_src = get_16aligned_be32(&nh->ip_src);
- flow->nw_dst = get_16aligned_be32(&nh->ip_dst);
- flow->nw_proto = nh->ip_proto;
-
- flow->nw_tos = nh->ip_tos;
- if (IP_IS_FRAGMENT(nh->ip_frag_off)) {
- flow->nw_frag = FLOW_NW_FRAG_ANY;
- if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) {
- flow->nw_frag |= FLOW_NW_FRAG_LATER;
- }
+ packet->l3_ofs = (char *)data - l2;
+
+ nw_frag = 0;
+ if (OVS_LIKELY(dl_type == htons(ETH_TYPE_IP))) {
+ const struct ip_header *nh = data;
+ int ip_len;
+
+ if (OVS_UNLIKELY(size < IP_HEADER_LEN)) {
+ goto out;
+ }
+ ip_len = IP_IHL(nh->ip_ihl_ver) * 4;
+
+ if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN)) {
+ goto out;
+ }
+
+ /* Push both source and destination address at once. */
+ miniflow_push_words(mf, nw_src, &nh->ip_src, 2);
+
+ nw_tos = nh->ip_tos;
+ nw_ttl = nh->ip_ttl;
+ nw_proto = nh->ip_proto;
+ if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) {
+ nw_frag = FLOW_NW_FRAG_ANY;
+ if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) {
+ nw_frag |= FLOW_NW_FRAG_LATER;
+ }
+ }
+ if (OVS_UNLIKELY(size < ip_len)) {
+ goto out;
+ }
+ data_pull(&data, &size, ip_len);
+
+ } else if (dl_type == htons(ETH_TYPE_IPV6)) {
+ const struct ovs_16aligned_ip6_hdr *nh;
+ ovs_be32 tc_flow;
+
+ if (OVS_UNLIKELY(size < sizeof *nh)) {
+ goto out;
+ }
+ nh = data_pull(&data, &size, sizeof *nh);
+
+ miniflow_push_words(mf, ipv6_src, &nh->ip6_src,
+ sizeof nh->ip6_src / 4);
+ miniflow_push_words(mf, ipv6_dst, &nh->ip6_dst,
+ sizeof nh->ip6_dst / 4);
+
+ tc_flow = get_16aligned_be32(&nh->ip6_flow);
+ {
+ ovs_be32 label = tc_flow & htonl(IPV6_LABEL_MASK);
+ miniflow_push_be32_check(mf, ipv6_label, label);
+ }
+
+ nw_tos = ntohl(tc_flow) >> 20;
+ nw_ttl = nh->ip6_hlim;
+ nw_proto = nh->ip6_nxt;
+
+ while (1) {
+ if (OVS_LIKELY((nw_proto != IPPROTO_HOPOPTS)
+ && (nw_proto != IPPROTO_ROUTING)
+ && (nw_proto != IPPROTO_DSTOPTS)
+ && (nw_proto != IPPROTO_AH)
+ && (nw_proto != IPPROTO_FRAGMENT))) {
+ /* It's either a terminal header (e.g., TCP, UDP) or one we
+ * don't understand. In either case, we're done with the
+ * packet, so use it to fill in 'nw_proto'. */
+ break;
+ }
+
+ /* We only verify that at least 8 bytes of the next header are
+ * available, but many of these headers are longer. Ensure that
+ * accesses within the extension header are within those first 8
+ * bytes. All extension headers are required to be at least 8
+ * bytes. */
+ if (OVS_UNLIKELY(size < 8)) {
+ goto out;
}
- flow->nw_ttl = nh->ip_ttl;
-
- if (!(nh->ip_frag_off & htons(IP_FRAG_OFF_MASK))) {
- if (flow->nw_proto == IPPROTO_TCP) {
- parse_tcp(&b, flow);
- } else if (flow->nw_proto == IPPROTO_UDP) {
- parse_udp(&b, flow);
- } else if (flow->nw_proto == IPPROTO_SCTP) {
- parse_sctp(&b, flow);
- } else if (flow->nw_proto == IPPROTO_ICMP) {
- const struct icmp_header *icmp = pull_icmp(&b);
- if (icmp) {
- flow->tp_src = htons(icmp->icmp_type);
- flow->tp_dst = htons(icmp->icmp_code);
+
+ if ((nw_proto == IPPROTO_HOPOPTS)
+ || (nw_proto == IPPROTO_ROUTING)
+ || (nw_proto == IPPROTO_DSTOPTS)) {
+ /* These headers, while different, have the fields we care
+ * about in the same location and with the same
+ * interpretation. */
+ const struct ip6_ext *ext_hdr = data;
+ nw_proto = ext_hdr->ip6e_nxt;
+ if (OVS_UNLIKELY(!data_try_pull(&data, &size,
+ (ext_hdr->ip6e_len + 1) * 8))) {
+ goto out;
+ }
+ } else if (nw_proto == IPPROTO_AH) {
+ /* A standard AH definition isn't available, but the fields
+ * we care about are in the same location as the generic
+ * option header--only the header length is calculated
+ * differently. */
+ const struct ip6_ext *ext_hdr = data;
+ nw_proto = ext_hdr->ip6e_nxt;
+ if (OVS_UNLIKELY(!data_try_pull(&data, &size,
+ (ext_hdr->ip6e_len + 2) * 4))) {
+ goto out;
+ }
+ } else if (nw_proto == IPPROTO_FRAGMENT) {
+ const struct ovs_16aligned_ip6_frag *frag_hdr = data;
+
+ nw_proto = frag_hdr->ip6f_nxt;
+ if (!data_try_pull(&data, &size, sizeof *frag_hdr)) {
+ goto out;
+ }
+
+ /* We only process the first fragment. */
+ if (frag_hdr->ip6f_offlg != htons(0)) {
+ nw_frag = FLOW_NW_FRAG_ANY;
+ if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) {
+ nw_frag |= FLOW_NW_FRAG_LATER;
+ nw_proto = IPPROTO_FRAGMENT;
+ break;
}
}
}
}
- } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
- if (parse_ipv6(&b, flow)) {
- return;
- }
+ } else {
+ if (dl_type == htons(ETH_TYPE_ARP) ||
+ dl_type == htons(ETH_TYPE_RARP)) {
+ uint8_t arp_buf[2][ETH_ADDR_LEN];
+ const struct arp_eth_header *arp = (const struct arp_eth_header *)
+ data_try_pull(&data, &size, ARP_ETH_HEADER_LEN);
+
+ if (OVS_LIKELY(arp) && OVS_LIKELY(arp->ar_hrd == htons(1))
+ && OVS_LIKELY(arp->ar_pro == htons(ETH_TYPE_IP))
+ && OVS_LIKELY(arp->ar_hln == ETH_ADDR_LEN)
+ && OVS_LIKELY(arp->ar_pln == 4)) {
+ miniflow_push_words(mf, nw_src, &arp->ar_spa, 1);
+ miniflow_push_words(mf, nw_dst, &arp->ar_tpa, 1);
+
+ /* We only match on the lower 8 bits of the opcode. */
+ if (OVS_LIKELY(ntohs(arp->ar_op) <= 0xff)) {
+ miniflow_push_be32(mf, nw_frag, htonl(ntohs(arp->ar_op)));
+ }
- ofpbuf_set_l4(packet, ofpbuf_data(&b));
- if (flow->nw_proto == IPPROTO_TCP) {
- parse_tcp(&b, flow);
- } else if (flow->nw_proto == IPPROTO_UDP) {
- parse_udp(&b, flow);
- } else if (flow->nw_proto == IPPROTO_SCTP) {
- parse_sctp(&b, flow);
- } else if (flow->nw_proto == IPPROTO_ICMPV6) {
- parse_icmpv6(&b, flow);
+ /* Must be adjacent. */
+ BUILD_ASSERT(offsetof(struct flow, arp_sha) + 6
+ == offsetof(struct flow, arp_tha));
+
+ memcpy(arp_buf[0], arp->ar_sha, ETH_ADDR_LEN);
+ memcpy(arp_buf[1], arp->ar_tha, ETH_ADDR_LEN);
+ miniflow_push_words(mf, arp_sha, arp_buf,
+ ETH_ADDR_LEN * 2 / 4);
+ }
}
- } else if (flow->dl_type == htons(ETH_TYPE_ARP) ||
- flow->dl_type == htons(ETH_TYPE_RARP)) {
- const struct arp_eth_header *arp = pull_arp(&b);
- if (arp && arp->ar_hrd == htons(1)
- && arp->ar_pro == htons(ETH_TYPE_IP)
- && arp->ar_hln == ETH_ADDR_LEN
- && arp->ar_pln == 4) {
- /* We only match on the lower 8 bits of the opcode. */
- if (ntohs(arp->ar_op) <= 0xff) {
- flow->nw_proto = ntohs(arp->ar_op);
+ goto out;
+ }
+
+ packet->l4_ofs = (char *)data - l2;
+ miniflow_push_be32(mf, nw_frag,
+ BYTES_TO_BE32(nw_frag, nw_tos, nw_ttl, nw_proto));
+
+ if (OVS_LIKELY(!(nw_frag & FLOW_NW_FRAG_LATER))) {
+ if (OVS_LIKELY(nw_proto == IPPROTO_TCP)) {
+ if (OVS_LIKELY(size >= TCP_HEADER_LEN)) {
+ const struct tcp_header *tcp = data;
+
+ miniflow_push_be32(mf, tcp_flags,
+ TCP_FLAGS_BE32(tcp->tcp_ctl));
+ miniflow_push_words(mf, tp_src, &tcp->tcp_src, 1);
}
+ } else if (OVS_LIKELY(nw_proto == IPPROTO_UDP)) {
+ if (OVS_LIKELY(size >= UDP_HEADER_LEN)) {
+ const struct udp_header *udp = data;
- flow->nw_src = get_16aligned_be32(&arp->ar_spa);
- flow->nw_dst = get_16aligned_be32(&arp->ar_tpa);
- memcpy(flow->arp_sha, arp->ar_sha, ETH_ADDR_LEN);
- memcpy(flow->arp_tha, arp->ar_tha, ETH_ADDR_LEN);
+ miniflow_push_words(mf, tp_src, &udp->udp_src, 1);
+ }
+ } else if (OVS_LIKELY(nw_proto == IPPROTO_SCTP)) {
+ if (OVS_LIKELY(size >= SCTP_HEADER_LEN)) {
+ const struct sctp_header *sctp = data;
+
+ miniflow_push_words(mf, tp_src, &sctp->sctp_src, 1);
+ }
+ } else if (OVS_LIKELY(nw_proto == IPPROTO_ICMP)) {
+ if (OVS_LIKELY(size >= ICMP_HEADER_LEN)) {
+ const struct icmp_header *icmp = data;
+
+ miniflow_push_be16(mf, tp_src, htons(icmp->icmp_type));
+ miniflow_push_be16(mf, tp_dst, htons(icmp->icmp_code));
+ }
+ } else if (OVS_LIKELY(nw_proto == IPPROTO_ICMPV6)) {
+ if (OVS_LIKELY(size >= sizeof(struct icmp6_hdr))) {
+ const struct in6_addr *nd_target = NULL;
+ uint8_t arp_buf[2][ETH_ADDR_LEN];
+ const struct icmp6_hdr *icmp = data_pull(&data, &size,
+ sizeof *icmp);
+ memset(arp_buf, 0, sizeof arp_buf);
+ if (OVS_LIKELY(parse_icmpv6(&data, &size, icmp, &nd_target,
+ arp_buf))) {
+ if (nd_target) {
+ miniflow_push_words(mf, nd_target, nd_target,
+ sizeof *nd_target / 4);
+ }
+ miniflow_push_words(mf, arp_sha, arp_buf,
+ ETH_ADDR_LEN * 2 / 4);
+ miniflow_push_be16(mf, tp_src, htons(icmp->icmp6_type));
+ miniflow_push_be16(mf, tp_dst, htons(icmp->icmp6_code));
+ }
+ }
}
}
+ if (md) {
+ miniflow_push_uint32_check(mf, dp_hash, md->dp_hash);
+ }
+ out:
+ dst->map = mf.map;
}
/* For every bit of a field that is wildcarded in 'wildcards', sets the
void
flow_get_metadata(const struct flow *flow, struct flow_metadata *fmd)
{
- BUILD_ASSERT_DECL(FLOW_WC_SEQ == 25);
+ BUILD_ASSERT_DECL(FLOW_WC_SEQ == 26);
fmd->dp_hash = flow->dp_hash;
fmd->recirc_id = flow->recirc_id;
}
}
-/* Perform a bitwise OR of miniflow 'src' flow data with the equivalent
- * fields in 'dst', storing the result in 'dst'. */
-static void
-flow_union_with_miniflow(struct flow *dst, const struct miniflow *src)
-{
- uint32_t *dst_u32 = (uint32_t *) dst;
- const uint32_t *p = src->values;
- uint64_t map;
-
- for (map = src->map; map; map = zero_rightmost_1bit(map)) {
- dst_u32[raw_ctz(map)] |= *p++;
- }
-}
-
-/* Fold minimask 'mask''s wildcard mask into 'wc's wildcard mask. */
-void
-flow_wildcards_fold_minimask(struct flow_wildcards *wc,
- const struct minimask *mask)
-{
- flow_union_with_miniflow(&wc->masks, &mask->masks);
-}
-
-uint64_t
-miniflow_get_map_in_range(const struct miniflow *miniflow,
- uint8_t start, uint8_t end, unsigned int *offset)
-{
- uint64_t map = miniflow->map;
- *offset = 0;
-
- if (start > 0) {
- uint64_t msk = (UINT64_C(1) << start) - 1; /* 'start' LSBs set */
- *offset = count_1bits(map & msk);
- map &= ~msk;
- }
- if (end < FLOW_U32S) {
- uint64_t msk = (UINT64_C(1) << end) - 1; /* 'end' LSBs set */
- map &= msk;
- }
- return map;
-}
-
-/* Fold minimask 'mask''s wildcard mask into 'wc's wildcard mask
- * in range [start, end). */
-void
-flow_wildcards_fold_minimask_range(struct flow_wildcards *wc,
- const struct minimask *mask,
- uint8_t start, uint8_t end)
-{
- uint32_t *dst_u32 = (uint32_t *)&wc->masks;
- unsigned int offset;
- uint64_t map = miniflow_get_map_in_range(&mask->masks, start, end,
- &offset);
- const uint32_t *p = mask->masks.values + offset;
-
- for (; map; map = zero_rightmost_1bit(map)) {
- dst_u32[raw_ctz(map)] |= *p++;
- }
-}
-
/* Returns a hash of the wildcards in 'wc'. */
uint32_t
flow_wildcards_hash(const struct flow_wildcards *wc, uint32_t basis)
wc->masks.regs[idx] = mask;
}
+/* Calculates the 5-tuple hash from the given miniflow.
+ * This returns the same value as flow_hash_5tuple for the corresponding
+ * flow. */
+uint32_t
+miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis)
+{
+ uint32_t hash = basis;
+
+ if (flow) {
+ ovs_be16 dl_type = MINIFLOW_GET_BE16(flow, dl_type);
+
+ hash = mhash_add(hash, MINIFLOW_GET_U8(flow, nw_proto));
+
+ /* Separate loops for better optimization. */
+ if (dl_type == htons(ETH_TYPE_IPV6)) {
+ uint64_t map = MINIFLOW_MAP(ipv6_src) | MINIFLOW_MAP(ipv6_dst)
+ | MINIFLOW_MAP(tp_src); /* Covers both ports */
+ uint32_t value;
+
+ MINIFLOW_FOR_EACH_IN_MAP(value, flow, map) {
+ hash = mhash_add(hash, value);
+ }
+ } else {
+ uint64_t map = MINIFLOW_MAP(nw_src) | MINIFLOW_MAP(nw_dst)
+ | MINIFLOW_MAP(tp_src); /* Covers both ports */
+ uint32_t value;
+
+ MINIFLOW_FOR_EACH_IN_MAP(value, flow, map) {
+ hash = mhash_add(hash, value);
+ }
+ }
+ hash = mhash_finish(hash, 42); /* Arbitrary number. */
+ }
+ return hash;
+}
+
+BUILD_ASSERT_DECL(offsetof(struct flow, tp_src) + 2
+ == offsetof(struct flow, tp_dst) &&
+ offsetof(struct flow, tp_src) / 4
+ == offsetof(struct flow, tp_dst) / 4);
+BUILD_ASSERT_DECL(offsetof(struct flow, ipv6_src) + 16
+ == offsetof(struct flow, ipv6_dst));
+
/* Calculates the 5-tuple hash from the given flow. */
uint32_t
flow_hash_5tuple(const struct flow *flow, uint32_t basis)
{
- uint32_t hash = 0;
+ uint32_t hash = basis;
- if (!flow) {
- return 0;
- }
+ if (flow) {
+ const uint32_t *flow_u32 = (const uint32_t *)flow;
+
+ hash = mhash_add(hash, flow->nw_proto);
+
+ if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
+ int ofs = offsetof(struct flow, ipv6_src) / 4;
+ int end = ofs + 2 * sizeof flow->ipv6_src / 4;
- hash = mhash_add(basis, (OVS_FORCE uint32_t) flow->nw_src);
- hash = mhash_add(hash, (OVS_FORCE uint32_t) flow->nw_dst);
- hash = mhash_add(hash, ((OVS_FORCE uint32_t) flow->tp_src << 16)
- | (OVS_FORCE uint32_t) flow->tp_dst);
- hash = mhash_add(hash, flow->nw_proto);
+ while (ofs < end) {
+ hash = mhash_add(hash, flow_u32[ofs++]);
+ }
+ } else {
+ hash = mhash_add(hash, (OVS_FORCE uint32_t) flow->nw_src);
+ hash = mhash_add(hash, (OVS_FORCE uint32_t) flow->nw_dst);
+ }
+ hash = mhash_add(hash, flow_u32[offsetof(struct flow, tp_src) / 4]);
- return mhash_finish(hash, 13);
+ hash = mhash_finish(hash, 42); /* Arbitrary number. */
+ }
+ return hash;
}
/* Hashes 'flow' based on its L2 through L4 protocol information. */
flow->mpls_lse[0] = set_mpls_lse_values(ttl, tc, 1, htonl(label));
/* Clear all L3 and L4 fields. */
- BUILD_ASSERT(FLOW_WC_SEQ == 25);
+ BUILD_ASSERT(FLOW_WC_SEQ == 26);
memset((char *) flow + FLOW_SEGMENT_2_ENDS_AT, 0,
sizeof(struct flow) - FLOW_SEGMENT_2_ENDS_AT);
}
static uint32_t *
miniflow_alloc_values(struct miniflow *flow, int n)
{
- if (n <= MINI_N_INLINE) {
+ int size = MINIFLOW_VALUES_SIZE(n);
+
+ if (size <= sizeof flow->inline_values) {
+ flow->values_inline = true;
return flow->inline_values;
} else {
COVERAGE_INC(miniflow_malloc);
- return xmalloc(n * sizeof *flow->values);
+ flow->values_inline = false;
+ flow->offline_values = xmalloc(size);
+ return flow->offline_values;
}
}
* when a miniflow is initialized from a (mini)mask, the values can be zeroes,
* so that the flow and mask always have the same maps.
*
- * This function initializes 'dst->values' (either inline if possible or with
+ * This function initializes values (either inline if possible or with
* malloc() otherwise) and copies the uint32_t elements of 'src' indicated by
* 'dst->map' into it. */
static void
miniflow_init__(struct miniflow *dst, const struct flow *src, int n)
{
const uint32_t *src_u32 = (const uint32_t *) src;
- unsigned int ofs;
+ uint32_t *dst_u32 = miniflow_alloc_values(dst, n);
uint64_t map;
- dst->values = miniflow_alloc_values(dst, n);
- ofs = 0;
for (map = dst->map; map; map = zero_rightmost_1bit(map)) {
- dst->values[ofs++] = src_u32[raw_ctz(map)];
+ *dst_u32++ = src_u32[raw_ctz(map)];
}
}
/* Initializes 'dst' as a copy of 'src'. The caller must eventually free 'dst'
- * with miniflow_destroy(). */
+ * with miniflow_destroy().
+ * Always allocates offline storage. */
void
miniflow_init(struct miniflow *dst, const struct flow *src)
{
void
miniflow_clone(struct miniflow *dst, const struct miniflow *src)
{
- int n = miniflow_n_values(src);
+ int size = MINIFLOW_VALUES_SIZE(miniflow_n_values(src));
+ uint32_t *values;
+
dst->map = src->map;
- dst->values = miniflow_alloc_values(dst, n);
- memcpy(dst->values, src->values, n * sizeof *dst->values);
+ if (size <= sizeof dst->inline_values) {
+ dst->values_inline = true;
+ values = dst->inline_values;
+ } else {
+ dst->values_inline = false;
+ COVERAGE_INC(miniflow_malloc);
+ dst->offline_values = xmalloc(size);
+ values = dst->offline_values;
+ }
+ memcpy(values, miniflow_get_values(src), size);
+}
+
+/* Initializes 'dst' as a copy of 'src'. The caller must have allocated
+ * 'dst' to have inline space all data in 'src'. */
+void
+miniflow_clone_inline(struct miniflow *dst, const struct miniflow *src,
+ size_t n_values)
+{
+ dst->values_inline = true;
+ dst->map = src->map;
+ memcpy(dst->inline_values, miniflow_get_values(src),
+ MINIFLOW_VALUES_SIZE(n_values));
}
/* Initializes 'dst' with the data in 'src', destroying 'src'.
- * The caller must eventually free 'dst' with miniflow_destroy(). */
+ * The caller must eventually free 'dst' with miniflow_destroy().
+ * 'dst' must be regularly sized miniflow, but 'src' can have
+ * larger than default inline values. */
void
miniflow_move(struct miniflow *dst, struct miniflow *src)
{
- if (src->values == src->inline_values) {
- dst->values = dst->inline_values;
- memcpy(dst->values, src->values,
- miniflow_n_values(src) * sizeof *dst->values);
+ int size = MINIFLOW_VALUES_SIZE(miniflow_n_values(src));
+
+ dst->map = src->map;
+ if (size <= sizeof dst->inline_values) {
+ dst->values_inline = true;
+ memcpy(dst->inline_values, miniflow_get_values(src), size);
+ miniflow_destroy(src);
+ } else if (src->values_inline) {
+ dst->values_inline = false;
+ COVERAGE_INC(miniflow_malloc);
+ dst->offline_values = xmalloc(size);
+ memcpy(dst->offline_values, src->inline_values, size);
} else {
- dst->values = src->values;
+ dst->values_inline = false;
+ dst->offline_values = src->offline_values;
}
- dst->map = src->map;
}
/* Frees any memory owned by 'flow'. Does not free the storage in which 'flow'
void
miniflow_destroy(struct miniflow *flow)
{
- if (flow->values != flow->inline_values) {
- free(flow->values);
+ if (!flow->values_inline) {
+ free(flow->offline_values);
}
}
flow_union_with_miniflow(dst, src);
}
-static const uint32_t *
-miniflow_get__(const struct miniflow *flow, unsigned int u32_ofs)
-{
- if (!(flow->map & (UINT64_C(1) << u32_ofs))) {
- static const uint32_t zero = 0;
- return &zero;
- }
- return flow->values +
- count_1bits(flow->map & ((UINT64_C(1) << u32_ofs) - 1));
-}
-
/* Returns the uint32_t that would be at byte offset '4 * u32_ofs' if 'flow'
* were expanded into a "struct flow". */
-uint32_t
+static uint32_t
miniflow_get(const struct miniflow *flow, unsigned int u32_ofs)
{
- return *miniflow_get__(flow, u32_ofs);
-}
-
-/* Returns the ovs_be16 that would be at byte offset 'u8_ofs' if 'flow' were
- * expanded into a "struct flow". */
-static ovs_be16
-miniflow_get_be16(const struct miniflow *flow, unsigned int u8_ofs)
-{
- const uint32_t *u32p = miniflow_get__(flow, u8_ofs / 4);
- const ovs_be16 *be16p = (const ovs_be16 *) u32p;
- return be16p[u8_ofs % 4 != 0];
-}
-
-/* Returns the VID within the vlan_tci member of the "struct flow" represented
- * by 'flow'. */
-uint16_t
-miniflow_get_vid(const struct miniflow *flow)
-{
- ovs_be16 tci = miniflow_get_be16(flow, offsetof(struct flow, vlan_tci));
- return vlan_tci_to_vid(tci);
+ return (flow->map & UINT64_C(1) << u32_ofs)
+ ? *(miniflow_get_u32_values(flow) +
+ count_1bits(flow->map & ((UINT64_C(1) << u32_ofs) - 1)))
+ : 0;
}
/* Returns true if 'a' and 'b' are the same flow, false otherwise. */
bool
miniflow_equal(const struct miniflow *a, const struct miniflow *b)
{
- const uint32_t *ap = a->values;
- const uint32_t *bp = b->values;
+ const uint32_t *ap = miniflow_get_u32_values(a);
+ const uint32_t *bp = miniflow_get_u32_values(b);
const uint64_t a_map = a->map;
const uint64_t b_map = b->map;
- uint64_t map;
- if (a_map == b_map) {
- for (map = a_map; map; map = zero_rightmost_1bit(map)) {
+ if (OVS_LIKELY(a_map == b_map)) {
+ int count = miniflow_n_values(a);
+
+ while (count--) {
if (*ap++ != *bp++) {
return false;
}
}
} else {
+ uint64_t map;
+
for (map = a_map | b_map; map; map = zero_rightmost_1bit(map)) {
uint64_t bit = rightmost_1bit(map);
uint64_t a_value = a_map & bit ? *ap++ : 0;
miniflow_equal_in_minimask(const struct miniflow *a, const struct miniflow *b,
const struct minimask *mask)
{
- const uint32_t *p;
+ const uint32_t *p = miniflow_get_u32_values(&mask->masks);
uint64_t map;
- p = mask->masks.values;
-
for (map = mask->masks.map; map; map = zero_rightmost_1bit(map)) {
int ofs = raw_ctz(map);
- if ((miniflow_get(a, ofs) ^ miniflow_get(b, ofs)) & *p) {
+ if ((miniflow_get(a, ofs) ^ miniflow_get(b, ofs)) & *p++) {
return false;
}
- p++;
}
return true;
const struct minimask *mask)
{
const uint32_t *b_u32 = (const uint32_t *) b;
- const uint32_t *p;
+ const uint32_t *p = miniflow_get_u32_values(&mask->masks);
uint64_t map;
- p = mask->masks.values;
-
for (map = mask->masks.map; map; map = zero_rightmost_1bit(map)) {
int ofs = raw_ctz(map);
- if ((miniflow_get(a, ofs) ^ b_u32[ofs]) & *p) {
+ if ((miniflow_get(a, ofs) ^ b_u32[ofs]) & *p++) {
return false;
}
- p++;
}
return true;
}
-/* Returns a hash value for 'flow', given 'basis'. */
-uint32_t
-miniflow_hash(const struct miniflow *flow, uint32_t basis)
-{
- const uint32_t *p = flow->values;
- uint32_t hash = basis;
- uint64_t hash_map = 0;
- uint64_t map;
-
- for (map = flow->map; map; map = zero_rightmost_1bit(map)) {
- if (*p) {
- hash = mhash_add(hash, *p);
- hash_map |= rightmost_1bit(map);
- }
- p++;
- }
- hash = mhash_add(hash, hash_map);
- hash = mhash_add(hash, hash_map >> 32);
-
- return mhash_finish(hash, p - flow->values);
-}
-
-/* Returns a hash value for the bits of 'flow' where there are 1-bits in
- * 'mask', given 'basis'.
- *
- * The hash values returned by this function are the same as those returned by
- * flow_hash_in_minimask(), only the form of the arguments differ. */
-uint32_t
-miniflow_hash_in_minimask(const struct miniflow *flow,
- const struct minimask *mask, uint32_t basis)
-{
- const uint32_t *p = mask->masks.values;
- uint32_t hash;
- uint64_t map;
-
- hash = basis;
-
- for (map = mask->masks.map; map; map = zero_rightmost_1bit(map)) {
- hash = mhash_add(hash, miniflow_get(flow, raw_ctz(map)) & *p++);
- }
-
- return mhash_finish(hash, (p - mask->masks.values) * 4);
-}
-
-/* Returns a hash value for the bits of 'flow' where there are 1-bits in
- * 'mask', given 'basis'.
- *
- * The hash values returned by this function are the same as those returned by
- * miniflow_hash_in_minimask(), only the form of the arguments differ. */
-uint32_t
-flow_hash_in_minimask(const struct flow *flow, const struct minimask *mask,
- uint32_t basis)
-{
- const uint32_t *flow_u32 = (const uint32_t *)flow;
- const uint32_t *p = mask->masks.values;
- uint32_t hash;
- uint64_t map;
-
- hash = basis;
- for (map = mask->masks.map; map; map = zero_rightmost_1bit(map)) {
- hash = mhash_add(hash, flow_u32[raw_ctz(map)] & *p++);
- }
-
- return mhash_finish(hash, (p - mask->masks.values) * 4);
-}
-
-/* Returns a hash value for the bits of range [start, end) in 'flow',
- * where there are 1-bits in 'mask', given 'hash'.
- *
- * The hash values returned by this function are the same as those returned by
- * minimatch_hash_range(), only the form of the arguments differ. */
-uint32_t
-flow_hash_in_minimask_range(const struct flow *flow,
- const struct minimask *mask,
- uint8_t start, uint8_t end, uint32_t *basis)
-{
- const uint32_t *flow_u32 = (const uint32_t *)flow;
- unsigned int offset;
- uint64_t map = miniflow_get_map_in_range(&mask->masks, start, end,
- &offset);
- const uint32_t *p = mask->masks.values + offset;
- uint32_t hash = *basis;
-
- for (; map; map = zero_rightmost_1bit(map)) {
- hash = mhash_add(hash, flow_u32[raw_ctz(map)] & *p++);
- }
-
- *basis = hash; /* Allow continuation from the unfinished value. */
- return mhash_finish(hash, (p - mask->masks.values) * 4);
-}
-
\f
/* Initializes 'dst' as a copy of 'src'. The caller must eventually free 'dst'
* with minimask_destroy(). */
uint32_t storage[FLOW_U32S])
{
struct miniflow *dst = &dst_->masks;
+ uint32_t *dst_values = storage;
const struct miniflow *a = &a_->masks;
const struct miniflow *b = &b_->masks;
uint64_t map;
int n = 0;
- dst->values = storage;
+ dst->values_inline = false;
+ dst->offline_values = storage;
dst->map = 0;
for (map = a->map & b->map; map; map = zero_rightmost_1bit(map)) {
if (mask) {
dst->map |= rightmost_1bit(map);
- dst->values[n++] = mask;
+ dst_values[n++] = mask;
}
}
}
return miniflow_get(&mask->masks, u32_ofs);
}
-/* Returns the VID mask within the vlan_tci member of the "struct
- * flow_wildcards" represented by 'mask'. */
-uint16_t
-minimask_get_vid_mask(const struct minimask *mask)
-{
- return miniflow_get_vid(&mask->masks);
-}
-
/* Returns true if 'a' and 'b' are the same flow mask, false otherwise. */
bool
minimask_equal(const struct minimask *a, const struct minimask *b)
return miniflow_equal(&a->masks, &b->masks);
}
-/* Returns a hash value for 'mask', given 'basis'. */
-uint32_t
-minimask_hash(const struct minimask *mask, uint32_t basis)
-{
- return miniflow_hash(&mask->masks, basis);
-}
-
-/* Returns true if at least one bit is wildcarded in 'a_' but not in 'b_',
+/* Returns true if at least one bit matched by 'b' is wildcarded by 'a',
* false otherwise. */
bool
-minimask_has_extra(const struct minimask *a_, const struct minimask *b_)
+minimask_has_extra(const struct minimask *a, const struct minimask *b)
{
- const struct miniflow *a = &a_->masks;
- const struct miniflow *b = &b_->masks;
+ const uint32_t *p = miniflow_get_u32_values(&b->masks);
uint64_t map;
- for (map = a->map | b->map; map; map = zero_rightmost_1bit(map)) {
- int ofs = raw_ctz(map);
- uint32_t a_u32 = miniflow_get(a, ofs);
- uint32_t b_u32 = miniflow_get(b, ofs);
+ for (map = b->masks.map; map; map = zero_rightmost_1bit(map)) {
+ uint32_t a_u32 = minimask_get(a, raw_ctz(map));
+ uint32_t b_u32 = *p++;
if ((a_u32 & b_u32) != b_u32) {
return true;
return false;
}
-
-/* Returns true if 'mask' matches every packet, false if 'mask' fixes any bits
- * or fields. */
-bool
-minimask_is_catchall(const struct minimask *mask_)
-{
- const struct miniflow *mask = &mask_->masks;
- const uint32_t *p = mask->values;
- uint64_t map;
-
- for (map = mask->map; map; map = zero_rightmost_1bit(map)) {
- if (*p++) {
- return false;
- }
- }
- return true;
-}
#include "byte-order.h"
#include "openflow/nicira-ext.h"
#include "openflow/openflow.h"
+#include "packets.h"
#include "hash.h"
#include "util.h"
/* This sequence number should be incremented whenever anything involving flows
* or the wildcarding of flows changes. This will cause build assertion
* failures in places which likely need to be updated. */
-#define FLOW_WC_SEQ 25
+#define FLOW_WC_SEQ 26
#define FLOW_N_REGS 8
BUILD_ASSERT_DECL(FLOW_N_REGS <= NXM_NX_MAX_REGS);
const char *flow_tun_flag_to_string(uint32_t flags);
-struct flow_tnl {
- ovs_be64 tun_id;
- ovs_be32 ip_src;
- ovs_be32 ip_dst;
- uint16_t flags;
- uint8_t ip_tos;
- uint8_t ip_ttl;
-};
-
-/* Unfortunately, a "struct flow" sometimes has to handle OpenFlow port
- * numbers and other times datapath (dpif) port numbers. This union allows
- * access to both. */
-union flow_in_port {
- odp_port_t odp_port;
- ofp_port_t ofp_port;
-};
-
/* Maximum number of supported MPLS labels. */
#define FLOW_MAX_MPLS_LABELS 3
* The fields are organized in four segments to facilitate staged lookup, where
* lower layer fields are first used to determine if the later fields need to
* be looked at. This enables better wildcarding for datapath flows.
+ *
+ * NOTE: Order of the fields is significant, any change in the order must be
+ * reflected in miniflow_extract()!
*/
struct flow {
/* L1 */
uint32_t recirc_id; /* Must be exact match. */
union flow_in_port in_port; /* Input port.*/
- /* L2 */
- uint8_t dl_src[6]; /* Ethernet source address. */
+ /* L2, Order the same as in the Ethernet header! */
uint8_t dl_dst[6]; /* Ethernet destination address. */
+ uint8_t dl_src[6]; /* Ethernet source address. */
ovs_be16 dl_type; /* Ethernet frame type. */
ovs_be16 vlan_tci; /* If 802.1Q, TCI | VLAN_CFI; otherwise 0. */
ovs_be32 mpls_lse[FLOW_MAX_MPLS_LABELS]; /* MPLS label stack entry. */
/* L3 */
struct in6_addr ipv6_src; /* IPv6 source address. */
struct in6_addr ipv6_dst; /* IPv6 destination address. */
- struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */
ovs_be32 ipv6_label; /* IPv6 flow label. */
ovs_be32 nw_src; /* IPv4 source address. */
ovs_be32 nw_dst; /* IPv4 destination address. */
uint8_t nw_proto; /* IP protocol or low 8 bits of ARP opcode. */
uint8_t arp_sha[6]; /* ARP/ND source hardware address. */
uint8_t arp_tha[6]; /* ARP/ND target hardware address. */
+ struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */
ovs_be16 tcp_flags; /* TCP flags. With L3 to avoid matching L4. */
ovs_be16 pad; /* Padding. */
/* L4 */
- uint32_t dp_hash; /* Datapath computed hash value. The exact
- computation is opaque to the user space.*/
ovs_be16 tp_src; /* TCP/UDP/SCTP source port. */
ovs_be16 tp_dst; /* TCP/UDP/SCTP destination port.
* Keep last for the BUILD_ASSERT_DECL below */
+ uint32_t dp_hash; /* Datapath computed hash value. The exact
+ computation is opaque to the user space.*/
};
BUILD_ASSERT_DECL(sizeof(struct flow) % 4 == 0);
#define FLOW_U32S (sizeof(struct flow) / 4)
/* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
-BUILD_ASSERT_DECL(offsetof(struct flow, tp_dst) + 2
+BUILD_ASSERT_DECL(offsetof(struct flow, dp_hash) + sizeof(uint32_t)
== sizeof(struct flow_tnl) + 172
- && FLOW_WC_SEQ == 25);
+ && FLOW_WC_SEQ == 26);
/* Incremental points at which flow classification may be performed in
* segments.
* This is located here since this is dependent on the structure of the
* struct flow defined above:
- * Each offset must be on a distint, successive U32 boundary srtictly
+ * Each offset must be on a distinct, successive U32 boundary strictly
* within the struct flow. */
enum {
- FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_src),
+ FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst),
FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, ipv6_src),
- FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, dp_hash),
+ FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src),
};
BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % 4 == 0);
BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % 4 == 0);
{
return hash_int(odp_to_u32(odp_port), 0);
}
-
-uint32_t flow_hash_in_minimask(const struct flow *, const struct minimask *,
- uint32_t basis);
-uint32_t flow_hash_in_minimask_range(const struct flow *,
- const struct minimask *,
- uint8_t start, uint8_t end,
- uint32_t *basis);
\f
/* Wildcards for a flow.
*
const struct flow_wildcards *src2);
bool flow_wildcards_has_extra(const struct flow_wildcards *,
const struct flow_wildcards *);
-
-void flow_wildcards_fold_minimask(struct flow_wildcards *,
- const struct minimask *);
-void flow_wildcards_fold_minimask_range(struct flow_wildcards *,
- const struct minimask *,
- uint8_t start, uint8_t end);
-
uint32_t flow_wildcards_hash(const struct flow_wildcards *, uint32_t basis);
bool flow_wildcards_equal(const struct flow_wildcards *,
const struct flow_wildcards *);
/* Compressed flow. */
#define MINI_N_INLINE (sizeof(void *) == 4 ? 7 : 8)
-BUILD_ASSERT_DECL(FLOW_U32S <= 64);
+BUILD_ASSERT_DECL(FLOW_U32S <= 63);
/* A sparse representation of a "struct flow".
*
*
* The 'map' member holds one bit for each uint32_t in a "struct flow". Each
* 0-bit indicates that the corresponding uint32_t is zero, each 1-bit that it
- * *may* be nonzero.
- *
- * 'values' points to the start of an array that has one element for each 1-bit
- * in 'map'. The least-numbered 1-bit is in values[0], the next 1-bit is in
- * values[1], and so on.
- *
- * 'values' may point to a few different locations:
+ * *may* be nonzero (see below how this applies to minimasks).
*
- * - If 'map' has MINI_N_INLINE or fewer 1-bits, it may point to
- * 'inline_values'. One hopes that this is the common case.
+ * The 'values_inline' boolean member indicates that the values are at
+ * 'inline_values'. If 'values_inline' is zero, then the values are
+ * offline at 'offline_values'. In either case, values is an array that has
+ * one element for each 1-bit in 'map'. The least-numbered 1-bit is in
+ * the first element of the values array, the next 1-bit is in the next array
+ * element, and so on.
*
- * - If 'map' has more than MINI_N_INLINE 1-bits, it may point to memory
- * allocated with malloc().
- *
- * - The caller could provide storage on the stack for situations where
- * that makes sense. So far that's only proved useful for
- * minimask_combine(), but the principle works elsewhere.
- *
- * Elements in 'values' are allowed to be zero. This is useful for "struct
+ * Elements in values array are allowed to be zero. This is useful for "struct
* minimatch", for which ensuring that the miniflow and minimask members have
* same 'map' allows optimization. This allowance applies only to a miniflow
* that is not a mask. That is, a minimask may NOT have zero elements in
* its 'values'.
*/
struct miniflow {
- uint64_t map;
- uint32_t *values;
- uint32_t inline_values[MINI_N_INLINE];
+ uint64_t map:63;
+ uint64_t values_inline:1;
+ union {
+ uint32_t *offline_values;
+ uint32_t inline_values[MINI_N_INLINE];
+ };
};
+#define MINIFLOW_VALUES_SIZE(COUNT) ((COUNT) * sizeof(uint32_t))
+
+static inline uint32_t *miniflow_values(struct miniflow *mf)
+{
+ return OVS_LIKELY(mf->values_inline)
+ ? mf->inline_values : mf->offline_values;
+}
+
+static inline const uint32_t *miniflow_get_values(const struct miniflow *mf)
+{
+ return OVS_LIKELY(mf->values_inline)
+ ? mf->inline_values : mf->offline_values;
+}
+
+static inline const uint32_t *miniflow_get_u32_values(const struct miniflow *mf)
+{
+ return miniflow_get_values(mf);
+}
+
+static inline const ovs_be32 *miniflow_get_be32_values(const struct miniflow *mf)
+{
+ return (OVS_FORCE const ovs_be32 *)miniflow_get_values(mf);
+}
+
+/* This is useful for initializing a miniflow for a miniflow_extract() call. */
+static inline void miniflow_initialize(struct miniflow *mf,
+ uint32_t buf[FLOW_U32S])
+{
+ mf->map = 0;
+ mf->values_inline = (buf == (uint32_t *)(mf + 1));
+ if (!mf->values_inline) {
+ mf->offline_values = buf;
+ }
+}
+
+struct pkt_metadata;
+
+/* The 'dst->values' must be initialized with a buffer with space for
+ * FLOW_U32S. 'dst->map' is ignored on input and set on output to
+ * indicate which fields were extracted. */
+void miniflow_extract(struct ofpbuf *packet, const struct pkt_metadata *,
+ struct miniflow *dst);
void miniflow_init(struct miniflow *, const struct flow *);
void miniflow_init_with_minimask(struct miniflow *, const struct flow *,
const struct minimask *);
void miniflow_clone(struct miniflow *, const struct miniflow *);
+void miniflow_clone_inline(struct miniflow *, const struct miniflow *,
+ size_t n_values);
void miniflow_move(struct miniflow *dst, struct miniflow *);
void miniflow_destroy(struct miniflow *);
void miniflow_expand(const struct miniflow *, struct flow *);
-uint32_t miniflow_get(const struct miniflow *, unsigned int u32_ofs);
-uint16_t miniflow_get_vid(const struct miniflow *);
+static inline uint32_t
+flow_get_next_in_map(const struct flow *flow, uint64_t map, uint32_t *value)
+{
+ if (map) {
+ *value = ((const uint32_t *)flow)[raw_ctz(map)];
+ return true;
+ }
+ return false;
+}
+
+/* Iterate through all flow u32 values specified by 'MAP'.
+ * This works as the first statement in a block.*/
+#define FLOW_FOR_EACH_IN_MAP(VALUE, FLOW, MAP) \
+ uint64_t map_; \
+ for (map_ = (MAP); \
+ flow_get_next_in_map(FLOW, map_, &(VALUE)); \
+ map_ = zero_rightmost_1bit(map_))
+
+#define FLOW_U32_SIZE(FIELD) \
+ DIV_ROUND_UP(sizeof(((struct flow *)0)->FIELD), sizeof(uint32_t))
+
+#define MINIFLOW_MAP(FIELD) \
+ (((UINT64_C(1) << FLOW_U32_SIZE(FIELD)) - 1) \
+ << (offsetof(struct flow, FIELD) / 4))
+
+static inline uint32_t
+mf_get_next_in_map(uint64_t *fmap, uint64_t rm1bit, const uint32_t **fp,
+ uint32_t *value)
+{
+ *value = 0;
+ if (*fmap & rm1bit) {
+ uint64_t trash = *fmap & (rm1bit - 1);
+
+ if (trash) {
+ *fmap -= trash;
+ *fp += count_1bits(trash);
+ }
+ *value = **fp;
+ }
+ return rm1bit != 0;
+}
+
+/* Iterate through all miniflow u32 values specified by 'MAP'.
+ * This works as the first statement in a block.*/
+#define MINIFLOW_FOR_EACH_IN_MAP(VALUE, FLOW, MAP) \
+ const uint32_t *fp_ = miniflow_get_u32_values(FLOW); \
+ uint64_t rm1bit_, fmap_, map_; \
+ for (fmap_ = (FLOW)->map, map_ = (MAP), rm1bit_ = rightmost_1bit(map_); \
+ mf_get_next_in_map(&fmap_, rm1bit_, &fp_, &(VALUE)); \
+ map_ -= rm1bit_, rm1bit_ = rightmost_1bit(map_))
+
+/* Get the value of 'FIELD' of an up to 4 byte wide integer type 'TYPE' of
+ * a miniflow. */
+#define MINIFLOW_GET_TYPE(MF, TYPE, OFS) \
+ (((MF)->map & (UINT64_C(1) << (OFS) / 4)) \
+ ? ((OVS_FORCE const TYPE *) \
+ (miniflow_get_u32_values(MF) \
+ + count_1bits((MF)->map & ((UINT64_C(1) << (OFS) / 4) - 1)))) \
+ [(OFS) % 4 / sizeof(TYPE)] \
+ : 0) \
+
+#define MINIFLOW_GET_U8(FLOW, FIELD) \
+ MINIFLOW_GET_TYPE(FLOW, uint8_t, offsetof(struct flow, FIELD))
+#define MINIFLOW_GET_U16(FLOW, FIELD) \
+ MINIFLOW_GET_TYPE(FLOW, uint16_t, offsetof(struct flow, FIELD))
+#define MINIFLOW_GET_BE16(FLOW, FIELD) \
+ MINIFLOW_GET_TYPE(FLOW, ovs_be16, offsetof(struct flow, FIELD))
+#define MINIFLOW_GET_U32(FLOW, FIELD) \
+ MINIFLOW_GET_TYPE(FLOW, uint32_t, offsetof(struct flow, FIELD))
+#define MINIFLOW_GET_BE32(FLOW, FIELD) \
+ MINIFLOW_GET_TYPE(FLOW, ovs_be32, offsetof(struct flow, FIELD))
+
+static inline uint16_t miniflow_get_vid(const struct miniflow *);
+static inline uint16_t miniflow_get_tcp_flags(const struct miniflow *);
static inline ovs_be64 miniflow_get_metadata(const struct miniflow *);
bool miniflow_equal(const struct miniflow *a, const struct miniflow *b);
bool miniflow_equal_flow_in_minimask(const struct miniflow *a,
const struct flow *b,
const struct minimask *);
-uint32_t miniflow_hash(const struct miniflow *, uint32_t basis);
-uint32_t miniflow_hash_in_minimask(const struct miniflow *,
- const struct minimask *, uint32_t basis);
-uint64_t miniflow_get_map_in_range(const struct miniflow *miniflow,
- uint8_t start, uint8_t end,
- unsigned int *offset);
+uint32_t miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis);
\f
/* Compressed flow wildcards. */
void minimask_expand(const struct minimask *, struct flow_wildcards *);
uint32_t minimask_get(const struct minimask *, unsigned int u32_ofs);
-uint16_t minimask_get_vid_mask(const struct minimask *);
+static inline uint16_t minimask_get_vid_mask(const struct minimask *);
static inline ovs_be64 minimask_get_metadata_mask(const struct minimask *);
bool minimask_equal(const struct minimask *a, const struct minimask *b);
-uint32_t minimask_hash(const struct minimask *, uint32_t basis);
-
bool minimask_has_extra(const struct minimask *, const struct minimask *);
-bool minimask_is_catchall(const struct minimask *);
+
\f
+/* Returns true if 'mask' matches every packet, false if 'mask' fixes any bits
+ * or fields. */
+static inline bool
+minimask_is_catchall(const struct minimask *mask)
+{
+ /* For every 1-bit in mask's map, the corresponding value is non-zero,
+ * so the only way the mask can not fix any bits or fields is for the
+ * map the be zero. */
+ return mask->masks.map == 0;
+}
+
+/* Returns the VID within the vlan_tci member of the "struct flow" represented
+ * by 'flow'. */
+static inline uint16_t
+miniflow_get_vid(const struct miniflow *flow)
+{
+ ovs_be16 tci = MINIFLOW_GET_BE16(flow, vlan_tci);
+ return vlan_tci_to_vid(tci);
+}
+
+/* Returns the VID mask within the vlan_tci member of the "struct
+ * flow_wildcards" represented by 'mask'. */
+static inline uint16_t
+minimask_get_vid_mask(const struct minimask *mask)
+{
+ return miniflow_get_vid(&mask->masks);
+}
+
+/* Returns the value of the "tcp_flags" field in 'flow'. */
+static inline uint16_t
+miniflow_get_tcp_flags(const struct miniflow *flow)
+{
+ return ntohs(MINIFLOW_GET_BE16(flow, tcp_flags));
+}
+
/* Returns the value of the OpenFlow 1.1+ "metadata" field in 'flow'. */
static inline ovs_be64
miniflow_get_metadata(const struct miniflow *flow)
{
+ union {
+ ovs_be64 be64;
+ struct {
+ ovs_be32 hi;
+ ovs_be32 lo;
+ };
+ } value;
+
enum { MD_OFS = offsetof(struct flow, metadata) };
BUILD_ASSERT_DECL(MD_OFS % sizeof(uint32_t) == 0);
- ovs_be32 hi = (OVS_FORCE ovs_be32) miniflow_get(flow, MD_OFS / 4);
- ovs_be32 lo = (OVS_FORCE ovs_be32) miniflow_get(flow, MD_OFS / 4 + 1);
+ value.hi = MINIFLOW_GET_TYPE(flow, ovs_be32, MD_OFS);
+ value.lo = MINIFLOW_GET_TYPE(flow, ovs_be32, MD_OFS + 4);
- return htonll(((uint64_t) ntohl(hi) << 32) | ntohl(lo));
+ return value.be64;
}
/* Returns the mask for the OpenFlow 1.1+ "metadata" field in 'mask'.
return miniflow_get_metadata(&mask->masks);
}
+/* Perform a bitwise OR of miniflow 'src' flow data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_union_with_miniflow(struct flow *dst, const struct miniflow *src)
+{
+ uint32_t *dst_u32 = (uint32_t *) dst;
+ const uint32_t *p = miniflow_get_u32_values(src);
+ uint64_t map;
+
+ for (map = src->map; map; map = zero_rightmost_1bit(map)) {
+ dst_u32[raw_ctz(map)] |= *p++;
+ }
+}
+
+static inline struct pkt_metadata
+pkt_metadata_from_flow(const struct flow *flow)
+{
+ struct pkt_metadata md;
+
+ md.recirc_id = flow->recirc_id;
+ md.dp_hash = flow->dp_hash;
+ md.tunnel = flow->tunnel;
+ md.skb_priority = flow->skb_priority;
+ md.pkt_mark = flow->pkt_mark;
+ md.in_port = flow->in_port;
+
+ return md;
+}
+
+static inline bool is_ip_any(const struct flow *flow)
+{
+ return dl_type_is_ip_any(flow->dl_type);
+}
+
+static inline bool is_icmpv4(const struct flow *flow)
+{
+ return (flow->dl_type == htons(ETH_TYPE_IP)
+ && flow->nw_proto == IPPROTO_ICMP);
+}
+
+static inline bool is_icmpv6(const struct flow *flow)
+{
+ return (flow->dl_type == htons(ETH_TYPE_IPV6)
+ && flow->nw_proto == IPPROTO_ICMPV6);
+}
+
+static inline bool is_stp(const struct flow *flow)
+{
+ return (eth_addr_equals(flow->dl_dst, eth_addr_stp)
+ && flow->dl_type == htons(FLOW_DL_TYPE_NONE));
+}
+
#endif /* flow.h */
size_t n;
};
+#define GUARDED_LIST_INITIALIZER(LIST) { \
+ .mutex = OVS_MUTEX_INITIALIZER, \
+ .list = LIST_INITIALIZER(&((LIST)->list)), \
+ .n = 0 }
+
void guarded_list_init(struct guarded_list *);
void guarded_list_destroy(struct guarded_list *);
return mask;
}
-/* Returns the head node in 'hindex' with the given 'hash', or a null pointer
- * if no nodes have that hash value. */
-struct hindex_node *
-hindex_node_with_hash(const struct hindex *hindex, size_t hash)
-{
- struct hindex_node *node = hindex->buckets[hash & hindex->mask];
-
- while (node && node->hash != hash) {
- node = node->d;
- }
- return node;
-}
-
/* Returns the head node in 'hindex' with the given 'hash'. 'hindex' must
* contain a head node with the given hash. */
static struct hindex_node *
NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER); \
ASSIGN_CONTAINER(NODE, (NODE)->MEMBER.s, MEMBER))
-struct hindex_node *hindex_node_with_hash(const struct hindex *, size_t hash);
+/* Returns the head node in 'hindex' with the given 'hash', or a null pointer
+ * if no nodes have that hash value. */
+static inline struct hindex_node *
+hindex_node_with_hash(const struct hindex *hindex, size_t hash)
+{
+ struct hindex_node *node = hindex->buckets[hash & hindex->mask];
+
+ while (node && node->hash != hash) {
+ node = node->d;
+ }
+ return node;
+}
/* Iteration. */
size_t n, i;
/* Choose a random non-empty bucket. */
- for (i = random_uint32(); ; i++) {
- bucket = hmap->buckets[i & hmap->mask];
+ for (;;) {
+ bucket = hmap->buckets[random_uint32() & hmap->mask];
if (bucket) {
break;
}
enum lacp_status
lacp_status(const struct lacp *lacp) OVS_EXCLUDED(mutex)
{
- enum lacp_status ret;
+ if (lacp) {
+ enum lacp_status ret;
- ovs_mutex_lock(&mutex);
- if (!lacp) {
- ret = LACP_DISABLED;
- } else if (lacp->negotiated) {
- ret = LACP_NEGOTIATED;
+ ovs_mutex_lock(&mutex);
+ ret = lacp->negotiated ? LACP_NEGOTIATED : LACP_CONFIGURED;
+ ovs_mutex_unlock(&mutex);
+ return ret;
} else {
- ret = LACP_CONFIGURED;
+ /* Don't take 'mutex'. It might not even be initialized, since we
+ * don't know that any lacp object has been created. */
+ return LACP_DISABLED;
}
- ovs_mutex_unlock(&mutex);
- return ret;
}
/* Registers 'slave_' as subordinate to 'lacp'. This should be called at least
case OFPTYPE_METER_FEATURES_STATS_REPLY:
case OFPTYPE_TABLE_FEATURES_STATS_REQUEST:
case OFPTYPE_TABLE_FEATURES_STATS_REPLY:
+ case OFPTYPE_BUNDLE_CONTROL:
+ case OFPTYPE_BUNDLE_ADD_MESSAGE:
default:
if (VLOG_IS_DBG_ENABLED()) {
char *s = ofp_to_string(ofpbuf_data(msg), ofpbuf_size(msg), 2);
int i;
- BUILD_ASSERT_DECL(FLOW_WC_SEQ == 25);
+ BUILD_ASSERT_DECL(FLOW_WC_SEQ == 26);
if (priority != OFP_DEFAULT_PRIORITY) {
ds_put_format(s, "priority=%u,", priority);
&& minimask_equal(&a->mask, &b->mask));
}
-/* Returns a hash value for 'match', given 'basis'. */
-uint32_t
-minimatch_hash(const struct minimatch *match, uint32_t basis)
-{
- return miniflow_hash(&match->flow, minimask_hash(&match->mask, basis));
-}
-
/* Returns true if 'target' satisifies 'match', that is, if each bit for which
* 'match' specifies a particular value has the correct value in 'target'.
*
const struct flow *target)
{
const uint32_t *target_u32 = (const uint32_t *) target;
- const uint32_t *flowp = match->flow.values;
- const uint32_t *maskp = match->mask.masks.values;
+ const uint32_t *flowp = miniflow_get_u32_values(&match->flow);
+ const uint32_t *maskp = miniflow_get_u32_values(&match->mask.masks);
uint64_t map;
for (map = match->flow.map; map; map = zero_rightmost_1bit(map)) {
return true;
}
-/* Returns a hash value for the bits of range [start, end) in 'minimatch',
- * given 'basis'.
- *
- * The hash values returned by this function are the same as those returned by
- * flow_hash_in_minimask_range(), only the form of the arguments differ. */
-uint32_t
-minimatch_hash_range(const struct minimatch *match, uint8_t start, uint8_t end,
- uint32_t *basis)
-{
- unsigned int offset;
- const uint32_t *p, *q;
- uint32_t hash = *basis;
- int n, i;
-
- n = count_1bits(miniflow_get_map_in_range(&match->mask.masks, start, end,
- &offset));
- q = match->mask.masks.values + offset;
- p = match->flow.values + offset;
-
- for (i = 0; i < n; i++) {
- hash = mhash_add(hash, p[i] & q[i]);
- }
- *basis = hash; /* Allow continuation from the unfinished value. */
- return mhash_finish(hash, (offset + n) * 4);
-}
-
/* Appends a string representation of 'match' to 's'. If 'priority' is
* different from OFP_DEFAULT_PRIORITY, includes it in 's'. */
void
void minimatch_expand(const struct minimatch *, struct match *);
bool minimatch_equal(const struct minimatch *a, const struct minimatch *b);
-uint32_t minimatch_hash(const struct minimatch *, uint32_t basis);
bool minimatch_matches_flow(const struct minimatch *, const struct flow *);
-uint32_t minimatch_hash_range(const struct minimatch *,
- uint8_t start, uint8_t end, uint32_t *basis);
-
void minimatch_format(const struct minimatch *, struct ds *,
unsigned int priority);
char *minimatch_to_string(const struct minimatch *, unsigned int priority);
MFF_MPLS_BOS, /* u8 */
/* L3. */
+ /* Update mf_is_l3_or_higher() if MFF_IPV4_SRC is
+ * no longer the first element for a field of layer 3 or higher */
MFF_IPV4_SRC, /* be32 */
MFF_IPV4_DST, /* be32 */
bool mf_are_prereqs_ok(const struct mf_field *, const struct flow *);
void mf_mask_field_and_prereqs(const struct mf_field *, struct flow *mask);
+static inline bool
+mf_is_l3_or_higher(const struct mf_field *mf)
+{
+ return mf->id >= MFF_IPV4_SRC;
+}
+
/* Field values. */
bool mf_is_value_valid(const struct mf_field *, const union mf_value *value);
#endif
#include "rtbsd.h"
-#include "connectivity.h"
#include "coverage.h"
#include "dpif-netdev.h"
#include "dynamic-string.h"
#include "ovs-thread.h"
#include "packets.h"
#include "poll-loop.h"
-#include "seq.h"
#include "shash.h"
#include "socket-util.h"
#include "svec.h"
if (is_netdev_bsd_class(netdev_class)) {
dev = netdev_bsd_cast(base_dev);
dev->cache_valid = 0;
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(base_dev);
}
netdev_close(base_dev);
}
struct netdev *netdev = node->data;
dev = netdev_bsd_cast(netdev);
dev->cache_valid = 0;
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev);
netdev_close(netdev);
}
shash_destroy(&device_shash);
if (!error) {
netdev->cache_valid |= VALID_ETHERADDR;
memcpy(netdev->etheraddr, mac, ETH_ADDR_LEN);
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev_);
}
}
ovs_mutex_unlock(&netdev->mutex);
netdev->netmask = mask;
}
}
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev_);
}
ovs_mutex_unlock(&netdev->mutex);
new_flags = (old_flags & ~nd_to_iff_flags(off)) | nd_to_iff_flags(on);
if (new_flags != old_flags) {
error = set_flags(netdev_get_kernel_name(netdev_), new_flags);
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev_);
}
}
return error;
#include <unistd.h>
#include <stdio.h>
-#include "connectivity.h"
#include "dpif-netdev.h"
#include "list.h"
#include "netdev-dpdk.h"
#include "ovs-rcu.h"
#include "packets.h"
#include "shash.h"
-#include "seq.h"
#include "sset.h"
#include "unaligned.h"
#include "timeval.h"
static struct list dpdk_mp_list OVS_GUARDED_BY(dpdk_mutex)
= LIST_INITIALIZER(&dpdk_mp_list);
-static pthread_t watchdog_thread;
-
struct dpdk_mp {
struct rte_mempool *mp;
int mtu;
rte_eth_link_get_nowait(dev->port_id, &link);
if (dev->link.link_status != link.link_status) {
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(&dev->up);
dev->link_reset_cnt++;
dev->link = link;
ovs_mutex_lock(&dev->mutex);
if (!eth_addr_equals(dev->hwaddr, mac)) {
memcpy(dev->hwaddr, mac, ETH_ADDR_LEN);
+ netdev_change_seq_changed(netdev);
}
ovs_mutex_unlock(&dev->mutex);
}
dpdk_mp_put(old_mp);
+ netdev_change_seq_changed(netdev);
out:
ovs_mutex_unlock(&dev->mutex);
ovs_mutex_unlock(&dpdk_mutex);
"[netdev] up|down", 1, 2,
netdev_dpdk_set_admin_state, NULL);
- xpthread_create(&watchdog_thread, NULL, dpdk_watchdog, NULL);
+ ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
return 0;
}
#include <errno.h>
-#include "connectivity.h"
#include "dpif-netdev.h"
#include "flow.h"
#include "list.h"
#include "packets.h"
#include "pcap-file.h"
#include "poll-loop.h"
-#include "seq.h"
#include "shash.h"
#include "sset.h"
#include "stream.h"
ovs_mutex_lock(&dev->mutex);
if (!eth_addr_equals(dev->hwaddr, mac)) {
memcpy(dev->hwaddr, mac, ETH_ADDR_LEN);
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev);
}
ovs_mutex_unlock(&dev->mutex);
netdev->flags |= on;
netdev->flags &= ~off;
if (*old_flagsp != netdev->flags) {
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(&netdev->up);
}
return 0;
#include <string.h>
#include <unistd.h>
-#include "connectivity.h"
#include "coverage.h"
#include "dpif-linux.h"
#include "dpif-netdev.h"
#include "packets.h"
#include "poll-loop.h"
#include "rtnetlink-link.h"
-#include "seq.h"
#include "shash.h"
#include "socket-util.h"
#include "sset.h"
unsigned int ifi_flags, unsigned int mask)
OVS_REQUIRES(dev->mutex)
{
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(&dev->up);
if ((dev->ifi_flags ^ ifi_flags) & IFF_RUNNING) {
dev->carrier_resets++;
error = 0;
}
} else if (netdev->vport_stats_error) {
- /* stats not available from OVS then use ioctl stats. */
+ /* stats not available from OVS then use netdev stats. */
*stats = dev_stats;
} else {
+ /* Use kernel netdev's packet and byte counts since vport's counters
+ * do not reflect packet counts on the wire when GSO, TSO or GRO are
+ * enabled. */
+ stats->rx_packets = dev_stats.rx_packets;
+ stats->rx_bytes = dev_stats.rx_bytes;
+ stats->tx_packets = dev_stats.tx_packets;
+ stats->tx_bytes = dev_stats.tx_bytes;
+
stats->rx_errors += dev_stats.rx_errors;
stats->tx_errors += dev_stats.tx_errors;
stats->rx_dropped += dev_stats.rx_dropped;
stats->tx_heartbeat_errors = 0;
stats->tx_window_errors = 0;
} else {
+ /* Use kernel netdev's packet and byte counts since vport counters
+ * do not reflect packet counts on the wire when GSO, TSO or GRO
+ * are enabled. */
+ stats->rx_packets = dev_stats.tx_packets;
+ stats->rx_bytes = dev_stats.tx_bytes;
+ stats->tx_packets = dev_stats.rx_packets;
+ stats->tx_bytes = dev_stats.rx_bytes;
+
stats->rx_dropped += dev_stats.tx_dropped;
stats->tx_dropped += dev_stats.rx_dropped;
/* Generic interface to network devices. */
+#include "connectivity.h"
#include "netdev.h"
#include "list.h"
+#include "seq.h"
#include "shash.h"
#include "smap.h"
const struct netdev_class *netdev_class; /* Functions to control
this device. */
+ /* A sequence number which indicates changes in one of 'netdev''s
+ * properties. It must be nonzero so that users have a value which
+ * they may use as a reset when tracking 'netdev'.
+ *
+ * Minimally, the sequence number is required to change whenever
+ * 'netdev''s flags, features, ethernet address, or carrier changes. */
+ uint64_t change_seq;
+
/* The following are protected by 'netdev_mutex' (internal to netdev.c). */
int n_rxq;
int ref_cnt; /* Times this devices was opened. */
struct list saved_flags_list; /* Contains "struct netdev_saved_flags". */
};
+static void
+netdev_change_seq_changed(struct netdev *netdev)
+{
+ seq_change(connectivity_seq_get());
+ netdev->change_seq++;
+ if (!netdev->change_seq) {
+ netdev->change_seq++;
+ }
+}
+
const char *netdev_get_type(const struct netdev *);
const struct netdev_class *netdev_get_class(const struct netdev *);
const char *netdev_get_name(const struct netdev *);
struct netdev *netdev_from_name(const char *name);
void netdev_get_devices(const struct netdev_class *,
struct shash *device_list);
+struct netdev **netdev_get_vports(size_t *size);
/* A data structure for capturing packets received by a network device.
*
#include <sys/ioctl.h>
#include "byte-order.h"
-#include "connectivity.h"
#include "daemon.h"
#include "dirs.h"
#include "dpif.h"
#include "netdev-provider.h"
#include "ofpbuf.h"
#include "packets.h"
+#include "poll-loop.h"
#include "route-table.h"
-#include "seq.h"
#include "shash.h"
#include "socket-util.h"
#include "vlog.h"
/* Tunnels. */
struct netdev_tunnel_config tnl_cfg;
+ char egress_iface[IFNAMSIZ];
+ bool carrier_status;
/* Patch Ports. */
char *peer;
struct netdev_class netdev_class;
};
+/* Last read of the route-table's change number. */
+static uint64_t rt_change_seqno;
+
static int netdev_vport_construct(struct netdev *);
static int get_patch_config(const struct netdev *netdev, struct smap *args);
static int get_tunnel_config(const struct netdev *, struct smap *args);
+static bool tunnel_check_status_change__(struct netdev_vport *);
static bool
is_vport_class(const struct netdev_class *class)
return class->construct == netdev_vport_construct;
}
+bool
+netdev_vport_is_vport_class(const struct netdev_class *class)
+{
+ return is_vport_class(class);
+}
+
static const struct vport_class *
vport_class_cast(const struct netdev_class *class)
{
sizeof namebuf));
}
+/* Whenever the route-table change number is incremented,
+ * netdev_vport_route_changed() should be called to update
+ * the corresponding tunnel interface status. */
+static void
+netdev_vport_route_changed(void)
+{
+ struct netdev **vports;
+ size_t i, n_vports;
+
+ vports = netdev_get_vports(&n_vports);
+ for (i = 0; i < n_vports; i++) {
+ struct netdev *netdev_ = vports[i];
+ struct netdev_vport *netdev = netdev_vport_cast(netdev_);
+
+ ovs_mutex_lock(&netdev->mutex);
+ /* Finds all tunnel vports. */
+ if (netdev->tnl_cfg.ip_dst) {
+ if (tunnel_check_status_change__(netdev)) {
+ netdev_change_seq_changed(netdev_);
+ }
+ }
+ netdev_close(netdev_);
+ ovs_mutex_unlock(&netdev->mutex);
+ }
+
+ free(vports);
+}
+
static struct netdev *
netdev_vport_alloc(void)
{
ovs_mutex_lock(&netdev->mutex);
memcpy(netdev->etheraddr, mac, ETH_ADDR_LEN);
ovs_mutex_unlock(&netdev->mutex);
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev_);
return 0;
}
return 0;
}
-static int
-tunnel_get_status(const struct netdev *netdev_, struct smap *smap)
+/* Checks if the tunnel status has changed and returns a boolean.
+ * Updates the tunnel status if it has changed. */
+static bool
+tunnel_check_status_change__(struct netdev_vport *netdev)
+ OVS_REQUIRES(netdev->mutex)
{
- struct netdev_vport *netdev = netdev_vport_cast(netdev_);
char iface[IFNAMSIZ];
+ bool status = false;
ovs_be32 route;
- ovs_mutex_lock(&netdev->mutex);
+ iface[0] = '\0';
route = netdev->tnl_cfg.ip_dst;
- ovs_mutex_unlock(&netdev->mutex);
-
if (route_table_get_name(route, iface)) {
struct netdev *egress_netdev;
- smap_add(smap, "tunnel_egress_iface", iface);
-
if (!netdev_open(iface, "system", &egress_netdev)) {
- smap_add(smap, "tunnel_egress_iface_carrier",
- netdev_get_carrier(egress_netdev) ? "up" : "down");
+ status = netdev_get_carrier(egress_netdev);
netdev_close(egress_netdev);
}
}
+ if (strcmp(netdev->egress_iface, iface)
+ || netdev->carrier_status != status) {
+ ovs_strlcpy(netdev->egress_iface, iface, IFNAMSIZ);
+ netdev->carrier_status = status;
+
+ return true;
+ }
+
+ return false;
+}
+
+static int
+tunnel_get_status(const struct netdev *netdev_, struct smap *smap)
+{
+ struct netdev_vport *netdev = netdev_vport_cast(netdev_);
+
+ if (netdev->egress_iface[0]) {
+ smap_add(smap, "tunnel_egress_iface", netdev->egress_iface);
+
+ smap_add(smap, "tunnel_egress_iface_carrier",
+ netdev->carrier_status ? "up" : "down");
+ }
+
return 0;
}
static void
netdev_vport_run(void)
{
+ uint64_t seq;
+
route_table_run();
+ seq = route_table_get_change_seq();
+ if (rt_change_seqno != seq) {
+ rt_change_seqno = seq;
+ netdev_vport_route_changed();
+ }
}
static void
netdev_vport_wait(void)
{
+ uint64_t seq;
+
route_table_wait();
+ seq = route_table_get_change_seq();
+ if (rt_change_seqno != seq) {
+ poll_immediate_wake();
+ }
}
\f
/* Code specific to tunnel types. */
ovs_mutex_lock(&dev->mutex);
dev->tnl_cfg = tnl_cfg;
- seq_change(connectivity_seq_get());
+ tunnel_check_status_change__(dev);
+ netdev_change_seq_changed(dev_);
ovs_mutex_unlock(&dev->mutex);
return 0;
ovs_mutex_lock(&dev->mutex);
free(dev->peer);
dev->peer = xstrdup(peer);
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(dev_);
ovs_mutex_unlock(&dev->mutex);
return 0;
void netdev_vport_inc_tx(const struct netdev *,
const struct dpif_flow_stats *);
+bool netdev_vport_is_vport_class(const struct netdev_class *);
const char *netdev_vport_class_get_dpif_port(const struct netdev_class *);
#ifndef _WIN32
#include <string.h>
#include <unistd.h>
-#include "connectivity.h"
#include "coverage.h"
#include "dpif.h"
#include "dynamic-string.h"
#include "openflow/openflow.h"
#include "packets.h"
#include "poll-loop.h"
-#include "seq.h"
#include "shash.h"
#include "smap.h"
#include "sset.h"
}
static void
-netdev_initialize(void)
+netdev_class_mutex_initialize(void)
OVS_EXCLUDED(netdev_class_mutex, netdev_mutex)
{
static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
if (ovsthread_once_start(&once)) {
ovs_mutex_init_recursive(&netdev_class_mutex);
+ ovsthread_once_done(&once);
+ }
+}
+
+static void
+netdev_initialize(void)
+ OVS_EXCLUDED(netdev_class_mutex, netdev_mutex)
+{
+ static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+
+ if (ovsthread_once_start(&once)) {
+ netdev_class_mutex_initialize();
fatal_signal_add_hook(restore_all_flags, NULL, NULL, true);
netdev_vport_patch_register();
{
struct netdev_registered_class *rc;
+ netdev_initialize();
ovs_mutex_lock(&netdev_class_mutex);
HMAP_FOR_EACH (rc, hmap_node, &netdev_classes) {
if (rc->class->run) {
{
int error;
+ netdev_class_mutex_initialize();
ovs_mutex_lock(&netdev_class_mutex);
if (netdev_lookup_class(new_class->type)) {
VLOG_WARN("attempted to register duplicate netdev provider: %s",
memset(netdev, 0, sizeof *netdev);
netdev->netdev_class = rc->class;
netdev->name = xstrdup(name);
+ netdev->change_seq = 1;
netdev->node = shash_add(&netdev_shash, name, netdev);
/* By default enable one rx queue per netdev. */
int old_ref_cnt;
atomic_add(&rc->ref_cnt, 1, &old_ref_cnt);
- seq_change(connectivity_seq_get());
+ netdev_change_seq_changed(netdev);
} else {
free(netdev->name);
ovs_assert(list_is_empty(&netdev->saved_flags_list));
ovs_mutex_unlock(&netdev_mutex);
}
+/* Extracts pointers to all 'netdev-vports' into an array 'vports'
+ * and returns it. Stores the size of the array into '*size'.
+ *
+ * The caller is responsible for freeing 'vports' and must close
+ * each 'netdev-vport' in the list. */
+struct netdev **
+netdev_get_vports(size_t *size)
+ OVS_EXCLUDED(netdev_mutex)
+{
+ struct netdev **vports;
+ struct shash_node *node;
+ size_t n = 0;
+
+ if (!size) {
+ return NULL;
+ }
+
+ /* Explicitly allocates big enough chunk of memory. */
+ vports = xmalloc(shash_count(&netdev_shash) * sizeof *vports);
+ ovs_mutex_lock(&netdev_mutex);
+ SHASH_FOR_EACH (node, &netdev_shash) {
+ struct netdev *dev = node->data;
+
+ if (netdev_vport_is_vport_class(dev->netdev_class)) {
+ dev->ref_cnt++;
+ vports[n] = dev;
+ n++;
+ }
+ }
+ ovs_mutex_unlock(&netdev_mutex);
+ *size = n;
+
+ return vports;
+}
+
const char *
netdev_get_type_from_name(const char *name)
{
}
}
}
+
+uint64_t
+netdev_get_change_seq(const struct netdev *netdev)
+{
+ return netdev->change_seq;
+}
int netdev_delete_queue(struct netdev *, unsigned int queue_id);
int netdev_get_queue_stats(const struct netdev *, unsigned int queue_id,
struct netdev_queue_stats *);
+uint64_t netdev_get_change_seq(const struct netdev *);
struct netdev_queue_dump {
struct netdev *netdev;
int match_len;
int i;
- BUILD_ASSERT_DECL(FLOW_WC_SEQ == 25);
+ BUILD_ASSERT_DECL(FLOW_WC_SEQ == 26);
/* Metadata. */
if (match->wc.masks.dp_hash) {
case OVS_ACTION_ATTR_OUTPUT:
case OVS_ACTION_ATTR_USERSPACE:
case OVS_ACTION_ATTR_RECIRC:
+ case OVS_ACTION_ATTR_HASH:
if (dp_execute_action) {
/* Allow 'dp_execute_action' to steal the packet data if we do
* not need it any more. */
case OVS_ACTION_ATTR_POP_VLAN: return 0;
case OVS_ACTION_ATTR_PUSH_MPLS: return sizeof(struct ovs_action_push_mpls);
case OVS_ACTION_ATTR_POP_MPLS: return sizeof(ovs_be16);
- case OVS_ACTION_ATTR_RECIRC: return sizeof(struct ovs_action_recirc);
+ case OVS_ACTION_ATTR_RECIRC: return sizeof(uint32_t);
+ case OVS_ACTION_ATTR_HASH: return sizeof(struct ovs_action_hash);
case OVS_ACTION_ATTR_SET: return -2;
case OVS_ACTION_ATTR_SAMPLE: return -2;
}
static void
-format_odp_recirc_action(struct ds *ds,
- const struct ovs_action_recirc *act)
+format_odp_recirc_action(struct ds *ds, uint32_t recirc_id)
{
- ds_put_format(ds, "recirc(");
+ ds_put_format(ds, "recirc(%"PRIu32")", recirc_id);
+}
- if (act->hash_alg == OVS_RECIRC_HASH_ALG_L4) {
- ds_put_format(ds, "hash_l4(%"PRIu32"), ", act->hash_bias);
- }
+static void
+format_odp_hash_action(struct ds *ds, const struct ovs_action_hash *hash_act)
+{
+ ds_put_format(ds, "hash(");
- ds_put_format(ds, "%"PRIu32")", act->recirc_id);
+ if (hash_act->hash_alg == OVS_HASH_ALG_L4) {
+ ds_put_format(ds, "hash_l4(%"PRIu32")", hash_act->hash_basis);
+ } else {
+ ds_put_format(ds, "Unknown hash algorithm(%"PRIu32")",
+ hash_act->hash_alg);
+ }
+ ds_put_format(ds, ")");
}
static void
format_odp_userspace_action(ds, a);
break;
case OVS_ACTION_ATTR_RECIRC:
- format_odp_recirc_action(ds, nl_attr_get(a));
+ format_odp_recirc_action(ds, nl_attr_get_u32(a));
+ break;
+ case OVS_ACTION_ATTR_HASH:
+ format_odp_hash_action(ds, nl_attr_get(a));
break;
case OVS_ACTION_ATTR_SET:
ds_put_cstr(ds, "set(");
expected_len = odp_flow_key_attr_len(nl_attr_type(a));
if (expected_len != -2) {
bool bad_key_len = nl_attr_get_size(a) != expected_len;
- bool bad_mask_len = ma && nl_attr_get_size(a) != expected_len;
+ bool bad_mask_len = ma && nl_attr_get_size(ma) != expected_len;
if (bad_key_len || bad_mask_len) {
if (bad_key_len) {
ds_put_format(ds, "(bad key length %"PRIuSIZE", expected %d)(",
- nl_attr_get_size(a),
- odp_flow_key_attr_len(nl_attr_type(a)));
+ nl_attr_get_size(a), expected_len);
}
format_generic_odp_key(a, ds);
- if (bad_mask_len) {
+ if (ma) {
ds_put_char(ds, '/');
- ds_put_format(ds, "(bad mask length %"PRIuSIZE", expected %d)(",
- nl_attr_get_size(ma),
- odp_flow_key_attr_len(nl_attr_type(ma)));
+ if (bad_mask_len) {
+ ds_put_format(ds, "(bad mask length %"PRIuSIZE", expected %d)(",
+ nl_attr_get_size(ma), expected_len);
+ }
+ format_generic_odp_key(ma, ds);
}
- format_generic_odp_key(ma, ds);
ds_put_char(ds, ')');
return;
}
} else {
const struct ovs_key_sctp *sctp_key = nl_attr_get(a);
- ds_put_format(ds, "(src=%"PRIu16",dst=%"PRIu16")",
+ ds_put_format(ds, "src=%"PRIu16",dst=%"PRIu16,
ntohs(sctp_key->sctp_src), ntohs(sctp_key->sctp_dst));
}
break;
}
static void
-odp_flow_key_from_flow__(struct ofpbuf *buf, const struct flow *data,
- const struct flow *flow, odp_port_t odp_in_port,
- size_t max_mpls_depth)
+odp_flow_key_from_flow__(struct ofpbuf *buf, const struct flow *flow,
+ const struct flow *mask, odp_port_t odp_in_port,
+ size_t max_mpls_depth, bool export_mask)
{
- bool is_mask;
struct ovs_key_ethernet *eth_key;
size_t encap;
-
- /* We assume that if 'data' and 'flow' are not the same, we should
- * treat 'data' as a mask. */
- is_mask = (data != flow);
+ const struct flow *data = export_mask ? mask : flow;
nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, data->skb_priority);
- if (flow->tunnel.ip_dst || is_mask) {
+ if (flow->tunnel.ip_dst || export_mask) {
tun_key_to_attr(buf, &data->tunnel);
}
nl_msg_put_u32(buf, OVS_KEY_ATTR_SKB_MARK, data->pkt_mark);
- if (flow->recirc_id) {
+ if (data->recirc_id || (mask && mask->recirc_id)) {
nl_msg_put_u32(buf, OVS_KEY_ATTR_RECIRC_ID, data->recirc_id);
}
- if (flow->dp_hash) {
+ if (data->dp_hash || (mask && mask->dp_hash)) {
nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, data->dp_hash);
}
/* Add an ingress port attribute if this is a mask or 'odp_in_port'
* is not the magical value "ODPP_NONE". */
- if (is_mask || odp_in_port != ODPP_NONE) {
+ if (export_mask || odp_in_port != ODPP_NONE) {
nl_msg_put_odp_port(buf, OVS_KEY_ATTR_IN_PORT, odp_in_port);
}
memcpy(eth_key->eth_dst, data->dl_dst, ETH_ADDR_LEN);
if (flow->vlan_tci != htons(0) || flow->dl_type == htons(ETH_TYPE_VLAN)) {
- if (is_mask) {
+ if (export_mask) {
nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, OVS_BE16_MAX);
} else {
nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, htons(ETH_TYPE_VLAN));
* <none> 0xffff Any non-Ethernet II frame (except valid
* 802.3 SNAP packet with valid eth_type).
*/
- if (is_mask) {
+ if (export_mask) {
nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, OVS_BE16_MAX);
}
goto unencap;
ipv4_key->ipv4_proto = data->nw_proto;
ipv4_key->ipv4_tos = data->nw_tos;
ipv4_key->ipv4_ttl = data->nw_ttl;
- ipv4_key->ipv4_frag = is_mask ? ovs_to_odp_frag_mask(data->nw_frag)
+ ipv4_key->ipv4_frag = export_mask ? ovs_to_odp_frag_mask(data->nw_frag)
: ovs_to_odp_frag(data->nw_frag);
} else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
struct ovs_key_ipv6 *ipv6_key;
ipv6_key->ipv6_proto = data->nw_proto;
ipv6_key->ipv6_tclass = data->nw_tos;
ipv6_key->ipv6_hlimit = data->nw_ttl;
- ipv6_key->ipv6_frag = is_mask ? ovs_to_odp_frag_mask(data->nw_frag)
+ ipv6_key->ipv6_frag = export_mask ? ovs_to_odp_frag_mask(data->nw_frag)
: ovs_to_odp_frag(data->nw_frag);
} else if (flow->dl_type == htons(ETH_TYPE_ARP) ||
flow->dl_type == htons(ETH_TYPE_RARP)) {
if (flow->tp_dst == htons(0) &&
(flow->tp_src == htons(ND_NEIGHBOR_SOLICIT) ||
flow->tp_src == htons(ND_NEIGHBOR_ADVERT)) &&
- (!is_mask || (data->tp_src == htons(0xffff) &&
+ (!export_mask || (data->tp_src == htons(0xffff) &&
data->tp_dst == htons(0xffff)))) {
struct ovs_key_nd *nd_key;
* capable of being expanded to allow for that much space. */
void
odp_flow_key_from_flow(struct ofpbuf *buf, const struct flow *flow,
- odp_port_t odp_in_port)
+ const struct flow *mask, odp_port_t odp_in_port)
{
- odp_flow_key_from_flow__(buf, flow, flow, odp_in_port, SIZE_MAX);
+ odp_flow_key_from_flow__(buf, flow, mask, odp_in_port, SIZE_MAX, false);
}
/* Appends a representation of 'mask' as OVS_KEY_ATTR_* attributes to
const struct flow *flow, uint32_t odp_in_port_mask,
size_t max_mpls_depth)
{
- odp_flow_key_from_flow__(buf, mask, flow, u32_to_odp(odp_in_port_mask),
- max_mpls_depth);
+ odp_flow_key_from_flow__(buf, flow, mask,
+ u32_to_odp(odp_in_port_mask), max_mpls_depth, true);
}
/* Generate ODP flow key from the given packet metadata */
const struct simap *port_names,
struct ofpbuf *, struct ofpbuf *);
-void odp_flow_key_from_flow(struct ofpbuf *, const struct flow *,
- odp_port_t odp_in_port);
+void odp_flow_key_from_flow(struct ofpbuf *, const struct flow * flow,
+ const struct flow *mask, odp_port_t odp_in_port);
void odp_flow_key_from_mask(struct ofpbuf *, const struct flow *mask,
const struct flow *flow, uint32_t odp_in_port,
size_t max_mpls_depth);
* Used for NXAST_OUTPUT_REG. */
struct ofpact_output_reg {
struct ofpact ofpact;
- struct mf_subfield src;
uint16_t max_len;
+ struct mf_subfield src;
};
/* OFPACT_BUNDLE.
* After using this function to add a variable-length action, add the
* elements of the flexible array (e.g. with ofpbuf_put()), then use
* ofpact_update_len() to update the length embedded into the action.
- * (Keep in mind the need to refresh the structure from 'ofpacts->l2' after
- * adding data to 'ofpacts'.)
+ * (Keep in mind the need to refresh the structure from 'ofpacts->frame'
+ * after adding data to 'ofpacts'.)
*
* struct <STRUCT> *ofpact_get_<ENUM>(const struct ofpact *ofpact);
*
/* OF1.3+(13,5). Permissions error. */
OFPERR_OFPTFFC_EPERM,
+/* ## -------------------- ## */
+/* ## OFPET_BUNDLE_FAILED ## */
+/* ## -------------------- ## */
+
+ /* OF1.4+(17,0). Unspecified error. */
+ OFPERR_OFPBFC_UNKNOWN,
+
+ /* OF1.4+(17,1). Permissions error. */
+ OFPERR_OFPBFC_EPERM,
+
+ /* OF1.4+(17,2). Bundle ID doesn't exist. */
+ OFPERR_OFPBFC_BAD_ID,
+
+ /* OF1.4+(17,3). Bundle ID already exists. */
+ OFPERR_OFPBFC_BUNDLE_EXIST,
+
+ /* OF1.4+(17,4). Bundle ID is closed. */
+ OFPERR_OFPBFC_BUNDLE_CLOSED,
+
+ /* OF1.4+(17,5). Too many bundle IDs. */
+ OFPERR_OFPBFC_OUT_OF_BUNDLES,
+
+ /* OF1.4+(17,6). Unsupported of unknown message control type. */
+ OFPERR_OFPBFC_BAD_TYPE,
+
+ /* OF1.4+(17,7). Unsupported, unknown, or inconsistent flags. */
+ OFPERR_OFPBFC_BAD_FLAGS,
+
+ /* OF1.4+(17,8). Length problem in included message. */
+ OFPERR_OFPBFC_MSG_BAD_LEN,
+
+ /* OF1.4+(17,9). Inconsistent or duplicate XID. */
+ OFPERR_OFPBFC_MSG_BAD_XID,
+
+ /* OF1.4+(17,10). Unsupported message in this bundle. */
+ OFPERR_OFPBFC_MSG_UNSUP,
+
+ /* OF1.4+(17,11). Unsupported message combination in this bundle. */
+ OFPERR_OFPBFC_MSG_CONFLICT,
+
+ /* OF1.4+(17,12). Cant handle this many messages in bundle. */
+ OFPERR_OFPBFC_MSG_TOO_MANY,
+
+ /* OF1.4+(17,13). One message in bundle failed. */
+ OFPERR_OFPBFC_MSG_FAILED,
+
+ /* OF1.4+(17,14). Bundle is taking too long. */
+ OFPERR_OFPBFC_TIMEOUT,
+
+ /* OF1.4+(17,15). Bundle is locking the resource. */
+ OFPERR_OFPBFC_BUNDLE_IN_PROGRESS,
+
/* ## ------------------ ## */
/* ## OFPET_EXPERIMENTER ## */
/* ## ------------------ ## */
/* OFPT 1.4+ (30): struct ofp14_role_status, uint8_t[8][]. */
OFPRAW_OFPT14_ROLE_STATUS,
+ /* OFPT 1.4+ (33): struct ofp14_bundle_ctrl_msg, uint8_t[8][]. */
+ OFPRAW_OFPT14_BUNDLE_CONTROL,
+
+ /* OFPT 1.4+ (34): struct ofp14_bundle_ctrl_msg, uint8_t[]. */
+ OFPRAW_OFPT14_BUNDLE_ADD_MESSAGE,
+
/* Standard statistics. */
/* OFPST 1.0+ (0): void. */
/* Controller role change event messages. */
OFPTYPE_ROLE_STATUS, /* OFPRAW_OFPT14_ROLE_STATUS. */
+ OFPTYPE_BUNDLE_CONTROL, /* OFPRAW_OFPT14_BUNDLE_CONTROL. */
+
+ OFPTYPE_BUNDLE_ADD_MESSAGE, /* OFPRAW_OFPT14_BUNDLE_ADD_MESSAGE. */
+
/* Statistics. */
OFPTYPE_DESC_STATS_REQUEST, /* OFPRAW_OFPST_DESC_REQUEST. */
OFPTYPE_DESC_STATS_REPLY, /* OFPRAW_OFPST_DESC_REPLY. */
size_t i;
for (i = 0; i < *n_fms; i++) {
- free((*fms)[i].ofpacts);
+ free(CONST_CAST(struct ofpact *, (*fms)[i].ofpacts));
}
free(*fms);
*fms = NULL;
case OFP13_VERSION:
ds_put_cstr(string, " (OF1.3)");
break;
+ case OFP14_VERSION:
+ ds_put_cstr(string, " (OF1.4)");
+ break;
default:
ds_put_format(string, " (OF 0x%02"PRIx8")", oh->version);
break;
}
}
+static const char *
+bundle_flags_to_name(uint32_t bit)
+{
+ switch (bit) {
+ case OFPBF_ATOMIC:
+ return "atomic";
+ case OFPBF_ORDERED:
+ return "ordered";
+ default:
+ return NULL;
+ }
+}
+
+static void
+ofp_print_bundle_ctrl(struct ds *s, const struct ofp_header *oh)
+{
+ int error;
+ struct ofputil_bundle_ctrl_msg bctrl;
+
+ error = ofputil_decode_bundle_ctrl(oh, &bctrl);
+ if (error) {
+ ofp_print_error(s, error);
+ return;
+ }
+
+ ds_put_char(s, '\n');
+
+ ds_put_format(s, " bundle_id=%#"PRIx32" type=", bctrl.bundle_id);
+ switch (bctrl.type) {
+ case OFPBCT_OPEN_REQUEST:
+ ds_put_cstr(s, "OPEN_REQUEST");
+ break;
+ case OFPBCT_OPEN_REPLY:
+ ds_put_cstr(s, "OPEN_REPLY");
+ break;
+ case OFPBCT_CLOSE_REQUEST:
+ ds_put_cstr(s, "CLOSE_REQUEST");
+ break;
+ case OFPBCT_CLOSE_REPLY:
+ ds_put_cstr(s, "CLOSE_REPLY");
+ break;
+ case OFPBCT_COMMIT_REQUEST:
+ ds_put_cstr(s, "COMMIT_REQUEST");
+ break;
+ case OFPBCT_COMMIT_REPLY:
+ ds_put_cstr(s, "COMMIT_REPLY");
+ break;
+ case OFPBCT_DISCARD_REQUEST:
+ ds_put_cstr(s, "DISCARD_REQUEST");
+ break;
+ case OFPBCT_DISCARD_REPLY:
+ ds_put_cstr(s, "DISCARD_REPLY");
+ break;
+ }
+
+ ds_put_cstr(s, " flags=");
+ ofp_print_bit_names(s, bctrl.flags, bundle_flags_to_name, ' ');
+}
+
+static void
+ofp_print_bundle_add(struct ds *s, const struct ofp_header *oh, int verbosity)
+{
+ int error;
+ struct ofputil_bundle_add_msg badd;
+ char *msg;
+
+ error = ofputil_decode_bundle_add(oh, &badd);
+ if (error) {
+ ofp_print_error(s, error);
+ return;
+ }
+
+ ds_put_char(s, '\n');
+ ds_put_format(s, " bundle_id=%#"PRIx32, badd.bundle_id);
+ ds_put_cstr(s, " flags=");
+ ofp_print_bit_names(s, badd.flags, bundle_flags_to_name, ' ');
+
+ ds_put_char(s, '\n');
+ msg = ofp_to_string(badd.msg, ntohs(badd.msg->length), verbosity);
+ if (msg) {
+ ds_put_cstr(s, msg);
+ }
+}
+
static void
ofp_to_string__(const struct ofp_header *oh, enum ofpraw raw,
struct ds *string, int verbosity)
case OFPTYPE_FLOW_MONITOR_STATS_REPLY:
ofp_print_nxst_flow_monitor_reply(string, msg);
break;
+
+ case OFPTYPE_BUNDLE_CONTROL:
+ ofp_print_bundle_ctrl(string, msg);
+ break;
+
+ case OFPTYPE_BUNDLE_ADD_MESSAGE:
+ ofp_print_bundle_add(string, msg, verbosity);
+ break;
}
}
void
ofputil_wildcard_from_ofpfw10(uint32_t ofpfw, struct flow_wildcards *wc)
{
- BUILD_ASSERT_DECL(FLOW_WC_SEQ == 25);
+ BUILD_ASSERT_DECL(FLOW_WC_SEQ == 26);
/* Initialize most of wc. */
flow_wildcards_init_catchall(wc);
bool
ofputil_port_from_string(const char *s, ofp_port_t *portp)
{
- uint32_t port32;
+ unsigned int port32; /* int is at least 32 bits wide. */
+ if (*s == '-') {
+ VLOG_WARN("Negative value %s is not a valid port number.", s);
+ return false;
+ }
*portp = 0;
if (str_to_uint(s, 10, &port32)) {
if (port32 < ofp_to_u16(OFPP_MAX)) {
return request;
}
-static void *
-ofputil_group_stats_to_ofp11(const struct ofputil_group_stats *ogs,
- size_t base_len, struct list *replies)
+static void
+ofputil_group_stats_to_ofp11__(const struct ofputil_group_stats *gs,
+ struct ofp11_group_stats *gs11, size_t length,
+ struct ofp11_bucket_counter bucket_cnts[])
{
- struct ofp11_bucket_counter *bc11;
- struct ofp11_group_stats *gs11;
- size_t length;
int i;
- length = base_len + sizeof(struct ofp11_bucket_counter) * ogs->n_buckets;
-
- gs11 = ofpmp_append(replies, length);
- memset(gs11, 0, base_len);
+ memset(gs11, 0, length);
gs11->length = htons(length);
- gs11->group_id = htonl(ogs->group_id);
- gs11->ref_count = htonl(ogs->ref_count);
- gs11->packet_count = htonll(ogs->packet_count);
- gs11->byte_count = htonll(ogs->byte_count);
+ gs11->group_id = htonl(gs->group_id);
+ gs11->ref_count = htonl(gs->ref_count);
+ gs11->packet_count = htonll(gs->packet_count);
+ gs11->byte_count = htonll(gs->byte_count);
- bc11 = (void *) (((uint8_t *) gs11) + base_len);
- for (i = 0; i < ogs->n_buckets; i++) {
- const struct bucket_counter *obc = &ogs->bucket_stats[i];
-
- bc11[i].packet_count = htonll(obc->packet_count);
- bc11[i].byte_count = htonll(obc->byte_count);
+ for (i = 0; i < gs->n_buckets; i++) {
+ bucket_cnts[i].packet_count = htonll(gs->bucket_stats[i].packet_count);
+ bucket_cnts[i].byte_count = htonll(gs->bucket_stats[i].byte_count);
}
-
- return gs11;
}
static void
-ofputil_append_of13_group_stats(const struct ofputil_group_stats *ogs,
- struct list *replies)
+ofputil_group_stats_to_ofp11(const struct ofputil_group_stats *gs,
+ struct ofp11_group_stats *gs11, size_t length)
{
- struct ofp13_group_stats *gs13;
+ ofputil_group_stats_to_ofp11__(gs, gs11, length, gs11->bucket_stats);
+}
- gs13 = ofputil_group_stats_to_ofp11(ogs, sizeof *gs13, replies);
- gs13->duration_sec = htonl(ogs->duration_sec);
- gs13->duration_nsec = htonl(ogs->duration_nsec);
+static void
+ofputil_group_stats_to_ofp13(const struct ofputil_group_stats *gs,
+ struct ofp13_group_stats *gs13, size_t length)
+{
+ ofputil_group_stats_to_ofp11__(gs, &gs13->gs, length, gs13->bucket_stats);
+ gs13->duration_sec = htonl(gs->duration_sec);
+ gs13->duration_nsec = htonl(gs->duration_nsec);
}
-/* Encodes 'ogs' properly for the format of the list of group statistics
+/* Encodes 'gs' properly for the format of the list of group statistics
* replies already begun in 'replies' and appends it to the list. 'replies'
* must have originally been initialized with ofpmp_init(). */
void
ofputil_append_group_stats(struct list *replies,
- const struct ofputil_group_stats *ogs)
+ const struct ofputil_group_stats *gs)
{
struct ofpbuf *msg = ofpbuf_from_list(list_back(replies));
struct ofp_header *oh = ofpbuf_data(msg);
+ size_t length;
- switch ((enum ofp_version)oh->version) {
+ switch ((enum ofp_version) oh->version) {
case OFP11_VERSION:
- case OFP12_VERSION:
- ofputil_group_stats_to_ofp11(ogs, sizeof(struct ofp11_group_stats),
- replies);
- break;
+ case OFP12_VERSION:{
+ struct ofp11_group_stats *reply;
+
+ length = gs->n_buckets * sizeof reply->bucket_stats[0]
+ + sizeof *reply;
+ reply = ofpmp_append(replies, length);
+ ofputil_group_stats_to_ofp11(gs, reply, length);
+ break;
+ }
case OFP13_VERSION:
- ofputil_append_of13_group_stats(ogs, replies);
- break;
+ case OFP14_VERSION:{
+ struct ofp13_group_stats *reply;
- case OFP14_VERSION:
- OVS_NOT_REACHED();
- break;
+ length = gs->n_buckets * sizeof reply->bucket_stats[0]
+ + sizeof *reply;
+ reply = ofpmp_append(replies, length);
+ ofputil_group_stats_to_ofp13(gs, reply, length);
+ break;
+ }
case OFP10_VERSION:
default:
OVS_NOT_REACHED();
}
}
-
/* Returns an OpenFlow group features request for OpenFlow version
* 'ofp_version'. */
struct ofpbuf *
OVS_NOT_REACHED();
}
}
+
+enum ofperr
+ofputil_decode_bundle_ctrl(const struct ofp_header *oh,
+ struct ofputil_bundle_ctrl_msg *msg)
+{
+ struct ofpbuf b;
+ enum ofpraw raw;
+ const struct ofp14_bundle_ctrl_msg *m;
+
+ ofpbuf_use_const(&b, oh, ntohs(oh->length));
+ raw = ofpraw_pull_assert(&b);
+ ovs_assert(raw == OFPRAW_OFPT14_BUNDLE_CONTROL);
+
+ m = ofpbuf_l3(&b);
+ msg->bundle_id = ntohl(m->bundle_id);
+ msg->type = ntohs(m->type);
+ msg->flags = ntohs(m->flags);
+
+ return 0;
+}
+
+struct ofpbuf *
+ofputil_encode_bundle_ctrl_reply(const struct ofp_header *oh,
+ struct ofputil_bundle_ctrl_msg *msg)
+{
+ struct ofpbuf *buf;
+ struct ofp14_bundle_ctrl_msg *m;
+
+ buf = ofpraw_alloc_reply(OFPRAW_OFPT14_BUNDLE_CONTROL, oh, 0);
+ m = ofpbuf_put_zeros(buf, sizeof *m);
+
+ m->bundle_id = htonl(msg->bundle_id);
+ m->type = htons(msg->type);
+ m->flags = htons(msg->flags);
+
+ return buf;
+}
+
+enum ofperr
+ofputil_decode_bundle_add(const struct ofp_header *oh,
+ struct ofputil_bundle_add_msg *msg)
+{
+ const struct ofp14_bundle_ctrl_msg *m;
+ struct ofpbuf b;
+ enum ofpraw raw;
+ size_t inner_len;
+
+ ofpbuf_use_const(&b, oh, ntohs(oh->length));
+ raw = ofpraw_pull_assert(&b);
+ ovs_assert(raw == OFPRAW_OFPT14_BUNDLE_ADD_MESSAGE);
+
+ m = ofpbuf_pull(&b, sizeof *m);
+ msg->bundle_id = ntohl(m->bundle_id);
+ msg->flags = ntohs(m->flags);
+
+ msg->msg = ofpbuf_data(&b);
+ inner_len = ntohs(msg->msg->length);
+ if (inner_len < sizeof(struct ofp_header) || inner_len > ofpbuf_size(&b)) {
+ return OFPERR_OFPBFC_MSG_BAD_LEN;
+ }
+
+ return 0;
+}
+
+struct ofpbuf *
+ofputil_encode_bundle_add(enum ofp_version ofp_version,
+ struct ofputil_bundle_add_msg *msg)
+{
+ struct ofpbuf *request;
+ struct ofp14_bundle_ctrl_msg *m;
+
+ request = ofpraw_alloc(OFPRAW_OFPT14_BUNDLE_ADD_MESSAGE, ofp_version, 0);
+ m = ofpbuf_put_zeros(request, sizeof *m);
+
+ m->bundle_id = htonl(msg->bundle_id);
+ m->flags = htons(msg->flags);
+ ofpbuf_put(request, msg->msg, ntohs(msg->msg->length));
+
+ return request;
+}
uint32_t ofputil_protocols_to_version_bitmap(enum ofputil_protocol);
enum ofputil_protocol ofputil_protocols_from_version_bitmap(uint32_t bitmap);
-/* Bitmap of OpenFlow versions that Open vSwitch supports. */
-#define OFPUTIL_SUPPORTED_VERSIONS \
- ((1u << OFP10_VERSION) | (1u << OFP12_VERSION) | (1u << OFP13_VERSION))
-
-/* Bitmap of OpenFlow versions to enable by default (a subset of
- * OFPUTIL_SUPPORTED_VERSIONS). */
-#define OFPUTIL_DEFAULT_VERSIONS (1u << OFP10_VERSION)
+/* Bitmaps of OpenFlow versions that Open vSwitch supports, and that it enables
+ * by default. When Open vSwitch has experimental or incomplete support for
+ * newer versions of OpenFlow, those versions should not be supported by
+ * default and thus should be omitted from the latter bitmap. */
+#define OFPUTIL_SUPPORTED_VERSIONS ((1u << OFP10_VERSION) | \
+ (1u << OFP11_VERSION) | \
+ (1u << OFP12_VERSION) | \
+ (1u << OFP13_VERSION))
+#define OFPUTIL_DEFAULT_VERSIONS OFPUTIL_SUPPORTED_VERSIONS
enum ofputil_protocol ofputil_protocols_from_string(const char *s);
ofp_port_t out_port;
uint32_t out_group;
enum ofputil_flow_mod_flags flags;
- struct ofpact *ofpacts; /* Series of "struct ofpact"s. */
- size_t ofpacts_len; /* Length of ofpacts, in bytes. */
+ struct ofpact *ofpacts; /* Series of "struct ofpact"s. */
+ size_t ofpacts_len; /* Length of ofpacts, in bytes. */
};
enum ofperr ofputil_decode_flow_mod(struct ofputil_flow_mod *,
int hard_age; /* Seconds since last change, -1 if unknown. */
uint64_t packet_count; /* Packet count, UINT64_MAX if unknown. */
uint64_t byte_count; /* Byte count, UINT64_MAX if unknown. */
- struct ofpact *ofpacts;
+ const struct ofpact *ofpacts;
size_t ofpacts_len;
enum ofputil_flow_mod_flags flags;
};
uint16_t priority;
ovs_be64 cookie;
struct match *match;
- struct ofpact *ofpacts;
+ const struct ofpact *ofpacts;
size_t ofpacts_len;
/* Used only for NXFME_ABBREV. */
struct list *replies);
struct ofpbuf *ofputil_encode_group_desc_request(enum ofp_version);
+struct ofputil_bundle_ctrl_msg {
+ uint32_t bundle_id;
+ uint16_t type;
+ uint16_t flags;
+};
+
+struct ofputil_bundle_add_msg {
+ uint32_t bundle_id;
+ uint16_t flags;
+ const struct ofp_header *msg;
+};
+
+enum ofperr ofputil_decode_bundle_ctrl(const struct ofp_header *,
+ struct ofputil_bundle_ctrl_msg *);
+
+struct ofpbuf *ofputil_encode_bundle_ctrl_reply(const struct ofp_header *,
+ struct ofputil_bundle_ctrl_msg *);
+
+struct ofpbuf *ofputil_encode_bundle_add(enum ofp_version ofp_version,
+ struct ofputil_bundle_add_msg *msg);
+
+enum ofperr ofputil_decode_bundle_add(const struct ofp_header *,
+ struct ofputil_bundle_add_msg *);
#endif /* ofp-util.h */
ofputil_format_version_bitmap_names(&msg, OFPUTIL_DEFAULT_VERSIONS);
printf(
- "\nOpen Flow Version options:\n"
+ "\nOpenFlow version options:\n"
" -V, --version display version information\n"
- " -O, --protocols set allowed Open Flow versions\n"
+ " -O, --protocols set allowed OpenFlow versions\n"
" (default: %s)\n",
ds_cstr(&msg));
ds_destroy(&msg);
#include "ovs-thread.h"
#include "poll-loop.h"
#include "seq.h"
+#include "timeval.h"
+#include "vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(ovs_rcu);
struct ovsrcu_cb {
void (*function)(void *aux);
};
struct ovsrcu_perthread {
- struct list list_node; /* In global list. */
+ struct list list_node; /* In global list. */
struct ovs_mutex mutex;
uint64_t seqno;
struct ovsrcu_cbset *cbset;
+ char name[16]; /* This thread's name. */
};
static struct seq *global_seqno;
perthread = pthread_getspecific(perthread_key);
if (!perthread) {
+ const char *name = get_subprogram_name();
+
perthread = xmalloc(sizeof *perthread);
ovs_mutex_init(&perthread->mutex);
perthread->seqno = seq_read(global_seqno);
perthread->cbset = NULL;
+ ovs_strlcpy(perthread->name, name[0] ? name : "main",
+ sizeof perthread->name);
ovs_mutex_lock(&ovsrcu_threads_mutex);
list_push_back(&ovsrcu_threads, &perthread->list_node);
} else {
static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
if (ovsthread_once_start(&once)) {
- xpthread_create(NULL, NULL, ovsrcu_postpone_thread, NULL);
+ ovs_thread_create("urcu", ovsrcu_postpone_thread, NULL);
ovsthread_once_done(&once);
}
}
ovsrcu_quiesced();
}
+bool
+ovsrcu_is_quiescent(void)
+{
+ ovsrcu_init();
+ return pthread_getspecific(perthread_key) == NULL;
+}
+
static void
ovsrcu_synchronize(void)
{
+ unsigned int warning_threshold = 1000;
uint64_t target_seqno;
+ long long int start;
if (single_threaded()) {
return;
target_seqno = seq_read(global_seqno);
ovsrcu_quiesce_start();
+ start = time_msec();
for (;;) {
uint64_t cur_seqno = seq_read(global_seqno);
struct ovsrcu_perthread *perthread;
+ char stalled_thread[16];
+ unsigned int elapsed;
bool done = true;
ovs_mutex_lock(&ovsrcu_threads_mutex);
LIST_FOR_EACH (perthread, list_node, &ovsrcu_threads) {
if (perthread->seqno <= target_seqno) {
+ ovs_strlcpy(stalled_thread, perthread->name,
+ sizeof stalled_thread);
done = false;
break;
}
break;
}
+ elapsed = time_msec() - start;
+ if (elapsed >= warning_threshold) {
+ VLOG_WARN("blocked %u ms waiting for %s to quiesce",
+ elapsed, stalled_thread);
+ warning_threshold *= 2;
+ }
+ poll_timer_wait_until(start + warning_threshold);
+
seq_wait(global_seqno, cur_seqno);
poll_block();
}
void ovsrcu_quiesce_start(void);
void ovsrcu_quiesce_end(void);
void ovsrcu_quiesce(void);
+bool ovsrcu_is_quiescent(void);
#endif /* ovs-rcu.h */
OVS_NO_THREAD_SAFETY_ANALYSIS \
{ \
struct ovs_##TYPE *l = CONST_CAST(struct ovs_##TYPE *, l_); \
- int error = pthread_##TYPE##_##FUN(&l->lock); \
+ int error; \
+ \
+ /* Verify that 'l' was initialized. */ \
+ ovs_assert(l->where); \
+ \
+ error = pthread_##TYPE##_##FUN(&l->lock); \
if (OVS_UNLIKELY(error)) { \
ovs_abort(error, "pthread_%s_%s failed", #TYPE, #FUN); \
} \
l->where = where; \
- }
+ }
LOCK_FUNCTION(mutex, lock);
LOCK_FUNCTION(rwlock, rdlock);
LOCK_FUNCTION(rwlock, wrlock);
OVS_NO_THREAD_SAFETY_ANALYSIS \
{ \
struct ovs_##TYPE *l = CONST_CAST(struct ovs_##TYPE *, l_); \
- int error = pthread_##TYPE##_##FUN(&l->lock); \
+ int error; \
+ \
+ /* Verify that 'l' was initialized. */ \
+ ovs_assert(l->where); \
+ \
+ error = pthread_##TYPE##_##FUN(&l->lock); \
if (OVS_UNLIKELY(error) && error != EBUSY) { \
ovs_abort(error, "pthread_%s_%s failed", #TYPE, #FUN); \
} \
TRY_LOCK_FUNCTION(rwlock, tryrdlock);
TRY_LOCK_FUNCTION(rwlock, trywrlock);
-#define UNLOCK_FUNCTION(TYPE, FUN) \
+#define UNLOCK_FUNCTION(TYPE, FUN, WHERE) \
void \
ovs_##TYPE##_##FUN(const struct ovs_##TYPE *l_) \
OVS_NO_THREAD_SAFETY_ANALYSIS \
{ \
struct ovs_##TYPE *l = CONST_CAST(struct ovs_##TYPE *, l_); \
int error; \
- l->where = NULL; \
+ \
+ /* Verify that 'l' was initialized. */ \
+ ovs_assert(l->where); \
+ \
+ l->where = WHERE; \
error = pthread_##TYPE##_##FUN(&l->lock); \
if (OVS_UNLIKELY(error)) { \
ovs_abort(error, "pthread_%s_%sfailed", #TYPE, #FUN); \
} \
}
-UNLOCK_FUNCTION(mutex, unlock);
-UNLOCK_FUNCTION(mutex, destroy);
-UNLOCK_FUNCTION(rwlock, unlock);
-UNLOCK_FUNCTION(rwlock, destroy);
+UNLOCK_FUNCTION(mutex, unlock, "<unlocked>");
+UNLOCK_FUNCTION(mutex, destroy, NULL);
+UNLOCK_FUNCTION(rwlock, unlock, "<unlocked>");
+UNLOCK_FUNCTION(rwlock, destroy, NULL);
#define XPTHREAD_FUNC1(FUNCTION, PARAM1) \
void \
pthread_mutexattr_t attr;
int error;
- l->where = NULL;
+ l->where = "<unlocked>";
xpthread_mutexattr_init(&attr);
xpthread_mutexattr_settype(&attr, type);
error = pthread_mutex_init(&l->lock, &attr);
pthread_rwlockattr_t attr;
int error;
- l->where = NULL;
+ l->where = "<unlocked>";
xpthread_rwlockattr_init(&attr);
#ifdef PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP
{
int error;
+ ovsrcu_quiesce_start();
error = pthread_barrier_wait(barrier);
+ ovsrcu_quiesce_end();
+
if (error && OVS_UNLIKELY(error != PTHREAD_BARRIER_SERIAL_THREAD)) {
ovs_abort(error, "pthread_barrier_wait failed");
}
struct ovsthread_aux {
void *(*start)(void *);
void *arg;
+ char name[16];
};
static void *
aux = *auxp;
free(auxp);
+ /* The order of the following calls is important, because
+ * ovsrcu_quiesce_end() saves a copy of the thread name. */
+ set_subprogram_name("%s%u", aux.name, id);
ovsrcu_quiesce_end();
+
return aux.start(aux.arg);
}
-void
-xpthread_create(pthread_t *threadp, pthread_attr_t *attr,
- void *(*start)(void *), void *arg)
+/* Starts a thread that calls 'start(arg)'. Sets the thread's name to 'name'
+ * (suffixed by its ovsthread_id()). Returns the new thread's pthread_t. */
+pthread_t
+ovs_thread_create(const char *name, void *(*start)(void *), void *arg)
{
struct ovsthread_aux *aux;
pthread_t thread;
aux = xmalloc(sizeof *aux);
aux->start = start;
aux->arg = arg;
+ ovs_strlcpy(aux->name, name, sizeof aux->name);
- error = pthread_create(threadp ? threadp : &thread, attr,
- ovsthread_wrapper, aux);
+ error = pthread_create(&thread, NULL, ovsthread_wrapper, aux);
if (error) {
ovs_abort(error, "pthread_create failed");
}
+ return thread;
}
\f
bool
/* Mutex. */
struct OVS_LOCKABLE ovs_mutex {
pthread_mutex_t lock;
- const char *where;
+ const char *where; /* NULL if and only if uninitialized. */
};
/* "struct ovs_mutex" initializer. */
#ifdef PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
-#define OVS_MUTEX_INITIALIZER { PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP, NULL }
+#define OVS_MUTEX_INITIALIZER { PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP, \
+ "<unlocked>" }
#else
-#define OVS_MUTEX_INITIALIZER { PTHREAD_MUTEX_INITIALIZER, NULL }
+#define OVS_MUTEX_INITIALIZER { PTHREAD_MUTEX_INITIALIZER, "<unlocked>" }
#endif
#ifdef PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
#define OVS_ADAPTIVE_MUTEX_INITIALIZER \
- { PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP, NULL }
+ { PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP, "<unlocked>" }
#else
#define OVS_ADAPTIVE_MUTEX_INITIALIZER OVS_MUTEX_INITIALIZER
#endif
* than exposing them only to porters. */
struct OVS_LOCKABLE ovs_rwlock {
pthread_rwlock_t lock;
- const char *where;
+ const char *where; /* NULL if and only if uninitialized. */
};
/* Initializer. */
#ifdef PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP
#define OVS_RWLOCK_INITIALIZER \
- { PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP, NULL }
+ { PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP, "<unlocked>" }
#else
-#define OVS_RWLOCK_INITIALIZER { PTHREAD_RWLOCK_INITIALIZER, NULL }
+#define OVS_RWLOCK_INITIALIZER { PTHREAD_RWLOCK_INITIALIZER, "<unlocked>" }
#endif
/* ovs_rwlock functions analogous to pthread_rwlock_*() functions.
void xpthread_key_delete(pthread_key_t);
void xpthread_setspecific(pthread_key_t, const void *);
-void xpthread_create(pthread_t *, pthread_attr_t *, void *(*)(void *), void *);
+pthread_t ovs_thread_create(const char *name, void *(*)(void *), void *);
void xpthread_join(pthread_t, void **);
\f
/* Per-thread data.
#include <stdint.h>
#include <string.h>
#include "compiler.h"
-#include "flow.h"
#include "openvswitch/types.h"
#include "random.h"
#include "hash.h"
struct ofpbuf;
struct ds;
+/* Tunnel information used in flow key and metadata. */
+struct flow_tnl {
+ ovs_be64 tun_id;
+ ovs_be32 ip_src;
+ ovs_be32 ip_dst;
+ uint16_t flags;
+ uint8_t ip_tos;
+ uint8_t ip_ttl;
+};
+
+/* Unfortunately, a "struct flow" sometimes has to handle OpenFlow port
+ * numbers and other times datapath (dpif) port numbers. This union allows
+ * access to both. */
+union flow_in_port {
+ odp_port_t odp_port;
+ ofp_port_t ofp_port;
+};
+
/* Datapath packet metadata */
struct pkt_metadata {
uint32_t recirc_id; /* Recirculation id carried with the
#define PKT_METADATA_INITIALIZER(PORT) \
(struct pkt_metadata){ 0, 0, { 0, 0, 0, 0, 0, 0}, 0, 0, {(PORT)} }
-static inline struct pkt_metadata
-pkt_metadata_from_flow(const struct flow *flow)
-{
- struct pkt_metadata md;
-
- md.recirc_id = flow->recirc_id;
- md.dp_hash = flow->dp_hash;
- md.tunnel = flow->tunnel;
- md.skb_priority = flow->skb_priority;
- md.pkt_mark = flow->pkt_mark;
- md.in_port = flow->in_port;
-
- return md;
-}
-
bool dpid_from_string(const char *s, uint64_t *dpidp);
#define ETH_ADDR_LEN 6
#define TCP_CTL(flags, offset) (htons((flags) | ((offset) << 12)))
#define TCP_FLAGS(tcp_ctl) (ntohs(tcp_ctl) & 0x0fff)
+#define TCP_FLAGS_BE16(tcp_ctl) ((tcp_ctl) & htons(0x0fff))
#define TCP_OFFSET(tcp_ctl) (ntohs(tcp_ctl) >> 12)
#define TCP_HEADER_LEN 20
|| dl_type == htons(ETH_TYPE_IPV6);
}
-static inline bool is_ip_any(const struct flow *flow)
-{
- return dl_type_is_ip_any(flow->dl_type);
-}
-
-static inline bool is_icmpv4(const struct flow *flow)
-{
- return (flow->dl_type == htons(ETH_TYPE_IP)
- && flow->nw_proto == IPPROTO_ICMP);
-}
-
-static inline bool is_icmpv6(const struct flow *flow)
-{
- return (flow->dl_type == htons(ETH_TYPE_IPV6)
- && flow->nw_proto == IPPROTO_ICMPV6);
-}
-
void format_ipv6_addr(char *addr_str, const struct in6_addr *addr);
void print_ipv6_addr(struct ds *string, const struct in6_addr *addr);
void print_ipv6_masked(struct ds *string, const struct in6_addr *addr,
case OFPTYPE_GROUP_FEATURES_STATS_REPLY:
case OFPTYPE_TABLE_FEATURES_STATS_REQUEST:
case OFPTYPE_TABLE_FEATURES_STATS_REPLY:
+ case OFPTYPE_BUNDLE_CONTROL:
+ case OFPTYPE_BUNDLE_ADD_MESSAGE:
return false;
case OFPTYPE_PACKET_IN:
return false;
}
+uint64_t
+route_table_get_change_seq(void)
+{
+ return 0;
+}
+
void
route_table_register(void)
{
-/* Copyright (c) 2012 Nicira, Inc.
+/* Copyright (c) 2012, 2013, 2014 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
return false;
}
+uint64_t
+route_table_get_change_seq(void)
+{
+ return 0;
+}
+
void
route_table_register(void)
{
/*
- * Copyright (c) 2011, 2012, 2013 Nicira, Inc.
+ * Copyright (c) 2011, 2012, 2013, 2014 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+/* Global change number for route-table, which should be incremented
+ * every time route_table_reset() is called. */
+static uint64_t rt_change_seq;
+
static unsigned int register_count = 0;
static struct nln *nln = NULL;
static struct route_table_msg rtmsg;
return false;
}
+uint64_t
+route_table_get_change_seq(void)
+{
+ return rt_change_seq;
+}
+
/* Users of the route_table module should register themselves with this
* function before making any other route_table function calls. */
void
if (nln) {
rtnetlink_link_run();
nln_run(nln);
+
+ if (!route_table_valid) {
+ route_table_reset();
+ }
}
}
route_map_clear();
route_table_valid = true;
+ rt_change_seq++;
ofpbuf_init(&request, 0);
/*
- * Copyright (c) 2011 Nicira, Inc.
+ * Copyright (c) 2011, 2012, 2013, 2014 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
bool route_table_get_ifindex(ovs_be32 ip, int *ifindex);
bool route_table_get_name(ovs_be32 ip, char name[IFNAMSIZ]);
+uint64_t route_table_get_change_seq(void);
void route_table_register(void);
void route_table_unregister(void);
void route_table_run(void);
/* Active and passive stream classes. */
extern const struct stream_class tcp_stream_class;
extern const struct pstream_class ptcp_pstream_class;
+#ifndef _WIN32
extern const struct stream_class unix_stream_class;
extern const struct pstream_class punix_pstream_class;
+#else
+extern const struct stream_class windows_stream_class;
+extern const struct pstream_class pwindows_pstream_class;
+#endif
#ifdef HAVE_OPENSSL
extern const struct stream_class ssl_stream_class;
extern const struct pstream_class pssl_pstream_class;
NULL, /* run_wait */
NULL, /* wait */
};
+
+#ifdef _WIN32
+static int
+windows_open(const char *name, char *suffix, struct stream **streamp,
+ uint8_t dscp)
+{
+ int error, port;
+ FILE *file;
+ char *suffix_new, *path;
+
+ /* If the path does not contain a ':', assume it is relative to
+ * OVS_RUNDIR. */
+ if (!strchr(suffix, ':')) {
+ path = xasprintf("%s/%s", ovs_rundir(), suffix);
+ } else {
+ path = strdup(suffix);
+ }
+
+ file = fopen(path, "r");
+ if (!file) {
+ error = errno;
+ VLOG_DBG("%s: could not open %s (%s)", name, suffix,
+ ovs_strerror(error));
+ return error;
+ }
+
+ error = fscanf(file, "%d", &port);
+ if (error != 1) {
+ VLOG_ERR("failed to read port from %s", suffix);
+ fclose(file);
+ return EINVAL;
+ }
+ fclose(file);
+
+ suffix_new = xasprintf("127.0.0.1:%d", port);
+
+ error = tcp_open(name, suffix_new, streamp, dscp);
+
+ free(suffix_new);
+ free(path);
+ return error;
+}
+
+const struct stream_class windows_stream_class = {
+ "unix", /* name */
+ false, /* needs_probes */
+ windows_open, /* open */
+ NULL, /* close */
+ NULL, /* connect */
+ NULL, /* recv */
+ NULL, /* send */
+ NULL, /* run */
+ NULL, /* run_wait */
+ NULL, /* wait */
+};
+#endif
\f
/* Passive TCP. */
NULL,
};
+#ifdef _WIN32
+static int
+pwindows_open(const char *name OVS_UNUSED, char *suffix,
+ struct pstream **pstreamp, uint8_t dscp)
+{
+ int error;
+ char *suffix_new, *path;
+ FILE *file;
+ struct pstream *listener;
+
+ suffix_new = xstrdup("0:127.0.0.1");
+ error = ptcp_open(name, suffix_new, pstreamp, dscp);
+ if (error) {
+ goto exit;
+ }
+ listener = *pstreamp;
+
+ /* If the path does not contain a ':', assume it is relative to
+ * OVS_RUNDIR. */
+ if (!strchr(suffix, ':')) {
+ path = xasprintf("%s/%s", ovs_rundir(), suffix);
+ } else {
+ path = strdup(suffix);
+ }
+
+ file = fopen(path, "w");
+ if (!file) {
+ error = errno;
+ VLOG_DBG("could not open %s (%s)", path, ovs_strerror(error));
+ goto exit;
+ }
+
+ fprintf(file, "%d\n", ntohs(listener->bound_port));
+ if (fflush(file) == EOF) {
+ error = EIO;
+ VLOG_ERR("write failed for %s", path);
+ fclose(file);
+ goto exit;
+ }
+ fclose(file);
+ free(path);
+
+exit:
+ free(suffix_new);
+ return error;
+}
+
+const struct pstream_class pwindows_pstream_class = {
+ "punix",
+ false,
+ pwindows_open,
+ NULL,
+ NULL,
+ NULL,
+ NULL,
+};
+#endif
&tcp_stream_class,
#ifndef _WIN32
&unix_stream_class,
+#else
+ &windows_stream_class,
#endif
#ifdef HAVE_OPENSSL
&ssl_stream_class,
&ptcp_pstream_class,
#ifndef _WIN32
&punix_pstream_class,
+#else
+ &pwindows_pstream_class,
#endif
#ifdef HAVE_OPENSSL
&pssl_pstream_class,
{
long long int *last_wakeup = last_wakeup_get();
long long int start;
+ bool quiescent;
int retval = 0;
time_init();
start = time_msec();
timeout_when = MIN(timeout_when, deadline);
+ quiescent = ovsrcu_is_quiescent();
for (;;) {
long long int now = time_msec();
time_left = timeout_when - now;
}
- if (!time_left) {
- ovsrcu_quiesce();
- } else {
- ovsrcu_quiesce_start();
+ if (!quiescent) {
+ if (!time_left) {
+ ovsrcu_quiesce();
+ } else {
+ ovsrcu_quiesce_start();
+ }
}
#ifndef _WIN32
}
#endif
- if (time_left) {
+ if (!quiescent && time_left) {
ovsrcu_quiesce_end();
}
{
struct unixctl_server *server;
struct pstream *listener;
- char *punix_path, *abs_path = NULL;
+ char *punix_path;
int error;
-#ifdef _WIN32
- FILE *file;
-#endif
*serverp = NULL;
if (path && !strcmp(path, "none")) {
return 0;
}
-#ifndef _WIN32
if (path) {
+ char *abs_path;
+#ifndef _WIN32
abs_path = abs_file_name(ovs_rundir(), path);
+#else
+ abs_path = strdup(path);
+#endif
punix_path = xasprintf("punix:%s", abs_path);
+ free(abs_path);
} else {
+#ifndef _WIN32
punix_path = xasprintf("punix:%s/%s.%ld.ctl", ovs_rundir(),
program_name, (long int) getpid());
- }
#else
- punix_path = xstrdup("ptcp:0:127.0.0.1");
+ punix_path = xasprintf("punix:%s/%s.ctl", ovs_rundir(), program_name);
#endif
+ }
error = pstream_open(punix_path, &listener, 0);
if (error) {
goto exit;
}
-#ifdef _WIN32
- if (path) {
- abs_path = xstrdup(path);
- } else {
- abs_path = xasprintf("%s/%s.ctl", ovs_rundir(), program_name);
- }
-
- file = fopen(abs_path, "w");
- if (!file) {
- error = errno;
- ovs_error(error, "could not open %s", abs_path);
- goto exit;
- }
-
- fprintf(file, "%d\n", ntohs(listener->bound_port));
- if (fflush(file) == EOF) {
- error = EIO;
- ovs_error(error, "write failed for %s", abs_path);
- fclose(file);
- goto exit;
- }
- fclose(file);
-#endif
-
unixctl_command_register("help", "", 0, 0, unixctl_help, NULL);
unixctl_command_register("version", "", 0, 0, unixctl_version, NULL);
*serverp = server;
exit:
- if (abs_path) {
- free(abs_path);
- }
free(punix_path);
return error;
}
char *abs_path, *unix_path;
struct stream *stream;
int error;
-#ifdef _WIN32
- FILE *file;
- int port;
+#ifdef _WIN32
abs_path = strdup(path);
- file = fopen(abs_path, "r");
- if (!file) {
- int error = errno;
- ovs_error(error, "could not open %s", abs_path);
- free(abs_path);
- return error;
- }
-
- error = fscanf(file, "%d", &port);
- if (error != 1) {
- ovs_error(errno, "failed to read port from %s", abs_path);
- free(abs_path);
- return EINVAL;
- }
- fclose(file);
-
- unix_path = xasprintf("tcp:127.0.0.1:%d", port);
#else
abs_path = abs_file_name(ovs_rundir(), path);
- unix_path = xasprintf("unix:%s", abs_path);
#endif
+ unix_path = xasprintf("unix:%s", abs_path);
*client = NULL;
}
}
+bool
+str_to_uint(const char *s, int base, unsigned int *u)
+{
+ long long ll;
+ bool ok = str_to_llong(s, base, &ll);
+ if (!ok || ll < 0 || ll > UINT_MAX) {
+ *u = 0;
+ return false;
+ } else {
+ *u = ll;
+ return true;
+ }
+}
+
/* Converts floating-point string 's' into a double. If successful, stores
* the double in '*d' and returns true; on failure, stores 0 in '*d' and
* returns false.
#define CACHE_LINE_SIZE 64
BUILD_ASSERT_DECL(IS_POW2(CACHE_LINE_SIZE));
+#define CACHE_LINE_SIZE 64 /* Correct for most CPUs. */
+
+static inline void
+ovs_prefetch_range(const void *start, size_t size)
+{
+ const char *addr = (const char *)start;
+ size_t ofs;
+
+ for (ofs = 0; ofs < size; ofs += CACHE_LINE_SIZE) {
+ OVS_PREFETCH(addr + ofs);
+ }
+}
+
#ifndef MIN
#define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
#endif
bool str_to_int(const char *, int base, int *);
bool str_to_long(const char *, int base, long *);
bool str_to_llong(const char *, int base, long long *);
-
-static inline bool
-str_to_uint(const char *s, int base, unsigned int *u)
-{
- return str_to_int(s, base, (int *) u);
-}
-
-static inline bool
-str_to_ulong(const char *s, int base, unsigned long *ul)
-{
- return str_to_long(s, base, (long *) ul);
-}
-
-static inline bool
-str_to_ullong(const char *s, int base, unsigned long long *ull)
-{
- return str_to_llong(s, base, (long long *) ull);
-}
+bool str_to_uint(const char *, int base, unsigned int *);
bool ovs_scan(const char *s, const char *format, ...) SCANF_FORMAT(2, 3);
unsigned int src_ofs, unsigned int n_bits);
void xsleep(unsigned int seconds);
+
#ifdef _WIN32
\f
char *ovs_format_message(int error);
value.
.TP
\fBunix:\fIfile\fR
-The Unix domain server socket named \fIfile\fR.
+On POSIX, a Unix domain server socket named \fIfile\fR.
+.IP
+On Windows, a localhost TCP port written in \fIfile\fR.
useful along with the \fB\-\-syslog\-target\fR option (the word has no
effect otherwise).
.
-.IP \(bu
+.IP \(bu
\fBoff\fR, \fBemer\fR, \fBerr\fR, \fBwarn\fR, \fBinfo\fR, or
\fBdbg\fR, to control the log level. Messages of the given severity
or higher will be logged, and messages of lower severity will be
Sets the maximum logging verbosity level, equivalent to
\fB\-\-verbose=dbg\fR.
.
-.\" Python vlog doesn't implement -vPATTERN so only document it if
-.\" \*(PY is empty:
-.ie dPY
-.el \{
.IP "\fB\-vPATTERN:\fIfacility\fB:\fIpattern\fR"
.IQ "\fB\-\-verbose=PATTERN:\fIfacility\fB:\fIpattern\fR"
Sets the log pattern for \fIfacility\fR to \fIpattern\fR. Refer to
\fBovs\-appctl\fR(8) for a description of the valid syntax for \fIpattern\fR.
-\}
.
.TP
\fB\-\-log\-file\fR[\fB=\fIfile\fR]
ofproto/pinsched.c \
ofproto/pinsched.h \
ofproto/tunnel.c \
- ofproto/tunnel.h
+ ofproto/tunnel.h \
+ ofproto/bundles.c \
+ ofproto/bundles.h
+
ofproto_libofproto_la_CPPFLAGS = $(AM_CPPFLAGS)
ofproto_libofproto_la_CFLAGS = $(AM_CFLAGS)
ofproto_libofproto_la_LIBADD = lib/libsflow.la
static struct hmap all_bonds__ = HMAP_INITIALIZER(&all_bonds__);
static struct hmap *const all_bonds OVS_GUARDED_BY(rwlock) = &all_bonds__;
-/* Bit-mask for hashing a flow down to a bucket.
- * There are (BOND_MASK + 1) buckets. */
+/* Bit-mask for hashing a flow down to a bucket. */
#define BOND_MASK 0xff
+#define BOND_BUCKETS (BOND_MASK + 1)
#define RECIRC_RULE_PRIORITY 20 /* Priority level for internal rules */
/* A hash bucket for mapping a flow to a slave.
- * "struct bond" has an array of (BOND_MASK + 1) of these. */
+ * "struct bond" has an array of BOND_BUCKETS of these. */
struct bond_entry {
struct bond_slave *slave; /* Assigned slave, NULL if unassigned. */
- uint64_t tx_bytes; /* Count of bytes recently transmitted. */
+ uint64_t tx_bytes /* Count of bytes recently transmitted. */
+ OVS_GUARDED_BY(rwlock);
struct list list_node; /* In bond_slave's 'entries' list. */
- /* Recirculation. */
- struct rule *pr_rule; /* Post recirculation rule for this entry.*/
- uint64_t pr_tx_bytes; /* Record the rule tx_bytes to figure out
- the delta to update the tx_bytes entry
- above.*/
+ /* Recirculation.
+ *
+ * 'pr_rule' is the post-recirculation rule for this entry.
+ * 'pr_tx_bytes' is the most recently seen statistics for 'pr_rule', which
+ * is used to determine delta (applied to 'tx_bytes' above.) */
+ struct rule *pr_rule;
+ uint64_t pr_tx_bytes OVS_GUARDED_BY(rwlock);
};
/* A bond slave, that is, one of the links comprising a bond. */
uint32_t basis; /* Basis for flow hash function. */
/* SLB specific bonding info. */
- struct bond_entry *hash; /* An array of (BOND_MASK + 1) elements. */
+ struct bond_entry *hash; /* An array of BOND_BUCKETS elements. */
int rebalance_interval; /* Interval between rebalances, in ms. */
long long int next_rebalance; /* Next rebalancing time. */
bool send_learning_packets;
struct match match;
ofp_port_t out_ofport;
enum bond_op op;
- struct rule *pr_rule;
+ struct rule **pr_rule;
};
static void bond_entry_reset(struct bond *) OVS_REQ_WRLOCK(rwlock);
static void bond_link_status_update(struct bond_slave *)
OVS_REQ_WRLOCK(rwlock);
static void bond_choose_active_slave(struct bond *)
- OVS_REQ_WRLOCK(rwlock);;
+ OVS_REQ_WRLOCK(rwlock);
static unsigned int bond_hash_src(const uint8_t mac[ETH_ADDR_LEN],
uint16_t vlan, uint32_t basis);
static unsigned int bond_hash_tcp(const struct flow *, uint16_t vlan,
static void
add_pr_rule(struct bond *bond, const struct match *match,
- ofp_port_t out_ofport, struct rule *rule)
+ ofp_port_t out_ofport, struct rule **rule)
{
uint32_t hash = match_hash(match, 0);
struct bond_pr_rule_op *pr_op;
pr_op->op = DEL;
}
- if ((bond->hash == NULL) || (!bond->recirc_id)) {
- return;
- }
-
- for (i = 0; i < BOND_MASK + 1; i++) {
- struct bond_slave *slave = bond->hash[i].slave;
+ if (bond->hash && bond->recirc_id) {
+ for (i = 0; i < BOND_BUCKETS; i++) {
+ struct bond_slave *slave = bond->hash[i].slave;
- if (slave) {
- match_init_catchall(&match);
- match_set_recirc_id(&match, bond->recirc_id);
- /* recirc_id -> metadata to speed up look ups. */
- match_set_metadata(&match, htonll(bond->recirc_id));
- match_set_dp_hash_masked(&match, i, BOND_MASK);
+ if (slave) {
+ match_init_catchall(&match);
+ match_set_recirc_id(&match, bond->recirc_id);
+ match_set_dp_hash_masked(&match, i, BOND_MASK);
- add_pr_rule(bond, &match, slave->ofp_port,
- bond->hash[i].pr_rule);
+ add_pr_rule(bond, &match, slave->ofp_port,
+ &bond->hash[i].pr_rule);
+ }
}
}
HMAP_FOR_EACH_SAFE(pr_op, next_op, hmap_node, &bond->pr_rule_ops) {
int error;
- struct rule *rule;
switch (pr_op->op) {
case ADD:
ofpbuf_clear(&ofpacts);
error = ofproto_dpif_add_internal_flow(bond->ofproto,
&pr_op->match,
RECIRC_RULE_PRIORITY,
- &ofpacts, &rule);
+ &ofpacts, pr_op->pr_rule);
if (error) {
char *err_s = match_to_string(&pr_op->match,
RECIRC_RULE_PRIORITY);
VLOG_ERR("failed to add post recirculation flow %s", err_s);
free(err_s);
- pr_op->pr_rule = NULL;
- } else {
- pr_op->pr_rule = rule;
}
break;
}
hmap_remove(&bond->pr_rule_ops, &pr_op->hmap_node);
- pr_op->pr_rule = NULL;
+ *pr_op->pr_rule = NULL;
free(pr_op);
break;
}
/* Recirculation. */
static void
bond_entry_account(struct bond_entry *entry, uint64_t rule_tx_bytes)
- OVS_REQ_RDLOCK(rwlock)
+ OVS_REQ_WRLOCK(rwlock)
{
if (entry->slave) {
uint64_t delta;
}
/* Maintain bond stats using post recirculation rule byte counters.*/
-void
+static void
bond_recirculation_account(struct bond *bond)
{
int i;
- ovs_rwlock_rdlock(&rwlock);
+ ovs_rwlock_wrlock(&rwlock);
for (i=0; i<=BOND_MASK; i++) {
struct bond_entry *entry = &bond->hash[i];
struct rule *rule = entry->pr_rule;
bond_may_recirc(const struct bond *bond, uint32_t *recirc_id,
uint32_t *hash_bias)
{
- if (bond->balance == BM_TCP) {
+ if (bond->balance == BM_TCP && recirc_id) {
if (recirc_id) {
*recirc_id = bond->recirc_id;
}
static void
log_bals(struct bond *bond, const struct list *bals)
+ OVS_REQ_RDLOCK(rwlock)
{
if (VLOG_IS_DBG_ENABLED()) {
struct ds ds = DS_EMPTY_INITIALIZER;
/* Shifts 'hash' from its current slave to 'to'. */
static void
bond_shift_load(struct bond_entry *hash, struct bond_slave *to)
+ OVS_REQ_WRLOCK(rwlock)
{
struct bond_slave *from = hash->slave;
struct bond *bond = from->bond;
* shift away small hashes or large hashes. */
static struct bond_entry *
choose_entry_to_migrate(const struct bond_slave *from, uint64_t to_tx_bytes)
+ OVS_REQ_WRLOCK(rwlock)
{
struct bond_entry *e;
* The caller should have called bond_account() for each active flow, or in case
* of recirculation is used, have called bond_recirculation_account(bond),
* to ensure that flow data is consistently accounted at this point.
- *
- * Return whether rebalancing took place.*/
-bool
+ */
+void
bond_rebalance(struct bond *bond)
{
struct bond_slave *slave;
struct bond_entry *e;
struct list bals;
bool rebalanced = false;
+ bool use_recirc;
ovs_rwlock_wrlock(&rwlock);
if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) {
}
bond->next_rebalance = time_msec() + bond->rebalance_interval;
+ use_recirc = ofproto_dpif_get_enable_recirc(bond->ofproto) &&
+ bond_may_recirc(bond, NULL, NULL);
+
+ if (use_recirc) {
+ bond_recirculation_account(bond);
+ }
+
/* Add each bond_entry to its slave's 'entries' list.
* Compute each slave's tx_bytes as the sum of its entries' tx_bytes. */
HMAP_FOR_EACH (slave, hmap_node, &bond->slaves) {
/* Re-sort 'bals'. */
reinsert_bal(&bals, from);
reinsert_bal(&bals, to);
- rebalanced = true;
+ rebalanced = true;
} else {
/* Can't usefully migrate anything away from 'from'.
* Don't reconsider it. */
* take 20 rebalancing runs to decay to 0 and get deleted entirely. */
for (e = &bond->hash[0]; e <= &bond->hash[BOND_MASK]; e++) {
e->tx_bytes /= 2;
- if (!e->tx_bytes) {
- e->slave = NULL;
- }
+ }
+
+ if (use_recirc && rebalanced) {
+ bond_update_post_recirc_rules(bond,true);
}
done:
ovs_rwlock_unlock(&rwlock);
- return rebalanced;
}
\f
/* Bonding unixctl user interface functions. */
/* Hashes. */
for (be = bond->hash; be <= &bond->hash[BOND_MASK]; be++) {
int hash = be - bond->hash;
+ uint64_t be_tx_k;
if (be->slave != slave) {
continue;
}
- ds_put_format(ds, "\thash %d: %"PRIu64" kB load\n",
- hash, be->tx_bytes / 1024);
+ be_tx_k = be->tx_bytes / 1024;
+ if (be_tx_k) {
+ ds_put_format(ds, "\thash %d: %"PRIu64" kB load\n",
+ hash, be_tx_k);
+ }
/* XXX How can we list the MACs assigned to hashes of SLB bonds? */
}
bond_entry_reset(struct bond *bond)
{
if (bond->balance != BM_AB) {
- size_t hash_len = (BOND_MASK + 1) * sizeof *bond->hash;
+ size_t hash_len = BOND_BUCKETS * sizeof *bond->hash;
if (!bond->hash) {
bond->hash = xmalloc(hash_len);
/* Rebalancing. */
void bond_account(struct bond *, const struct flow *, uint16_t vlan,
uint64_t n_bytes);
-bool bond_rebalance(struct bond *);
+void bond_rebalance(struct bond *);
/* Recirculation
*
*
* When recirculation is used, each bond port is assigned with a unique
* recirc_id. The output action to the bond port will be replaced by
- * a RECIRC action.
+ * a Hash action, followed by a RECIRC action.
*
- * ... actions= ... RECIRC(L4_HASH, recirc_id) ....
+ * ... actions= ... HASH(hash(L4)), RECIRC(recirc_id) ....
*
* On handling first output packet, 256 post recirculation flows are installed:
*
void bond_update_post_recirc_rules(struct bond *, const bool force);
bool bond_may_recirc(const struct bond *, uint32_t *recirc_id,
uint32_t *hash_bias);
-void bond_recirculation_account(struct bond *);
#endif /* bond.h */
--- /dev/null
+/*
+ * Copyright (c) 2013, 2014 Alexandru Copot <alex.mihai.c@gmail.com>, with support from IXIA.
+ * Copyright (c) 2013, 2014 Daniel Baluta <dbaluta@ixiacom.com>
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+
+#include "coverage.h"
+#include "fail-open.h"
+#include "in-band.h"
+#include "odp-util.h"
+#include "ofp-actions.h"
+#include "ofp-msgs.h"
+#include "ofp-util.h"
+#include "ofpbuf.h"
+#include "ofproto-provider.h"
+#include "pinsched.h"
+#include "poll-loop.h"
+#include "pktbuf.h"
+#include "rconn.h"
+#include "shash.h"
+#include "simap.h"
+#include "stream.h"
+#include "timeval.h"
+#include "vconn.h"
+#include "vlog.h"
+
+#include "bundles.h"
+
+VLOG_DEFINE_THIS_MODULE(bundles);
+
+enum bundle_state {
+ BS_OPEN,
+ BS_CLOSED
+};
+
+struct ofp_bundle {
+ struct hmap_node node; /* In struct ofconn's "bundles" hmap. */
+ uint32_t id;
+ uint16_t flags;
+ enum bundle_state state;
+
+ /* List of 'struct bundle_message's */
+ struct list msg_list;
+};
+
+struct bundle_message {
+ struct ofp_header *msg;
+ struct list node; /* Element in 'struct ofp_bundles's msg_list */
+};
+
+static uint32_t
+bundle_hash(uint32_t id)
+{
+ return hash_int(id, 0);
+}
+
+static struct ofp_bundle *
+ofp_bundle_find(struct hmap *bundles, uint32_t id)
+{
+ struct ofp_bundle *bundle;
+
+ HMAP_FOR_EACH_IN_BUCKET(bundle, node, bundle_hash(id), bundles) {
+ if (bundle->id == id) {
+ return bundle;
+ }
+ }
+
+ return NULL;
+}
+
+static struct ofp_bundle *
+ofp_bundle_create(uint32_t id, uint16_t flags)
+{
+ struct ofp_bundle *bundle;
+
+ bundle = xmalloc(sizeof(*bundle));
+
+ bundle->id = id;
+ bundle->flags = flags;
+
+ list_init(&bundle->msg_list);
+
+ return bundle;
+}
+
+static void
+ofp_bundle_remove(struct ofconn *ofconn, struct ofp_bundle *item)
+{
+ struct bundle_message *msg, *next;
+ struct hmap *bundles;
+
+ LIST_FOR_EACH_SAFE (msg, next, node, &item->msg_list) {
+ list_remove(&msg->node);
+ free(msg->msg);
+ free(msg);
+ }
+
+ bundles = ofconn_get_bundles(ofconn);
+ hmap_remove(bundles, &item->node);
+
+ free(item);
+}
+
+void
+ofp_bundle_remove_all(struct ofconn *ofconn)
+{
+ struct ofp_bundle *b, *next;
+ struct hmap *bundles;
+
+ bundles = ofconn_get_bundles(ofconn);
+
+ HMAP_FOR_EACH_SAFE (b, next, node, bundles) {
+ ofp_bundle_remove(ofconn, b);
+ }
+}
+
+enum ofperr
+ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t flags)
+{
+ struct hmap *bundles;
+ struct ofp_bundle *bundle;
+
+ bundles = ofconn_get_bundles(ofconn);
+ bundle = ofp_bundle_find(bundles, id);
+
+ if (bundle) {
+ VLOG_INFO("Bundle %x already exists.", id);
+ ofp_bundle_remove(ofconn, bundle);
+
+ return OFPERR_OFPBFC_BAD_ID;
+ }
+
+ /* TODO: Check the limit of open bundles */
+
+ bundle = ofp_bundle_create(id, flags);
+ bundle->state = BS_OPEN;
+
+ bundles = ofconn_get_bundles(ofconn);
+ hmap_insert(bundles, &bundle->node, bundle_hash(id));
+
+ return 0;
+}
+
+enum ofperr
+ofp_bundle_close(struct ofconn *ofconn, uint32_t id, uint16_t flags)
+{
+ struct hmap *bundles;
+ struct ofp_bundle *bundle;
+
+ bundles = ofconn_get_bundles(ofconn);
+ bundle = ofp_bundle_find(bundles, id);
+
+ if (!bundle) {
+ return OFPERR_OFPBFC_BAD_ID;
+ }
+
+ if (bundle->state == BS_CLOSED) {
+ ofp_bundle_remove(ofconn, bundle);
+ return OFPERR_OFPBFC_BUNDLE_CLOSED;
+ }
+
+ if (bundle->flags != flags) {
+ ofp_bundle_remove(ofconn, bundle);
+ return OFPERR_OFPBFC_BAD_FLAGS;
+ }
+
+ bundle->state = BS_CLOSED;
+ return 0;
+}
+
+enum ofperr
+ofp_bundle_commit(struct ofconn *ofconn, uint32_t id, uint16_t flags)
+{
+ struct hmap *bundles;
+ struct ofp_bundle *bundle;
+
+ bundles = ofconn_get_bundles(ofconn);
+ bundle = ofp_bundle_find(bundles, id);
+
+ if (!bundle) {
+ return OFPERR_OFPBFC_BAD_ID;
+ }
+ if (bundle->flags != flags) {
+ ofp_bundle_remove(ofconn, bundle);
+ return OFPERR_OFPBFC_BAD_FLAGS;
+ }
+
+ /* TODO: actual commit */
+
+ return OFPERR_OFPBFC_MSG_UNSUP;
+}
+
+enum ofperr
+ofp_bundle_discard(struct ofconn *ofconn, uint32_t id)
+{
+ struct hmap *bundles;
+ struct ofp_bundle *bundle;
+
+ bundles = ofconn_get_bundles(ofconn);
+ bundle = ofp_bundle_find(bundles, id);
+
+ if (!bundle) {
+ return OFPERR_OFPBFC_BAD_ID;
+ }
+
+ ofp_bundle_remove(ofconn, bundle);
+
+ return 0;
+}
+
+enum ofperr
+ofp_bundle_add_message(struct ofconn *ofconn, struct ofputil_bundle_add_msg *badd)
+{
+ struct hmap *bundles;
+ struct ofp_bundle *bundle;
+ struct bundle_message *bmsg;
+
+ bundles = ofconn_get_bundles(ofconn);
+ bundle = ofp_bundle_find(bundles, badd->bundle_id);
+
+ if (!bundle) {
+ bundle = ofp_bundle_create(badd->bundle_id, badd->flags);
+ bundle->state = BS_OPEN;
+
+ bundles = ofconn_get_bundles(ofconn);
+ hmap_insert(bundles, &bundle->node, bundle_hash(badd->bundle_id));
+ }
+
+ if (bundle->state == BS_CLOSED) {
+ ofp_bundle_remove(ofconn, bundle);
+ return OFPERR_OFPBFC_BUNDLE_CLOSED;
+ }
+
+ bmsg = xmalloc(sizeof *bmsg);
+ bmsg->msg = xmemdup(badd->msg, ntohs(badd->msg->length));
+ list_push_back(&bundle->msg_list, &bmsg->node);
+ return 0;
+}
--- /dev/null
+/*
+ * Copyright (c) 2013, 2014 Alexandru Copot <alex.mihai.c@gmail.com>, with support from IXIA.
+ * Copyright (c) 2013, 2014 Daniel Baluta <dbaluta@ixiacom.com>
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef BUNDLES_H
+#define BUNDLES_H 1
+
+#include <sys/types.h>
+
+#include "ofp-msgs.h"
+#include "connmgr.h"
+#include "ofp-util.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+enum ofperr ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t flags);
+
+enum ofperr ofp_bundle_close(struct ofconn *ofconn, uint32_t id, uint16_t flags);
+
+enum ofperr ofp_bundle_commit(struct ofconn *ofconn, uint32_t id, uint16_t flags);
+
+enum ofperr ofp_bundle_discard(struct ofconn *ofconn, uint32_t id);
+
+enum ofperr ofp_bundle_add_message(struct ofconn *ofconn,
+ struct ofputil_bundle_add_msg *badd);
+
+void ofp_bundle_remove_all(struct ofconn *ofconn);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
#include "vconn.h"
#include "vlog.h"
+#include "bundles.h"
+
VLOG_DEFINE_THIS_MODULE(connmgr);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
* contains an update event of type NXFME_ABBREV and false otherwise.. */
struct list updates OVS_GUARDED_BY(ofproto_mutex);
bool sent_abbrev_update OVS_GUARDED_BY(ofproto_mutex);
+
+ /* Active bundles. Contains "struct ofp_bundle"s. */
+ struct hmap bundles;
};
static struct ofconn *ofconn_create(struct connmgr *, struct rconn *,
{
list_push_back(&ofconn->opgroups, ofconn_node);
}
+
+struct hmap *
+ofconn_get_bundles(struct ofconn *ofconn)
+{
+ return &ofconn->bundles;
+}
+
\f
/* Private ofconn functions. */
hmap_init(&ofconn->monitors);
list_init(&ofconn->updates);
+ hmap_init(&ofconn->bundles);
+
ofconn_flush(ofconn);
return ofconn;
hmap_remove(&ofconn->connmgr->controllers, &ofconn->hmap_node);
}
+ ofp_bundle_remove_all(ofconn);
+
hmap_destroy(&ofconn->monitors);
list_remove(&ofconn->node);
rconn_destroy(ofconn->rconn);
ovs_mutex_unlock(&rule->mutex);
if (flags & NXFMF_ACTIONS) {
- struct rule_actions *actions = rule_get_actions(rule);
+ const struct rule_actions *actions = rule_get_actions(rule);
fu.ofpacts = actions->ofpacts;
fu.ofpacts_len = actions->ofpacts_len;
} else {
void ofconn_remove_opgroup(struct ofconn *, struct list *,
const struct ofp_header *request, int error);
+struct hmap *ofconn_get_bundles(struct ofconn *ofconn);
+
/* Sending asynchronous messages. */
bool connmgr_wants_packet_in_on_miss(struct connmgr *mgr);
void connmgr_send_port_status(struct connmgr *, struct ofconn *source,
}
/* Update flow tracking data. */
- nf_flow->created = 0;
nf_flow->packet_count = 0;
nf_flow->byte_count = 0;
nf_flow->tcp_flags = 0;
#include "bfd.h"
#include "cfm.h"
+#include "guarded-list.h"
#include "hash.h"
#include "heap.h"
#include "hmap.h"
uint8_t hw_addr[OFP_ETH_ALEN]; /* Hardware address. */
};
+/* Entry of the 'send_soon' list. Contains the pointer to the
+ * 'ofport_dpif'. Note, the pointed object is not protected, so
+ * users should always use the mport_find() to convert it to 'mport'. */
+struct send_soon_entry {
+ struct list list_node; /* In send_soon. */
+ const struct ofport_dpif *ofport;
+};
+
/* hmap that contains "struct mport"s. */
static struct hmap monitor_hmap = HMAP_INITIALIZER(&monitor_hmap);
/* heap for ordering mport based on bfd/cfm wakeup time. */
static struct heap monitor_heap;
+/* guarded-list for storing the mports that need to send bfd/cfm control
+ * packet soon. */
+static struct guarded_list send_soon = GUARDED_LIST_INITIALIZER(&send_soon);
+
/* The monitor thread id. */
static pthread_t monitor_tid;
/* True if the monitor thread is running. */
static struct ovs_mutex monitor_mutex = OVS_MUTEX_INITIALIZER;
static void *monitor_main(void *);
+static void monitor_check_send_soon(struct ofpbuf *);
static void monitor_run(void);
+static void monitor_mport_run(struct mport *, struct ofpbuf *);
static void mport_register(const struct ofport_dpif *, struct bfd *,
struct cfm *, uint8_t[ETH_ADDR_LEN])
static void *
monitor_main(void * args OVS_UNUSED)
{
- set_subprogram_name("monitor");
VLOG_INFO("monitor thread created");
while (!latch_is_set(&monitor_exit_latch)) {
monitor_run();
* reconfigured monitoring ports are run in a timely manner. */
#define MONITOR_INTERVAL_MSEC 100
-/* Checks the sending of control packets on mports that have timed out.
- * Sends the control packets if needed. Executes bfd and cfm periodic
- * functions (run, wait) on those mports. */
+/* Checks the 'send_soon' list and the heap for mports that have timed
+ * out bfd/cfm sessions. */
static void
monitor_run(void)
{
ofpbuf_use_stub(&packet, stub, sizeof stub);
ovs_mutex_lock(&monitor_mutex);
+
+ /* The monitor_check_send_soon() needs to be run twice. The first
+ * time is for preventing the same 'mport' from being processed twice
+ * (i.e. once from heap, the other from the 'send_soon' array).
+ * The second run is to cover the case when the control packet is sent
+ * via patch port and the other end needs to send back immediately. */
+ monitor_check_send_soon(&packet);
+
prio_now = MSEC_TO_PRIO(time_msec());
/* Peeks the top of heap and checks if we should run this mport. */
while (!heap_is_empty(&monitor_heap)
&& heap_max(&monitor_heap)->priority >= prio_now) {
- long long int next_wake_time;
struct mport *mport;
mport = CONTAINER_OF(heap_max(&monitor_heap), struct mport, heap_node);
- if (mport->cfm && cfm_should_send_ccm(mport->cfm)) {
- ofpbuf_clear(&packet);
- cfm_compose_ccm(mport->cfm, &packet, mport->hw_addr);
- ofproto_dpif_send_packet(mport->ofport, &packet);
- }
- if (mport->bfd && bfd_should_send_packet(mport->bfd)) {
- ofpbuf_clear(&packet);
- bfd_put_packet(mport->bfd, &packet, mport->hw_addr);
- ofproto_dpif_send_packet(mport->ofport, &packet);
- }
- if (mport->cfm) {
- cfm_run(mport->cfm);
- cfm_wait(mport->cfm);
- }
- if (mport->bfd) {
- bfd_run(mport->bfd);
- bfd_wait(mport->bfd);
- }
- /* Computes the next wakeup time for this mport. */
- next_wake_time = MIN(bfd_wake_time(mport->bfd),
- cfm_wake_time(mport->cfm));
- heap_change(&monitor_heap, &mport->heap_node,
- MSEC_TO_PRIO(next_wake_time));
+ monitor_mport_run(mport, &packet);
}
+ monitor_check_send_soon(&packet);
+
/* Waits on the earliest next wakeup time. */
if (!heap_is_empty(&monitor_heap)) {
long long int next_timeout, next_mport_wakeup;
ovs_mutex_unlock(&monitor_mutex);
ofpbuf_uninit(&packet);
}
+
+/* Checks the 'send_soon' list for any mport that needs to send cfm/bfd
+ * control packet immediately, and calls monitor_mport_run(). */
+static void
+monitor_check_send_soon(struct ofpbuf *packet)
+ OVS_REQUIRES(monitor_mutex)
+{
+ while (!guarded_list_is_empty(&send_soon)) {
+ struct send_soon_entry *entry;
+ struct mport *mport;
+
+ entry = CONTAINER_OF(guarded_list_pop_front(&send_soon),
+ struct send_soon_entry, list_node);
+ mport = mport_find(entry->ofport);
+ if (mport) {
+ monitor_mport_run(mport, packet);
+ }
+ free(entry);
+ }
+}
+
+/* Checks the sending of control packet on 'mport'. Sends the control
+ * packet if needed. Executes bfd and cfm periodic functions (run, wait)
+ * on 'mport'. And changes the location of 'mport' in heap based on next
+ * timeout. */
+static void
+monitor_mport_run(struct mport *mport, struct ofpbuf *packet)
+ OVS_REQUIRES(monitor_mutex)
+{
+ long long int next_wake_time;
+
+ if (mport->cfm && cfm_should_send_ccm(mport->cfm)) {
+ ofpbuf_clear(packet);
+ cfm_compose_ccm(mport->cfm, packet, mport->hw_addr);
+ ofproto_dpif_send_packet(mport->ofport, packet);
+ }
+ if (mport->bfd && bfd_should_send_packet(mport->bfd)) {
+ ofpbuf_clear(packet);
+ bfd_put_packet(mport->bfd, packet, mport->hw_addr);
+ ofproto_dpif_send_packet(mport->ofport, packet);
+ }
+ if (mport->cfm) {
+ cfm_run(mport->cfm);
+ cfm_wait(mport->cfm);
+ }
+ if (mport->bfd) {
+ bfd_run(mport->bfd);
+ bfd_wait(mport->bfd);
+ }
+ /* Computes the next wakeup time for this mport. */
+ next_wake_time = MIN(bfd_wake_time(mport->bfd),
+ cfm_wake_time(mport->cfm));
+ heap_change(&monitor_heap, &mport->heap_node,
+ MSEC_TO_PRIO(next_wake_time));
+}
\f
/* Creates the mport in monitor module if either bfd or cfm
* terminates it. */
if (!monitor_running && !hmap_is_empty(&monitor_hmap)) {
latch_init(&monitor_exit_latch);
- xpthread_create(&monitor_tid, NULL, monitor_main, NULL);
+ monitor_tid = ovs_thread_create("monitor", monitor_main, NULL);
monitor_running = true;
} else if (monitor_running && hmap_is_empty(&monitor_hmap)) {
latch_set(&monitor_exit_latch);
}
}
-/* Moves the mport on top of the heap. This is necessary when
- * for example, bfd POLL is received and the mport should
- * immediately send FINAL back. */
-void
-ofproto_dpif_monitor_port_send_soon_safe(const struct ofport_dpif *ofport)
-{
- ovs_mutex_lock(&monitor_mutex);
- ofproto_dpif_monitor_port_send_soon(ofport);
- ovs_mutex_unlock(&monitor_mutex);
-}
-
+/* Registers the 'ofport' in the 'send_soon' list. We cannot directly
+ * insert the corresponding mport to the 'send_soon' list, since the
+ * 'send_soon' list is not updated when the mport is removed.
+ *
+ * Reader of the 'send_soon' list is responsible for freeing the entry. */
void
ofproto_dpif_monitor_port_send_soon(const struct ofport_dpif *ofport)
- OVS_REQUIRES(monitor_mutex)
{
- struct mport *mport;
+ struct send_soon_entry *entry = xzalloc(sizeof *entry);
+ entry->ofport = ofport;
- mport = mport_find(ofport);
- if (mport) {
- heap_change(&monitor_heap, &mport->heap_node, LLONG_MAX);
- }
+ guarded_list_push_back(&send_soon, &entry->list_node, SIZE_MAX);
}
#include <stdint.h>
+#include "openflow/openflow.h"
#include "packets.h"
struct bfd;
struct ofport_dpif;
void ofproto_dpif_monitor_port_send_soon(const struct ofport_dpif *);
-void ofproto_dpif_monitor_port_send_soon_safe(const struct ofport_dpif *);
void ofproto_dpif_monitor_port_update(const struct ofport_dpif *,
struct bfd *, struct cfm *,
VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
-COVERAGE_DEFINE(upcall_queue_overflow);
+COVERAGE_DEFINE(upcall_duplicate_flow);
-/* A thread that processes each upcall handed to it by the dispatcher thread,
- * forwards the upcall's packet, and possibly sets up a kernel flow as a
- * cache. */
+/* A thread that reads upcalls from dpif, forwards each upcall's packet,
+ * and possibly sets up a kernel flow as a cache. */
struct handler {
struct udpif *udpif; /* Parent udpif. */
pthread_t thread; /* Thread ID. */
- char *name; /* Thread name. */
-
- struct ovs_mutex mutex; /* Mutex guarding the following. */
-
- /* Atomic queue of unprocessed upcalls. */
- struct list upcalls OVS_GUARDED;
- size_t n_upcalls OVS_GUARDED;
-
- bool need_signal; /* Only changed by the dispatcher. */
-
- pthread_cond_t wake_cond; /* Wakes 'thread' while holding
- 'mutex'. */
+ uint32_t handler_id; /* Handler id. */
};
-/* A thread that processes each kernel flow handed to it by the flow_dumper
- * thread, updates OpenFlow statistics, and updates or removes the kernel flow
- * as necessary. */
+/* A thread that processes datapath flows, updates OpenFlow statistics, and
+ * updates or removes them if necessary. */
struct revalidator {
struct udpif *udpif; /* Parent udpif. */
- char *name; /* Thread name. */
-
pthread_t thread; /* Thread ID. */
- struct hmap ukeys; /* Datapath flow keys. */
-
- uint64_t dump_seq;
-
- struct ovs_mutex mutex; /* Mutex guarding the following. */
- pthread_cond_t wake_cond;
- struct list udumps OVS_GUARDED; /* Unprocessed udumps. */
- size_t n_udumps OVS_GUARDED; /* Number of unprocessed udumps. */
+ unsigned int id; /* ovsthread_id_self(). */
+ struct hmap *ukeys; /* Points into udpif->ukeys for this
+ revalidator. Used for GC phase. */
};
/* An upcall handler for ofproto_dpif.
*
- * udpif has two logically separate pieces:
+ * udpif keeps records of two kind of logically separate units:
+ *
+ * upcall handling
+ * ---------------
+ *
+ * - An array of 'struct handler's for upcall handling and flow
+ * installation.
*
- * - A "dispatcher" thread that reads upcalls from the kernel and dispatches
- * them to one of several "handler" threads (see struct handler).
+ * flow revalidation
+ * -----------------
*
- * - A "flow_dumper" thread that reads the kernel flow table and dispatches
- * flows to one of several "revalidator" threads (see struct
- * revalidator). */
+ * - Revalidation threads which read the datapath flow table and maintains
+ * them.
+ */
struct udpif {
struct list list_node; /* In all_udpifs list. */
uint32_t secret; /* Random seed for upcall hash. */
- pthread_t dispatcher; /* Dispatcher thread ID. */
- pthread_t flow_dumper; /* Flow dumper thread ID. */
-
struct handler *handlers; /* Upcall handlers. */
size_t n_handlers;
struct revalidator *revalidators; /* Flow revalidators. */
size_t n_revalidators;
- uint64_t last_reval_seq; /* 'reval_seq' at last revalidation. */
- struct seq *reval_seq; /* Incremented to force revalidation. */
-
- struct seq *dump_seq; /* Increments each dump iteration. */
-
struct latch exit_latch; /* Tells child threads to exit. */
+ /* Revalidation. */
+ struct seq *reval_seq; /* Incremented to force revalidation. */
+ bool need_revalidate; /* As indicated by 'reval_seq'. */
+ bool reval_exit; /* Set by leader on 'exit_latch. */
+ pthread_barrier_t reval_barrier; /* Barrier used by revalidators. */
+ struct dpif_flow_dump dump; /* DPIF flow dump state. */
long long int dump_duration; /* Duration of the last flow dump. */
+ struct seq *dump_seq; /* Increments each dump iteration. */
+
+ /* There are 'n_revalidators' ukey hmaps. Each revalidator retains a
+ * reference to one of these for garbage collection.
+ *
+ * During the flow dump phase, revalidators insert into these with a random
+ * distribution. During the garbage collection phase, each revalidator
+ * takes care of garbage collecting one of these hmaps. */
+ struct {
+ struct ovs_mutex mutex; /* Guards the following. */
+ struct hmap hmap OVS_GUARDED; /* Datapath flow keys. */
+ } *ukeys;
/* Datapath flow statistics. */
unsigned int max_n_flows;
};
struct upcall {
- struct list list_node; /* For queuing upcalls. */
struct flow_miss *flow_miss; /* This upcall's flow_miss. */
/* Raw upcall plus data for keeping track of the memory backing it. */
/* 'udpif_key's are responsible for tracking the little bit of state udpif
* needs to do flow expiration which can't be pulled directly from the
- * datapath. They are owned, created by, maintained, and destroyed by a single
- * revalidator making them easy to efficiently handle with multiple threads. */
+ * datapath. They may be created or maintained by any revalidator during
+ * the dump phase, but are owned by a single revalidator, and are destroyed
+ * by that revalidator during the garbage-collection phase.
+ *
+ * While some elements of a udpif_key are protected by a mutex, the ukey itself
+ * is not. Therefore it is not safe to destroy a udpif_key except when all
+ * revalidators are in garbage collection phase, or they aren't running. */
struct udpif_key {
struct hmap_node hmap_node; /* In parent revalidator 'ukeys' map. */
- struct nlattr *key; /* Datapath flow key. */
+ /* These elements are read only once created, and therefore aren't
+ * protected by a mutex. */
+ const struct nlattr *key; /* Datapath flow key. */
size_t key_len; /* Length of 'key'. */
- struct dpif_flow_stats stats; /* Stats at most recent flow dump. */
- long long int created; /* Estimation of creation time. */
-
- bool mark; /* Used by mark and sweep GC algorithm. */
-
- struct odputil_keybuf key_buf; /* Memory for 'key'. */
-};
-
-/* 'udpif_flow_dump's hold the state associated with one iteration in a flow
- * dump operation. This is created by the flow_dumper thread and handed to the
- * appropriate revalidator thread to be processed. */
-struct udpif_flow_dump {
- struct list list_node;
-
- struct nlattr *key; /* Datapath flow key. */
- size_t key_len; /* Length of 'key'. */
- uint32_t key_hash; /* Hash of 'key'. */
-
- struct odputil_keybuf mask_buf;
- struct nlattr *mask; /* Datapath mask for 'key'. */
- size_t mask_len; /* Length of 'mask'. */
-
- struct dpif_flow_stats stats; /* Stats pulled from the datapath. */
-
- bool need_revalidate; /* Key needs revalidation? */
-
- struct odputil_keybuf key_buf;
+ struct ovs_mutex mutex; /* Guards the following. */
+ struct dpif_flow_stats stats OVS_GUARDED; /* Last known stats.*/
+ long long int created OVS_GUARDED; /* Estimate of creation time. */
+ bool mark OVS_GUARDED; /* For mark and sweep garbage
+ collection. */
+ bool flow_exists OVS_GUARDED; /* Ensures flows are only deleted
+ once. */
+
+ struct xlate_cache *xcache OVS_GUARDED; /* Cache for xlate entries that
+ * are affected by this ukey.
+ * Used for stats and learning.*/
+ struct odputil_keybuf key_buf; /* Memory for 'key'. */
};
/* Flow miss batching.
bool put;
};
-static void upcall_destroy(struct upcall *);
-
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
static struct list all_udpifs = LIST_INITIALIZER(&all_udpifs);
-static void recv_upcalls(struct udpif *);
-static void handle_upcalls(struct handler *handler, struct list *upcalls);
-static void *udpif_flow_dumper(void *);
-static void *udpif_dispatcher(void *);
+static size_t read_upcalls(struct handler *,
+ struct upcall upcalls[FLOW_MISS_MAX_BATCH],
+ struct flow_miss miss_buf[FLOW_MISS_MAX_BATCH],
+ struct hmap *);
+static void handle_upcalls(struct handler *, struct hmap *, struct upcall *,
+ size_t n_upcalls);
+static void udpif_stop_threads(struct udpif *);
+static void udpif_start_threads(struct udpif *, size_t n_handlers,
+ size_t n_revalidators);
static void *udpif_upcall_handler(void *);
static void *udpif_revalidator(void *);
static uint64_t udpif_get_n_flows(struct udpif *);
-static void revalidate_udumps(struct revalidator *, struct list *udumps);
+static void revalidate(struct revalidator *);
static void revalidator_sweep(struct revalidator *);
static void revalidator_purge(struct revalidator *);
static void upcall_unixctl_show(struct unixctl_conn *conn, int argc,
const char *argv[], void *aux);
static void upcall_unixctl_set_flow_limit(struct unixctl_conn *conn, int argc,
const char *argv[], void *aux);
+
+static struct udpif_key *ukey_create(const struct nlattr *key, size_t key_len,
+ long long int used);
static void ukey_delete(struct revalidator *, struct udpif_key *);
static atomic_bool enable_megaflows = ATOMIC_VAR_INIT(true);
void
udpif_destroy(struct udpif *udpif)
{
- udpif_set_threads(udpif, 0, 0);
- udpif_flush(udpif);
+ udpif_stop_threads(udpif);
list_remove(&udpif->list_node);
latch_destroy(&udpif->exit_latch);
free(udpif);
}
-/* Tells 'udpif' how many threads it should use to handle upcalls. Disables
- * all threads if 'n_handlers' and 'n_revalidators' is zero. 'udpif''s
- * datapath handle must have packet reception enabled before starting threads.
- */
-void
-udpif_set_threads(struct udpif *udpif, size_t n_handlers,
- size_t n_revalidators)
+/* Stops the handler and revalidator threads, must be enclosed in
+ * ovsrcu quiescent state unless when destroying udpif. */
+static void
+udpif_stop_threads(struct udpif *udpif)
{
- int error;
-
- ovsrcu_quiesce_start();
- /* Stop the old threads (if any). */
- if (udpif->handlers &&
- (udpif->n_handlers != n_handlers
- || udpif->n_revalidators != n_revalidators)) {
+ if (udpif && (udpif->n_handlers != 0 || udpif->n_revalidators != 0)) {
size_t i;
latch_set(&udpif->exit_latch);
for (i = 0; i < udpif->n_handlers; i++) {
struct handler *handler = &udpif->handlers[i];
- ovs_mutex_lock(&handler->mutex);
- xpthread_cond_signal(&handler->wake_cond);
- ovs_mutex_unlock(&handler->mutex);
xpthread_join(handler->thread, NULL);
}
for (i = 0; i < udpif->n_revalidators; i++) {
- struct revalidator *revalidator = &udpif->revalidators[i];
-
- ovs_mutex_lock(&revalidator->mutex);
- xpthread_cond_signal(&revalidator->wake_cond);
- ovs_mutex_unlock(&revalidator->mutex);
- xpthread_join(revalidator->thread, NULL);
+ xpthread_join(udpif->revalidators[i].thread, NULL);
}
- xpthread_join(udpif->flow_dumper, NULL);
- xpthread_join(udpif->dispatcher, NULL);
-
for (i = 0; i < udpif->n_revalidators; i++) {
struct revalidator *revalidator = &udpif->revalidators[i];
- struct udpif_flow_dump *udump, *next_udump;
-
- LIST_FOR_EACH_SAFE (udump, next_udump, list_node,
- &revalidator->udumps) {
- list_remove(&udump->list_node);
- free(udump);
- }
/* Delete ukeys, and delete all flows from the datapath to prevent
* double-counting stats. */
revalidator_purge(revalidator);
- hmap_destroy(&revalidator->ukeys);
- ovs_mutex_destroy(&revalidator->mutex);
- free(revalidator->name);
+ hmap_destroy(&udpif->ukeys[i].hmap);
+ ovs_mutex_destroy(&udpif->ukeys[i].mutex);
}
- for (i = 0; i < udpif->n_handlers; i++) {
- struct handler *handler = &udpif->handlers[i];
- struct upcall *miss, *next;
-
- LIST_FOR_EACH_SAFE (miss, next, list_node, &handler->upcalls) {
- list_remove(&miss->list_node);
- upcall_destroy(miss);
- }
- ovs_mutex_destroy(&handler->mutex);
-
- xpthread_cond_destroy(&handler->wake_cond);
- free(handler->name);
- }
latch_poll(&udpif->exit_latch);
+ xpthread_barrier_destroy(&udpif->reval_barrier);
+
free(udpif->revalidators);
udpif->revalidators = NULL;
udpif->n_revalidators = 0;
free(udpif->handlers);
udpif->handlers = NULL;
udpif->n_handlers = 0;
- }
- error = dpif_handlers_set(udpif->dpif, 1);
- if (error) {
- VLOG_ERR("failed to configure handlers in dpif %s: %s",
- dpif_name(udpif->dpif), ovs_strerror(error));
- return;
+ free(udpif->ukeys);
+ udpif->ukeys = NULL;
}
+}
- /* Start new threads (if necessary). */
- if (!udpif->handlers && n_handlers) {
+/* Starts the handler and revalidator threads, must be enclosed in
+ * ovsrcu quiescent state. */
+static void
+udpif_start_threads(struct udpif *udpif, size_t n_handlers,
+ size_t n_revalidators)
+{
+ if (udpif && n_handlers && n_revalidators) {
size_t i;
udpif->n_handlers = n_handlers;
struct handler *handler = &udpif->handlers[i];
handler->udpif = udpif;
- list_init(&handler->upcalls);
- handler->need_signal = false;
- xpthread_cond_init(&handler->wake_cond, NULL);
- ovs_mutex_init(&handler->mutex);
- xpthread_create(&handler->thread, NULL, udpif_upcall_handler,
- handler);
+ handler->handler_id = i;
+ handler->thread = ovs_thread_create(
+ "handler", udpif_upcall_handler, handler);
}
+ xpthread_barrier_init(&udpif->reval_barrier, NULL,
+ udpif->n_revalidators);
+ udpif->reval_exit = false;
udpif->revalidators = xzalloc(udpif->n_revalidators
* sizeof *udpif->revalidators);
+ udpif->ukeys = xmalloc(sizeof *udpif->ukeys * n_revalidators);
for (i = 0; i < udpif->n_revalidators; i++) {
struct revalidator *revalidator = &udpif->revalidators[i];
revalidator->udpif = udpif;
- list_init(&revalidator->udumps);
- hmap_init(&revalidator->ukeys);
- ovs_mutex_init(&revalidator->mutex);
- xpthread_cond_init(&revalidator->wake_cond, NULL);
- xpthread_create(&revalidator->thread, NULL, udpif_revalidator,
- revalidator);
+ hmap_init(&udpif->ukeys[i].hmap);
+ ovs_mutex_init(&udpif->ukeys[i].mutex);
+ revalidator->ukeys = &udpif->ukeys[i].hmap;
+ revalidator->thread = ovs_thread_create(
+ "revalidator", udpif_revalidator, revalidator);
}
- xpthread_create(&udpif->dispatcher, NULL, udpif_dispatcher, udpif);
- xpthread_create(&udpif->flow_dumper, NULL, udpif_flow_dumper, udpif);
}
+}
+/* Tells 'udpif' how many threads it should use to handle upcalls.
+ * 'n_handlers' and 'n_revalidators' can never be zero. 'udpif''s
+ * datapath handle must have packet reception enabled before starting
+ * threads. */
+void
+udpif_set_threads(struct udpif *udpif, size_t n_handlers,
+ size_t n_revalidators)
+{
+ int error;
+
+ ovs_assert(udpif);
+ ovs_assert(n_handlers && n_revalidators);
+
+ ovsrcu_quiesce_start();
+ if (udpif->n_handlers != n_handlers
+ || udpif->n_revalidators != n_revalidators) {
+ udpif_stop_threads(udpif);
+ }
+
+ error = dpif_handlers_set(udpif->dpif, n_handlers);
+ if (error) {
+ VLOG_ERR("failed to configure handlers in dpif %s: %s",
+ dpif_name(udpif->dpif), ovs_strerror(error));
+ return;
+ }
+
+ if (!udpif->handlers && !udpif->revalidators) {
+ udpif_start_threads(udpif, n_handlers, n_revalidators);
+ }
ovsrcu_quiesce_end();
}
* its main loop once. */
size_t n_handlers = udpif->n_handlers;
size_t n_revalidators = udpif->n_revalidators;
- udpif_set_threads(udpif, 0, 0);
- udpif_set_threads(udpif, n_handlers, n_revalidators);
+
+ ovsrcu_quiesce_start();
+ udpif_stop_threads(udpif);
+ udpif_start_threads(udpif, n_handlers, n_revalidators);
+ ovsrcu_quiesce_end();
}
/* Notifies 'udpif' that something changed which may render previous
{
size_t i;
- simap_increase(usage, "dispatchers", 1);
- simap_increase(usage, "flow_dumpers", 1);
-
simap_increase(usage, "handlers", udpif->n_handlers);
- for (i = 0; i < udpif->n_handlers; i++) {
- struct handler *handler = &udpif->handlers[i];
- ovs_mutex_lock(&handler->mutex);
- simap_increase(usage, "handler upcalls", handler->n_upcalls);
- ovs_mutex_unlock(&handler->mutex);
- }
simap_increase(usage, "revalidators", udpif->n_revalidators);
for (i = 0; i < udpif->n_revalidators; i++) {
- struct revalidator *revalidator = &udpif->revalidators[i];
- ovs_mutex_lock(&revalidator->mutex);
- simap_increase(usage, "revalidator dumps", revalidator->n_udumps);
-
- /* XXX: This isn't technically thread safe because the revalidator
- * ukeys maps isn't protected by a mutex since it's per thread. */
- simap_increase(usage, "revalidator keys",
- hmap_count(&revalidator->ukeys));
- ovs_mutex_unlock(&revalidator->mutex);
+ ovs_mutex_lock(&udpif->ukeys[i].mutex);
+ simap_increase(usage, "udpif keys", hmap_count(&udpif->ukeys[i].hmap));
+ ovs_mutex_unlock(&udpif->ukeys[i].mutex);
}
}
n_handlers = udpif->n_handlers;
n_revalidators = udpif->n_revalidators;
- udpif_set_threads(udpif, 0, 0);
+ ovsrcu_quiesce_start();
+
+ udpif_stop_threads(udpif);
dpif_flow_flush(udpif->dpif);
- udpif_set_threads(udpif, n_handlers, n_revalidators);
+ udpif_start_threads(udpif, n_handlers, n_revalidators);
+
+ ovsrcu_quiesce_end();
}
/* Removes all flows from all datapaths. */
}
\f
-/* Destroys and deallocates 'upcall'. */
-static void
-upcall_destroy(struct upcall *upcall)
-{
- if (upcall) {
- ofpbuf_uninit(&upcall->dpif_upcall.packet);
- ofpbuf_uninit(&upcall->upcall_buf);
- free(upcall);
- }
-}
-
static uint64_t
udpif_get_n_flows(struct udpif *udpif)
{
return flow_count;
}
-/* The dispatcher thread is responsible for receiving upcalls from the kernel,
- * assigning them to a upcall_handler thread. */
-static void *
-udpif_dispatcher(void *arg)
-{
- struct udpif *udpif = arg;
-
- set_subprogram_name("dispatcher");
- while (!latch_is_set(&udpif->exit_latch)) {
- recv_upcalls(udpif);
- dpif_recv_wait(udpif->dpif, 0);
- latch_wait(&udpif->exit_latch);
- poll_block();
- }
-
- return NULL;
-}
-
-static void *
-udpif_flow_dumper(void *arg)
-{
- struct udpif *udpif = arg;
-
- set_subprogram_name("flow_dumper");
- while (!latch_is_set(&udpif->exit_latch)) {
- const struct dpif_flow_stats *stats;
- long long int start_time, duration;
- const struct nlattr *key, *mask;
- struct dpif_flow_dump dump;
- size_t key_len, mask_len;
- unsigned int flow_limit;
- bool need_revalidate;
- uint64_t reval_seq;
- size_t n_flows, i;
- int error;
- void *state = NULL;
-
- reval_seq = seq_read(udpif->reval_seq);
- need_revalidate = udpif->last_reval_seq != reval_seq;
- udpif->last_reval_seq = reval_seq;
-
- n_flows = udpif_get_n_flows(udpif);
- udpif->max_n_flows = MAX(n_flows, udpif->max_n_flows);
- udpif->avg_n_flows = (udpif->avg_n_flows + n_flows) / 2;
-
- start_time = time_msec();
- error = dpif_flow_dump_start(&dump, udpif->dpif);
- if (error) {
- VLOG_INFO("Failed to start flow dump (%s)", ovs_strerror(error));
- goto skip;
- }
- dpif_flow_dump_state_init(udpif->dpif, &state);
- while (dpif_flow_dump_next(&dump, state, &key, &key_len,
- &mask, &mask_len, NULL, NULL, &stats)
- && !latch_is_set(&udpif->exit_latch)) {
- struct udpif_flow_dump *udump = xmalloc(sizeof *udump);
- struct revalidator *revalidator;
-
- udump->key_hash = hash_bytes(key, key_len, udpif->secret);
- memcpy(&udump->key_buf, key, key_len);
- udump->key = (struct nlattr *) &udump->key_buf;
- udump->key_len = key_len;
-
- memcpy(&udump->mask_buf, mask, mask_len);
- udump->mask = (struct nlattr *) &udump->mask_buf;
- udump->mask_len = mask_len;
-
- udump->stats = *stats;
- udump->need_revalidate = need_revalidate;
-
- revalidator = &udpif->revalidators[udump->key_hash
- % udpif->n_revalidators];
-
- ovs_mutex_lock(&revalidator->mutex);
- while (revalidator->n_udumps >= REVALIDATE_MAX_BATCH * 3
- && !latch_is_set(&udpif->exit_latch)) {
- ovs_mutex_cond_wait(&revalidator->wake_cond,
- &revalidator->mutex);
- }
- list_push_back(&revalidator->udumps, &udump->list_node);
- revalidator->n_udumps++;
- xpthread_cond_signal(&revalidator->wake_cond);
- ovs_mutex_unlock(&revalidator->mutex);
- }
- dpif_flow_dump_state_uninit(udpif->dpif, state);
- dpif_flow_dump_done(&dump);
-
- /* Let all the revalidators finish and garbage collect. */
- seq_change(udpif->dump_seq);
- for (i = 0; i < udpif->n_revalidators; i++) {
- struct revalidator *revalidator = &udpif->revalidators[i];
- ovs_mutex_lock(&revalidator->mutex);
- xpthread_cond_signal(&revalidator->wake_cond);
- ovs_mutex_unlock(&revalidator->mutex);
- }
-
- for (i = 0; i < udpif->n_revalidators; i++) {
- struct revalidator *revalidator = &udpif->revalidators[i];
-
- ovs_mutex_lock(&revalidator->mutex);
- while (revalidator->dump_seq != seq_read(udpif->dump_seq)
- && !latch_is_set(&udpif->exit_latch)) {
- ovs_mutex_cond_wait(&revalidator->wake_cond,
- &revalidator->mutex);
- }
- ovs_mutex_unlock(&revalidator->mutex);
- }
-
- duration = MAX(time_msec() - start_time, 1);
- udpif->dump_duration = duration;
- atomic_read(&udpif->flow_limit, &flow_limit);
- if (duration > 2000) {
- flow_limit /= duration / 1000;
- } else if (duration > 1300) {
- flow_limit = flow_limit * 3 / 4;
- } else if (duration < 1000 && n_flows > 2000
- && flow_limit < n_flows * 1000 / duration) {
- flow_limit += 1000;
- }
- flow_limit = MIN(ofproto_flow_limit, MAX(flow_limit, 1000));
- atomic_store(&udpif->flow_limit, flow_limit);
-
- if (duration > 2000) {
- VLOG_INFO("Spent an unreasonably long %lldms dumping flows",
- duration);
- }
-
-skip:
- poll_timer_wait_until(start_time + MIN(ofproto_max_idle, 500));
- seq_wait(udpif->reval_seq, udpif->last_reval_seq);
- latch_wait(&udpif->exit_latch);
- poll_block();
- }
-
- return NULL;
-}
-
-/* The miss handler thread is responsible for processing miss upcalls retrieved
- * by the dispatcher thread. Once finished it passes the processed miss
- * upcalls to ofproto-dpif where they're installed in the datapath. */
+/* The upcall handler thread tries to read a batch of FLOW_MISS_MAX_BATCH
+ * upcalls from dpif, processes the batch and installs corresponding flows
+ * in dpif. */
static void *
udpif_upcall_handler(void *arg)
{
struct handler *handler = arg;
-
- handler->name = xasprintf("handler_%u", ovsthread_id_self());
- set_subprogram_name("%s", handler->name);
+ struct udpif *udpif = handler->udpif;
+ struct hmap misses = HMAP_INITIALIZER(&misses);
while (!latch_is_set(&handler->udpif->exit_latch)) {
- struct list misses = LIST_INITIALIZER(&misses);
- size_t i;
-
- ovs_mutex_lock(&handler->mutex);
- /* Must check the 'exit_latch' again to make sure the main thread is
- * not joining on the handler thread. */
- if (!handler->n_upcalls
- && !latch_is_set(&handler->udpif->exit_latch)) {
- ovs_mutex_cond_wait(&handler->wake_cond, &handler->mutex);
- }
+ struct upcall upcalls[FLOW_MISS_MAX_BATCH];
+ struct flow_miss miss_buf[FLOW_MISS_MAX_BATCH];
+ struct flow_miss *miss;
+ size_t n_upcalls, i;
+
+ n_upcalls = read_upcalls(handler, upcalls, miss_buf, &misses);
+ if (!n_upcalls) {
+ dpif_recv_wait(udpif->dpif, handler->handler_id);
+ latch_wait(&udpif->exit_latch);
+ poll_block();
+ } else {
+ handle_upcalls(handler, &misses, upcalls, n_upcalls);
- for (i = 0; i < FLOW_MISS_MAX_BATCH; i++) {
- if (handler->n_upcalls) {
- handler->n_upcalls--;
- list_push_back(&misses, list_pop_front(&handler->upcalls));
- } else {
- break;
+ HMAP_FOR_EACH (miss, hmap_node, &misses) {
+ xlate_out_uninit(&miss->xout);
+ }
+ hmap_clear(&misses);
+ for (i = 0; i < n_upcalls; i++) {
+ ofpbuf_uninit(&upcalls[i].dpif_upcall.packet);
+ ofpbuf_uninit(&upcalls[i].upcall_buf);
}
}
- ovs_mutex_unlock(&handler->mutex);
-
- handle_upcalls(handler, &misses);
-
coverage_clear();
}
+ hmap_destroy(&misses);
return NULL;
}
static void *
udpif_revalidator(void *arg)
{
+ /* Used by all revalidators. */
struct revalidator *revalidator = arg;
+ struct udpif *udpif = revalidator->udpif;
+ bool leader = revalidator == &udpif->revalidators[0];
- revalidator->name = xasprintf("revalidator_%u", ovsthread_id_self());
- set_subprogram_name("%s", revalidator->name);
+ /* Used only by the leader. */
+ long long int start_time = 0;
+ uint64_t last_reval_seq = 0;
+ unsigned int flow_limit = 0;
+ size_t n_flows = 0;
+
+ revalidator->id = ovsthread_id_self();
for (;;) {
- struct list udumps = LIST_INITIALIZER(&udumps);
- struct udpif *udpif = revalidator->udpif;
- size_t i;
+ if (leader) {
+ uint64_t reval_seq;
- ovs_mutex_lock(&revalidator->mutex);
- if (latch_is_set(&udpif->exit_latch)) {
- ovs_mutex_unlock(&revalidator->mutex);
- return NULL;
- }
+ reval_seq = seq_read(udpif->reval_seq);
+ udpif->need_revalidate = last_reval_seq != reval_seq;
+ last_reval_seq = reval_seq;
- if (!revalidator->n_udumps) {
- if (revalidator->dump_seq != seq_read(udpif->dump_seq)) {
- revalidator->dump_seq = seq_read(udpif->dump_seq);
- revalidator_sweep(revalidator);
- } else {
- ovs_mutex_cond_wait(&revalidator->wake_cond,
- &revalidator->mutex);
+ n_flows = udpif_get_n_flows(udpif);
+ udpif->max_n_flows = MAX(n_flows, udpif->max_n_flows);
+ udpif->avg_n_flows = (udpif->avg_n_flows + n_flows) / 2;
+
+ /* Only the leader checks the exit latch to prevent a race where
+ * some threads think it's true and exit and others think it's
+ * false and block indefinitely on the reval_barrier */
+ udpif->reval_exit = latch_is_set(&udpif->exit_latch);
+
+ start_time = time_msec();
+ if (!udpif->reval_exit) {
+ dpif_flow_dump_start(&udpif->dump, udpif->dpif);
}
}
- for (i = 0; i < REVALIDATE_MAX_BATCH && revalidator->n_udumps; i++) {
- list_push_back(&udumps, list_pop_front(&revalidator->udumps));
- revalidator->n_udumps--;
+ /* Wait for the leader to start the flow dump. */
+ xpthread_barrier_wait(&udpif->reval_barrier);
+ if (udpif->reval_exit) {
+ break;
}
+ revalidate(revalidator);
+
+ /* Wait for all flows to have been dumped before we garbage collect. */
+ xpthread_barrier_wait(&udpif->reval_barrier);
+ revalidator_sweep(revalidator);
+
+ /* Wait for all revalidators to finish garbage collection. */
+ xpthread_barrier_wait(&udpif->reval_barrier);
+
+ if (leader) {
+ long long int duration;
+
+ dpif_flow_dump_done(&udpif->dump);
+ seq_change(udpif->dump_seq);
+
+ duration = MAX(time_msec() - start_time, 1);
+ atomic_read(&udpif->flow_limit, &flow_limit);
+ udpif->dump_duration = duration;
+ if (duration > 2000) {
+ flow_limit /= duration / 1000;
+ } else if (duration > 1300) {
+ flow_limit = flow_limit * 3 / 4;
+ } else if (duration < 1000 && n_flows > 2000
+ && flow_limit < n_flows * 1000 / duration) {
+ flow_limit += 1000;
+ }
+ flow_limit = MIN(ofproto_flow_limit, MAX(flow_limit, 1000));
+ atomic_store(&udpif->flow_limit, flow_limit);
- /* Wake up the flow dumper. */
- xpthread_cond_signal(&revalidator->wake_cond);
- ovs_mutex_unlock(&revalidator->mutex);
+ if (duration > 2000) {
+ VLOG_INFO("Spent an unreasonably long %lldms dumping flows",
+ duration);
+ }
- if (!list_is_empty(&udumps)) {
- revalidate_udumps(revalidator, &udumps);
+ poll_timer_wait_until(start_time + MIN(ofproto_max_idle, 500));
+ seq_wait(udpif->reval_seq, last_reval_seq);
+ latch_wait(&udpif->exit_latch);
+ poll_block();
}
}
}
}
-static void
-recv_upcalls(struct udpif *udpif)
-{
- int n;
-
- for (;;) {
- uint32_t hash = udpif->secret;
- struct handler *handler;
- struct upcall *upcall;
- size_t n_bytes, left;
- struct nlattr *nla;
- int error;
-
- upcall = xmalloc(sizeof *upcall);
- ofpbuf_use_stub(&upcall->upcall_buf, upcall->upcall_stub,
- sizeof upcall->upcall_stub);
- error = dpif_recv(udpif->dpif, 0, &upcall->dpif_upcall,
- &upcall->upcall_buf);
- if (error) {
- /* upcall_destroy() can only be called on successfully received
- * upcalls. */
- ofpbuf_uninit(&upcall->upcall_buf);
- free(upcall);
- break;
- }
-
- n_bytes = 0;
- NL_ATTR_FOR_EACH (nla, left, upcall->dpif_upcall.key,
- upcall->dpif_upcall.key_len) {
- enum ovs_key_attr type = nl_attr_type(nla);
- if (type == OVS_KEY_ATTR_IN_PORT
- || type == OVS_KEY_ATTR_TCP
- || type == OVS_KEY_ATTR_UDP) {
- if (nl_attr_get_size(nla) == 4) {
- hash = mhash_add(hash, nl_attr_get_u32(nla));
- n_bytes += 4;
- } else {
- VLOG_WARN_RL(&rl,
- "Netlink attribute with incorrect size.");
- }
- }
- }
- hash = mhash_finish(hash, n_bytes);
-
- handler = &udpif->handlers[hash % udpif->n_handlers];
-
- ovs_mutex_lock(&handler->mutex);
- if (handler->n_upcalls < MAX_QUEUE_LENGTH) {
- list_push_back(&handler->upcalls, &upcall->list_node);
- if (handler->n_upcalls == 0) {
- handler->need_signal = true;
- }
- handler->n_upcalls++;
- if (handler->need_signal &&
- handler->n_upcalls >= FLOW_MISS_MAX_BATCH) {
- handler->need_signal = false;
- xpthread_cond_signal(&handler->wake_cond);
- }
- ovs_mutex_unlock(&handler->mutex);
- if (!VLOG_DROP_DBG(&rl)) {
- struct ds ds = DS_EMPTY_INITIALIZER;
-
- odp_flow_key_format(upcall->dpif_upcall.key,
- upcall->dpif_upcall.key_len,
- &ds);
- VLOG_DBG("dispatcher: enqueue (%s)", ds_cstr(&ds));
- ds_destroy(&ds);
- }
- } else {
- ovs_mutex_unlock(&handler->mutex);
- COVERAGE_INC(upcall_queue_overflow);
- upcall_destroy(upcall);
- }
- }
-
- for (n = 0; n < udpif->n_handlers; ++n) {
- struct handler *handler = &udpif->handlers[n];
-
- if (handler->need_signal) {
- handler->need_signal = false;
- ovs_mutex_lock(&handler->mutex);
- xpthread_cond_signal(&handler->wake_cond);
- ovs_mutex_unlock(&handler->mutex);
- }
- }
-}
-
/* Calculates slow path actions for 'xout'. 'buf' must statically be
* initialized with at least 128 bytes of space. */
static void
compose_slow_path(struct udpif *udpif, struct xlate_out *xout,
- odp_port_t odp_in_port, struct ofpbuf *buf)
+ struct flow *flow, odp_port_t odp_in_port,
+ struct ofpbuf *buf)
{
union user_action_cookie cookie;
odp_port_t port;
port = xout->slow & (SLOW_CFM | SLOW_BFD | SLOW_LACP | SLOW_STP)
? ODPP_NONE
: odp_in_port;
- pid = dpif_port_get_pid(udpif->dpif, port, 0);
+ pid = dpif_port_get_pid(udpif->dpif, port, flow_hash_5tuple(flow, 0));
odp_put_userspace_action(pid, &cookie, sizeof cookie.slow_path, buf);
}
return NULL;
}
-static void
-handle_upcalls(struct handler *handler, struct list *upcalls)
+/* Reads and classifies upcalls. Returns the number of upcalls successfully
+ * read. */
+static size_t
+read_upcalls(struct handler *handler,
+ struct upcall upcalls[FLOW_MISS_MAX_BATCH],
+ struct flow_miss miss_buf[FLOW_MISS_MAX_BATCH],
+ struct hmap *misses)
{
- struct hmap misses = HMAP_INITIALIZER(&misses);
struct udpif *udpif = handler->udpif;
+ size_t i;
+ size_t n_misses = 0;
+ size_t n_upcalls = 0;
- struct flow_miss miss_buf[FLOW_MISS_MAX_BATCH];
- struct dpif_op *opsp[FLOW_MISS_MAX_BATCH * 2];
- struct dpif_op ops[FLOW_MISS_MAX_BATCH * 2];
- struct flow_miss *miss, *next_miss;
- struct upcall *upcall, *next;
- size_t n_misses, n_ops, i;
- unsigned int flow_limit;
- bool fail_open, may_put;
- enum upcall_type type;
-
- atomic_read(&udpif->flow_limit, &flow_limit);
- may_put = udpif_get_n_flows(udpif) < flow_limit;
-
- /* Extract the flow from each upcall. Construct in 'misses' a hash table
+ /*
+ * Try reading FLOW_MISS_MAX_BATCH upcalls from dpif.
+ *
+ * Extract the flow from each upcall. Construct in 'misses' a hash table
* that maps each unique flow to a 'struct flow_miss'.
*
* Most commonly there is a single packet per flow_miss, but there are
* other end of the connection, which gives OVS a chance to set up a
* datapath flow.)
*/
- n_misses = 0;
- LIST_FOR_EACH_SAFE (upcall, next, list_node, upcalls) {
- struct dpif_upcall *dupcall = &upcall->dpif_upcall;
+ for (i = 0; i < FLOW_MISS_MAX_BATCH; i++) {
+ struct upcall *upcall = &upcalls[n_upcalls];
struct flow_miss *miss = &miss_buf[n_misses];
- struct ofpbuf *packet = &dupcall->packet;
+ struct dpif_upcall *dupcall;
+ struct ofpbuf *packet;
struct flow_miss *existing_miss;
struct ofproto_dpif *ofproto;
struct dpif_sflow *sflow;
struct dpif_ipfix *ipfix;
- odp_port_t odp_in_port;
struct flow flow;
+ enum upcall_type type;
+ odp_port_t odp_in_port;
int error;
+ ofpbuf_use_stub(&upcall->upcall_buf, upcall->upcall_stub,
+ sizeof upcall->upcall_stub);
+ error = dpif_recv(udpif->dpif, handler->handler_id,
+ &upcall->dpif_upcall, &upcall->upcall_buf);
+ if (error) {
+ ofpbuf_uninit(&upcall->upcall_buf);
+ break;
+ }
+
+ dupcall = &upcall->dpif_upcall;
+ packet = &dupcall->packet;
error = xlate_receive(udpif->backer, packet, dupcall->key,
dupcall->key_len, &flow,
&ofproto, &ipfix, &sflow, NULL, &odp_in_port);
dupcall->key, dupcall->key_len, NULL, 0, NULL, 0,
NULL);
}
- list_remove(&upcall->list_node);
- upcall_destroy(upcall);
- continue;
+ goto destroy_upcall;
}
type = classify_upcall(upcall);
flow_extract(packet, &md, &miss->flow);
hash = flow_hash(&miss->flow, 0);
- existing_miss = flow_miss_find(&misses, ofproto, &miss->flow,
+ existing_miss = flow_miss_find(misses, ofproto, &miss->flow,
hash);
if (!existing_miss) {
- hmap_insert(&misses, &miss->hmap_node, hash);
+ hmap_insert(misses, &miss->hmap_node, hash);
miss->ofproto = ofproto;
miss->key = dupcall->key;
miss->key_len = dupcall->key_len;
miss->stats.tcp_flags = 0;
miss->odp_in_port = odp_in_port;
miss->put = false;
-
n_misses++;
} else {
miss = existing_miss;
miss->stats.n_packets++;
upcall->flow_miss = miss;
+ n_upcalls++;
continue;
}
dpif_ipfix_unref(ipfix);
dpif_sflow_unref(sflow);
- list_remove(&upcall->list_node);
- upcall_destroy(upcall);
+destroy_upcall:
+ ofpbuf_uninit(&upcall->dpif_upcall.packet);
+ ofpbuf_uninit(&upcall->upcall_buf);
}
+ return n_upcalls;
+}
+
+static void
+handle_upcalls(struct handler *handler, struct hmap *misses,
+ struct upcall *upcalls, size_t n_upcalls)
+{
+ struct udpif *udpif = handler->udpif;
+ struct dpif_op *opsp[FLOW_MISS_MAX_BATCH * 2];
+ struct dpif_op ops[FLOW_MISS_MAX_BATCH * 2];
+ struct flow_miss *miss;
+ size_t n_ops, i;
+ unsigned int flow_limit;
+ bool fail_open, may_put;
+
+ atomic_read(&udpif->flow_limit, &flow_limit);
+ may_put = udpif_get_n_flows(udpif) < flow_limit;
+
/* Initialize each 'struct flow_miss's ->xout.
*
* We do this per-flow_miss rather than per-packet because, most commonly,
* We can't do this in the previous loop because we need the TCP flags for
* all the packets in each miss. */
fail_open = false;
- HMAP_FOR_EACH (miss, hmap_node, &misses) {
+ HMAP_FOR_EACH (miss, hmap_node, misses) {
struct xlate_in xin;
xlate_in_init(&xin, miss->ofproto, &miss->flow, NULL,
* The loop fills 'ops' with an array of operations to execute in the
* datapath. */
n_ops = 0;
- LIST_FOR_EACH (upcall, list_node, upcalls) {
+ for (i = 0; i < n_upcalls; i++) {
+ struct upcall *upcall = &upcalls[i];
struct flow_miss *miss = upcall->flow_miss;
struct ofpbuf *packet = &upcall->dpif_upcall.packet;
struct dpif_op *op;
ofpbuf_use_stack(&buf, miss->slow_path_buf,
sizeof miss->slow_path_buf);
- compose_slow_path(udpif, &miss->xout, miss->odp_in_port, &buf);
+ compose_slow_path(udpif, &miss->xout, &miss->flow,
+ miss->odp_in_port, &buf);
op->u.flow_put.actions = ofpbuf_data(&buf);
op->u.flow_put.actions_len = ofpbuf_size(&buf);
}
*
* Copy packets before they are modified by execution. */
if (fail_open) {
- LIST_FOR_EACH (upcall, list_node, upcalls) {
+ for (i = 0; i < n_upcalls; i++) {
+ struct upcall *upcall = &upcalls[i];
struct flow_miss *miss = upcall->flow_miss;
struct ofpbuf *packet = &upcall->dpif_upcall.packet;
struct ofproto_packet_in *pin;
opsp[i] = &ops[i];
}
dpif_operate(udpif->dpif, opsp, n_ops);
-
- HMAP_FOR_EACH_SAFE (miss, next_miss, hmap_node, &misses) {
- hmap_remove(&misses, &miss->hmap_node);
- xlate_out_uninit(&miss->xout);
- }
- hmap_destroy(&misses);
-
- LIST_FOR_EACH_SAFE (upcall, next, list_node, upcalls) {
- list_remove(&upcall->list_node);
- upcall_destroy(upcall);
- }
}
+/* Must be called with udpif->ukeys[hash % udpif->n_revalidators].mutex. */
static struct udpif_key *
-ukey_lookup(struct revalidator *revalidator, struct udpif_flow_dump *udump)
+ukey_lookup__(struct udpif *udpif, const struct nlattr *key, size_t key_len,
+ uint32_t hash)
{
struct udpif_key *ukey;
+ struct hmap *hmap = &udpif->ukeys[hash % udpif->n_revalidators].hmap;
- HMAP_FOR_EACH_WITH_HASH (ukey, hmap_node, udump->key_hash,
- &revalidator->ukeys) {
- if (ukey->key_len == udump->key_len
- && !memcmp(ukey->key, udump->key, udump->key_len)) {
+ HMAP_FOR_EACH_WITH_HASH (ukey, hmap_node, hash, hmap) {
+ if (ukey->key_len == key_len && !memcmp(ukey->key, key, key_len)) {
return ukey;
}
}
return NULL;
}
+static struct udpif_key *
+ukey_lookup(struct udpif *udpif, const struct nlattr *key, size_t key_len,
+ uint32_t hash)
+{
+ struct udpif_key *ukey;
+ uint32_t idx = hash % udpif->n_revalidators;
+
+ ovs_mutex_lock(&udpif->ukeys[idx].mutex);
+ ukey = ukey_lookup__(udpif, key, key_len, hash);
+ ovs_mutex_unlock(&udpif->ukeys[idx].mutex);
+
+ return ukey;
+}
+
static struct udpif_key *
ukey_create(const struct nlattr *key, size_t key_len, long long int used)
{
struct udpif_key *ukey = xmalloc(sizeof *ukey);
+ ovs_mutex_init(&ukey->mutex);
ukey->key = (struct nlattr *) &ukey->key_buf;
memcpy(&ukey->key_buf, key, key_len);
ukey->key_len = key_len;
+ ovs_mutex_lock(&ukey->mutex);
ukey->mark = false;
+ ukey->flow_exists = true;
ukey->created = used ? used : time_msec();
memset(&ukey->stats, 0, sizeof ukey->stats);
+ ukey->xcache = NULL;
+ ovs_mutex_unlock(&ukey->mutex);
return ukey;
}
+/* Checks for a ukey in 'udpif->ukeys' with the same 'ukey->key' and 'hash',
+ * and inserts 'ukey' if it does not exist.
+ *
+ * Returns true if 'ukey' was inserted into 'udpif->ukeys', false otherwise. */
+static bool
+udpif_insert_ukey(struct udpif *udpif, struct udpif_key *ukey, uint32_t hash)
+{
+ struct udpif_key *duplicate;
+ uint32_t idx = hash % udpif->n_revalidators;
+ bool ok;
+
+ ovs_mutex_lock(&udpif->ukeys[idx].mutex);
+ duplicate = ukey_lookup__(udpif, ukey->key, ukey->key_len, hash);
+ if (duplicate) {
+ ok = false;
+ } else {
+ hmap_insert(&udpif->ukeys[idx].hmap, &ukey->hmap_node, hash);
+ ok = true;
+ }
+ ovs_mutex_unlock(&udpif->ukeys[idx].mutex);
+
+ return ok;
+}
+
static void
ukey_delete(struct revalidator *revalidator, struct udpif_key *ukey)
+ OVS_NO_THREAD_SAFETY_ANALYSIS
{
- hmap_remove(&revalidator->ukeys, &ukey->hmap_node);
+ if (revalidator) {
+ hmap_remove(revalidator->ukeys, &ukey->hmap_node);
+ }
+ xlate_cache_delete(ukey->xcache);
+ ovs_mutex_destroy(&ukey->mutex);
free(ukey);
}
static bool
-revalidate_ukey(struct udpif *udpif, struct udpif_flow_dump *udump,
- struct udpif_key *ukey)
+should_revalidate(uint64_t packets, long long int used)
+{
+ long long int metric, now, duration;
+
+ /* Calculate the mean time between seeing these packets. If this
+ * exceeds the threshold, then delete the flow rather than performing
+ * costly revalidation for flows that aren't being hit frequently.
+ *
+ * This is targeted at situations where the dump_duration is high (~1s),
+ * and revalidation is triggered by a call to udpif_revalidate(). In
+ * these situations, revalidation of all flows causes fluctuations in the
+ * flow_limit due to the interaction with the dump_duration and max_idle.
+ * This tends to result in deletion of low-throughput flows anyway, so
+ * skip the revalidation and just delete those flows. */
+ packets = MAX(packets, 1);
+ now = MAX(used, time_msec());
+ duration = now - used;
+ metric = duration / packets;
+
+ if (metric > 200) {
+ return false;
+ }
+ return true;
+}
+
+static bool
+revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
+ const struct nlattr *mask, size_t mask_len,
+ const struct nlattr *actions, size_t actions_len,
+ const struct dpif_flow_stats *stats)
{
- struct ofpbuf xout_actions, *actions;
uint64_t slow_path_buf[128 / 8];
struct xlate_out xout, *xoutp;
struct netflow *netflow;
- struct flow flow, udump_mask;
struct ofproto_dpif *ofproto;
struct dpif_flow_stats push;
- uint32_t *udump32, *xout32;
+ struct ofpbuf xout_actions;
+ struct flow flow, dp_mask;
+ uint32_t *dp32, *xout32;
odp_port_t odp_in_port;
struct xlate_in xin;
+ long long int last_used;
int error;
size_t i;
- bool ok;
+ bool may_learn, ok;
ok = false;
xoutp = NULL;
- actions = NULL;
netflow = NULL;
- /* If we don't need to revalidate, we can simply push the stats contained
- * in the udump, otherwise we'll have to get the actions so we can check
- * them. */
- if (udump->need_revalidate) {
- if (dpif_flow_get(udpif->dpif, ukey->key, ukey->key_len, &actions,
- &udump->stats)) {
- goto exit;
- }
- }
-
- push.used = udump->stats.used;
- push.tcp_flags = udump->stats.tcp_flags;
- push.n_packets = udump->stats.n_packets > ukey->stats.n_packets
- ? udump->stats.n_packets - ukey->stats.n_packets
+ ovs_mutex_lock(&ukey->mutex);
+ last_used = ukey->stats.used;
+ push.used = stats->used;
+ push.tcp_flags = stats->tcp_flags;
+ push.n_packets = stats->n_packets > ukey->stats.n_packets
+ ? stats->n_packets - ukey->stats.n_packets
: 0;
- push.n_bytes = udump->stats.n_bytes > ukey->stats.n_bytes
- ? udump->stats.n_bytes - ukey->stats.n_bytes
+ push.n_bytes = stats->n_bytes > ukey->stats.n_bytes
+ ? stats->n_bytes - ukey->stats.n_bytes
: 0;
- ukey->stats = udump->stats;
- if (!push.n_packets && !udump->need_revalidate) {
+ if (!ukey->flow_exists) {
+ /* Don't bother revalidating if the flow was already deleted. */
+ goto exit;
+ }
+
+ if (udpif->need_revalidate && last_used
+ && !should_revalidate(push.n_packets, last_used)) {
+ ok = false;
+ goto exit;
+ }
+
+ /* We will push the stats, so update the ukey stats cache. */
+ ukey->stats = *stats;
+ if (!push.n_packets && !udpif->need_revalidate) {
+ ok = true;
+ goto exit;
+ }
+
+ may_learn = push.n_packets > 0;
+ if (ukey->xcache && !udpif->need_revalidate) {
+ xlate_push_stats(ukey->xcache, may_learn, &push);
ok = true;
goto exit;
}
goto exit;
}
+ if (udpif->need_revalidate) {
+ xlate_cache_clear(ukey->xcache);
+ }
+ if (!ukey->xcache) {
+ ukey->xcache = xlate_cache_new();
+ }
+
xlate_in_init(&xin, ofproto, &flow, NULL, push.tcp_flags, NULL);
xin.resubmit_stats = push.n_packets ? &push : NULL;
- xin.may_learn = push.n_packets > 0;
- xin.skip_wildcards = !udump->need_revalidate;
+ xin.xcache = ukey->xcache;
+ xin.may_learn = may_learn;
+ xin.skip_wildcards = !udpif->need_revalidate;
xlate_actions(&xin, &xout);
xoutp = &xout;
- if (!udump->need_revalidate) {
+ if (!udpif->need_revalidate) {
ok = true;
goto exit;
}
ofpbuf_size(&xout.odp_actions));
} else {
ofpbuf_use_stack(&xout_actions, slow_path_buf, sizeof slow_path_buf);
- compose_slow_path(udpif, &xout, odp_in_port, &xout_actions);
+ compose_slow_path(udpif, &xout, &flow, odp_in_port, &xout_actions);
}
- if (!ofpbuf_equal(&xout_actions, actions)) {
+ if (actions_len != ofpbuf_size(&xout_actions)
+ || memcmp(ofpbuf_data(&xout_actions), actions, actions_len)) {
goto exit;
}
- if (odp_flow_key_to_mask(udump->mask, udump->mask_len, &udump_mask, &flow)
+ if (odp_flow_key_to_mask(mask, mask_len, &dp_mask, &flow)
== ODP_FIT_ERROR) {
goto exit;
}
* mask in the kernel is more specific i.e. less wildcarded, than what
* we've calculated here. This guarantees we don't catch any packets we
* shouldn't with the megaflow. */
- udump32 = (uint32_t *) &udump_mask;
+ dp32 = (uint32_t *) &dp_mask;
xout32 = (uint32_t *) &xout.wc.masks;
for (i = 0; i < FLOW_U32S; i++) {
- if ((udump32[i] | xout32[i]) != udump32[i]) {
+ if ((dp32[i] | xout32[i]) != dp32[i]) {
goto exit;
}
}
ok = true;
exit:
+ ovs_mutex_unlock(&ukey->mutex);
if (netflow) {
if (!ok) {
netflow_expire(netflow, &flow);
}
netflow_unref(netflow);
}
- ofpbuf_delete(actions);
xlate_out_uninit(xoutp);
return ok;
}
struct dump_op {
struct udpif_key *ukey;
- struct udpif_flow_dump *udump;
struct dpif_flow_stats stats; /* Stats for 'op'. */
struct dpif_op op; /* Flow del operation. */
};
static void
dump_op_init(struct dump_op *op, const struct nlattr *key, size_t key_len,
- struct udpif_key *ukey, struct udpif_flow_dump *udump)
+ struct udpif_key *ukey)
{
op->ukey = ukey;
- op->udump = udump;
op->op.type = DPIF_OP_FLOW_DEL;
op->op.u.flow_del.key = key;
op->op.u.flow_del.key_len = key_len;
}
static void
-push_dump_ops(struct revalidator *revalidator,
- struct dump_op *ops, size_t n_ops)
+push_dump_ops__(struct udpif *udpif, struct dump_op *ops, size_t n_ops)
{
- struct udpif *udpif = revalidator->udpif;
struct dpif_op *opsp[REVALIDATE_MAX_BATCH];
size_t i;
stats = op->op.u.flow_del.stats;
if (op->ukey) {
push = &push_buf;
+ ovs_mutex_lock(&op->ukey->mutex);
push->used = MAX(stats->used, op->ukey->stats.used);
push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags;
push->n_packets = stats->n_packets - op->ukey->stats.n_packets;
push->n_bytes = stats->n_bytes - op->ukey->stats.n_bytes;
+ ovs_mutex_unlock(&op->ukey->mutex);
} else {
push = stats;
}
struct ofproto_dpif *ofproto;
struct netflow *netflow;
struct flow flow;
+ bool may_learn;
+
+ may_learn = push->n_packets > 0;
+ if (op->ukey) {
+ ovs_mutex_lock(&op->ukey->mutex);
+ if (op->ukey->xcache) {
+ xlate_push_stats(op->ukey->xcache, may_learn, push);
+ ovs_mutex_unlock(&op->ukey->mutex);
+ continue;
+ }
+ ovs_mutex_unlock(&op->ukey->mutex);
+ }
if (!xlate_receive(udpif->backer, NULL, op->op.u.flow_del.key,
op->op.u.flow_del.key_len, &flow, &ofproto,
xlate_in_init(&xin, ofproto, &flow, NULL, push->tcp_flags,
NULL);
xin.resubmit_stats = push->n_packets ? push : NULL;
- xin.may_learn = push->n_packets > 0;
+ xin.may_learn = may_learn;
xin.skip_wildcards = true;
xlate_actions_for_side_effects(&xin);
}
}
}
+}
- for (i = 0; i < n_ops; i++) {
- struct udpif_key *ukey;
+static void
+push_dump_ops(struct revalidator *revalidator,
+ struct dump_op *ops, size_t n_ops)
+{
+ int i;
- /* If there's a udump, this ukey came directly from a datapath flow
- * dump. Sometimes a datapath can send duplicates in flow dumps, in
- * which case we wouldn't want to double-free a ukey, so avoid that by
- * looking up the ukey again.
- *
- * If there's no udump then we know what we're doing. */
- ukey = (ops[i].udump
- ? ukey_lookup(revalidator, ops[i].udump)
- : ops[i].ukey);
- if (ukey) {
- ukey_delete(revalidator, ukey);
- }
+ push_dump_ops__(revalidator->udpif, ops, n_ops);
+ for (i = 0; i < n_ops; i++) {
+ ukey_delete(revalidator, ops[i].ukey);
}
}
static void
-revalidate_udumps(struct revalidator *revalidator, struct list *udumps)
+revalidate(struct revalidator *revalidator)
{
struct udpif *udpif = revalidator->udpif;
struct dump_op ops[REVALIDATE_MAX_BATCH];
- struct udpif_flow_dump *udump, *next_udump;
- size_t n_ops, n_flows;
+ const struct nlattr *key, *mask, *actions;
+ size_t key_len, mask_len, actions_len;
+ const struct dpif_flow_stats *stats;
+ long long int now;
unsigned int flow_limit;
- long long int max_idle;
- bool must_del;
+ size_t n_ops;
+ void *state;
+ n_ops = 0;
+ now = time_msec();
atomic_read(&udpif->flow_limit, &flow_limit);
- n_flows = udpif_get_n_flows(udpif);
-
- must_del = false;
- max_idle = ofproto_max_idle;
- if (n_flows > flow_limit) {
- must_del = n_flows > 2 * flow_limit;
- max_idle = 100;
- }
-
- n_ops = 0;
- LIST_FOR_EACH_SAFE (udump, next_udump, list_node, udumps) {
- long long int used, now;
+ dpif_flow_dump_state_init(udpif->dpif, &state);
+ while (dpif_flow_dump_next(&udpif->dump, state, &key, &key_len, &mask,
+ &mask_len, &actions, &actions_len, &stats)) {
struct udpif_key *ukey;
+ bool mark, may_destroy;
+ long long int used, max_idle;
+ uint32_t hash;
+ size_t n_flows;
- now = time_msec();
- ukey = ukey_lookup(revalidator, udump);
+ hash = hash_bytes(key, key_len, udpif->secret);
+ ukey = ukey_lookup(udpif, key, key_len, hash);
- used = udump->stats.used;
+ used = stats->used;
if (!used && ukey) {
+ ovs_mutex_lock(&ukey->mutex);
+
+ if (ukey->mark || !ukey->flow_exists) {
+ /* The flow has already been dumped. This can occasionally
+ * occur if the datapath is changed in the middle of a flow
+ * dump. Rather than perform the same work twice, skip the
+ * flow this time. */
+ ovs_mutex_unlock(&ukey->mutex);
+ COVERAGE_INC(upcall_duplicate_flow);
+ continue;
+ }
+
used = ukey->created;
+ ovs_mutex_unlock(&ukey->mutex);
}
- if (must_del || (used && used < now - max_idle)) {
- struct dump_op *dop = &ops[n_ops++];
+ n_flows = udpif_get_n_flows(udpif);
+ max_idle = ofproto_max_idle;
+ if (n_flows > flow_limit) {
+ max_idle = 100;
+ }
- dump_op_init(dop, udump->key, udump->key_len, ukey, udump);
- continue;
+ if ((used && used < now - max_idle) || n_flows > flow_limit * 2) {
+ mark = false;
+ } else {
+ if (!ukey) {
+ ukey = ukey_create(key, key_len, used);
+ if (!udpif_insert_ukey(udpif, ukey, hash)) {
+ /* The same ukey has already been created. This means that
+ * another revalidator is processing this flow
+ * concurrently, so don't bother processing it. */
+ ukey_delete(NULL, ukey);
+ continue;
+ }
+ }
+
+ mark = revalidate_ukey(udpif, ukey, mask, mask_len, actions,
+ actions_len, stats);
}
- if (!ukey) {
- ukey = ukey_create(udump->key, udump->key_len, used);
- hmap_insert(&revalidator->ukeys, &ukey->hmap_node,
- udump->key_hash);
+ if (ukey) {
+ ovs_mutex_lock(&ukey->mutex);
+ ukey->mark = ukey->flow_exists = mark;
+ ovs_mutex_unlock(&ukey->mutex);
}
- ukey->mark = true;
- if (!revalidate_ukey(udpif, udump, ukey)) {
- dpif_flow_del(udpif->dpif, udump->key, udump->key_len, NULL);
- ukey_delete(revalidator, ukey);
+ if (!mark) {
+ dump_op_init(&ops[n_ops++], key, key_len, ukey);
}
- list_remove(&udump->list_node);
- free(udump);
- }
+ may_destroy = dpif_flow_dump_next_may_destroy_keys(&udpif->dump,
+ state);
- push_dump_ops(revalidator, ops, n_ops);
+ /* Only update 'now' immediately before 'buffer' will be updated.
+ * This gives us the current time relative to the time the datapath
+ * will write into 'stats'. */
+ if (may_destroy) {
+ now = time_msec();
+ }
+
+ /* Only do a dpif_operate when we've hit our maximum batch, or when our
+ * memory is about to be clobbered by the next call to
+ * dpif_flow_dump_next(). */
+ if (n_ops == REVALIDATE_MAX_BATCH || (n_ops && may_destroy)) {
+ push_dump_ops__(udpif, ops, n_ops);
+ n_ops = 0;
+ }
+ }
- LIST_FOR_EACH_SAFE (udump, next_udump, list_node, udumps) {
- list_remove(&udump->list_node);
- free(udump);
+ if (n_ops) {
+ push_dump_ops__(udpif, ops, n_ops);
}
+
+ dpif_flow_dump_state_uninit(udpif->dpif, state);
}
static void
revalidator_sweep__(struct revalidator *revalidator, bool purge)
+ OVS_NO_THREAD_SAFETY_ANALYSIS
{
struct dump_op ops[REVALIDATE_MAX_BATCH];
struct udpif_key *ukey, *next;
n_ops = 0;
- HMAP_FOR_EACH_SAFE (ukey, next, hmap_node, &revalidator->ukeys) {
+ /* During garbage collection, this revalidator completely owns its ukeys
+ * map, and therefore doesn't need to do any locking. */
+ HMAP_FOR_EACH_SAFE (ukey, next, hmap_node, revalidator->ukeys) {
if (!purge && ukey->mark) {
ukey->mark = false;
+ } else if (!ukey->flow_exists) {
+ ukey_delete(revalidator, ukey);
} else {
struct dump_op *op = &ops[n_ops++];
/* If we have previously seen a flow in the datapath, but didn't
* see it during the most recent dump, delete it. This allows us
* to clean up the ukey and keep the statistics consistent. */
- dump_op_init(op, ukey->key, ukey->key_len, ukey, NULL);
+ dump_op_init(op, ukey->key, ukey->key_len, ukey);
if (n_ops == REVALIDATE_MAX_BATCH) {
push_dump_ops(revalidator, ops, n_ops);
n_ops = 0;
udpif->avg_n_flows, udpif->max_n_flows, flow_limit);
ds_put_format(&ds, "\tdump duration : %lldms\n", udpif->dump_duration);
- ds_put_char(&ds, '\n');
- for (i = 0; i < udpif->n_handlers; i++) {
- struct handler *handler = &udpif->handlers[i];
-
- ovs_mutex_lock(&handler->mutex);
- ds_put_format(&ds, "\t%s: (upcall queue %"PRIuSIZE")\n",
- handler->name, handler->n_upcalls);
- ovs_mutex_unlock(&handler->mutex);
- }
-
ds_put_char(&ds, '\n');
for (i = 0; i < n_revalidators; i++) {
struct revalidator *revalidator = &udpif->revalidators[i];
- /* XXX: The result of hmap_count(&revalidator->ukeys) may not be
- * accurate because it's not protected by the revalidator mutex. */
- ovs_mutex_lock(&revalidator->mutex);
- ds_put_format(&ds, "\t%s: (dump queue %"PRIuSIZE") (keys %"PRIuSIZE
- ")\n", revalidator->name, revalidator->n_udumps,
- hmap_count(&revalidator->ukeys));
- ovs_mutex_unlock(&revalidator->mutex);
+ ovs_mutex_lock(&udpif->ukeys[i].mutex);
+ ds_put_format(&ds, "\t%u: (keys %"PRIuSIZE")\n",
+ revalidator->id, hmap_count(&udpif->ukeys[i].hmap));
+ ovs_mutex_unlock(&udpif->ukeys[i].mutex);
}
}
uint16_t user_cookie_offset;/* Used for user_action_cookie fixup. */
bool exit; /* No further actions should be processed. */
+ bool use_recirc; /* Should generate recirc? */
+ struct xlate_recirc recirc; /* Information used for generating
+ * recirculation actions */
+
/* OpenFlow 1.1+ action set.
*
* 'action_set' accumulates "struct ofpact"s added by OFPACT_WRITE_ACTIONS.
uint8_t dscp; /* DSCP bits to mark outgoing traffic with. */
};
+enum xc_type {
+ XC_RULE,
+ XC_BOND,
+ XC_NETDEV,
+ XC_NETFLOW,
+ XC_MIRROR,
+ XC_LEARN,
+ XC_NORMAL,
+ XC_FIN_TIMEOUT,
+};
+
+/* xlate_cache entries hold enough information to perform the side effects of
+ * xlate_actions() for a rule, without needing to perform rule translation
+ * from scratch. The primary usage of these is to submit statistics to objects
+ * that a flow relates to, although they may be used for other effects as well
+ * (for instance, refreshing hard timeouts for learned flows). */
+struct xc_entry {
+ enum xc_type type;
+ union {
+ struct rule_dpif *rule;
+ struct {
+ struct netdev *tx;
+ struct netdev *rx;
+ struct bfd *bfd;
+ } dev;
+ struct {
+ struct netflow *netflow;
+ struct flow *flow;
+ ofp_port_t iface;
+ } nf;
+ struct {
+ struct mbridge *mbridge;
+ mirror_mask_t mirrors;
+ } mirror;
+ struct {
+ struct bond *bond;
+ struct flow *flow;
+ uint16_t vid;
+ } bond;
+ struct {
+ struct ofproto_dpif *ofproto;
+ struct rule_dpif *rule;
+ } learn;
+ struct {
+ struct ofproto_dpif *ofproto;
+ struct flow *flow;
+ int vlan;
+ } normal;
+ struct {
+ struct rule_dpif *rule;
+ uint16_t idle;
+ uint16_t hard;
+ } fin;
+ } u;
+};
+
+#define XC_ENTRY_FOR_EACH(entry, entries, xcache) \
+ entries = xcache->entries; \
+ for (entry = ofpbuf_try_pull(&entries, sizeof *entry); \
+ entry; \
+ entry = ofpbuf_try_pull(&entries, sizeof *entry))
+
+struct xlate_cache {
+ struct ofpbuf entries;
+};
+
static struct hmap xbridges = HMAP_INITIALIZER(&xbridges);
static struct hmap xbundles = HMAP_INITIALIZER(&xbundles);
static struct hmap xports = HMAP_INITIALIZER(&xports);
static bool dscp_from_skb_priority(const struct xport *, uint32_t skb_priority,
uint8_t *dscp);
+static struct xc_entry *xlate_cache_add_entry(struct xlate_cache *xc,
+ enum xc_type type);
+
void
xlate_ofproto_set(struct ofproto_dpif *ofproto, const char *name,
struct dpif *dpif, struct rule_dpif *miss_rule,
static bool
stp_should_process_flow(const struct flow *flow, struct flow_wildcards *wc)
{
+ /* is_stp() also checks dl_type, but dl_type is always set in 'wc'. */
memset(&wc->masks.dl_dst, 0xff, sizeof wc->masks.dl_dst);
- return eth_addr_equals(flow->dl_dst, eth_addr_stp);
+ return is_stp(flow);
}
static void
return xport->xbundle;
}
- /* Special-case OFPP_NONE, which a controller may use as the ingress
- * port for traffic that it is sourcing. */
- if (in_port == OFPP_NONE) {
+ /* Special-case OFPP_NONE (OF1.0) and OFPP_CONTROLLER (OF1.1+),
+ * which a controller may use as the ingress port for traffic that
+ * it is sourcing. */
+ if (in_port == OFPP_CONTROLLER || in_port == OFPP_NONE) {
return &ofpp_none_bundle;
}
/* Partially configured bundle with no slaves. Drop the packet. */
return;
} else if (!out_xbundle->bond) {
- ctx->xout->use_recirc = false;
+ ctx->use_recirc = false;
xport = CONTAINER_OF(list_front(&out_xbundle->xports), struct xport,
bundle_node);
} else {
struct ofport_dpif *ofport;
- struct xlate_recirc *xr = &ctx->xout->recirc;
+ struct xlate_recirc *xr = &ctx->recirc;
struct flow_wildcards *wc = &ctx->xout->wc;
if (ctx->xbridge->enable_recirc) {
- ctx->xout->use_recirc = bond_may_recirc(
- out_xbundle->bond, &xr->recirc_id, &xr->hash_bias);
+ ctx->use_recirc = bond_may_recirc(
+ out_xbundle->bond, &xr->recirc_id, &xr->hash_basis);
- if (ctx->xout->use_recirc) {
+ if (ctx->use_recirc) {
/* Only TCP mode uses recirculation. */
- xr->hash_alg = OVS_RECIRC_HASH_ALG_L4;
+ xr->hash_alg = OVS_HASH_ALG_L4;
bond_update_post_recirc_rules(out_xbundle->bond, false);
/* Recirculation does not require unmasking hash fields. */
return;
}
- if (ctx->xin->resubmit_stats) {
- bond_account(out_xbundle->bond, &ctx->xin->flow, vid,
- ctx->xin->resubmit_stats->n_bytes);
+ /* If ctx->xout->use_recirc is set, the main thread will handle stats
+ * accounting for this bond. */
+ if (!ctx->use_recirc) {
+ if (ctx->xin->resubmit_stats) {
+ bond_account(out_xbundle->bond, &ctx->xin->flow, vid,
+ ctx->xin->resubmit_stats->n_bytes);
+ }
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+ struct flow *flow;
+
+ flow = &ctx->xin->flow;
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_BOND);
+ entry->u.bond.bond = bond_ref(out_xbundle->bond);
+ entry->u.bond.flow = xmemdup(flow, sizeof *flow);
+ entry->u.bond.vid = vid;
+ }
}
}
if (ctx->xin->may_learn) {
update_learning_table(ctx->xbridge, flow, wc, vlan, in_xbundle);
}
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+
+ /* Save enough info to update mac learning table later. */
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
+ entry->u.normal.ofproto = ctx->xin->ofproto;
+ entry->u.normal.flow = xmemdup(flow, sizeof *flow);
+ entry->u.normal.vlan = vlan;
+ }
/* Determine output bundle. */
ovs_rwlock_rdlock(&ctx->xbridge->ml->rwlock);
actions_offset = nl_msg_start_nested(odp_actions, OVS_SAMPLE_ATTR_ACTIONS);
odp_port = ofp_port_to_odp_port(xbridge, flow->in_port.ofp_port);
- pid = dpif_port_get_pid(xbridge->dpif, odp_port, 0);
- cookie_offset = odp_put_userspace_action(pid, cookie, cookie_size, odp_actions);
+ pid = dpif_port_get_pid(xbridge->dpif, odp_port,
+ flow_hash_5tuple(flow, 0));
+ cookie_offset = odp_put_userspace_action(pid, cookie, cookie_size,
+ odp_actions);
nl_msg_end_nested(odp_actions, actions_offset);
nl_msg_end_nested(odp_actions, sample_offset);
bfd_process_packet(xport->bfd, flow, packet);
/* If POLL received, immediately sends FINAL back. */
if (bfd_should_send_packet(xport->bfd)) {
- if (xport->peer) {
- ofproto_dpif_monitor_port_send_soon(xport->ofport);
- } else {
- ofproto_dpif_monitor_port_send_soon_safe(xport->ofport);
- }
+ ofproto_dpif_monitor_port_send_soon(xport->ofport);
}
}
return SLOW_BFD;
/* If 'struct flow' gets additional metadata, we'll need to zero it out
* before traversing a patch port. */
- BUILD_ASSERT_DECL(FLOW_WC_SEQ == 25);
+ BUILD_ASSERT_DECL(FLOW_WC_SEQ == 26);
if (!xport) {
xlate_report(ctx, "Nonexistent output port");
xlate_report(ctx, "OFPPC_NO_FWD set, skipping output");
return;
} else if (check_stp) {
- if (eth_addr_equals(ctx->base_flow.dl_dst, eth_addr_stp)) {
+ if (is_stp(&ctx->base_flow)) {
if (!xport_stp_listen_state(xport)) {
xlate_report(ctx, "STP not in listening state, "
"skipping bpdu output");
bfd_account_rx(peer->bfd, ctx->xin->resubmit_stats);
}
}
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NETDEV);
+ entry->u.dev.tx = netdev_ref(xport->netdev);
+ entry->u.dev.rx = netdev_ref(peer->netdev);
+ entry->u.dev.bfd = bfd_ref(peer->bfd);
+ }
return;
}
if (ctx->xin->resubmit_stats) {
netdev_vport_inc_tx(xport->netdev, ctx->xin->resubmit_stats);
}
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NETDEV);
+ entry->u.dev.tx = netdev_ref(xport->netdev);
+ }
out_port = odp_port;
commit_odp_tunnel_action(flow, &ctx->base_flow,
&ctx->xout->odp_actions);
&ctx->xout->odp_actions,
&ctx->xout->wc);
- if (ctx->xout->use_recirc) {
- struct ovs_action_recirc *act_recirc;
- struct xlate_recirc *xr = &ctx->xout->recirc;
+ if (ctx->use_recirc) {
+ struct ovs_action_hash *act_hash;
+ struct xlate_recirc *xr = &ctx->recirc;
+
+ /* Hash action. */
+ act_hash = nl_msg_put_unspec_uninit(&ctx->xout->odp_actions,
+ OVS_ACTION_ATTR_HASH,
+ sizeof *act_hash);
+ act_hash->hash_alg = xr->hash_alg;
+ act_hash->hash_basis = xr->hash_basis;
- act_recirc = nl_msg_put_unspec_uninit(&ctx->xout->odp_actions,
- OVS_ACTION_ATTR_RECIRC, sizeof *act_recirc);
- act_recirc->recirc_id = xr->recirc_id;
- act_recirc->hash_alg = xr->hash_alg;
- act_recirc->hash_bias = xr->hash_bias;
+ /* Recirc action. */
+ nl_msg_put_u32(&ctx->xout->odp_actions, OVS_ACTION_ATTR_RECIRC,
+ xr->recirc_id);
} else {
nl_msg_put_odp_port(&ctx->xout->odp_actions, OVS_ACTION_ATTR_OUTPUT,
out_port);
xlate_recursively(struct xlate_ctx *ctx, struct rule_dpif *rule)
{
struct rule_dpif *old_rule = ctx->rule;
- struct rule_actions *actions;
+ const struct rule_actions *actions;
if (ctx->xin->resubmit_stats) {
rule_dpif_credit_stats(rule, ctx->xin->resubmit_stats);
!skip_wildcards
? &ctx->xout->wc : NULL,
honor_table_miss,
- &ctx->table_id, &rule);
+ &ctx->table_id, &rule,
+ ctx->xin->xcache != NULL);
ctx->xin->flow.in_port.ofp_port = old_in_port;
if (ctx->xin->resubmit_hook) {
}
choose_miss_rule(config, ctx->xbridge->miss_rule,
- ctx->xbridge->no_packet_in_rule, &rule);
+ ctx->xbridge->no_packet_in_rule, &rule,
+ ctx->xin->xcache != NULL);
match:
if (rule) {
+ /* Fill in the cache entry here instead of xlate_recursively
+ * to make the reference counting more explicit. We take a
+ * reference in the lookups above if we are going to cache the
+ * rule. */
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE);
+ entry->u.rule = rule;
+ }
xlate_recursively(ctx, rule);
- rule_dpif_unref(rule);
}
ctx->table_id = old_table_id;
learn_execute(learn, &ctx->xin->flow, &fm, &ofpacts);
ofproto_dpif_flow_mod(ctx->xbridge->ofproto, &fm);
ofpbuf_uninit(&ofpacts);
+
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN);
+ entry->u.learn.ofproto = ctx->xin->ofproto;
+ /* Lookup the learned rule, taking a reference on it. The reference
+ * is released when this cache entry is deleted. */
+ rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL,
+ &entry->u.learn.rule, true);
+ }
+}
+
+static void
+xlate_fin_timeout__(struct rule_dpif *rule, uint16_t tcp_flags,
+ uint16_t idle_timeout, uint16_t hard_timeout)
+{
+ if (tcp_flags & (TCP_FIN | TCP_RST)) {
+ rule_dpif_reduce_timeouts(rule, idle_timeout, hard_timeout);
+ }
}
static void
xlate_fin_timeout(struct xlate_ctx *ctx,
const struct ofpact_fin_timeout *oft)
{
- if (ctx->xin->tcp_flags & (TCP_FIN | TCP_RST) && ctx->rule) {
- rule_dpif_reduce_timeouts(ctx->rule, oft->fin_idle_timeout,
- oft->fin_hard_timeout);
+ if (ctx->rule) {
+ xlate_fin_timeout__(ctx->rule, ctx->xin->tcp_flags,
+ oft->fin_idle_timeout, oft->fin_hard_timeout);
+ if (ctx->xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx->xin->xcache, XC_FIN_TIMEOUT);
+ /* XC_RULE already holds a reference on the rule, none is taken
+ * here. */
+ entry->u.fin.rule = ctx->rule;
+ entry->u.fin.idle = oft->fin_idle_timeout;
+ entry->u.fin.hard = oft->fin_hard_timeout;
+ }
}
}
static bool
may_receive(const struct xport *xport, struct xlate_ctx *ctx)
{
- if (xport->config & (eth_addr_equals(ctx->xin->flow.dl_dst, eth_addr_stp)
+ if (xport->config & (is_stp(&ctx->xin->flow)
? OFPUTIL_PC_NO_RECV_STP
: OFPUTIL_PC_NO_RECV)) {
return false;
xin->packet = packet;
xin->may_learn = packet != NULL;
xin->rule = rule;
+ xin->xcache = NULL;
xin->ofpacts = NULL;
xin->ofpacts_len = 0;
xin->tcp_flags = tcp_flags;
struct flow *flow = &xin->flow;
struct rule_dpif *rule = NULL;
- struct rule_actions *actions = NULL;
+ const struct rule_actions *actions = NULL;
enum slow_path_reason special;
const struct ofpact *ofpacts;
struct xport *in_port;
ctx.xbridge = xbridge_lookup(xin->ofproto);
if (!ctx.xbridge) {
- goto out;
+ return;
}
ctx.rule = xin->rule;
ctx.orig_skb_priority = flow->skb_priority;
ctx.table_id = 0;
ctx.exit = false;
+ ctx.use_recirc = false;
if (!xin->ofpacts && !ctx.rule) {
ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow,
!xin->skip_wildcards ? wc : NULL,
- &rule);
+ &rule, ctx.xin->xcache != NULL);
if (ctx.xin->resubmit_stats) {
rule_dpif_credit_stats(rule, ctx.xin->resubmit_stats);
}
+ if (ctx.xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx.xin->xcache, XC_RULE);
+ entry->u.rule = rule;
+ }
ctx.rule = rule;
}
xout->fail_open = ctx.rule && rule_dpif_is_fail_open(ctx.rule);
- xout->use_recirc = false;
if (xin->ofpacts) {
ofpacts = xin->ofpacts;
break;
case OFPC_FRAG_DROP:
- goto out;
+ return;
case OFPC_FRAG_REASM:
OVS_NOT_REACHED();
}
in_port = get_ofp_port(ctx.xbridge, flow->in_port.ofp_port);
- if (in_port && in_port->is_tunnel && ctx.xin->resubmit_stats) {
- netdev_vport_inc_rx(in_port->netdev, ctx.xin->resubmit_stats);
- if (in_port->bfd) {
- bfd_account_rx(in_port->bfd, ctx.xin->resubmit_stats);
+ if (in_port && in_port->is_tunnel) {
+ if (ctx.xin->resubmit_stats) {
+ netdev_vport_inc_rx(in_port->netdev, ctx.xin->resubmit_stats);
+ if (in_port->bfd) {
+ bfd_account_rx(in_port->bfd, ctx.xin->resubmit_stats);
+ }
+ }
+ if (ctx.xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx.xin->xcache, XC_NETDEV);
+ entry->u.dev.rx = netdev_ref(in_port->netdev);
+ entry->u.dev.bfd = bfd_ref(in_port->bfd);
}
}
ctx.xout->slow |= SLOW_ACTION;
}
- if (ctx.xin->resubmit_stats) {
- mirror_update_stats(ctx.xbridge->mbridge, xout->mirrors,
- ctx.xin->resubmit_stats->n_packets,
- ctx.xin->resubmit_stats->n_bytes);
-
- if (ctx.xbridge->netflow) {
- const struct ofpact *ofpacts;
- size_t ofpacts_len;
-
- ofpacts_len = actions->ofpacts_len;
- ofpacts = actions->ofpacts;
- if (ofpacts_len == 0
- || ofpacts->type != OFPACT_CONTROLLER
- || ofpact_next(ofpacts) < ofpact_end(ofpacts, ofpacts_len)) {
- /* Only update netflow if we don't have controller flow. We don't
- * report NetFlow expiration messages for such facets because they
- * are just part of the control logic for the network, not real
- * traffic. */
+ if (mbridge_has_mirrors(ctx.xbridge->mbridge)) {
+ if (ctx.xin->resubmit_stats) {
+ mirror_update_stats(ctx.xbridge->mbridge, xout->mirrors,
+ ctx.xin->resubmit_stats->n_packets,
+ ctx.xin->resubmit_stats->n_bytes);
+ }
+ if (ctx.xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx.xin->xcache, XC_MIRROR);
+ entry->u.mirror.mbridge = mbridge_ref(ctx.xbridge->mbridge);
+ entry->u.mirror.mirrors = xout->mirrors;
+ }
+ }
+
+ if (ctx.xbridge->netflow) {
+ const struct ofpact *ofpacts = actions->ofpacts;
+ size_t ofpacts_len = actions->ofpacts_len;
+
+ /* Only update netflow if we don't have controller flow. We don't
+ * report NetFlow expiration messages for such facets because they
+ * are just part of the control logic for the network, not real
+ * traffic. */
+ if (ofpacts_len == 0
+ || ofpacts->type != OFPACT_CONTROLLER
+ || ofpact_next(ofpacts) < ofpact_end(ofpacts, ofpacts_len)) {
+ if (ctx.xin->resubmit_stats) {
netflow_flow_update(ctx.xbridge->netflow, flow,
xout->nf_output_iface,
ctx.xin->resubmit_stats);
}
+ if (ctx.xin->xcache) {
+ struct xc_entry *entry;
+
+ entry = xlate_cache_add_entry(ctx.xin->xcache, XC_NETFLOW);
+ entry->u.nf.netflow = netflow_ref(ctx.xbridge->netflow);
+ entry->u.nf.flow = xmemdup(flow, sizeof *flow);
+ entry->u.nf.iface = xout->nf_output_iface;
+ }
}
}
wc->masks.tp_src &= htons(UINT8_MAX);
wc->masks.tp_dst &= htons(UINT8_MAX);
}
-
-out:
- rule_dpif_unref(rule);
}
/* Sends 'packet' out 'ofport'.
&output.ofpact, sizeof output,
packet);
}
+
+struct xlate_cache *
+xlate_cache_new(void)
+{
+ struct xlate_cache *xcache = xmalloc(sizeof *xcache);
+
+ ofpbuf_init(&xcache->entries, 512);
+ return xcache;
+}
+
+static struct xc_entry *
+xlate_cache_add_entry(struct xlate_cache *xcache, enum xc_type type)
+{
+ struct xc_entry *entry;
+
+ entry = ofpbuf_put_zeros(&xcache->entries, sizeof *entry);
+ entry->type = type;
+
+ return entry;
+}
+
+static void
+xlate_cache_netdev(struct xc_entry *entry, const struct dpif_flow_stats *stats)
+{
+ if (entry->u.dev.tx) {
+ netdev_vport_inc_tx(entry->u.dev.tx, stats);
+ }
+ if (entry->u.dev.rx) {
+ netdev_vport_inc_rx(entry->u.dev.rx, stats);
+ }
+ if (entry->u.dev.bfd) {
+ bfd_account_rx(entry->u.dev.bfd, stats);
+ }
+}
+
+static void
+xlate_cache_normal(struct ofproto_dpif *ofproto, struct flow *flow, int vlan)
+{
+ struct xbridge *xbridge;
+ struct xbundle *xbundle;
+ struct flow_wildcards wc;
+
+ xbridge = xbridge_lookup(ofproto);
+ if (!xbridge) {
+ return;
+ }
+
+ xbundle = lookup_input_bundle(xbridge, flow->in_port.ofp_port, false,
+ NULL);
+ if (!xbundle) {
+ return;
+ }
+
+ update_learning_table(xbridge, flow, &wc, vlan, xbundle);
+}
+
+/* Push stats and perform side effects of flow translation. */
+void
+xlate_push_stats(struct xlate_cache *xcache, bool may_learn,
+ const struct dpif_flow_stats *stats)
+{
+ struct xc_entry *entry;
+ struct ofpbuf entries = xcache->entries;
+
+ XC_ENTRY_FOR_EACH (entry, entries, xcache) {
+ switch (entry->type) {
+ case XC_RULE:
+ rule_dpif_credit_stats(entry->u.rule, stats);
+ break;
+ case XC_BOND:
+ bond_account(entry->u.bond.bond, entry->u.bond.flow,
+ entry->u.bond.vid, stats->n_bytes);
+ break;
+ case XC_NETDEV:
+ xlate_cache_netdev(entry, stats);
+ break;
+ case XC_NETFLOW:
+ netflow_flow_update(entry->u.nf.netflow, entry->u.nf.flow,
+ entry->u.nf.iface, stats);
+ break;
+ case XC_MIRROR:
+ mirror_update_stats(entry->u.mirror.mbridge,
+ entry->u.mirror.mirrors,
+ stats->n_packets, stats->n_bytes);
+ break;
+ case XC_LEARN:
+ if (may_learn) {
+ struct rule_dpif *rule = entry->u.learn.rule;
+
+ /* Reset the modified time for a rule that is equivalent to
+ * the currently cached rule. If the rule is not the exact
+ * rule we have cached, update the reference that we have. */
+ entry->u.learn.rule = ofproto_dpif_refresh_rule(rule);
+ }
+ break;
+ case XC_NORMAL:
+ xlate_cache_normal(entry->u.normal.ofproto, entry->u.normal.flow,
+ entry->u.normal.vlan);
+ break;
+ case XC_FIN_TIMEOUT:
+ xlate_fin_timeout__(entry->u.fin.rule, stats->tcp_flags,
+ entry->u.fin.idle, entry->u.fin.hard);
+ break;
+ default:
+ OVS_NOT_REACHED();
+ }
+ }
+}
+
+static void
+xlate_dev_unref(struct xc_entry *entry)
+{
+ if (entry->u.dev.tx) {
+ netdev_close(entry->u.dev.tx);
+ }
+ if (entry->u.dev.rx) {
+ netdev_close(entry->u.dev.rx);
+ }
+ if (entry->u.dev.bfd) {
+ bfd_unref(entry->u.dev.bfd);
+ }
+}
+
+static void
+xlate_cache_clear_netflow(struct netflow *netflow, struct flow *flow)
+{
+ netflow_expire(netflow, flow);
+ netflow_flow_clear(netflow, flow);
+ netflow_unref(netflow);
+ free(flow);
+}
+
+void
+xlate_cache_clear(struct xlate_cache *xcache)
+{
+ struct xc_entry *entry;
+ struct ofpbuf entries;
+
+ if (!xcache) {
+ return;
+ }
+
+ XC_ENTRY_FOR_EACH (entry, entries, xcache) {
+ switch (entry->type) {
+ case XC_RULE:
+ rule_dpif_unref(entry->u.rule);
+ break;
+ case XC_BOND:
+ free(entry->u.bond.flow);
+ bond_unref(entry->u.bond.bond);
+ break;
+ case XC_NETDEV:
+ xlate_dev_unref(entry);
+ break;
+ case XC_NETFLOW:
+ xlate_cache_clear_netflow(entry->u.nf.netflow, entry->u.nf.flow);
+ break;
+ case XC_MIRROR:
+ mbridge_unref(entry->u.mirror.mbridge);
+ break;
+ case XC_LEARN:
+ /* 'u.learn.rule' is the learned rule. */
+ rule_dpif_unref(entry->u.learn.rule);
+ break;
+ case XC_NORMAL:
+ free(entry->u.normal.flow);
+ break;
+ case XC_FIN_TIMEOUT:
+ /* 'u.fin.rule' is always already held as a XC_RULE, which
+ * has already released it's reference above. */
+ break;
+ default:
+ OVS_NOT_REACHED();
+ }
+ }
+
+ ofpbuf_clear(&xcache->entries);
+}
+
+void
+xlate_cache_delete(struct xlate_cache *xcache)
+{
+ xlate_cache_clear(xcache);
+ ofpbuf_uninit(&xcache->entries);
+ free(xcache);
+}
struct dpif_ipfix;
struct dpif_sflow;
struct mac_learning;
+struct xlate_cache;
struct xlate_recirc {
uint32_t recirc_id; /* !0 Use recirculation instead of output. */
uint8_t hash_alg; /* !0 Compute hash for recirc before. */
- uint32_t hash_bias; /* Compute hash for recirc before. */
+ uint32_t hash_basis; /* Compute hash for recirc before. */
};
struct xlate_out {
ofp_port_t nf_output_iface; /* Output interface index for NetFlow. */
mirror_mask_t mirrors; /* Bitmap of associated mirrors. */
- bool use_recirc; /* Should generate recirc? */
- struct xlate_recirc recirc; /* Information used for generating
- * recirculation actions */
uint64_t odp_actions_stub[256 / 8];
struct ofpbuf odp_actions;
};
* This is normally null so the client has to set it manually after
* calling xlate_in_init(). */
const struct dpif_flow_stats *resubmit_stats;
+
+ /* If nonnull, flow translation populates this cache with references to all
+ * modules that are affected by translation. This 'xlate_cache' may be
+ * passed to xlate_push_stats() to perform the same function as
+ * xlate_actions() without the full cost of translation.
+ *
+ * This is normally null so the client has to set it manually after
+ * calling xlate_in_init(). */
+ struct xlate_cache *xcache;
};
extern struct ovs_rwlock xlate_rwlock;
int xlate_send_packet(const struct ofport_dpif *, struct ofpbuf *);
+struct xlate_cache *xlate_cache_new(void);
+void xlate_push_stats(struct xlate_cache *, bool may_learn,
+ const struct dpif_flow_stats *);
+void xlate_cache_clear(struct xlate_cache *);
+void xlate_cache_delete(struct xlate_cache *);
+
#endif /* ofproto-dpif-xlate.h */
COVERAGE_DEFINE(ofproto_dpif_expired);
COVERAGE_DEFINE(packet_in_overflow);
-/* Number of implemented OpenFlow tables. */
-enum { N_TABLES = 255 };
-enum { TBL_INTERNAL = N_TABLES - 1 }; /* Used for internal hidden rules. */
-BUILD_ASSERT_DECL(N_TABLES >= 2 && N_TABLES <= 255);
+/* No bfd/cfm status change. */
+#define NO_STATUS_CHANGE -1
struct flow_miss;
struct dpif_flow_stats stats OVS_GUARDED;
};
+/* RULE_CAST() depends on this. */
+BUILD_ASSERT_DECL(offsetof(struct rule_dpif, up) == 0);
+
static void rule_get_stats(struct rule *, uint64_t *packets, uint64_t *bytes,
long long int *used);
static struct rule_dpif *rule_dpif_cast(const struct rule *);
/* Work queues. */
struct guarded_list pins; /* Contains "struct ofputil_packet_in"s. */
+ struct seq *pins_seq; /* For notifying 'pins' reception. */
+ uint64_t pins_seqno;
};
/* All existing ofproto_dpif instances, indexed by ->up.name. */
ofproto_flow_mod(&ofproto->up, fm);
}
+/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' is not
+ * in the classifier, but an equivalent rule is, unref 'rule' and ref the new
+ * rule. Otherwise if 'rule' is no longer installed in the classifier,
+ * reinstall it.
+ *
+ * Returns the rule whose modified time has been reset. */
+struct rule_dpif *
+ofproto_dpif_refresh_rule(struct rule_dpif *rule)
+{
+ return rule_dpif_cast(ofproto_refresh_rule(&rule->up));
+}
+
/* Appends 'pin' to the queue of "packet ins" to be sent to the controller.
* Takes ownership of 'pin' and pin->packet. */
void
free(CONST_CAST(void *, pin->up.packet));
free(pin);
}
+
+ /* Wakes up main thread for packet-in I/O. */
+ seq_change(ofproto->pins_seq);
}
/* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the
return error;
}
-/* Tests whether 'backer''s datapath supports recirculation Only newer datapath
- * supports OVS_KEY_ATTR in OVS_ACTION_ATTR_USERSPACE actions. We need to
- * disable some features on older datapaths that don't support this feature.
+/* Tests whether 'backer''s datapath supports recirculation. Only newer
+ * datapaths support OVS_KEY_ATTR_RECIRC_ID in keys. We need to disable some
+ * features on older datapaths that don't support this feature.
*
* Returns false if 'backer' definitely does not support recirculation, true if
* it seems to support recirculation or if at least the error we get is
flow.dp_hash = 1;
ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
- odp_flow_key_from_flow(&key, &flow, 0);
+ odp_flow_key_from_flow(&key, &flow, NULL, 0);
error = dpif_flow_put(backer->dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY,
ofpbuf_data(&key), ofpbuf_size(&key), NULL, 0, NULL,
flow_set_mpls_bos(&flow, n, 1);
ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
- odp_flow_key_from_flow(&key, &flow, 0);
+ odp_flow_key_from_flow(&key, &flow, NULL, 0);
error = dpif_flow_put(backer->dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY,
ofpbuf_data(&key), ofpbuf_size(&key), NULL, 0, NULL, 0, NULL);
sset_init(&ofproto->port_poll_set);
ofproto->port_poll_errno = 0;
ofproto->change_seq = 0;
+ ofproto->pins_seq = seq_create();
+ ofproto->pins_seqno = seq_read(ofproto->pins_seq);
+
SHASH_FOR_EACH_SAFE (node, next, &init_ofp_ports) {
struct iface_hint *iface_hint = node->data;
ovs_mutex_destroy(&ofproto->stats_mutex);
ovs_mutex_destroy(&ofproto->vsp_mutex);
+ seq_destroy(ofproto->pins_seq);
+
close_dpif_backer(ofproto->backer);
}
{
struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
uint64_t new_seq, new_dump_seq;
- const bool enable_recirc = ofproto_dpif_get_enable_recirc(ofproto);
if (mbridge_need_revalidate(ofproto->mbridge)) {
ofproto->backer->need_revalidate = REV_RECONFIGURE;
ovs_rwlock_unlock(&ofproto->ml->rwlock);
}
+ /* Always updates the ofproto->pins_seqno to avoid frequent wakeup during
+ * flow restore. Even though nothing is processed during flow restore,
+ * all queued 'pins' will be handled immediately when flow restore
+ * completes. */
+ ofproto->pins_seqno = seq_read(ofproto->pins_seq);
+
/* Do not perform any periodic activity required by 'ofproto' while
* waiting for flow restore to complete. */
if (!ofproto_get_flow_restore_wait()) {
/* All outstanding data in existing flows has been accounted, so it's a
* good time to do bond rebalancing. */
- if (enable_recirc && ofproto->has_bonded_bundles) {
+ if (ofproto->has_bonded_bundles) {
struct ofbundle *bundle;
HMAP_FOR_EACH (bundle, hmap_node, &ofproto->bundles) {
- struct bond *bond = bundle->bond;
-
- if (bond && bond_may_recirc(bond, NULL, NULL)) {
- bond_recirculation_account(bond);
- if (bond_rebalance(bundle->bond)) {
- bond_update_post_recirc_rules(bond, true);
- }
+ if (bundle->bond) {
+ bond_rebalance(bundle->bond);
}
}
}
}
seq_wait(udpif_dump_seq(ofproto->backer->udpif), ofproto->dump_seq);
+ seq_wait(ofproto->pins_seq, ofproto->pins_seqno);
}
static void
return error;
}
-static bool
+static int
get_cfm_status(const struct ofport *ofport_,
struct ofproto_cfm_status *status)
{
struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
+ int ret = 0;
if (ofport->cfm) {
- status->faults = cfm_get_fault(ofport->cfm);
- status->flap_count = cfm_get_flap_count(ofport->cfm);
- status->remote_opstate = cfm_get_opup(ofport->cfm);
- status->health = cfm_get_health(ofport->cfm);
- cfm_get_remote_mpids(ofport->cfm, &status->rmps, &status->n_rmps);
- return true;
+ if (cfm_check_status_change(ofport->cfm)) {
+ status->faults = cfm_get_fault(ofport->cfm);
+ status->flap_count = cfm_get_flap_count(ofport->cfm);
+ status->remote_opstate = cfm_get_opup(ofport->cfm);
+ status->health = cfm_get_health(ofport->cfm);
+ cfm_get_remote_mpids(ofport->cfm, &status->rmps, &status->n_rmps);
+ } else {
+ ret = NO_STATUS_CHANGE;
+ }
} else {
- return false;
+ ret = ENOENT;
}
+
+ return ret;
}
static int
get_bfd_status(struct ofport *ofport_, struct smap *smap)
{
struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
+ int ret = 0;
if (ofport->bfd) {
- bfd_get_status(ofport->bfd, smap);
- return 0;
+ if (bfd_check_status_change(ofport->bfd)) {
+ bfd_get_status(ofport->bfd, smap);
+ } else {
+ ret = NO_STATUS_CHANGE;
+ }
} else {
- return ENOENT;
+ ret = ENOENT;
}
+
+ return ret;
}
\f
/* Spanning Tree. */
ovs_mutex_unlock(&rule->stats_mutex);
}
-bool
-rule_dpif_is_fail_open(const struct rule_dpif *rule)
-{
- return is_fail_open_rule(&rule->up);
-}
-
-bool
-rule_dpif_is_table_miss(const struct rule_dpif *rule)
-{
- return rule_is_table_miss(&rule->up);
-}
-
-bool
-rule_dpif_is_internal(const struct rule_dpif *rule)
-{
- return rule_is_internal(&rule->up);
-}
-
ovs_be64
rule_dpif_get_flow_cookie(const struct rule_dpif *rule)
OVS_REQUIRES(rule->up.mutex)
/* Returns 'rule''s actions. The caller owns a reference on the returned
* actions and must eventually release it (with rule_actions_unref()) to avoid
* a memory leak. */
-struct rule_actions *
+const struct rule_actions *
rule_dpif_get_actions(const struct rule_dpif *rule)
{
return rule_get_actions(&rule->up);
}
-static uint8_t
-rule_dpif_lookup__ (struct ofproto_dpif *ofproto, const struct flow *flow,
- struct flow_wildcards *wc, struct rule_dpif **rule)
+/* Lookup 'flow' in table 0 of 'ofproto''s classifier.
+ * If 'wc' is non-null, sets the fields that were relevant as part of
+ * the lookup. Returns the table_id where a match or miss occurred.
+ *
+ * The return value will be zero unless there was a miss and
+ * OFPTC11_TABLE_MISS_CONTINUE is in effect for the sequence of tables
+ * where misses occur.
+ *
+ * The rule is returned in '*rule', which is valid at least until the next
+ * RCU quiescent period. If the '*rule' needs to stay around longer,
+ * a non-zero 'take_ref' must be passed in to cause a reference to be taken
+ * on it before this returns. */
+uint8_t
+rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow,
+ struct flow_wildcards *wc, struct rule_dpif **rule,
+ bool take_ref)
{
enum rule_dpif_lookup_verdict verdict;
enum ofputil_port_config config = 0;
- uint8_t table_id = TBL_INTERNAL;
+ uint8_t table_id;
+
+ if (ofproto_dpif_get_enable_recirc(ofproto)) {
+ /* Always exactly match recirc_id since datapath supports
+ * recirculation. */
+ if (wc) {
+ wc->masks.recirc_id = UINT32_MAX;
+ }
+
+ /* Start looking up from internal table for post recirculation flows
+ * or packets. We can also simply send all, including normal flows
+ * or packets to the internal table. They will not match any post
+ * recirculation rules except the 'catch all' rule that resubmit
+ * them to table 0.
+ *
+ * As an optimization, we send normal flows and packets to table 0
+ * directly, saving one table lookup. */
+ table_id = flow->recirc_id ? TBL_INTERNAL : 0;
+ } else {
+ table_id = 0;
+ }
verdict = rule_dpif_lookup_from_table(ofproto, flow, wc, true,
- &table_id, rule);
+ &table_id, rule, take_ref);
switch (verdict) {
case RULE_DPIF_LOOKUP_VERDICT_MATCH:
}
choose_miss_rule(config, ofproto->miss_rule,
- ofproto->no_packet_in_rule, rule);
+ ofproto->no_packet_in_rule, rule, take_ref);
return table_id;
}
-/* Lookup 'flow' in table 0 of 'ofproto''s classifier.
- * If 'wc' is non-null, sets the fields that were relevant as part of
- * the lookup. Returns the table_id where a match or miss occurred.
- *
- * The return value will be zero unless there was a miss and
- * O!-TC_TABLE_MISS_CONTINUE is in effect for the sequence of tables
- * where misses occur. */
-uint8_t
-rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow,
- struct flow_wildcards *wc, struct rule_dpif **rule)
-{
- /* Set metadata to the value of recirc_id to speed up internal
- * rule lookup. */
- flow->metadata = htonll(flow->recirc_id);
- return rule_dpif_lookup__(ofproto, flow, wc, rule);
-}
-
+/* The returned rule is valid at least until the next RCU quiescent period.
+ * If the '*rule' needs to stay around longer, a non-zero 'take_ref' must be
+ * passed in to cause a reference to be taken on it before this returns. */
static struct rule_dpif *
rule_dpif_lookup_in_table(struct ofproto_dpif *ofproto, uint8_t table_id,
- const struct flow *flow, struct flow_wildcards *wc)
+ const struct flow *flow, struct flow_wildcards *wc,
+ bool take_ref)
{
struct classifier *cls = &ofproto->up.tables[table_id].cls;
const struct cls_rule *cls_rule;
}
rule = rule_dpif_cast(rule_from_cls_rule(cls_rule));
- rule_dpif_ref(rule);
+ if (take_ref) {
+ rule_dpif_ref(rule);
+ }
fat_rwlock_unlock(&cls->rwlock);
return rule;
* - RULE_OFPTC_TABLE_MISS_CONTROLLER if no rule was found and either:
* + 'honor_table_miss' is false
* + a table miss configuration specified that the packet should be
+ * sent to the controller in this case.
*
* - RULE_DPIF_LOOKUP_VERDICT_DROP if no rule was found, 'honor_table_miss'
* is true and a table miss configuration specified that the packet
*
* - RULE_DPIF_LOOKUP_VERDICT_DEFAULT if no rule was found,
* 'honor_table_miss' is true and a table miss configuration has
- * not been specified in this case. */
+ * not been specified in this case.
+ *
+ * The rule is returned in '*rule', which is valid at least until the next
+ * RCU quiescent period. If the '*rule' needs to stay around longer,
+ * a non-zero 'take_ref' must be passed in to cause a reference to be taken
+ * on it before this returns. */
enum rule_dpif_lookup_verdict
rule_dpif_lookup_from_table(struct ofproto_dpif *ofproto,
const struct flow *flow,
struct flow_wildcards *wc,
bool honor_table_miss,
- uint8_t *table_id, struct rule_dpif **rule)
+ uint8_t *table_id, struct rule_dpif **rule,
+ bool take_ref)
{
uint8_t next_id;
next_id++, next_id += (next_id == TBL_INTERNAL))
{
*table_id = next_id;
- *rule = rule_dpif_lookup_in_table(ofproto, *table_id, flow, wc);
+ *rule = rule_dpif_lookup_in_table(ofproto, *table_id, flow, wc,
+ take_ref);
if (*rule) {
return RULE_DPIF_LOOKUP_VERDICT_MATCH;
} else if (!honor_table_miss) {
/* Given a port configuration (specified as zero if there's no port), chooses
* which of 'miss_rule' and 'no_packet_in_rule' should be used in case of a
- * flow table miss. */
+ * flow table miss.
+ *
+ * The rule is returned in '*rule', which is valid at least until the next
+ * RCU quiescent period. If the '*rule' needs to stay around longer,
+ * a reference must be taken on it (rule_dpif_ref()).
+ */
void
choose_miss_rule(enum ofputil_port_config config, struct rule_dpif *miss_rule,
- struct rule_dpif *no_packet_in_rule, struct rule_dpif **rule)
+ struct rule_dpif *no_packet_in_rule, struct rule_dpif **rule,
+ bool take_ref)
{
*rule = config & OFPUTIL_PC_NO_PACKET_IN ? no_packet_in_rule : miss_rule;
- rule_dpif_ref(*rule);
-}
-
-void
-rule_dpif_ref(struct rule_dpif *rule)
-{
- if (rule) {
- ofproto_rule_ref(&rule->up);
- }
-}
-
-void
-rule_dpif_unref(struct rule_dpif *rule)
-{
- if (rule) {
- ofproto_rule_unref(&rule->up);
+ if (take_ref) {
+ rule_dpif_ref(*rule);
}
}
static void
trace_format_rule(struct ds *result, int level, const struct rule_dpif *rule)
{
- struct rule_actions *actions;
+ const struct rule_actions *actions;
ovs_be64 cookie;
ds_put_char_multiple(result, '\t', level);
if (ofpacts) {
rule = NULL;
} else {
- rule_dpif_lookup(ofproto, flow, &trace.wc, &rule);
+ rule_dpif_lookup(ofproto, flow, &trace.wc, &rule, false);
trace_format_rule(ds, 0, rule);
if (rule == ofproto->miss_rule) {
xlate_out_uninit(&trace.xout);
}
-
- rule_dpif_unref(rule);
}
/* Store the current ofprotos in 'ofproto_shash'. Returns a sorted list
ofproto_unixctl_dpif_dump_flows, NULL);
}
-
-/* Returns true if 'rule' is an internal rule, false otherwise. */
+/* Returns true if 'table' is the table used for internal rules,
+ * false otherwise. */
bool
-rule_is_internal(const struct rule *rule)
+table_is_internal(uint8_t table_id)
{
- return rule->table_id == TBL_INTERNAL;
+ return table_id == TBL_INTERNAL;
}
\f
/* Linux VLAN device support (e.g. "eth0.10" for VLAN 10.)
}
rule = rule_dpif_lookup_in_table(ofproto, TBL_INTERNAL, &match->flow,
- &match->wc);
+ &match->wc, false);
if (rule) {
- rule_dpif_unref(rule);
*rulep = &rule->up;
} else {
OVS_NOT_REACHED();
#include <stdint.h>
+#include "fail-open.h"
#include "hmapx.h"
#include "odp-util.h"
#include "ofp-util.h"
bool ofproto_dpif_get_enable_recirc(const struct ofproto_dpif *);
uint8_t rule_dpif_lookup(struct ofproto_dpif *, struct flow *,
- struct flow_wildcards *, struct rule_dpif **rule);
+ struct flow_wildcards *, struct rule_dpif **rule,
+ bool take_ref);
enum rule_dpif_lookup_verdict rule_dpif_lookup_from_table(struct ofproto_dpif *,
const struct flow *,
struct flow_wildcards *,
bool force_controller_on_miss,
uint8_t *table_id,
- struct rule_dpif **rule);
+ struct rule_dpif **rule,
+ bool take_ref);
-void rule_dpif_ref(struct rule_dpif *);
-void rule_dpif_unref(struct rule_dpif *);
+static inline void rule_dpif_ref(struct rule_dpif *);
+static inline void rule_dpif_unref(struct rule_dpif *);
void rule_dpif_credit_stats(struct rule_dpif *rule ,
const struct dpif_flow_stats *);
-bool rule_dpif_is_fail_open(const struct rule_dpif *);
-bool rule_dpif_is_table_miss(const struct rule_dpif *);
-bool rule_dpif_is_internal(const struct rule_dpif *);
+static inline bool rule_dpif_is_fail_open(const struct rule_dpif *);
+static inline bool rule_dpif_is_table_miss(const struct rule_dpif *);
+static inline bool rule_dpif_is_internal(const struct rule_dpif *);
+
uint8_t rule_dpif_get_table(const struct rule_dpif *);
-struct rule_actions *rule_dpif_get_actions(const struct rule_dpif *);
+bool table_is_internal(uint8_t table_id);
+
+const struct rule_actions *rule_dpif_get_actions(const struct rule_dpif *);
ovs_be64 rule_dpif_get_flow_cookie(const struct rule_dpif *rule);
void choose_miss_rule(enum ofputil_port_config,
struct rule_dpif *miss_rule,
struct rule_dpif *no_packet_in_rule,
- struct rule_dpif **rule);
+ struct rule_dpif **rule, bool take_ref);
bool group_dpif_lookup(struct ofproto_dpif *ofproto, uint32_t group_id,
struct group_dpif **group);
bool ofproto_dpif_wants_packet_in_on_miss(struct ofproto_dpif *);
int ofproto_dpif_send_packet(const struct ofport_dpif *, struct ofpbuf *);
void ofproto_dpif_flow_mod(struct ofproto_dpif *, struct ofputil_flow_mod *);
+struct rule_dpif *ofproto_dpif_refresh_rule(struct rule_dpif *);
struct ofport_dpif *odp_port_to_ofport(const struct dpif_backer *, odp_port_t);
int ofproto_dpif_delete_internal_flow(struct ofproto_dpif *, struct match *,
int priority);
+/* Number of implemented OpenFlow tables. */
+enum { N_TABLES = 255 };
+enum { TBL_INTERNAL = N_TABLES - 1 }; /* Used for internal hidden rules. */
+BUILD_ASSERT_DECL(N_TABLES >= 2 && N_TABLES <= 255);
+
+\f
+/* struct rule_dpif has struct rule as it's first member. */
+#define RULE_CAST(RULE) ((struct rule *)RULE)
+
+static inline void rule_dpif_ref(struct rule_dpif *rule)
+{
+ if (rule) {
+ ofproto_rule_ref(RULE_CAST(rule));
+ }
+}
+
+static inline void rule_dpif_unref(struct rule_dpif *rule)
+{
+ if (rule) {
+ ofproto_rule_unref(RULE_CAST(rule));
+ }
+}
+
+static inline bool rule_dpif_is_fail_open(const struct rule_dpif *rule)
+{
+ return is_fail_open_rule(RULE_CAST(rule));
+}
+
+static inline bool rule_dpif_is_table_miss(const struct rule_dpif *rule)
+{
+ return rule_is_table_miss(RULE_CAST(rule));
+}
+
+/* Returns true if 'rule' is an internal rule, false otherwise. */
+static inline bool rule_dpif_is_internal(const struct rule_dpif *rule)
+{
+ return RULE_CAST(rule)->table_id == TBL_INTERNAL;
+}
+
+#undef RULE_CAST
+
#endif /* ofproto-dpif.h */
#include "heap.h"
#include "hindex.h"
#include "list.h"
+#include "ofp-actions.h"
#include "ofp-errors.h"
#include "ofp-util.h"
#include "ofproto/ofproto.h"
#include "timeval.h"
struct match;
-struct ofpact;
struct ofputil_flow_mod;
struct bfd_cfg;
struct meter;
struct netdev *netdev;
struct ofputil_phy_port pp;
ofp_port_t ofp_port; /* OpenFlow port number. */
+ uint64_t change_seq;
long long int created; /* Time created, in msec. */
int mtu;
};
/* OpenFlow actions. See struct rule_actions for more thread-safety
* notes. */
- OVSRCU_TYPE(struct rule_actions *) actions;
+ OVSRCU_TYPE(const struct rule_actions *) actions;
/* In owning meter's 'rules' list. An empty list if there is no meter. */
struct list meter_list_node OVS_GUARDED_BY(ofproto_mutex);
void ofproto_rule_ref(struct rule *);
void ofproto_rule_unref(struct rule *);
-static inline struct rule_actions *
-rule_get_actions(const struct rule *rule)
-{
- return ovsrcu_get(struct rule_actions *, &rule->actions);
-}
-
-/* Returns true if 'rule' is an OpenFlow 1.3 "table-miss" rule, false
- * otherwise.
- *
- * ("Table-miss" rules are special because a packet_in generated through one
- * uses OFPR_NO_MATCH as its reason, whereas packet_ins generated by any other
- * rule use OFPR_ACTION.) */
-static inline bool
-rule_is_table_miss(const struct rule *rule)
-{
- return rule->cr.priority == 0 && cls_rule_is_catchall(&rule->cr);
-}
-bool rule_is_internal(const struct rule *);
+static inline const struct rule_actions * rule_get_actions(const struct rule *);
+static inline bool rule_is_table_miss(const struct rule *);
/* A set of actions within a "struct rule".
*
* Thread-safety
* =============
*
- * A struct rule_actions 'actions' may be accessed without a risk of being
+ * A struct rule_actions may be accessed without a risk of being
* freed by code that holds a read-lock or write-lock on 'rule->mutex' (where
- * 'rule' is the rule for which 'rule->actions == actions') or that owns a
- * reference to 'actions->ref_count' (or both). */
+ * 'rule' is the rule for which 'rule->actions == actions') or during the RCU
+ * active period. */
struct rule_actions {
/* These members are immutable: they do not change during the struct's
* lifetime. */
- struct ofpact *ofpacts; /* Sequence of "struct ofpacts". */
- unsigned int ofpacts_len; /* Size of 'ofpacts', in bytes. */
- uint32_t provider_meter_id; /* Datapath meter_id, or UINT32_MAX. */
+ uint32_t ofpacts_len; /* Size of 'ofpacts', in bytes. */
+ uint32_t provider_meter_id; /* Datapath meter_id, or UINT32_MAX. */
+ struct ofpact ofpacts[]; /* Sequence of "struct ofpacts". */
};
+BUILD_ASSERT_DECL(offsetof(struct rule_actions, ofpacts) % OFPACT_ALIGNTO == 0);
-struct rule_actions *rule_actions_create(const struct ofproto *,
- const struct ofpact *, size_t);
-void rule_actions_destroy(struct rule_actions *);
+const struct rule_actions *rule_actions_create(const struct ofproto *,
+ const struct ofpact *, size_t);
+void rule_actions_destroy(const struct rule_actions *);
/* A set of rules to which an OpenFlow operation applies. */
struct rule_collection {
* ofproto-dpif implementation. */
extern size_t n_handlers, n_revalidators;
-static inline struct rule *
-rule_from_cls_rule(const struct cls_rule *cls_rule)
-{
- return cls_rule ? CONTAINER_OF(cls_rule, struct rule, cr) : NULL;
-}
+static inline struct rule *rule_from_cls_rule(const struct cls_rule *);
void ofproto_rule_expire(struct rule *rule, uint8_t reason)
OVS_REQUIRES(ofproto_mutex);
* support CFM, as does a null pointer. */
int (*set_cfm)(struct ofport *ofport, const struct cfm_settings *s);
- /* Checks the status of CFM configured on 'ofport'. Returns true if the
- * port's CFM status was successfully stored into '*status'. Returns false
- * if the port did not have CFM configured, in which case '*status' is
- * indeterminate.
+ /* Checks the status of CFM configured on 'ofport'. Returns 0 if the
+ * port's CFM status was successfully stored into '*status'. Returns
+ * negative number if there is no status change since last update.
+ * Returns positive errno otherwise.
+ *
+ * EOPNOTSUPP as a return value indicates that this ofproto_class does not
+ * support CFM, as does a null pointer.
*
- * The caller must provide and owns '*status', but it does not own and must
- * not modify or free the array returned in 'status->rmps'. */
- bool (*get_cfm_status)(const struct ofport *ofport,
- struct ofproto_cfm_status *status);
+ * The caller must provide and own '*status', and it must free the array
+ * returned in 'status->rmps'. '*status' is indeterminate if the return
+ * value is non-zero. */
+ int (*get_cfm_status)(const struct ofport *ofport,
+ struct ofproto_cfm_status *status);
/* Configures BFD on 'ofport'.
*
int (*set_bfd)(struct ofport *ofport, const struct smap *cfg);
/* Populates 'smap' with the status of BFD on 'ofport'. Returns 0 on
- * success, or a positive errno. EOPNOTSUPP as a return value indicates
- * that this ofproto_class does not support BFD, as does a null pointer. */
+ * success. Returns a negative number if there is no status change since
+ * last update. Returns a positive errno otherwise.
+ *
+ * EOPNOTSUPP as a return value indicates that this ofproto_class does not
+ * support BFD, as does a null pointer. */
int (*get_bfd_status)(struct ofport *ofport, struct smap *smap);
/* Configures spanning tree protocol (STP) on 'ofproto' using the
int ofproto_flow_mod(struct ofproto *, struct ofputil_flow_mod *)
OVS_EXCLUDED(ofproto_mutex);
+struct rule *ofproto_refresh_rule(struct rule *rule)
+ OVS_EXCLUDED(ofproto_mutex);
void ofproto_add_flow(struct ofproto *, const struct match *,
unsigned int priority,
const struct ofpact *ofpacts, size_t ofpacts_len)
OVS_EXCLUDED(ofproto_mutex);
void ofproto_flush_flows(struct ofproto *);
+\f
+static inline const struct rule_actions *
+rule_get_actions(const struct rule *rule)
+{
+ return ovsrcu_get(const struct rule_actions *, &rule->actions);
+}
+
+/* Returns true if 'rule' is an OpenFlow 1.3 "table-miss" rule, false
+ * otherwise.
+ *
+ * ("Table-miss" rules are special because a packet_in generated through one
+ * uses OFPR_NO_MATCH as its reason, whereas packet_ins generated by any other
+ * rule use OFPR_ACTION.) */
+static inline bool
+rule_is_table_miss(const struct rule *rule)
+{
+ return rule->cr.priority == 0 && cls_rule_is_catchall(&rule->cr);
+}
+
+static inline struct rule *
+rule_from_cls_rule(const struct cls_rule *cls_rule)
+{
+ return cls_rule ? CONTAINER_OF(cls_rule, struct rule, cr) : NULL;
+}
+
#endif /* ofproto/ofproto-provider.h */
#include "unaligned.h"
#include "unixctl.h"
#include "vlog.h"
+#include "bundles.h"
VLOG_DEFINE_THIS_MODULE(ofproto);
/* OFOPERATION_MODIFY, OFOPERATION_REPLACE: The old actions, if the actions
* are changing. */
- struct rule_actions *actions;
+ const struct rule_actions *actions;
/* OFOPERATION_DELETE. */
enum ofp_flow_removed_reason reason; /* Reason flow was removed. */
};
/* rule. */
-static void ofproto_rule_destroy__(struct rule *);
static void ofproto_rule_send_removed(struct rule *, uint8_t reason);
static bool rule_is_modifiable(const struct rule *rule,
enum ofputil_flow_mod_flags flag);
static enum ofperr add_flow(struct ofproto *, struct ofconn *,
struct ofputil_flow_mod *,
const struct ofp_header *);
+static void do_add_flow(struct ofproto *, struct ofconn *,
+ const struct ofp_header *request, uint32_t buffer_id,
+ struct rule *);
static enum ofperr modify_flows__(struct ofproto *, struct ofconn *,
struct ofputil_flow_mod *,
const struct ofp_header *,
}
}
-/* Populates 'status' with key value pairs indicating the status of the BFD
- * session on 'ofp_port'. This information is intended to be populated in the
- * OVS database. Has no effect if 'ofp_port' is not na OpenFlow port in
- * 'ofproto'. */
+/* Populates 'status' with the status of BFD on 'ofport'. Returns 0 on
+ * success. Returns a negative number if there is no status change since
+ * last update. Returns a positive errno otherwise. Has no effect if
+ * 'ofp_port' is not an OpenFlow port in 'ofproto'.
+ *
+ * The caller must provide and own '*status'. */
int
ofproto_port_get_bfd_status(struct ofproto *ofproto, ofp_port_t ofp_port,
struct smap *status)
ovs_assert(list_is_empty(&ofproto->pending));
destroy_rule_executes(ofproto);
- guarded_list_destroy(&ofproto->rule_executes);
-
delete_group(ofproto, OFPG_ALL);
+
+ guarded_list_destroy(&ofproto->rule_executes);
ovs_rwlock_destroy(&ofproto->groups_rwlock);
hmap_destroy(&ofproto->groups);
}
p->ofproto_class->destruct(p);
- ofproto_destroy__(p);
+ /* Destroying rules is deferred, must have 'ofproto' around for them. */
+ ovsrcu_postpone(ofproto_destroy__, p);
}
/* Destroys the datapath with the respective 'name' and 'type'. With the Linux
* need this two-phase approach. */
sset_init(&devnames);
HMAP_FOR_EACH (ofport, hmap_node, &p->ports) {
- sset_add(&devnames, netdev_get_name(ofport->netdev));
+ uint64_t port_change_seq;
+
+ port_change_seq = netdev_get_change_seq(ofport->netdev);
+ if (ofport->change_seq != port_change_seq) {
+ ofport->change_seq = port_change_seq;
+ sset_add(&devnames, netdev_get_name(ofport->netdev));
+ }
}
SSET_FOR_EACH (devname, &devnames) {
update_port(p, devname);
rule = rule_from_cls_rule(classifier_find_match_exactly(
&ofproto->tables[0].cls, match, priority));
if (rule) {
- struct rule_actions *actions = rule_get_actions(rule);
+ const struct rule_actions *actions = rule_get_actions(rule);
must_add = !ofpacts_equal(actions->ofpacts, actions->ofpacts_len,
ofpacts, ofpacts_len);
} else {
if (fm->command == OFPFC_MODIFY_STRICT && fm->table_id != OFPTT_ALL
&& !(fm->flags & OFPUTIL_FF_RESET_COUNTS)) {
struct oftable *table = &ofproto->tables[fm->table_id];
- struct cls_rule match_rule;
struct rule *rule;
bool done = false;
- cls_rule_init(&match_rule, &fm->match, fm->priority);
fat_rwlock_rdlock(&table->cls.rwlock);
- rule = rule_from_cls_rule(classifier_find_rule_exactly(&table->cls,
- &match_rule));
+ rule = rule_from_cls_rule(classifier_find_match_exactly(&table->cls,
+ &fm->match,
+ fm->priority));
if (rule) {
/* Reading many of the rule fields and writing on 'modified'
* requires the rule->mutex. Also, rule->actions may change
return handle_flow_mod__(ofproto, NULL, fm, NULL);
}
+/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' is not
+ * in the classifier, but an equivalent rule is, unref 'rule' and ref the new
+ * rule. Otherwise if 'rule' is no longer installed in the classifier,
+ * reinstall it.
+ *
+ * Returns the rule whose modified time has been reset. */
+struct rule *
+ofproto_refresh_rule(struct rule *rule)
+{
+ const struct oftable *table = &rule->ofproto->tables[rule->table_id];
+ const struct cls_rule *cr = &rule->cr;
+ struct rule *r;
+
+ /* do_add_flow() requires that the rule is not installed. We lock the
+ * ofproto_mutex here so that another thread cannot add the flow before
+ * we get a chance to add it.*/
+ ovs_mutex_lock(&ofproto_mutex);
+
+ fat_rwlock_rdlock(&table->cls.rwlock);
+ r = rule_from_cls_rule(classifier_find_rule_exactly(&table->cls, cr));
+ if (r != rule) {
+ ofproto_rule_ref(r);
+ }
+ fat_rwlock_unlock(&table->cls.rwlock);
+
+ if (!r) {
+ do_add_flow(rule->ofproto, NULL, NULL, 0, rule);
+ } else if (r != rule) {
+ ofproto_rule_unref(rule);
+ rule = r;
+ }
+ ovs_mutex_unlock(&ofproto_mutex);
+
+ /* Refresh the modified time for the rule. */
+ ovs_mutex_lock(&rule->mutex);
+ rule->modified = MAX(rule->modified, time_msec());
+ ovs_mutex_unlock(&rule->mutex);
+
+ return rule;
+}
+
/* Searches for a rule with matching criteria exactly equal to 'target' in
* ofproto's table 0 and, if it finds one, deletes it.
*
}
ofport->ofproto = p;
ofport->netdev = netdev;
+ ofport->change_seq = netdev_get_change_seq(netdev);
ofport->pp = *pp;
ofport->ofp_port = pp->port_no;
ofport->created = time_msec();
* Don't close the old netdev yet in case port_modified has to
* remove a retained reference to it.*/
port->netdev = netdev;
+ port->change_seq = netdev_get_change_seq(netdev);
if (port->ofproto->ofproto_class->port_modified) {
port->ofproto->ofproto_class->port_modified(port);
}
}
\f
+static void
+ofproto_rule_destroy__(struct rule *rule)
+ OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+ cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
+ rule_actions_destroy(rule_get_actions(rule));
+ ovs_mutex_destroy(&rule->mutex);
+ rule->ofproto->ofproto_class->rule_dealloc(rule);
+}
+
+static void
+rule_destroy_cb(struct rule *rule)
+{
+ rule->ofproto->ofproto_class->rule_destruct(rule);
+ ofproto_rule_destroy__(rule);
+}
+
void
ofproto_rule_ref(struct rule *rule)
{
}
}
+/* Decrements 'rule''s ref_count and schedules 'rule' to be destroyed if the
+ * ref_count reaches 0.
+ *
+ * Use of RCU allows short term use (between RCU quiescent periods) without
+ * keeping a reference. A reference must be taken if the rule needs to
+ * stay around accross the RCU quiescent periods. */
void
ofproto_rule_unref(struct rule *rule)
{
if (rule && ovs_refcount_unref(&rule->ref_count) == 1) {
- rule->ofproto->ofproto_class->rule_destruct(rule);
- ofproto_rule_destroy__(rule);
+ ovsrcu_postpone(rule_destroy_cb, rule);
}
}
-static void
-ofproto_rule_destroy__(struct rule *rule)
- OVS_NO_THREAD_SAFETY_ANALYSIS
-{
- cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
- rule_actions_destroy(rule_get_actions(rule));
- ovs_mutex_destroy(&rule->mutex);
- rule->ofproto->ofproto_class->rule_dealloc(rule);
-}
-
static uint32_t get_provider_meter_id(const struct ofproto *,
uint32_t of_meter_id);
-/* Creates and returns a new 'struct rule_actions', with a ref_count of 1,
- * whose actions are a copy of from the 'ofpacts_len' bytes of 'ofpacts'. */
-struct rule_actions *
+/* Creates and returns a new 'struct rule_actions', whose actions are a copy
+ * of from the 'ofpacts_len' bytes of 'ofpacts'. */
+const struct rule_actions *
rule_actions_create(const struct ofproto *ofproto,
const struct ofpact *ofpacts, size_t ofpacts_len)
{
struct rule_actions *actions;
- actions = xmalloc(sizeof *actions);
- actions->ofpacts = xmemdup(ofpacts, ofpacts_len);
+ actions = xmalloc(sizeof *actions + ofpacts_len);
actions->ofpacts_len = ofpacts_len;
actions->provider_meter_id
= get_provider_meter_id(ofproto,
ofpacts_get_meter(ofpacts, ofpacts_len));
+ memcpy(actions->ofpacts, ofpacts, ofpacts_len);
return actions;
}
-static void
-rule_actions_destroy_cb(struct rule_actions *actions)
-{
- free(actions->ofpacts);
- free(actions);
-}
-
-/* Decrements 'actions''s ref_count and frees 'actions' if the ref_count
- * reaches 0. */
+/* Free the actions after the RCU quiescent period is reached. */
void
-rule_actions_destroy(struct rule_actions *actions)
+rule_actions_destroy(const struct rule_actions *actions)
{
if (actions) {
- ovsrcu_postpone(rule_actions_destroy_cb, actions);
+ ovsrcu_postpone(free, CONST_CAST(struct rule_actions *, actions));
}
}
long long int now = time_msec();
struct ofputil_flow_stats fs;
long long int created, used, modified;
- struct rule_actions *actions;
+ const struct rule_actions *actions;
enum ofputil_flow_mod_flags flags;
ovs_mutex_lock(&rule->mutex);
flow_stats_ds(struct rule *rule, struct ds *results)
{
uint64_t packet_count, byte_count;
- struct rule_actions *actions;
+ const struct rule_actions *actions;
long long int created, used;
rule->ofproto->ofproto_class->rule_get_stats(rule, &packet_count,
ofproto->ofproto_class->get_netflow_ids(ofproto, engine_type, engine_id);
}
-/* Checks the status of CFM configured on 'ofp_port' within 'ofproto'. Returns
- * true if the port's CFM status was successfully stored into '*status'.
- * Returns false if the port did not have CFM configured, in which case
- * '*status' is indeterminate.
+/* Checks the status of CFM configured on 'ofp_port' within 'ofproto'.
+ * Returns 0 if the port's CFM status was successfully stored into
+ * '*status'. Returns positive errno if the port did not have CFM
+ * configured. Returns negative number if there is no status change
+ * since last update.
*
- * The caller must provide and owns '*status', and must free 'status->rmps'. */
-bool
+ * The caller must provide and own '*status', and must free 'status->rmps'.
+ * '*status' is indeterminate if the return value is non-zero. */
+int
ofproto_port_get_cfm_status(const struct ofproto *ofproto, ofp_port_t ofp_port,
struct ofproto_cfm_status *status)
{
struct ofport *ofport = ofproto_get_port(ofproto, ofp_port);
- return (ofport
- && ofproto->ofproto_class->get_cfm_status
- && ofproto->ofproto_class->get_cfm_status(ofport, status));
+ return (ofport && ofproto->ofproto_class->get_cfm_status
+ ? ofproto->ofproto_class->get_cfm_status(ofport, status)
+ : EOPNOTSUPP);
}
static enum ofperr
HMAP_FOR_EACH_WITH_HASH (op, hmap_node,
cls_rule_hash(cls_rule, table_id),
&ofproto->deletions) {
- if (cls_rule_equal(cls_rule, &op->rule->cr)) {
+ if (op->rule->table_id == table_id
+ && cls_rule_equal(cls_rule, &op->rule->cr)) {
return true;
}
}
OVS_REQUIRES(ofproto_mutex)
{
struct oftable *table;
- struct ofopgroup *group;
struct cls_rule cr;
struct rule *rule;
uint8_t table_id;
}
/* Insert rule. */
+ do_add_flow(ofproto, ofconn, request, fm->buffer_id, rule);
+
+ return error;
+}
+
+static void
+do_add_flow(struct ofproto *ofproto, struct ofconn *ofconn,
+ const struct ofp_header *request, uint32_t buffer_id,
+ struct rule *rule)
+ OVS_REQUIRES(ofproto_mutex)
+{
+ struct ofopgroup *group;
+
oftable_insert_rule(rule);
- group = ofopgroup_create(ofproto, ofconn, request, fm->buffer_id);
+ group = ofopgroup_create(ofproto, ofconn, request, buffer_id);
ofoperation_create(group, rule, OFOPERATION_ADD, 0);
ofproto->ofproto_class->rule_insert(rule);
ofopgroup_submit(group);
-
- return error;
}
\f
/* OFPFC_MODIFY and OFPFC_MODIFY_STRICT. */
reset_counters = (fm->flags & OFPUTIL_FF_RESET_COUNTS) != 0;
if (actions_changed || reset_counters) {
- struct rule_actions *new_actions;
+ const struct rule_actions *new_actions;
op->actions = rule_get_actions(rule);
new_actions = rule_actions_create(ofproto,
return table_mod(ofproto, &tm);
}
+static enum ofperr
+handle_bundle_control(struct ofconn *ofconn, const struct ofp_header *oh)
+{
+ enum ofperr error;
+ struct ofputil_bundle_ctrl_msg bctrl;
+ struct ofpbuf *buf;
+ struct ofputil_bundle_ctrl_msg reply;
+
+ error = ofputil_decode_bundle_ctrl(oh, &bctrl);
+ if (error) {
+ return error;
+ }
+ reply.flags = 0;
+ reply.bundle_id = bctrl.bundle_id;
+
+ switch (bctrl.type) {
+ case OFPBCT_OPEN_REQUEST:
+ error = ofp_bundle_open(ofconn, bctrl.bundle_id, bctrl.flags);
+ reply.type = OFPBCT_OPEN_REPLY;
+ break;
+ case OFPBCT_CLOSE_REQUEST:
+ error = ofp_bundle_close(ofconn, bctrl.bundle_id, bctrl.flags);
+ reply.type = OFPBCT_CLOSE_REPLY;;
+ break;
+ case OFPBCT_COMMIT_REQUEST:
+ error = ofp_bundle_commit(ofconn, bctrl.bundle_id, bctrl.flags);
+ reply.type = OFPBCT_COMMIT_REPLY;
+ break;
+ case OFPBCT_DISCARD_REQUEST:
+ error = ofp_bundle_discard(ofconn, bctrl.bundle_id);
+ reply.type = OFPBCT_DISCARD_REPLY;
+ break;
+
+ case OFPBCT_OPEN_REPLY:
+ case OFPBCT_CLOSE_REPLY:
+ case OFPBCT_COMMIT_REPLY:
+ case OFPBCT_DISCARD_REPLY:
+ return OFPERR_OFPBFC_BAD_TYPE;
+ break;
+ }
+
+ if (!error) {
+ buf = ofputil_encode_bundle_ctrl_reply(oh, &reply);
+ ofconn_send_reply(ofconn, buf);
+ }
+ return error;
+}
+
+
+static enum ofperr
+handle_bundle_add(struct ofconn *ofconn, const struct ofp_header *oh)
+{
+ enum ofperr error;
+ struct ofputil_bundle_add_msg badd;
+
+ error = ofputil_decode_bundle_add(oh, &badd);
+ if (error) {
+ return error;
+ }
+
+ return ofp_bundle_add_message(ofconn, &badd);
+}
+
static enum ofperr
handle_openflow__(struct ofconn *ofconn, const struct ofpbuf *msg)
OVS_EXCLUDED(ofproto_mutex)
case OFPTYPE_QUEUE_GET_CONFIG_REQUEST:
return handle_queue_get_config_request(ofconn, oh);
+ case OFPTYPE_BUNDLE_CONTROL:
+ return handle_bundle_control(ofconn, oh);
+
+ case OFPTYPE_BUNDLE_ADD_MESSAGE:
+ return handle_bundle_add(ofconn, oh);
+
case OFPTYPE_HELLO:
case OFPTYPE_ERROR:
case OFPTYPE_FEATURES_REPLY:
rule->hard_timeout = op->hard_timeout;
ovs_mutex_unlock(&rule->mutex);
if (op->actions) {
- struct rule_actions *old_actions;
+ const struct rule_actions *old_actions;
ovs_mutex_lock(&rule->mutex);
old_actions = rule_get_actions(rule);
{
struct ofproto *ofproto = rule->ofproto;
struct oftable *table = &ofproto->tables[rule->table_id];
- struct rule_actions *actions;
+ const struct rule_actions *actions;
bool may_expire;
ovs_mutex_lock(&rule->mutex);
void
ofproto_get_vlan_usage(struct ofproto *ofproto, unsigned long int *vlan_bitmap)
{
+ struct match match;
+ struct cls_rule target;
const struct oftable *oftable;
+ match_init_catchall(&match);
+ match_set_vlan_vid_masked(&match, htons(VLAN_CFI), htons(VLAN_CFI));
+ cls_rule_init(&target, &match, 0);
+
free(ofproto->vlan_bitmap);
ofproto->vlan_bitmap = bitmap_allocate(4096);
ofproto->vlans_changed = false;
OFPROTO_FOR_EACH_TABLE (oftable, ofproto) {
- const struct cls_subtable *table;
+ struct cls_cursor cursor;
+ struct rule *rule;
fat_rwlock_rdlock(&oftable->cls.rwlock);
- HMAP_FOR_EACH (table, hmap_node, &oftable->cls.subtables) {
- if (minimask_get_vid_mask(&table->mask) == VLAN_VID_MASK) {
- const struct cls_rule *rule;
-
- HMAP_FOR_EACH (rule, hmap_node, &table->rules) {
- uint16_t vid = miniflow_get_vid(&rule->match.flow);
- bitmap_set1(vlan_bitmap, vid);
- bitmap_set1(ofproto->vlan_bitmap, vid);
- }
+ cls_cursor_init(&cursor, &oftable->cls, &target);
+ CLS_CURSOR_FOR_EACH (rule, cr, &cursor) {
+ if (minimask_get_vid_mask(&rule->cr.match.mask) == VLAN_VID_MASK) {
+ uint16_t vid = miniflow_get_vid(&rule->cr.match.flow);
+
+ bitmap_set1(vlan_bitmap, vid);
+ bitmap_set1(ofproto->vlan_bitmap, vid);
}
}
fat_rwlock_unlock(&oftable->cls.rwlock);
size_t n_rmps;
};
-bool ofproto_port_get_cfm_status(const struct ofproto *,
- ofp_port_t ofp_port,
- struct ofproto_cfm_status *);
+int ofproto_port_get_cfm_status(const struct ofproto *,
+ ofp_port_t ofp_port,
+ struct ofproto_cfm_status *);
\f
/* Linux VLAN device support (e.g. "eth0.10" for VLAN 10.)
*
struct flow_wildcards *wc)
{
if (is_ip_any(base_flow)) {
- wc->masks.nw_tos |= IP_ECN_MASK;
if ((flow->tunnel.ip_tos & IP_ECN_MASK) == IP_ECN_CE) {
+ wc->masks.nw_tos |= IP_ECN_MASK;
if ((base_flow->nw_tos & IP_ECN_MASK) == IP_ECN_NOT_ECT) {
VLOG_WARN_RL(&rl, "dropping tunnel packet marked ECN CE"
" but is not ECN capable");
tnl_xlate_init(const struct flow *base_flow, struct flow *flow,
struct flow_wildcards *wc)
{
+ /* tnl_port_should_receive() examines the 'tunnel.ip_dst' field to
+ * determine the presence of the tunnel metadata. However, since tunnels'
+ * datapath port numbers are different from the non-tunnel ports, and we
+ * always unwildcard the 'in_port', we do not need to unwildcard
+ * the 'tunnel.ip_dst' for non-tunneled packets. */
if (tnl_port_should_receive(flow)) {
wc->masks.tunnel.tun_id = OVS_BE64_MAX;
wc->masks.tunnel.ip_src = OVS_BE32_MAX;
square brackets, e.g.: \fBtcp:[::1]:6632\fR.
.
.IP "\fBunix:\fIfile\fR"
-Connect to the Unix domain server socket named \fIfile\fR.
+On POSIX, connect to the Unix domain server socket named \fIfile\fR.
+.IP
+On Windows, connect to a localhost TCP port whose value is written in
+\fIfile\fR.
\fBptcp:6632:[::1]\fR.
.
.IP "\fBpunix:\fIfile\fR"
-Listen on the Unix domain server socket named \fIfile\fR for a
+On POSIX, listen on the Unix domain server socket named \fIfile\fR for a
connection.
+.IP
+On Windows, listen on a kernel chosen TCP port on the localhost. The kernel
+chosen TCP port value is written in \fIfile\fR.
import datetime
import logging
import logging.handlers
+import os
import re
import socket
import sys
+import threading
import ovs.dirs
import ovs.unixctl
import ovs.util
FACILITIES = {"console": "info", "file": "info", "syslog": "info"}
+PATTERNS = {
+ "console": "%D{%Y-%m-%dT%H:%M:%SZ}|%05N|%c%T|%p|%m",
+ "file": "%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m",
+ "syslog": "ovs|%05N|%c%T|%p|%m",
+}
LEVELS = {
"dbg": logging.DEBUG,
"info": logging.INFO,
class Vlog:
__inited = False
__msg_num = 0
+ __start_time = 0
__mfl = {} # Module -> facility -> level
__log_file = None
__file_handler = None
+ __log_patterns = PATTERNS
def __init__(self, name):
"""Creates a new Vlog object representing a module called 'name'. The
if not Vlog.__inited:
return
- dt = datetime.datetime.utcnow();
- now = dt.strftime("%Y-%m-%dT%H:%M:%S.%%03iZ") % (dt.microsecond/1000)
- syslog_message = ("%s|%s|%s|%s"
- % (Vlog.__msg_num, self.name, level, message))
-
- level = LEVELS.get(level.lower(), logging.DEBUG)
+ level_num = LEVELS.get(level.lower(), logging.DEBUG)
+ msg_num = Vlog.__msg_num
Vlog.__msg_num += 1
for f, f_level in Vlog.__mfl[self.name].iteritems():
f_level = LEVELS.get(f_level, logging.CRITICAL)
- if level >= f_level:
- if f == "syslog":
- message = "ovs|" + syslog_message
+ if level_num >= f_level:
+ msg = self._build_message(message, f, level, msg_num)
+ logging.getLogger(f).log(level_num, msg, **kwargs)
+
+ def _build_message(self, message, facility, level, msg_num):
+ pattern = self.__log_patterns[facility]
+ tmp = pattern
+
+ tmp = self._format_time(tmp)
+
+ matches = re.findall("(%-?[0]?[0-9]?[AcmNnpPrtT])", tmp)
+ for m in matches:
+ if "A" in m:
+ tmp = self._format_field(tmp, m, ovs.util.PROGRAM_NAME)
+ elif "c" in m:
+ tmp = self._format_field(tmp, m, self.name)
+ elif "m" in m:
+ tmp = self._format_field(tmp, m, message)
+ elif "N" in m:
+ tmp = self._format_field(tmp, m, str(msg_num))
+ elif "n" in m:
+ tmp = re.sub(m, "\n", tmp)
+ elif "p" in m:
+ tmp = self._format_field(tmp, m, level.upper())
+ elif "P" in m:
+ self._format_field(tmp, m, str(os.getpid()))
+ elif "r" in m:
+ now = datetime.datetime.utcnow()
+ delta = now - self.__start_time
+ ms = delta.microseconds / 1000
+ tmp = self._format_field(tmp, m, str(ms))
+ elif "t" in m:
+ subprogram = threading.current_thread().name
+ if subprogram == "MainThread":
+ subprogram = "main"
+ tmp = self._format_field(tmp, m, subprogram)
+ elif "T" in m:
+ subprogram = threading.current_thread().name
+ if not subprogram == "MainThread":
+ subprogram = "({})".format(subprogram)
else:
- message = "%s|%s" % (now, syslog_message)
- logging.getLogger(f).log(level, message, **kwargs)
+ subprogram = ""
+ tmp = self._format_field(tmp, m, subprogram)
+ return tmp.strip()
+
+ def _format_field(self, tmp, match, replace):
+ formatting = re.compile("^%(0)?([1-9])?")
+ matches = formatting.match(match)
+ # Do we need to apply padding?
+ if not matches.group(1) and replace != "":
+ replace = replace.center(len(replace)+2)
+ # Does the field have a minimum width
+ if matches.group(2):
+ min_width = int(matches.group(2))
+ if len(replace) < min_width:
+ replace = replace.center(min_width)
+ return re.sub(match, replace, tmp)
+
+ def _format_time(self, tmp):
+ date_regex = re.compile('(%(0?[1-9]?[dD])(\{(.*)\})?)')
+ match = date_regex.search(tmp)
+
+ if match is None:
+ return tmp
+
+ # UTC date or Local TZ?
+ if match.group(2) == "d":
+ now = datetime.datetime.now()
+ elif match.group(2) == "D":
+ now = datetime.datetime.utcnow()
+
+ # Custom format or ISO format?
+ if match.group(3):
+ time = datetime.date.strftime(now, match.group(4))
+ try:
+ i = len(re.search("#+", match.group(4)).group(0))
+ msec = '{0:0>{i}.{i}}'.format(str(now.microsecond / 1000), i=i)
+ time = re.sub('#+', msec, time)
+ except AttributeError:
+ pass
+ else:
+ time = datetime.datetime.isoformat(now.replace(microsecond=0))
+
+ return self._format_field(tmp, match.group(1), time)
def emer(self, message, **kwargs):
self.__log("EMER", message, **kwargs)
return
Vlog.__inited = True
+ Vlog.__start_time = datetime.datetime.utcnow()
logging.raiseExceptions = False
Vlog.__log_file = log_file
for f in FACILITIES:
for f in facilities:
Vlog.__mfl[m][f] = level
+ @staticmethod
+ def set_pattern(facility, pattern):
+ """ Sets the log pattern of the 'facility' to 'pattern' """
+ facility = facility.lower()
+ Vlog.__log_patterns[facility] = pattern
+
@staticmethod
def set_levels_from_string(s):
module = None
level = None
facility = None
- for word in [w.lower() for w in re.split('[ :]', s)]:
+ words = re.split('[ :]', s)
+ if words[0] == "pattern":
+ try:
+ if words[1] in FACILITIES and words[2]:
+ segments = [words[i] for i in range(2, len(words))]
+ pattern = "".join(segments)
+ Vlog.set_pattern(words[1], pattern)
+ return
+ else:
+ return "Facility %s does not exist" % words[1]
+ except IndexError:
+ return "Please supply a valid pattern and facility"
+
+ for word in [w.lower() for w in words]:
if word == "any":
pass
elif word in FACILITIES:
def _unixctl_vlog_list(conn, unused_argv, unused_aux):
conn.reply(Vlog.get_levels())
+
def add_args(parser):
"""Adds vlog related options to 'parser', an ArgumentParser object. The
resulting arguments parsed by 'parser' should be passed to handle_args."""
# Spec file for Open vSwitch.
-# Copyright (C) 2009, 2010, 2013 Nicira Networks, Inc.
+# Copyright (C) 2009, 2010, 2013, 2014 Nicira Networks, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
%doc /usr/share/man/man5/vtep.5.gz
%doc /usr/share/man/man8/ovs-appctl.8.gz
%doc /usr/share/man/man8/ovs-bugtool.8.gz
+%doc /usr/share/man/man8/ovs-ctl.8.gz
%doc /usr/share/man/man8/ovs-dpctl.8.gz
%doc /usr/share/man/man8/ovs-dpctl-top.8.gz
%doc /usr/share/man/man8/ovs-ofctl.8.gz
%exclude /usr/share/man/man1/ovs-benchmark.1.gz
%exclude /usr/share/man/man1/ovs-pcap.1.gz
%exclude /usr/share/man/man1/ovs-tcpundump.1.gz
-%exclude /usr/share/man/man8/ovs-ctl.8.gz
%exclude /usr/share/man/man8/ovs-vlan-bug-workaround.8.gz
%exclude /usr/share/man/man8/ovs-vlan-test.8.gz
%exclude /usr/share/openvswitch/scripts/ovs-save
Source: openvswitch-%{version}.tar.gz
Buildroot: /tmp/openvswitch-rpm
Requires: openvswitch-kmod, logrotate, python
+BuildRequires: openssl-devel
%description
Open vSwitch provides standard network bridging functions and
LOOPBACK_INTERFACE=lo0
;;
esac
+
+# Check for MINGW platform.
+case `uname` in
+MINGW*)
+ IS_WIN32="yes"
+ ;;
+*)
+ IS_WIN32="no"
+ ;;
+esac
# forwarding_if_rx Test1
# Test1 tests the case when bfd is only enabled on one end of the link.
-# Under this situation, the bfd state should be DOWN and the forwarding
-# flag should be FALSE by default. However, if forwarding_if_rx is
-# enabled, as long as there is packet received, the bfd forwarding flag
-# should be TRUE.
+# Under this situation, the forwarding flag should always be false, even
+# though there is data packet received, since there is no bfd control
+# packet received during the demand_rx_bfd interval.
AT_SETUP([bfd - bfd forwarding_if_rx - bfd on one side])
OVS_VSWITCHD_START([add-br br1 -- set bridge br1 datapath-type=dummy -- \
add-port br1 p1 -- set Interface p1 type=patch \
AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
[0], [stdout], [])
done
-# the forwarding flag should be true, since there is data received.
-BFD_CHECK([p0], [true], [false], [none], [down], [No Diagnostic], [none], [down], [No Diagnostic])
-
-# reset bfd forwarding_if_rx.
-AT_CHECK([ovs-vsctl set Interface p0 bfd:forwarding_if_rx=false], [0])
-# forwarding flag should turn to false since the STATE is DOWN.
+# the forwarding flag should be false, due to the demand_rx_bfd.
BFD_CHECK([p0], [false], [false], [none], [down], [No Diagnostic], [none], [down], [No Diagnostic])
-BFD_CHECK_TX([p0], [1000ms], [1000ms], [0ms])
-BFD_CHECK_RX([p0], [500ms], [500ms], [1ms])
AT_CHECK([ovs-vsctl del-br br1], [0], [ignore])
AT_CLEANUP
AT_CHECK([ovs-vsctl del-br br1], [0], [ignore])
AT_CLEANUP
+# forwarding_if_rx Test4
+# Test4 is for testing the demand_rx_bfd feature.
+# bfd is enabled on both ends of the link.
+AT_SETUP([bfd - bfd forwarding_if_rx - demand_rx_bfd])
+OVS_VSWITCHD_START([add-br br1 -- set bridge br1 datapath-type=dummy -- \
+ add-port br1 p1 -- set Interface p1 type=patch \
+ options:peer=p0 ofport_request=2 -- \
+ add-port br0 p0 -- set Interface p0 type=patch \
+ options:peer=p1 ofport_request=1 -- \
+ set Interface p0 bfd:enable=true bfd:min_tx=300 bfd:min_rx=300 bfd:forwarding_if_rx=true -- \
+ set Interface p1 bfd:enable=true bfd:min_tx=500 bfd:min_rx=500])
+
+ovs-appctl time/stop
+# advance the clock, to stablize the states.
+for i in `seq 0 19`; do ovs-appctl time/warp 500; done
+BFD_CHECK([p0], [true], [false], [none], [up], [No Diagnostic], [none], [up], [No Diagnostic])
+BFD_CHECK([p1], [true], [false], [none], [up], [No Diagnostic], [none], [up], [No Diagnostic])
+BFD_CHECK_TX([p0], [500ms], [300ms], [500ms])
+
+# disable the bfd on p1.
+AT_CHECK([ovs-vsctl set Interface p1 bfd:enable=false], [0])
+
+# advance clock by 4000ms, while receiving packets.
+# the STATE should go DOWN, due to Control Detection Time Expired.
+# but forwarding flag should be still true.
+for i in `seq 0 7`
+do
+ ovs-appctl time/warp 500
+ AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
+ [0], [stdout], [])
+done
+BFD_CHECK([p0], [true], [false], [none], [down], [Control Detection Time Expired], [none], [down], [No Diagnostic])
+
+# advance clock long enough to trigger the demand_bfd_rx interval
+# (100 * bfd->cfm_min_rx), forwarding flag should go down since there
+# is no bfd control packet received during the demand_rx_bfd.
+for i in `seq 0 120`
+do
+ ovs-appctl time/warp 300
+ AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
+ [0], [stdout], [])
+done
+BFD_CHECK([p0], [false], [false], [none], [down], [Control Detection Time Expired], [none], [down], [No Diagnostic])
+
+# now enable the bfd on p1 again.
+AT_CHECK([ovs-vsctl set Interface p1 bfd:enable=true], [0])
+# advance clock by 5000ms. and p1 and p0 should be all up.
+for i in `seq 0 9`; do ovs-appctl time/warp 500; done
+BFD_CHECK([p0], [true], [false], [none], [up], [Control Detection Time Expired], [none], [up], [No Diagnostic])
+BFD_CHECK([p1], [true], [false], [none], [up], [No Diagnostic], [none], [up], [Control Detection Time Expired])
+BFD_CHECK_TX([p0], [500ms], [300ms], [500ms])
+
+# disable the bfd on p1 again.
+AT_CHECK([ovs-vsctl set Interface p1 bfd:enable=false], [0])
+# advance clock long enough to trigger the demand_rx_bfd,
+# forwarding flag should go down since there is no bfd control packet
+# received during the demand_rx_bfd.
+for i in `seq 0 120`
+do
+ ovs-appctl time/warp 300
+ AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
+ [0], [stdout], [])
+done
+BFD_CHECK([p0], [false], [false], [none], [down], [Control Detection Time Expired], [none], [down], [No Diagnostic])
+
+AT_CHECK([ovs-vsctl del-br br1], [0], [ignore])
+AT_CLEANUP
+
# test bfd:flap_count.
# This test contains three part:
# part 1. tests the flap_count on normal bfd monitored link.
# Part-3 now turn on forwarding_if_rx.
AT_CHECK([ovs-vsctl set Interface p0 bfd:forwarding_if_rx=true], [0])
+for i in `seq 0 10`; do ovs-appctl time/warp 100; done
# disable the bfd on p1.
AT_CHECK([ovs-vsctl set Interface p1 bfd:enable=false], [0])
# flap_count should remain unchanged.
BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["5"])
-# stop the traffic for 4000ms, the forwarding flag of p0 should turn false.
+# stop the traffic for more than 100 * bfd->cfm_min_rx ms, the forwarding flag of p0 should turn false.
# and there should be the increment of flap_count.
-for i in `seq 0 7`; do ovs-appctl time/warp 500; done
+for i in `seq 0 120`; do ovs-appctl time/warp 100; done
BFD_CHECK([p0], [false], [false], [none], [down], [Control Detection Time Expired], [none], [down], [No Diagnostic])
BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["6"])
AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
[0], [stdout], [])
done
-BFD_CHECK([p0], [true], [false], [none], [down], [Control Detection Time Expired], [none], [down], [No Diagnostic])
-# flap_count should be incremented again.
-BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["7"])
-
-# stop the traffic for 4000ms, the forwarding flag of p0 should turn false.
-# and there should be the increment of flap_count.
-for i in `seq 0 7`; do ovs-appctl time/warp 500; done
+# forwarding should be false, since there is still no bfd control packet received.
BFD_CHECK([p0], [false], [false], [none], [down], [Control Detection Time Expired], [none], [down], [No Diagnostic])
-BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["8"])
+BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["6"])
# turn on the bfd on p1.
AT_CHECK([ovs-vsctl set interface p1 bfd:enable=true])
for i in `seq 0 49`; do ovs-appctl time/warp 100; done
# even though there is no data traffic, since p1 bfd is on again, should increment the flap_count.
-BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["9"])
+BFD_VSCTL_LIST_IFACE([p0], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["7"])
BFD_VSCTL_LIST_IFACE([p1], ["s/^.*flap_count=\(.*\), forwarding.*$/\1/p"], ["1"])
OVS_VSWITCHD_STOP
])
])
+m4_define([CFM_CHECK_EXTENDED_FAULT], [
+AT_CHECK([ovs-appctl cfm/show $1 | sed -e '/next CCM tx:/d' | sed -e '/next fault check:/d' | sed -e '/recv since check:/d'],[0],
+[dnl
+---- $1 ----
+MPID $2: extended
+ fault: $3
+ average health: $4
+ opstate: $5
+ remote_opstate: $6
+ interval: $7
+])
+])
+
m4_define([CFM_VSCTL_LIST_IFACE], [
AT_CHECK([ovs-vsctl list interface $1 | sed -n '/$2/p'],[0],
[dnl
OVS_VSWITCHD_STOP
AT_CLEANUP
+# test demand_rx_ccm under demand mode.
+AT_SETUP([cfm - demand_rx_ccm])
+#Create 2 bridges connected by patch ports and enable cfm
+OVS_VSWITCHD_START([add-br br1 -- \
+ set bridge br1 datapath-type=dummy \
+ other-config:hwaddr=aa:55:aa:56:00:00 -- \
+ add-port br1 p1 -- set Interface p1 type=patch \
+ options:peer=p0 ofport_request=2 -- \
+ add-port br0 p0 -- set Interface p0 type=patch \
+ options:peer=p1 ofport_request=1 -- \
+ set Interface p0 cfm_mpid=1 other_config:cfm_interval=300 other_config:cfm_extended=true other_config:cfm_demand=true -- \
+ set Interface p1 cfm_mpid=2 other_config:cfm_interval=300 other_config:cfm_extended=true other_config:cfm_demand=true])
+
+ovs-appctl time/stop
+# wait for a while to stablize cfm. (need a longer time, since in demand mode
+# the fault interval is (MAX(ccm_interval_ms, 500) * 3.5) ms)
+for i in `seq 0 200`; do ovs-appctl time/warp 100; done
+CFM_CHECK_EXTENDED([p0], [1], [100], [up], [up], [300ms], [2], [up])
+CFM_CHECK_EXTENDED([p1], [2], [100], [up], [up], [300ms], [1], [up])
+
+# turn off the cfm on p1.
+AT_CHECK([ovs-vsctl clear Interface p1 cfm_mpid])
+# cfm should never go down while receiving data packets.
+for i in `seq 0 200`
+do
+ ovs-appctl time/warp 100
+ AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
+ [0], [stdout], [])
+done
+CFM_CHECK_EXTENDED([p0], [1], [0], [up], [up], [300ms], [2], [up])
+
+# wait longer, since the demand_rx_ccm interval is 100 * 300 ms.
+# since there is no ccm received, the [recv] fault should be raised.
+for i in `seq 0 200`
+do
+ ovs-appctl time/warp 100
+ AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
+ [0], [stdout], [])
+done
+CFM_CHECK_EXTENDED_FAULT([p0], [1], [recv], [0], [up], [up], [300ms])
+
+# now turn on the cfm on p1 again,
+AT_CHECK([ovs-vsctl set Interface p1 cfm_mpid=2])
+# cfm should be up for both p0 and p1
+for i in `seq 0 200`; do ovs-appctl time/warp 100; done
+CFM_CHECK_EXTENDED([p0], [1], [100], [up], [up], [300ms], [2], [up])
+CFM_CHECK_EXTENDED([p1], [2], [100], [up], [up], [300ms], [1], [up])
+
+# now turn off the cfm on p1 again
+AT_CHECK([ovs-vsctl clear Interface p1 cfm_mpid])
+# since there is no ccm received, the [recv] fault should be raised.
+for i in `seq 0 400`
+do
+ ovs-appctl time/warp 100
+ AT_CHECK([ovs-ofctl packet-out br1 3 2 "90e2ba01475000101856b2e80806000108000604000100101856b2e80202020300000000000002020202"],
+ [0], [stdout], [])
+done
+CFM_CHECK_EXTENDED_FAULT([p0], [1], [recv], [0], [up], [up], [300ms])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
# test cfm_flap_count.
AT_SETUP([cfm - flap_count])
#Create 2 bridges connected by patch ports and enable cfm
])
AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=10.1.2.15,nw_proto=6,nw_tos=0,nw_ttl=128,tp_src=8,tp_dst=79'], [0], [stdout])
AT_CHECK([tail -2 stdout], [0],
- [Megaflow: recirc_id=0,skb_priority=0,tcp,in_port=1,nw_dst=10.1.2.15,nw_frag=no,tp_dst=79
+ [Megaflow: recirc_id=0,skb_priority=0,tcp,in_port=1,nw_dst=10.1.2.15,nw_frag=no,tp_dst=0x40/0xfff0
Datapath actions: 2
])
OVS_VSWITCHD_STOP
table=0 in_port=1 priority=16,tcp,nw_dst=10.1.0.0/255.255.0.0,action=output(3)
table=0 in_port=1 priority=32,tcp,nw_dst=10.1.2.0/255.255.255.0,tp_src=79,action=output(2)
table=0 in_port=1 priority=33,tcp,nw_dst=10.1.2.15,tp_dst=80,action=drop
+table=0 in_port=1 priority=33,tcp,nw_dst=10.1.2.15,tp_dst=8080,action=output(2)
+table=0 in_port=1 priority=33,tcp,nw_dst=10.1.2.15,tp_dst=192,action=output(2)
table=0 in_port=1 priority=0,ip,action=drop
table=0 in_port=2 priority=16,tcp,nw_dst=192.168.0.0/255.255.0.0,action=output(1)
table=0 in_port=2 priority=0,ip,action=drop
])
AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=10.1.2.15,nw_proto=6,nw_tos=0,nw_ttl=128,tp_src=8,tp_dst=79'], [0], [stdout])
AT_CHECK([tail -2 stdout], [0],
- [Megaflow: recirc_id=0,skb_priority=0,tcp,in_port=1,nw_dst=10.1.2.15,nw_frag=no,tp_src=8,tp_dst=79
+ [Megaflow: recirc_id=0,skb_priority=0,tcp,in_port=1,nw_dst=10.1.2.15,nw_frag=no,tp_src=0x0/0xffc0,tp_dst=0x40/0xfff0
Datapath actions: 3
])
OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"])
set(ipv4(src=35.8.2.41,dst=172.16.0.20,proto=5,tos=0x80,ttl=128,frag=no))
set(tcp(src=80,dst=8080))
set(udp(src=81,dst=6632))
+set(sctp(src=82,dst=6633))
set(icmp(type=1,code=2))
set(ipv6(src=::1,dst=::2,label=0,proto=10,tclass=0x70,hlimit=128,frag=no))
set(icmpv6(type=1,code=2))
05 1e 00 18 00 00 00 0a \
00 00 00 02 02 00 00 00 ff ff ff ff ff ff ff ff \
"], [0], [dnl
-OFPT_ROLE_STATUS (OF 0x05) (xid=0xa): role=master reason=experimenter_data_changed
+OFPT_ROLE_STATUS (OF1.4) (xid=0xa): role=master reason=experimenter_data_changed
])
AT_CLEANUP
05 1e 00 18 00 00 00 0a \
00 00 00 02 01 00 00 00 ff ff ff ff ff ff ff ff \
"], [0], [dnl
-OFPT_ROLE_STATUS (OF 0x05) (xid=0xa): role=master reason=configuration_changed
+OFPT_ROLE_STATUS (OF1.4) (xid=0xa): role=master reason=configuration_changed
])
AT_CLEANUP
05 1e 00 18 00 00 00 0a \
00 00 00 02 01 00 00 00 00 00 00 00 00 00 00 10 \
"], [0], [dnl
-OFPT_ROLE_STATUS (OF 0x05) (xid=0xa): role=master generation_id=16 reason=configuration_changed
+OFPT_ROLE_STATUS (OF1.4) (xid=0xa): role=master generation_id=16 reason=configuration_changed
])
AT_CLEANUP
event=ABBREV xid=0x186a0
])
AT_CLEANUP
+
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REQUEST])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 00 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REQUEST])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 00 00 02 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=OPEN_REQUEST flags=ordered
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REQUEST])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 00 00 03 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic ordered
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REPLY])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 01 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=OPEN_REPLY flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - CLOSE_REQUEST])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 02 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - CLOSE_REPLY])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 03 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=CLOSE_REPLY flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - COMMIT_REQUEST])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 04 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=COMMIT_REQUEST flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - COMMIT_REPLY])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 05 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=COMMIT_REPLY flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - DISCARD_REQUEST])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 06 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=DISCARD_REQUEST flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_CONTROL - DISCARD_REPLY])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 21 00 10 00 00 00 00 \
+00 00 00 01 00 07 00 01 \
+"], [0], [dnl
+OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
+ bundle_id=0x1 type=DISCARD_REPLY flags=atomic
+])
+AT_CLEANUP
+
+AT_SETUP([OFPT_BUNDLE_ADD_MESSAGE - OFPT_HELLO])
+AT_KEYWORDS([ofp-print])
+AT_CHECK([ovs-ofctl ofp-print "\
+05 22 00 20 00 00 00 00 \
+00 00 00 01 00 01 00 01 02 00 00 08 00 00 00 00 \
+00 00 00 00 00 00 00 00 \
+"], [0], [dnl
+OFPT_BUNDLE_ADD_MESSAGE (OF1.4) (xid=0x0):
+ bundle_id=0x1 flags=atomic
+OFPT_HELLO (OF1.1) (xid=0x0):
+ version bitmap: 0x01, 0x02
+])
+AT_CLEANUP
])
AT_CHECK([ovs-ofctl encode-hello 0x3e], [0], [dnl
00000000 05 00 00 08 00 00 00 01-
-OFPT_HELLO (OF 0x05) (xid=0x1):
+OFPT_HELLO (OF1.4) (xid=0x1):
version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05
])
ovs-appctl lacp/show > lacp.txt
ovs-appctl bond/show > bond.txt
(
-for i in `seq 10 100` ;
+for i in `seq 0 255` ;
do
pkt="in_port(7),eth(src=50:54:00:00:00:05,dst=50:54:00:00:01:00),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=$i),tcp_flags(0x010)"
AT_CHECK([ovs-appctl netdev-dummy/receive p7 $pkt])
ovs-appctl netdev-dummy/receive p1 '50 54 00 00 00 07 20 22 22 22 22 22 08 00 45 00 00 24 00 00 00 00 00 84 00 00 C0 A8 00 01 C0 A8 00 02 04 58 08 af 00 00 00 00 d9 d7 91 57 01 00 00 34 cf 28 ec 4e 00 01 40 00 00 0a ff ff b7 53 24 19 00 05 00 08 7f 00 00 01 00 05 00 08 c0 a8 02 07 00 0c 00 06 00 05 00 00 80 00 00 04 c0 00 00 04'
done
-OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 9])
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 18])
OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
AT_CHECK([cat ofctl_monitor.log], [0], [dnl
NXT_PACKET_IN (xid=0x0): cookie=0x1 total_len=98 in_port=1 (via action) data_len=98 (unbuffered)
AT_CHECK([ovs-appctl netdev-dummy/receive p3 'in_port(3),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
AT_CHECK([ovs-appctl dpif/dump-flows br0 | sort | STRIP_USED], [0], [dnl
-skb_priority(0),in_port(1),eth_type(0x0800),ipv4(src=192.168.0.1/0.0.0.0,dst=192.168.0.2/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:drop
-skb_priority(0),in_port(2),eth_type(0x0800),ipv4(src=192.168.0.2/0.0.0.0,dst=192.168.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:drop
+skb_priority(0),recirc_id(0),in_port(1),eth_type(0x0800),ipv4(src=192.168.0.1/0.0.0.0,dst=192.168.0.2/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:drop
+skb_priority(0),recirc_id(0),in_port(2),eth_type(0x0800),ipv4(src=192.168.0.2/0.0.0.0,dst=192.168.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:drop
])
AT_CHECK([ovs-appctl dpif/dump-flows br1 | sort | STRIP_USED], [0], [dnl
-skb_priority(0),in_port(3),eth_type(0x0800),ipv4(src=10.0.0.2/0.0.0.0,dst=10.0.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:drop
+skb_priority(0),recirc_id(0),in_port(3),eth_type(0x0800),ipv4(src=10.0.0.2/0.0.0.0,dst=10.0.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:drop
])
AT_CHECK([ovs-appctl dpif/dump-flows -m br0 | sort | STRIP_USED], [0], [dnl
-skb_priority(0),skb_mark(0/0),in_port(p1),eth(src=50:54:00:00:00:05/00:00:00:00:00:00,dst=50:54:00:00:00:07/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.1/0.0.0.0,dst=192.168.0.2/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:0, bytes:0, used:never, actions:drop
-skb_priority(0),skb_mark(0/0),in_port(p2),eth(src=50:54:00:00:00:07/00:00:00:00:00:00,dst=50:54:00:00:00:05/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.2/0.0.0.0,dst=192.168.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=0/0,code=0/0), packets:0, bytes:0, used:never, actions:drop
+skb_priority(0),skb_mark(0/0),recirc_id(0),in_port(p1),eth(src=50:54:00:00:00:05/00:00:00:00:00:00,dst=50:54:00:00:00:07/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.1/0.0.0.0,dst=192.168.0.2/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:0, bytes:0, used:never, actions:drop
+skb_priority(0),skb_mark(0/0),recirc_id(0),in_port(p2),eth(src=50:54:00:00:00:07/00:00:00:00:00:00,dst=50:54:00:00:00:05/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.2/0.0.0.0,dst=192.168.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=0/0,code=0/0), packets:0, bytes:0, used:never, actions:drop
])
AT_CHECK([ovs-appctl dpif/dump-flows -m br1 | sort | STRIP_USED], [0], [dnl
-skb_priority(0),skb_mark(0/0),in_port(p3),eth(src=50:54:00:00:00:09/00:00:00:00:00:00,dst=50:54:00:00:00:0a/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=10.0.0.2/0.0.0.0,dst=10.0.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:0, bytes:0, used:never, actions:drop
+skb_priority(0),skb_mark(0/0),recirc_id(0),in_port(p3),eth(src=50:54:00:00:00:09/00:00:00:00:00:00,dst=50:54:00:00:00:0a/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=10.0.0.2/0.0.0.0,dst=10.0.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:0, bytes:0, used:never, actions:drop
])
OVS_VSWITCHD_STOP
])
AT_CHECK([cat ovs-vswitchd.log | grep -e 'in_port(100).*packets:9' | FILTER_FLOW_DUMP], [0], [dnl
-skb_priority(0),skb_mark(0/0),in_port(100),eth(src=50:54:00:00:00:05/00:00:00:00:00:00,dst=50:54:00:00:00:07/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.1/0.0.0.0,dst=192.168.0.2/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:9, bytes:540, used:0.0s
+skb_priority(0),skb_mark(0/0),recirc_id(0),in_port(100),eth(src=50:54:00:00:00:05/00:00:00:00:00:00,dst=50:54:00:00:00:07/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.1/0.0.0.0,dst=192.168.0.2/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:9, bytes:540, used:0.0s, actions:101,3,2
])
AT_CHECK([cat ovs-vswitchd.log | grep -e 'in_port(101).*packets:4' | FILTER_FLOW_DUMP], [0], [dnl
-skb_priority(0),skb_mark(0/0),in_port(101),eth(src=50:54:00:00:00:07/00:00:00:00:00:00,dst=50:54:00:00:00:05/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.2/0.0.0.0,dst=192.168.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:4, bytes:240, used:0.0s
+skb_priority(0),skb_mark(0/0),recirc_id(0),in_port(101),eth(src=50:54:00:00:00:07/00:00:00:00:00:00,dst=50:54:00:00:00:05/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=192.168.0.2/0.0.0.0,dst=192.168.0.1/0.0.0.0,proto=1/0,tos=0/0,ttl=64/0,frag=no/0xff),icmp(type=8/0,code=0/0), packets:4, bytes:240, used:0.0s, actions:100,2,3
])
AT_CHECK([ovs-ofctl dump-ports br0 pbr0], [0], [dnl
skb_priority(0),skb_mark(0),in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800),ipv4(src=10.0.0.4,dst=10.0.0.3,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), actions:drop
])
AT_CHECK([cat ovs-vswitchd.log | grep '00:09.*packets:3' | FILTER_FLOW_DUMP], [0], [dnl
-skb_priority(0),skb_mark(0),in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:3, bytes:180, used:0.0s
+skb_priority(0),skb_mark(0),recirc_id(0),dp_hash(0),in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:3, bytes:180, used:0.0s, actions:2
])
AT_CHECK([cat ovs-vswitchd.log | grep '00:0b.*packets:3' | FILTER_FLOW_DUMP], [0], [dnl
-skb_priority(0),skb_mark(0),in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800),ipv4(src=10.0.0.4,dst=10.0.0.3,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:3, bytes:180, used:0.0s
+skb_priority(0),skb_mark(0),recirc_id(0),dp_hash(0),in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800),ipv4(src=10.0.0.4,dst=10.0.0.3,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:3, bytes:180, used:0.0s, actions:drop
])
OVS_VSWITCHD_STOP
AT_CLEANUP
# enable bfd on p0.
AT_CHECK([ovs-vsctl set interface p0 bfd:enable=true])
# check log.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* created\)$/\1/p" ovs-vswitchd.log], [0], [dnl
-monitor thread created
-])
+OVS_WAIT_UNTIL([grep "monitor thread created" ovs-vswitchd.log])
# disable bfd on p0.
AT_CHECK([ovs-vsctl set interface p0 bfd:enable=false])
# check log.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* terminated\)$/\1/p" ovs-vswitchd.log], [0], [dnl
-monitor thread terminated
-])
+OVS_WAIT_UNTIL([grep "monitor thread terminated" ovs-vswitchd.log])
AT_CHECK([cat ovs-vswitchd.log | sed -e '/^.*ofproto_dpif_monitor.*$/d' > ovs-vswitchd.log])
# enable cfm on p0.
AT_CHECK([ovs-vsctl set interface p0 cfm_mpid=10])
# check log.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* created\)$/\1/p" ovs-vswitchd.log], [0], [dnl
-monitor thread created
-])
+OVS_WAIT_UNTIL([grep "monitor thread created" ovs-vswitchd.log])
# disable cfm on p0.
AT_CHECK([ovs-vsctl remove interface p0 cfm_mpid 10])
# check log.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* terminated\)$/\1/p" ovs-vswitchd.log], [0], [dnl
-monitor thread terminated
-])
+OVS_WAIT_UNTIL([grep "monitor thread terminated" ovs-vswitchd.log])
AT_CHECK([cat ovs-vswitchd.log | sed -e '/^.*ofproto_dpif_monitor.*$/d' > ovs-vswitchd.log])
# enable both bfd and cfm on p0.
AT_CHECK([ovs-vsctl set interface p0 bfd:enable=true cfm_mpid=10])
# check log.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* created\)$/\1/p" ovs-vswitchd.log], [0], [dnl
-monitor thread created
-])
+OVS_WAIT_UNTIL([grep "monitor thread created" ovs-vswitchd.log])
# disable bfd on p0.
AT_CHECK([ovs-vsctl set interface p0 bfd:enable=false])
# check log, there should not be the log of thread terminated.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* terminated\)$/\1/p" ovs-vswitchd.log], [0], [dnl
+AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor[[0-9]]*)|INFO|\(.* terminated\)$/\1/p" ovs-vswitchd.log], [0], [dnl
])
# reenable bfd on p0.
AT_CHECK([ovs-vsctl set interface p0 bfd:enable=true])
# check log, should still be on log of thread created.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* created\)$/\1/p" ovs-vswitchd.log], [0], [dnl
+AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor[[0-9]]*)|INFO|\(.* created\)$/\1/p" ovs-vswitchd.log], [0], [dnl
monitor thread created
])
# disable bfd and cfm together.
AT_CHECK([ovs-vsctl set interface p0 bfd:enable=false -- remove interface p0 cfm_mpid 10])
# check log.
-AT_CHECK([sed -n "s/^.*|ofproto_dpif_monitor(monitor)|INFO|\(.* terminated\)$/\1/p" ovs-vswitchd.log], [0], [dnl
-monitor thread terminated
-])
+OVS_WAIT_UNTIL([grep "monitor thread terminated" ovs-vswitchd.log])
OVS_VSWITCHD_STOP
AT_CLEANUP
AT_CHECK([ovs-vsctl --no-wait init])
dnl Start ovs-vswitchd.
- AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --enable-dummy$3 --disable-system --log-file -vvconn -vofproto_dpif], [0], [], [stderr])
+ AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --enable-dummy$3 --disable-system --log-file -vvconn -vofproto_dpif --enable-of14], [0], [], [stderr])
AT_CAPTURE_FILE([ovs-vswitchd.log])
AT_CHECK([[sed < stderr '
/vlog|INFO|opened log file/d
/ofproto|INFO|datapath ID changed to fedcba9876543210/d']])
dnl Add bridges, ports, etc.
- AT_CHECK([ovs-vsctl -- add-br br0 -- set bridge br0 datapath-type=dummy other-config:datapath-id=fedcba9876543210 other-config:hwaddr=aa:55:aa:55:00:00 protocols=[[OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13]] fail-mode=secure -- $1 m4_if([$2], [], [], [| ${PERL} $srcdir/uuidfilt.pl])], [0], [$2])
+ AT_CHECK([ovs-vsctl -- add-br br0 -- set bridge br0 datapath-type=dummy other-config:datapath-id=fedcba9876543210 other-config:hwaddr=aa:55:aa:55:00:00 protocols=[[OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14]] fail-mode=secure -- $1 m4_if([$2], [], [], [| ${PERL} $srcdir/uuidfilt.pl])], [0], [$2])
])
m4_divert_push([PREPARE_TESTS])
/timeval.*faults: [[0-9]]* minor, [[0-9]]* major/d
/timeval.*disk: [[0-9]]* reads, [[0-9]]* writes/d
/timeval.*context switches: [[0-9]]* voluntary, [[0-9]]* involuntary/d
+/ovs_rcu.*blocked [[0-9]]* ms waiting for .* to quiesce/d
/|WARN|/p
/|ERR|/p
/|EMER|/p" ovs-vswitchd.log ovsdb-server.log
queue_size=`expr $rmem_max + 128 \* 1024`
echo rmem_max=$rmem_max queue_size=$queue_size
+# If there's too much queuing skip the test to avoid timing out.
+AT_SKIP_IF([test $rmem_max -gt 1048576])
+
# Each flow update message takes up at least 48 bytes of space in queues
# and in practice more than that.
n_msgs=`expr $queue_size / 48`
done
OVS_VSWITCHD_STOP
AT_CLEANUP
+
+
+AT_SETUP([ofproto - bundles, open (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Send an OpenFlow14 message (05), OFPT_BUNDLE_CONTROL (21), length (10), xid (0a)
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 00 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REPLY flags=0
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundles, double open (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Send twice an OpenFlow14 message (05), OFPT_BUNDLE_CONTROL (21), length (10), xid (0a)
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 00 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 00 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REPLY flags=0
+OFPT_BARRIER_REPLY (OF1.4):
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_ERROR (OF1.4): OFPBFC_BAD_ID
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundle close without open (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 02 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=atomic
+OFPT_ERROR (OF1.4): OFPBFC_BAD_ID
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=atomic
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundle double close (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Open, Close, Close
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 00 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 02 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 02 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REPLY flags=0
+OFPT_BARRIER_REPLY (OF1.4):
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=atomic
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REPLY flags=0
+OFPT_BARRIER_REPLY (OF1.4):
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=atomic
+OFPT_ERROR (OF1.4): OFPBFC_BUNDLE_CLOSED
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=atomic
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundle close, different flags (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Open, Close, Close
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 00 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 02 00 02"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REPLY flags=0
+OFPT_BARRIER_REPLY (OF1.4):
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=ordered
+OFPT_ERROR (OF1.4): OFPBFC_BAD_FLAGS
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=CLOSE_REQUEST flags=ordered
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundle commit without open (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Open, Close, Close
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 04 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=COMMIT_REQUEST flags=atomic
+OFPT_ERROR (OF1.4): OFPBFC_BAD_ID
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=COMMIT_REQUEST flags=atomic
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundle commit, different flags (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Open, Close, Close
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 00 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 04 00 02"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REQUEST flags=atomic
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=OPEN_REPLY flags=0
+OFPT_BARRIER_REPLY (OF1.4):
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=COMMIT_REQUEST flags=ordered
+OFPT_ERROR (OF1.4): OFPBFC_BAD_FLAGS
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=COMMIT_REQUEST flags=ordered
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto - bundle discard without open (OpenFlow 1.4)])
+AT_KEYWORDS([monitor])
+OVS_VSWITCHD_START
+
+# Start a monitor, use the required protocol version
+ovs-ofctl -O OpenFlow14 monitor br0 --detach --no-chdir --pidfile >monitor.log 2>&1
+AT_CAPTURE_FILE([monitor.log])
+
+# Open, Close, Close
+ovs-appctl -t ovs-ofctl ofctl/send "05 21 00 10 00 00 00 0a 00 00 00 01 00 06 00 01"
+ovs-appctl -t ovs-ofctl ofctl/barrier
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([ofctl_strip < monitor.log], [0], [dnl
+send: OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=DISCARD_REQUEST flags=atomic
+OFPT_ERROR (OF1.4): OFPBFC_BAD_ID
+OFPT_BUNDLE_CONTROL (OF1.4):
+ bundle_id=0x1 type=DISCARD_REQUEST flags=atomic
+OFPT_BARRIER_REPLY (OF1.4):
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
dnl ovs-vswitchd detached OK or we wouldn't have made it this far. Success.
AT_CLEANUP
+
+
+dnl ----------------------------------------------------------------------
+m4_define([OVS_VSCTL_CHECK_RX_PKT], [
+AT_CHECK([ovs-vsctl list int $1 | grep statistics | sed -n 's/^.*\(rx_packets=[[0-9]]\+\).*$/\1/p'],[0],
+[dnl
+rx_packets=$2
+])
+])
+
+AT_SETUP([ovs-vswitchd -- stats-update-interval])
+OVS_VSWITCHD_START([add-port br0 p1 -- set int p1 type=internal])
+ovs-appctl time/stop
+
+dnl at the beginning, the udpate of rx_packets should happen every 5 seconds.
+for i in `seq 0 10`; do ovs-appctl time/warp 1000; done
+OVS_VSCTL_CHECK_RX_PKT([p1], [0])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+for i in `seq 0 10`; do ovs-appctl time/warp 1000; done
+OVS_VSCTL_CHECK_RX_PKT([p1], [1])
+
+dnl set the stats update interval to 100K ms, the following 'recv' should not be updated.
+AT_CHECK([ovs-vsctl set O . other_config:stats-update-interval=100000])
+for i in `seq 0 50`; do ovs-appctl time/warp 1000; done
+for i in `seq 1 5`; do
+ AT_CHECK([ovs-appctl netdev-dummy/receive p1 'eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+done
+
+OVS_VSCTL_CHECK_RX_PKT([p1], [1])
+dnl advance the clock by 100K ms, the previous 'recv' should be updated.
+for i in `seq 0 99`; do ovs-appctl time/warp 1000; done
+OVS_VSCTL_CHECK_RX_PKT([p1], [6])
+
+dnl now remove the configuration. 'recv' one packet. there should be an update after 5000 ms.
+AT_CHECK([ovs-vsctl clear O . other_config])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+for i in `seq 0 10`; do ovs-appctl time/warp 1000; done
+OVS_VSCTL_CHECK_RX_PKT([p1], [7])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
\ No newline at end of file
n_iterations=`expr $rmem_max / 25000 + 5`
echo rmem_max=$rmem_max n_iterations=$n_iterations
+# If there's too much queuing skip the test to avoid timing out.
+AT_SKIP_IF([test $rmem_max -gt 1048576])
+
# Calculate the exact number of monitor updates expected for $n_iterations,
# assuming no updates are combined. The "extra" update is for the initial
# contents of the database.
# Add bridges for Ryu to use, and configure them to connect to Ryu.
for config in \
- 'br0 0000000000000001 a c b d' \
- 'br1 0000000000000002 c a d b'
+ 'br0 0000000000000001 a b pstream=punix' \
+ 'br1 0000000000000002 c d stream=unix'
do
set $config
- bridge=$1 dpid=$2 port1=$3 peer1=$4 port2=$5 peer2=$6
+ bridge=$1 dpid=$2 port1=$3 port2=$4 stream_mode=$5
run ovs-vsctl --no-wait \
-- add-br $bridge \
-- set bridge $bridge \
-- set controller $bridge connection-mode=out-of-band \
max-backoff=1000 \
-- add-port $bridge $port1 \
- -- set interface $port1 ofport_request=1 type=patch options:peer=$peer1 \
+ -- set interface $port1 ofport_request=1 type=dummy \
+ options:${stream_mode}:"$sandbox"/p1.sock \
-- add-port $bridge $port2 \
- -- set interface $port2 ofport_request=2 type=patch options:peer=$peer2
+ -- set interface $port2 ofport_request=2 type=dummy \
+ options:${stream_mode}:"$sandbox"/p2.sock
done
logs=
*/
#include <config.h>
-#include "classifier.h"
#include <errno.h>
#include <limits.h>
#include "byte-order.h"
#undef NDEBUG
#include <assert.h>
+/* We need access to classifier internal definitions to be able to fully
+ * test them. The alternative would be to expose them all in the classifier
+ * API. */
+#include "classifier.c"
+
/* Fields in a rule. */
#define CLS_FIELDS \
/* struct flow all-caps */ \
int found_dups = 0;
int found_rules2 = 0;
- HMAP_FOR_EACH (table, hmap_node, &cls->subtables) {
- const struct cls_rule *head;
+ HMAP_FOR_EACH (table, hmap_node, &cls->cls->subtables) {
+ const struct cls_match *head;
unsigned int max_priority = 0;
unsigned int max_count = 0;
found_tables++;
HMAP_FOR_EACH (head, hmap_node, &table->rules) {
unsigned int prev_priority = UINT_MAX;
- const struct cls_rule *rule;
+ const struct cls_match *rule;
if (head->priority > max_priority) {
max_priority = head->priority;
prev_priority = rule->priority;
found_rules++;
found_dups++;
- assert(classifier_find_rule_exactly(cls, rule) == rule);
+ assert(classifier_find_rule_exactly(cls, rule->cls_rule)
+ == rule->cls_rule);
}
}
assert(table->max_priority == max_priority);
assert(table->max_count == max_count);
}
- assert(found_tables == hmap_count(&cls->subtables));
- assert(n_tables == -1 || n_tables == hmap_count(&cls->subtables));
+ assert(found_tables == hmap_count(&cls->cls->subtables));
+ assert(n_tables == -1 || n_tables == hmap_count(&cls->cls->subtables));
assert(n_rules == -1 || found_rules == n_rules);
assert(n_dups == -1 || found_dups == n_dups);
compare_classifiers(&cls, &tcls);
}
- fat_rwlock_unlock(&cls.rwlock);
- classifier_destroy(&cls);
- tcls_destroy(&tcls);
-
for (i = 0; i < N_RULES; i++) {
+ if (rules[i]->cls_rule.cls_match) {
+ classifier_remove(&cls, &rules[i]->cls_rule);
+ }
free_rule(rules[i]);
}
+
+ fat_rwlock_unlock(&cls.rwlock);
+ classifier_destroy(&cls);
+ tcls_destroy(&tcls);
} while (next_permutation(ops, ARRAY_SIZE(ops)));
assert(n_permutations == (factorial(N_RULES * 2) >> N_RULES));
}
/* Check that the flow equals its miniflow. */
assert(miniflow_get_vid(&miniflow) == vlan_tci_to_vid(flow.vlan_tci));
for (i = 0; i < FLOW_U32S; i++) {
- assert(miniflow_get(&miniflow, i) == flow_u32[i]);
+ assert(MINIFLOW_GET_TYPE(&miniflow, uint32_t, i * 4)
+ == flow_u32[i]);
}
/* Check that the miniflow equals itself. */
/* Convert cls_rule back to odp_key. */
ofpbuf_uninit(&odp_key);
ofpbuf_init(&odp_key, 0);
- odp_flow_key_from_flow(&odp_key, &flow, flow.in_port.odp_port);
+ odp_flow_key_from_flow(&odp_key, &flow, NULL,
+ flow.in_port.odp_port);
if (ofpbuf_size(&odp_key) > ODPUTIL_FLOW_KEY_BYTES) {
printf ("too long: %"PRIu32" > %d\n",
#include <config.h>
+#include <arpa/inet.h>
#include <errno.h>
#include <getopt.h>
#include <signal.h>
if (retval == sizeof hello) {
enum ofpraw raw;
- CHECK(hello.version, OFP10_VERSION);
+ CHECK(hello.version, OFP13_VERSION);
CHECK(ofpraw_decode_partial(&raw, &hello, sizeof hello), 0);
CHECK(raw, OFPRAW_OFPT_HELLO);
CHECK(ntohs(hello.length), sizeof hello);
if (retval == sizeof hello) {
enum ofpraw raw;
- CHECK(hello.version, OFP10_VERSION);
+ CHECK(hello.version, OFP13_VERSION);
CHECK(ofpraw_decode_partial(&raw, &hello, sizeof hello), 0);
CHECK(raw, OFPRAW_OFPT_HELLO);
CHECK(ntohs(hello.length), sizeof hello);
const char *type = argv[1];
struct ofpbuf *hello;
- hello = ofpraw_alloc_xid(OFPRAW_OFPT_HELLO, OFP10_VERSION,
+ hello = ofpraw_alloc_xid(OFPRAW_OFPT_HELLO, OFP13_VERSION,
htonl(0x12345678), 0);
test_send_hello(type, ofpbuf_data(hello), ofpbuf_size(hello), 0);
ofpbuf_delete(hello);
struct ofpbuf *hello;
enum { EXTRA_BYTES = 8 };
- hello = ofpraw_alloc_xid(OFPRAW_OFPT_HELLO, OFP10_VERSION,
+ hello = ofpraw_alloc_xid(OFPRAW_OFPT_HELLO, OFP13_VERSION,
htonl(0x12345678), EXTRA_BYTES);
ofpbuf_put_zeros(hello, EXTRA_BYTES);
ofpmsg_update_length(hello);
const char *type = argv[1];
struct ofpbuf *echo;
- echo = ofpraw_alloc_xid(OFPRAW_OFPT_ECHO_REQUEST, OFP10_VERSION,
+ echo = ofpraw_alloc_xid(OFPRAW_OFPT_ECHO_REQUEST, OFP13_VERSION,
htonl(0x12345678), 0);
test_send_hello(type, ofpbuf_data(echo), ofpbuf_size(echo), EPROTO);
ofpbuf_delete(echo);
const char *type = argv[1];
struct ofpbuf *hello;
- hello = ofpraw_alloc_xid(OFPRAW_OFPT_HELLO, OFP10_VERSION,
+ hello = ofpraw_alloc_xid(OFPRAW_OFPT_HELLO, OFP13_VERSION,
htonl(0x12345678), 0);
((struct ofp_header *) ofpbuf_data(hello))->version = 0;
test_send_hello(type, ofpbuf_data(hello), ofpbuf_size(hello), EPROTO);
set `expr $1 + ${3-1}` $2 $3
done
}
+
+if test "$IS_WIN32" = "yes"; then
+ pwd () {
+ command pwd -W "$@"
+ }
+
+ diff () {
+ command diff --strip-trailing-cr "$@"
+ }
+
+ kill () {
+ case "$1" in
+ -0)
+ shift
+ for i in $*; do
+ # tasklist will always have return code 0.
+ # If pid does exist, there will be a line with the pid.
+ if tasklist //fi "PID eq $i" | grep $i; then
+ :
+ else
+ return 1
+ fi
+ done
+ return 0
+ ;;
+ -[1-9]*)
+ shift
+ for i in $*; do
+ taskkill //F //PID $i
+ done
+ ;;
+ [1-9][1-9]*)
+ for i in $*; do
+ taskkill //F //PID $i
+ done
+ ;;
+ esac
+ }
+fi
]
m4_divert_pop([PREPARE_TESTS])
AT_CHECK([$PYTHON $srcdir/test-vlog.py --log-file log_file \
-v dbg module_1:info module_2:warn syslog:off 2>stderr_log])
-AT_CHECK([diff log_file stderr_log])
-
-AT_CHECK([sed -e 's/.*-.*-.*T..:..:..\....Z|//' \
+AT_CHECK([sed -e 's/.*-.*-.*T..:..:..Z |//' \
-e 's/File ".*", line [[0-9]][[0-9]]*,/File <name>, line <number>,/' \
stderr_log], [0], [dnl
-0|module_0|EMER|emergency
-1|module_0|ERR|error
-2|module_0|WARN|warning
-3|module_0|INFO|information
-4|module_0|DBG|debug
-5|module_0|EMER|emergency exception
+ 0 | module_0 | EMER | emergency
+ 1 | module_0 | ERR | error
+ 2 | module_0 | WARN | warning
+ 3 | module_0 | INFO | information
+ 4 | module_0 | DBG | debug
+ 5 | module_0 | EMER | emergency exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-6|module_0|ERR|error exception
+ 6 | module_0 | ERR | error exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-7|module_0|WARN|warn exception
+ 7 | module_0 | WARN | warn exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-8|module_0|INFO|information exception
+ 8 | module_0 | INFO | information exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-9|module_0|DBG|debug exception
+ 9 | module_0 | DBG | debug exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-10|module_0|ERR|exception
+ 10 | module_0 | ERR | exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-11|module_1|EMER|emergency
-12|module_1|ERR|error
-13|module_1|WARN|warning
-14|module_1|INFO|information
-16|module_1|EMER|emergency exception
+ 11 | module_1 | EMER | emergency
+ 12 | module_1 | ERR | error
+ 13 | module_1 | WARN | warning
+ 14 | module_1 | INFO | information
+ 16 | module_1 | EMER | emergency exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-17|module_1|ERR|error exception
+ 17 | module_1 | ERR | error exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-18|module_1|WARN|warn exception
+ 18 | module_1 | WARN | warn exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-19|module_1|INFO|information exception
+ 19 | module_1 | INFO | information exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-21|module_1|ERR|exception
+ 21 | module_1 | ERR | exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-22|module_2|EMER|emergency
-23|module_2|ERR|error
-24|module_2|WARN|warning
-27|module_2|EMER|emergency exception
+ 22 | module_2 | EMER | emergency
+ 23 | module_2 | ERR | error
+ 24 | module_2 | WARN | warning
+ 27 | module_2 | EMER | emergency exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-28|module_2|ERR|error exception
+ 28 | module_2 | ERR | error exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-29|module_2|WARN|warn exception
+ 29 | module_2 | WARN | warn exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AssertionError
-32|module_2|ERR|exception
+ 32 | module_2 | ERR | exception
Traceback (most recent call last):
File <name>, line <number>, in main
assert fail
AT_CHECK([APPCTL -t test-unixctl.py exit])
AT_CHECK([sed 's/.*|//' log.old], [0], [dnl
-Entering run loop.
-message
-message2
+ Entering run loop.
+ message
+ message2
])
AT_CHECK([sed 's/.*|//' log], [0], [dnl
-message3
+ message3
])
AT_CLEANUP
AT_CHECK([APPCTL -t test-unixctl.py log message3])
AT_CHECK([APPCTL -t test-unixctl.py exit])
AT_CHECK([sed 's/.*|//' log.old], [0], [dnl
-Entering run loop.
-message
+ Entering run loop.
+ message
])
AT_CHECK([sed 's/.*|//' log], [0], [dnl
-message3
+ message3
])
AT_CLEANUP
test-unixctl info info dbg
unixctl_server info info dbg
])
+
+AT_CHECK([APPCTL -t test-unixctl.py vlog/set pattern], [0],
+ [Please supply a valid pattern and facility
+])
+AT_CHECK([APPCTL -t test-unixctl.py vlog/set pattern:nonexistent], [0],
+ [Facility nonexistent does not exist
+])
+AT_CHECK([APPCTL -t test-unixctl.py vlog/set pattern:file:'I<3OVS|%m'])
+AT_CHECK([APPCTL -t test-unixctl.py log patterntest])
+AT_CHECK([grep -q 'I<3OVS' log])
AT_CLEANUP
man_MANS += \
utilities/ovs-appctl.8 \
utilities/ovs-benchmark.1 \
+ utilities/ovs-ctl.8 \
utilities/ovs-dpctl.8 \
utilities/ovs-dpctl-top.8 \
utilities/ovs-l3ping.8 \
utilities/ovs-test.8 \
utilities/ovs-vlan-test.8 \
utilities/ovs-vsctl.8
-dist_man_MANS += utilities/ovs-ctl.8
utilities_ovs_appctl_SOURCES = utilities/ovs-appctl.c
utilities_ovs_appctl_LDADD = lib/libopenvswitch.la
/*
- * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013 Nicira, Inc.
+ * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
uint32_t versions;
enum ofputil_protocol version_protocols;
+ /* For now, ovs-ofctl only enables OpenFlow 1.0 by default. This is
+ * because ovs-ofctl implements command such as "add-flow" as raw OpenFlow
+ * requests, but those requests have subtly different semantics in
+ * different OpenFlow versions. For example:
+ *
+ * - In OpenFlow 1.0, a "mod-flow" operation that does not find any
+ * existing flow to modify adds a new flow.
+ *
+ * - In OpenFlow 1.1, a "mod-flow" operation that does not find any
+ * existing flow to modify adds a new flow, but only if the mod-flow
+ * did not match on the flow cookie.
+ *
+ * - In OpenFlow 1.2 and a later, a "mod-flow" operation never adds a
+ * new flow.
+ */
+ set_allowed_ofp_versions("OpenFlow10");
+
for (;;) {
unsigned long int timeout;
int c;
ds_destroy(&s);
for (i = 0; i < n_fses; i++) {
- free(fses[i].ofpacts);
+ free(CONST_CAST(struct ofpact *, fses[i].ofpacts));
}
free(fses);
struct ofputil_flow_mod *fm = &fms[i];
transact_noreply(vconn, ofputil_encode_flow_mod(fm, protocol));
- free(fm->ofpacts);
+ free(CONST_CAST(struct ofpact *, fm->ofpacts));
}
vconn_close(vconn);
}
fte_version_free(struct fte_version *version)
{
if (version) {
- free(version->ofpacts);
+ free(CONST_CAST(struct ofpact *, version->ofpacts));
free(version);
}
}
ofp_print(stdout, ofpbuf_data(msg), ofpbuf_size(msg), verbosity);
ofpbuf_delete(msg);
- free(fm->ofpacts);
+ free(CONST_CAST(struct ofpact *, fm->ofpacts));
}
}
char *name; /* Host network device name. */
struct netdev *netdev; /* Network device. */
ofp_port_t ofp_port; /* OpenFlow port number. */
+ uint64_t change_seq;
/* These members are valid only within bridge_reconfigure(). */
const char *type; /* Usually same as cfg->type. */
/* Track changes to port connectivity. */
static uint64_t connectivity_seqno = LLONG_MIN;
+/* Status update to database.
+ *
+ * Some information in the database must be kept as up-to-date as possible to
+ * allow controllers to respond rapidly to network outages. Those status are
+ * updated via the 'status_txn'.
+ *
+ * We use the global connectivity sequence number to detect the status change.
+ * Also, to prevent the status update from sending too much to the database,
+ * we check the return status of each update transaction and do not start new
+ * update if the previous transaction status is 'TXN_INCOMPLETE'.
+ *
+ * 'statux_txn' is NULL if there is no ongoing status update.
+ */
+static struct ovsdb_idl_txn *status_txn;
+
+/* When the status update transaction returns 'TXN_INCOMPLETE', should register a
+ * timeout in 'STATUS_CHECK_AGAIN_MSEC' to check again. */
+#define STATUS_CHECK_AGAIN_MSEC 100
+
/* Each time this timer expires, the bridge fetches interface and mirror
* statistics and pushes them into the database. */
-#define IFACE_STATS_INTERVAL (5 * 1000) /* In milliseconds. */
-static long long int iface_stats_timer = LLONG_MIN;
+static int stats_timer_interval;
+static long long int stats_timer = LLONG_MIN;
/* Set to true to allow experimental use of OpenFlow 1.4.
* This is false initially because OpenFlow 1.4 is not yet safe to use: it can
static const char *iface_get_type(const struct ovsrec_interface *,
const struct ovsrec_bridge *);
static void iface_destroy(struct iface *);
+static void iface_destroy__(struct iface *);
static struct iface *iface_lookup(const struct bridge *, const char *name);
static struct iface *iface_find(const char *name);
static struct iface *iface_from_ofp_port(const struct bridge *,
static void iface_configure_cfm(struct iface *);
static void iface_refresh_cfm_stats(struct iface *);
static void iface_refresh_stats(struct iface *);
-static void iface_refresh_status(struct iface *);
+static void iface_refresh_netdev_status(struct iface *);
+static void iface_refresh_ofproto_status(struct iface *);
static bool iface_is_synthetic(const struct iface *);
static ofp_port_t iface_get_requested_ofp_port(
const struct ovsrec_interface *);
bridge_configure_stp(br);
bridge_configure_tables(br);
bridge_configure_dp_desc(br);
-
- if (smap_get(&br->cfg->other_config, "flow-eviction-threshold")) {
- /* XXX: Remove this warning message eventually. */
- VLOG_WARN_ONCE("As of June 2013, flow-eviction-threshold has been"
- " moved to the Open_vSwitch table. Ignoring its"
- " setting in the bridge table.");
- }
}
free(managers);
struct ofproto_port ofproto_port;
struct ofproto_port_dump dump;
+ struct sset ofproto_ports;
+ struct port *port, *port_next;
+
/* List of "ofp_port"s to delete. We make a list instead of deleting them
* right away because ofproto implementations aren't necessarily able to
* iterate through a changing list of ports in an entirely robust way. */
del = NULL;
n = allocated = 0;
+ sset_init(&ofproto_ports);
+ /* Main task: Iterate over the ports in 'br->ofproto' and remove the ports
+ * that are not configured in the database. (This commonly happens when
+ * ports have been deleted, e.g. with "ovs-vsctl del-port".)
+ *
+ * Side tasks: Reconfigure the ports that are still in 'br'. Delete ports
+ * that have the wrong OpenFlow port number (and arrange to add them back
+ * with the correct OpenFlow port number). */
OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) {
ofp_port_t requested_ofp_port;
struct iface *iface;
+ sset_add(&ofproto_ports, ofproto_port.name);
+
iface = iface_lookup(br, ofproto_port.name);
if (!iface) {
/* No such iface is configured, so we should delete this
iface_destroy(iface);
del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated);
}
-
for (i = 0; i < n; i++) {
ofproto_port_del(br->ofproto, del[i]);
}
free(del);
+
+ /* Iterate over this module's idea of interfaces in 'br'. Remove any ports
+ * that we didn't see when we iterated through the datapath, i.e. ports
+ * that disappeared underneath use. This is an unusual situation, but it
+ * can happen in some cases:
+ *
+ * - An admin runs a command like "ovs-dpctl del-port" (which is a bad
+ * idea but could happen).
+ *
+ * - The port represented a device that disappeared, e.g. a tuntap
+ * device destroyed via "tunctl -d", a physical Ethernet device
+ * whose module was just unloaded via "rmmod", or a virtual NIC for a
+ * VM whose VM was just terminated. */
+ HMAP_FOR_EACH_SAFE (port, port_next, hmap_node, &br->ports) {
+ struct iface *iface, *iface_next;
+
+ LIST_FOR_EACH_SAFE (iface, iface_next, port_elem, &port->ifaces) {
+ if (!sset_contains(&ofproto_ports, iface->name)) {
+ iface_destroy__(iface);
+ }
+ }
+
+ if (list_is_empty(&port->ifaces)) {
+ port_destroy(port);
+ }
+ }
+ sset_destroy(&ofproto_ports);
}
static void
/* Populate initial status in database. */
iface_refresh_stats(iface);
- iface_refresh_status(iface);
+ iface_refresh_netdev_status(iface);
/* Add bond fake iface if necessary. */
if (port_is_bond_fake_iface(port)) {
}
static void
-iface_refresh_status(struct iface *iface)
+iface_refresh_netdev_status(struct iface *iface)
{
struct smap smap;
enum netdev_features current;
- int64_t bps;
- int mtu;
- int64_t mtu_64;
+ enum netdev_flags flags;
+ const char *link_state;
uint8_t mac[ETH_ADDR_LEN];
- int64_t ifindex64;
- int error;
+ int64_t bps, mtu_64, ifindex64, link_resets;
+ int mtu, error;
if (iface_is_synthetic(iface)) {
return;
}
+ if (iface->change_seq == netdev_get_change_seq(iface->netdev)) {
+ return;
+ }
+
+ iface->change_seq = netdev_get_change_seq(iface->netdev);
+
smap_init(&smap);
if (!netdev_get_status(iface->netdev, &smap)) {
smap_destroy(&smap);
+ error = netdev_get_flags(iface->netdev, &flags);
+ if (!error) {
+ const char *state = flags & NETDEV_UP ? "up" : "down";
+
+ ovsrec_interface_set_admin_state(iface->cfg, state);
+ } else {
+ ovsrec_interface_set_admin_state(iface->cfg, NULL);
+ }
+
+ link_state = netdev_get_carrier(iface->netdev) ? "up" : "down";
+ ovsrec_interface_set_link_state(iface->cfg, link_state);
+
+ link_resets = netdev_get_carrier_resets(iface->netdev);
+ ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
+
error = netdev_get_features(iface->netdev, ¤t, NULL, NULL, NULL);
bps = !error ? netdev_features_to_bps(current, 0) : 0;
if (bps) {
ovsrec_interface_set_ifindex(iface->cfg, &ifindex64, 1);
}
+static void
+iface_refresh_ofproto_status(struct iface *iface)
+{
+ struct smap smap;
+ int current, error;
+
+ if (iface_is_synthetic(iface)) {
+ return;
+ }
+
+ current = ofproto_port_is_lacp_current(iface->port->bridge->ofproto,
+ iface->ofp_port);
+ if (current >= 0) {
+ bool bl = current;
+ ovsrec_interface_set_lacp_current(iface->cfg, &bl, 1);
+ } else {
+ ovsrec_interface_set_lacp_current(iface->cfg, NULL, 0);
+ }
+
+ iface_refresh_cfm_stats(iface);
+
+ smap_init(&smap);
+ error = ofproto_port_get_bfd_status(iface->port->bridge->ofproto,
+ iface->ofp_port, &smap);
+ if (error >= 0) {
+ ovsrec_interface_set_bfd_status(iface->cfg, &smap);
+ }
+ smap_destroy(&smap);
+}
+
/* Writes 'iface''s CFM statistics to the database. 'iface' must not be
* synthetic. */
static void
{
const struct ovsrec_interface *cfg = iface->cfg;
struct ofproto_cfm_status status;
+ int error;
- if (!ofproto_port_get_cfm_status(iface->port->bridge->ofproto,
- iface->ofp_port, &status)) {
+ error = ofproto_port_get_cfm_status(iface->port->bridge->ofproto,
+ iface->ofp_port, &status);
+ if (error < 0) {
+ /* Do nothing if there is no status change since last update. */
+ } else if (error > 0) {
ovsrec_interface_set_cfm_fault(cfg, NULL, 0);
ovsrec_interface_set_cfm_fault_status(cfg, NULL, 0);
ovsrec_interface_set_cfm_remote_opstate(cfg, NULL);
ofproto_free_ofproto_controller_info(&info);
}
\f
-/* "Instant" stats.
- *
- * Some information in the database must be kept as up-to-date as possible to
- * allow controllers to respond rapidly to network outages. We call these
- * statistics "instant" stats.
- *
- * We wish to update these statistics every INSTANT_INTERVAL_MSEC milliseconds,
- * assuming that they've changed. The only means we have to determine whether
- * they have changed are:
- *
- * - Try to commit changes to the database. If nothing changed, then
- * ovsdb_idl_txn_commit() returns TXN_UNCHANGED, otherwise some other
- * value.
- *
- * - instant_stats_run() is called late in the run loop, after anything that
- * might change any of the instant stats.
- *
- * We use these two facts together to avoid waking the process up every
- * INSTANT_INTERVAL_MSEC whether there is any change or not.
- */
-
-/* Minimum interval between writing updates to the instant stats to the
- * database. */
-#define INSTANT_INTERVAL_MSEC 100
-
-/* Current instant stats database transaction, NULL if there is no ongoing
- * transaction. */
-static struct ovsdb_idl_txn *instant_txn;
-
-/* Next time (in msec on monotonic clock) at which we will update the instant
- * stats. */
-static long long int instant_next_txn = LLONG_MIN;
-
-/* True if the run loop has run since we last saw that the instant stats were
- * unchanged, that is, this is true if we need to wake up at 'instant_next_txn'
- * to refresh the instant stats. */
-static bool instant_stats_could_have_changed;
-
-static void
-instant_stats_run(void)
-{
- enum ovsdb_idl_txn_status status;
-
- instant_stats_could_have_changed = true;
-
- if (!instant_txn) {
- struct bridge *br;
- uint64_t seq;
-
- if (time_msec() < instant_next_txn) {
- return;
- }
- instant_next_txn = time_msec() + INSTANT_INTERVAL_MSEC;
-
- seq = seq_read(connectivity_seq_get());
- if (seq == connectivity_seqno) {
- return;
- }
- connectivity_seqno = seq;
-
- instant_txn = ovsdb_idl_txn_create(idl);
- HMAP_FOR_EACH (br, node, &all_bridges) {
- struct iface *iface;
- struct port *port;
-
- br_refresh_stp_status(br);
-
- HMAP_FOR_EACH (port, hmap_node, &br->ports) {
- port_refresh_stp_status(port);
- }
-
- HMAP_FOR_EACH (iface, name_node, &br->iface_by_name) {
- enum netdev_flags flags;
- struct smap smap;
- const char *link_state;
- int64_t link_resets;
- int current, error;
-
- if (iface_is_synthetic(iface)) {
- continue;
- }
-
- current = ofproto_port_is_lacp_current(br->ofproto,
- iface->ofp_port);
- if (current >= 0) {
- bool bl = current;
- ovsrec_interface_set_lacp_current(iface->cfg, &bl, 1);
- } else {
- ovsrec_interface_set_lacp_current(iface->cfg, NULL, 0);
- }
-
- error = netdev_get_flags(iface->netdev, &flags);
- if (!error) {
- const char *state = flags & NETDEV_UP ? "up" : "down";
- ovsrec_interface_set_admin_state(iface->cfg, state);
- } else {
- ovsrec_interface_set_admin_state(iface->cfg, NULL);
- }
-
- link_state = netdev_get_carrier(iface->netdev) ? "up" : "down";
- ovsrec_interface_set_link_state(iface->cfg, link_state);
-
- link_resets = netdev_get_carrier_resets(iface->netdev);
- ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
-
- iface_refresh_cfm_stats(iface);
-
- smap_init(&smap);
- ofproto_port_get_bfd_status(br->ofproto, iface->ofp_port,
- &smap);
- ovsrec_interface_set_bfd_status(iface->cfg, &smap);
- smap_destroy(&smap);
- }
- }
- }
-
- status = ovsdb_idl_txn_commit(instant_txn);
- if (status != TXN_INCOMPLETE) {
- ovsdb_idl_txn_destroy(instant_txn);
- instant_txn = NULL;
- }
- if (status == TXN_UNCHANGED) {
- instant_stats_could_have_changed = false;
- }
-}
-
-static void
-instant_stats_wait(void)
-{
- if (instant_txn) {
- ovsdb_idl_txn_wait(instant_txn);
- } else if (instant_stats_could_have_changed) {
- poll_timer_wait_until(instant_next_txn);
- }
-}
-\f
static void
bridge_run__(void)
{
bool vlan_splinters_changed;
struct bridge *br;
+ int stats_interval;
ovsrec_open_vswitch_init(&null_cfg);
}
}
+ /* Statistics update interval should always be greater than or equal to
+ * 5000 ms. */
+ stats_interval = MAX(smap_get_int(&cfg->other_config,
+ "stats-update-interval", 5000), 5000);
+ if (stats_timer_interval != stats_interval) {
+ stats_timer_interval = stats_interval;
+ stats_timer = LLONG_MIN;
+ }
+
/* Refresh interface and mirror stats if necessary. */
- if (time_msec() >= iface_stats_timer) {
+ if (time_msec() >= stats_timer) {
if (cfg) {
struct ovsdb_idl_txn *txn;
LIST_FOR_EACH (iface, port_elem, &port->ifaces) {
iface_refresh_stats(iface);
- iface_refresh_status(iface);
}
port_refresh_stp_stats(port);
ovsdb_idl_txn_destroy(txn); /* XXX */
}
- iface_stats_timer = time_msec() + IFACE_STATS_INTERVAL;
+ stats_timer = time_msec() + stats_timer_interval;
+ }
+
+ if (!status_txn) {
+ uint64_t seq;
+
+ /* Check the need to update status. */
+ seq = seq_read(connectivity_seq_get());
+ if (seq != connectivity_seqno) {
+ connectivity_seqno = seq;
+ status_txn = ovsdb_idl_txn_create(idl);
+ HMAP_FOR_EACH (br, node, &all_bridges) {
+ struct port *port;
+
+ br_refresh_stp_status(br);
+ HMAP_FOR_EACH (port, hmap_node, &br->ports) {
+ struct iface *iface;
+
+ port_refresh_stp_status(port);
+ LIST_FOR_EACH (iface, port_elem, &port->ifaces) {
+ iface_refresh_netdev_status(iface);
+ iface_refresh_ofproto_status(iface);
+ }
+ }
+ }
+ }
+ }
+
+ if (status_txn) {
+ enum ovsdb_idl_txn_status status;
+
+ status = ovsdb_idl_txn_commit(status_txn);
+ /* Do not destroy "status_txn" if the transaction is
+ * "TXN_INCOMPLETE". */
+ if (status != TXN_INCOMPLETE) {
+ ovsdb_idl_txn_destroy(status_txn);
+ status_txn = NULL;
+ }
}
run_system_stats();
- instant_stats_run();
}
void
HMAP_FOR_EACH (br, node, &all_bridges) {
ofproto_wait(br->ofproto);
}
- poll_timer_wait_until(iface_stats_timer);
+
+ poll_timer_wait_until(stats_timer);
+ }
+
+ /* If the status database transaction is 'TXN_INCOMPLETE' in this run,
+ * register a timeout in 'STATUS_CHECK_AGAIN_MSEC'. Else, wait on the
+ * global connectivity sequence number. Note, this also helps batch
+ * multiple status changes into one transaction. */
+ if (status_txn) {
+ poll_timer_wait_until(time_msec() + STATUS_CHECK_AGAIN_MSEC);
+ } else {
+ seq_wait(connectivity_seq_get(), connectivity_seqno);
}
system_stats_wait();
- instant_stats_wait();
}
/* Adds some memory usage statistics for bridges into 'usage', for use with
\f
/* Port functions. */
-static void iface_destroy__(struct iface *);
-
static struct port *
port_create(struct bridge *br, const struct ovsrec_port *cfg)
{
ovs_mutex_lock(&mutex);
if (enable) {
if (!started) {
- xpthread_create(NULL, NULL, system_stats_thread_func, NULL);
+ ovs_thread_create("system_stats",
+ system_stats_thread_func, NULL);
latch_init(&latch);
started = true;
}
host as displayed by <code>xe host-list</code>.
</column>
+ <column name="other_config" key="stats-update-interval"
+ type='{"type": "integer", "minInteger": 5000}'>
+ <p>
+ Interval for updating statistics to the database, in milliseconds.
+ This option will affect the update of the <code>statistics</code>
+ column in the following tables: <code>Port</code>, <code>Interface
+ </code>, <code>Mirror</code>.
+ </p>
+ <p>
+ Default value is 5000 ms.
+ </p>
+ <p>
+ Getting statistics more frequently can be achieved via OpenFlow.
+ </p>
+ </column>
+
<column name="other_config" key="flow-restore-wait"
type='{"type": "boolean"}'>
<p>
<column name="protocols">
<p>
- List of OpenFlow protocols that may be used when negotiating a
- connection with a controller. A default value of
- <code>OpenFlow10</code> will be used if this column is empty.
+ List of OpenFlow protocols that may be used when negotiating
+ a connection with a controller. OpenFlow 1.0, 1.1, 1.2, and
+ 1.3 are enabled by default if this column is empty.
</p>
<p>
<group title="Port Statistics">
<p>
- Key-value pairs that report port statistics.
+ Key-value pairs that report port statistics. The update period
+ is controlled by <ref column="other_config"
+ key="stats-update-interval"/> in the <code>Open_vSwitch</code> table.
</p>
<group title="Statistics: STP transmit and receive counters">
<column name="statistics" key="stp_tx_count">
<group title="Statistics">
<p>
Key-value pairs that report interface statistics. The current
- implementation updates these counters periodically. Future
- implementations may update them when an interface is created, when they
- are queried (e.g. using an OVSDB <code>select</code> operation), and
- just before an interface is deleted due to virtual interface hot-unplug
- or VM shutdown, and perhaps at other times, but not on any regular
- periodic basis.
+ implementation updates these counters periodically. The update period
+ is controlled by <ref column="other_config"
+ key="stats-update-interval"/> in the <code>Open_vSwitch</code> table.
+ Future implementations may update them when an interface is created,
+ when they are queried (e.g. using an OVSDB <code>select</code>
+ operation), and just before an interface is deleted due to virtual
+ interface hot-unplug or VM shutdown, and perhaps at other times, but
+ not on any regular periodic basis.
</p>
<p>
These are the same statistics reported by OpenFlow in its <code>struct
</column>
<column name="bfd" key="forwarding_if_rx" type='{"type": "boolean"}'>
- True to consider the interface capable of packet I/O as long as it
- continues to receive any packets (not just BFD packets). This
- prevents link congestion that causes consecutive BFD control packets
- to be lost from marking the interface down.
+ When <code>true</code>, traffic received on the
+ <ref table="Interface"/> is used to indicate the capability of packet
+ I/O. BFD control packets are still transmitted and received. At
+ least one BFD control packet must be received every 100 * <ref
+ column="bfd" key="min_rx"/> amount of time. Otherwise, even if
+ traffic are received, the <ref column="bfd" key="forwarding"/>
+ will be <code>false</code>.
</column>
<column name="bfd" key="cpath_down" type='{"type": "boolean"}'>
<ref column="other_config" key="cfm_extended"/> is true, the CFM
module operates in demand mode. When in demand mode, traffic
received on the <ref table="Interface"/> is used to indicate
- liveness. CCMs are still transmitted and received, but if the
- <ref table="Interface"/> is receiving traffic, their absence does not
- cause a connectivity fault.
+ liveness. CCMs are still transmitted and received. At least one
+ CCM must be received every 100 * <ref column="other_config"
+ key="cfm_interval"/> amount of time. Otherwise, even if traffic
+ are received, the CFM module will raise the connectivity fault.
</p>
<p>
<group title="Statistics: Mirror counters">
<p>
- Key-value pairs that report mirror statistics.
+ Key-value pairs that report mirror statistics. The update period
+ is controlled by <ref column="other_config"
+ key="stats-update-interval"/> in the <code>Open_vSwitch</code> table.
</p>
<column name="statistics" key="tx_packets">
Number of packets transmitted through this mirror.
{
"name": "hardware_vtep",
- "cksum": "3096797177 6063",
+ "cksum": "1687941026 6625",
"tables": {
"Global": {
"columns": {
"type": {"key": {"type": "uuid",
"refTable": "Manager"},
"min": 0, "max": "unlimited"}},
- "switches": {
- "type": {"key": {"type": "uuid", "refTable": "Physical_Switch"},
- "min": 0, "max": "unlimited"}}
+ "switches": {
+ "type": {"key": {"type": "uuid", "refTable": "Physical_Switch"},
+ "min": 0, "max": "unlimited"}}
},
"maxRows": 1,
"isRoot": true},
"Physical_Switch": {
"columns": {
- "ports": {
- "type": {"key": {"type": "uuid", "refTable": "Physical_Port"},
- "min": 0, "max": "unlimited"}},
+ "ports": {
+ "type": {"key": {"type": "uuid", "refTable": "Physical_Port"},
+ "min": 0, "max": "unlimited"}},
"name": {"type": "string"},
"description": {"type": "string"},
"management_ips": {
- "type": {"key": {"type": "string"}, "min": 0, "max": "unlimited"}},
+ "type": {"key": {"type": "string"}, "min": 0, "max": "unlimited"}},
"tunnel_ips": {
- "type": {"key": {"type": "string"}, "min": 0, "max": "unlimited"}},
+ "type": {"key": {"type": "string"}, "min": 0, "max": "unlimited"}},
"switch_fault_status": {
"type": {
"key": "string", "min": 0, "max": "unlimited"},
- "ephemeral": true}},
+ "ephemeral": true}},
"indexes": [["name"]]},
"Physical_Port": {
"columns": {
"name": {"type": "string"},
"description": {"type": "string"},
- "vlan_bindings": {
- "type": {"key": {"type": "integer",
- "minInteger": 0, "maxInteger": 4095},
- "value": {"type": "uuid", "refTable": "Logical_Switch"},
- "min": 0, "max": "unlimited"}},
+ "vlan_bindings": {
+ "type": {"key": {"type": "integer",
+ "minInteger": 0, "maxInteger": 4095},
+ "value": {"type": "uuid", "refTable": "Logical_Switch"},
+ "min": 0, "max": "unlimited"}},
"vlan_stats": {
- "type": {"key": {"type": "integer",
- "minInteger": 0, "maxInteger": 4095},
- "value": {"type": "uuid",
- "refTable": "Logical_Binding_Stats"},
- "min": 0, "max": "unlimited"}},
+ "type": {"key": {"type": "integer",
+ "minInteger": 0, "maxInteger": 4095},
+ "value": {"type": "uuid",
+ "refTable": "Logical_Binding_Stats"},
+ "min": 0, "max": "unlimited"}},
"port_fault_status": {
"type": {
"key": "string", "min": 0, "max": "unlimited"},
- "ephemeral": true}}},
+ "ephemeral": true}}},
"Logical_Binding_Stats": {
"columns": {
"bytes_from_local": {"type": "integer"},
"columns": {
"name": {"type": "string"},
"description": {"type": "string"},
- "tunnel_key": {"type": {"key": "integer", "min": 0, "max": 1}}},
+ "tunnel_key": {"type": {"key": "integer", "min": 0, "max": 1}}},
"isRoot": true,
"indexes": [["name"]]},
"Ucast_Macs_Local": {
"columns": {
"MAC": {"type": "string"},
- "logical_switch": {
+ "logical_switch": {
"type": {"key": {"type": "uuid",
- "refTable": "Logical_Switch"}}},
- "locator": {
+ "refTable": "Logical_Switch"}}},
+ "locator": {
"type": {"key": {"type": "uuid",
- "refTable": "Physical_Locator"}}},
+ "refTable": "Physical_Locator"}}},
"ipaddr": {"type": "string"}},
"isRoot": true},
"Ucast_Macs_Remote": {
"columns": {
"MAC": {"type": "string"},
- "logical_switch": {
+ "logical_switch": {
"type": {"key": {"type": "uuid",
- "refTable": "Logical_Switch"}}},
- "locator": {
+ "refTable": "Logical_Switch"}}},
+ "locator": {
"type": {"key": {"type": "uuid",
- "refTable": "Physical_Locator"}}},
+ "refTable": "Physical_Locator"}}},
"ipaddr": {"type": "string"}},
"isRoot": true},
"Mcast_Macs_Local": {
"columns": {
"MAC": {"type": "string"},
- "logical_switch": {
+ "logical_switch": {
"type": {"key": {"type": "uuid",
- "refTable": "Logical_Switch"}}},
- "locator_set": {
+ "refTable": "Logical_Switch"}}},
+ "locator_set": {
"type": {"key": {"type": "uuid",
- "refTable": "Physical_Locator_Set"}}},
+ "refTable": "Physical_Locator_Set"}}},
"ipaddr": {"type": "string"}},
"isRoot": true},
"Mcast_Macs_Remote": {
"columns": {
"MAC": {"type": "string"},
- "logical_switch": {
+ "logical_switch": {
"type": {"key": {"type": "uuid",
- "refTable": "Logical_Switch"}}},
- "locator_set": {
+ "refTable": "Logical_Switch"}}},
+ "locator_set": {
"type": {"key": {"type": "uuid",
- "refTable": "Physical_Locator_Set"}}},
+ "refTable": "Physical_Locator_Set"}}},
"ipaddr": {"type": "string"}},
"isRoot": true},
"Logical_Router": {
"columns": {
"name": {"type": "string"},
"description": {"type": "string"},
- "switch_binding": {
- "type": {"key": {"type": "string"},
- "value": {"type": "uuid",
+ "switch_binding": {
+ "type": {"key": {"type": "string"},
+ "value": {"type": "uuid",
"refTable": "Logical_Switch"},
- "min": 0, "max": "unlimited"}},
- "static_routes": {
- "type": {"key": {"type": "string"},
- "value": {"type" : "string"},
- "min": 0, "max": "unlimited"}}},
+ "min": 0, "max": "unlimited"}},
+ "static_routes": {
+ "type": {"key": {"type": "string"},
+ "value": {"type" : "string"},
+ "min": 0, "max": "unlimited"}}},
"isRoot": true,
"indexes": [["name"]]},
"Arp_Sources_Local": {
"Physical_Locator_Set": {
"columns": {
"locators": {
- "type": {"key": {"type": "uuid", "refTable": "Physical_Locator"},
- "min": 1, "max": "unlimited"},
- "mutable": false}}},
+ "type": {"key": {"type": "uuid", "refTable": "Physical_Locator"},
+ "min": 1, "max": "unlimited"},
+ "mutable": false}}},
"Physical_Locator": {
"columns": {
"encapsulation_type": {
"key": {
"enum": ["set", ["vxlan_over_ipv4"]],
"type": "string"}},
- "mutable": false},
+ "mutable": false},
"dst_ip": {"type": "string", "mutable": false},
- "bfd": {
+ "bfd": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}},
- "bfd_status": {
+ "bfd_status": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}}},
"indexes": [["encapsulation_type", "dst_ip"]]},
(not a DNS name).
</p>
<p>
- SSL key and certificate configuration happens outside the
- database.
+ SSL key and certificate configuration happens outside the
+ database.
</p>
</dd>
<group title="Identification">
<column name="name">
- Symbolic name for the switch, such as its hostname.
+ Symbolic name for the switch, such as its hostname.
</column>
<column name="description">
- An extended description for the switch, such as its switch login
- banner.
+ An extended description for the switch, such as its switch login
+ banner.
</column>
</group>
<group title="Error Notification">
<p>
- An entry in this column indicates to the NVC that this switch
- has encountered a fault. The switch must clear this column
- when the fault has been cleared.
+ An entry in this column indicates to the NVC that this switch
+ has encountered a fault. The switch must clear this column
+ when the fault has been cleared.
</p>
<column name="switch_fault_status" key="mac_table_exhaustion">
<group title="Identification">
<column name="name">
- Symbolic name for the port. The name ought to be unique within a given
- <ref table="Physical_Switch"/>, but the database is not capable of
- enforcing this.
+ Symbolic name for the port. The name ought to be unique within a given
+ <ref table="Physical_Switch"/>, but the database is not capable of
+ enforcing this.
</column>
<column name="description">
- An extended description for the port.
+ An extended description for the port.
</column>
</group>
<group title="Error Notification">
<p>
- An entry in this column indicates to the NVC that the physical port has
- encountered a fault. The switch must clear this column when the errror
- has been cleared.
+ An entry in this column indicates to the NVC that the physical port has
+ encountered a fault. The switch must clear this column when the errror
+ has been cleared.
</p>
<column name="port_fault_status" key="invalid_vlan_map">
- <p>
- Indicates that a VLAN-to-logical-switch mapping requested by
- the controller could not be instantiated by the switch
- because of a conflict with local configuration.
- </p>
+ <p>
+ Indicates that a VLAN-to-logical-switch mapping requested by
+ the controller could not be instantiated by the switch
+ because of a conflict with local configuration.
+ </p>
</column>
<column name="port_fault_status" key="unspecified_fault">
- <p>
- Indicates that an error has occurred on the port but that no
- more specific information is available.
- </p>
+ <p>
+ Indicates that an error has occurred on the port but that no
+ more specific information is available.
+ </p>
</column>
</group>
<group title="Identification">
<column name="name">
- Symbolic name for the logical switch.
+ Symbolic name for the logical switch.
</column>
<column name="description">
- An extended description for the logical switch, such as its switch
- login banner.
+ An extended description for the logical switch, such as its switch
+ login banner.
</column>
</group>
</table>
<column name="MAC">
<p>
- A MAC address that has been learned by the VTEP.
+ A MAC address that has been learned by the VTEP.
</p>
<p>
- The keyword <code>unknown-dst</code> is used as a special
- ``Ethernet address'' that indicates the locations to which
- packets in a logical switch whose destination addresses do not
- otherwise appear in <ref table="Ucast_Macs_Local"/> (for
- unicast addresses) or <ref table="Mcast_Macs_Local"/> (for
- multicast addresses) should be sent.
+ The keyword <code>unknown-dst</code> is used as a special
+ ``Ethernet address'' that indicates the locations to which
+ packets in a logical switch whose destination addresses do not
+ otherwise appear in <ref table="Ucast_Macs_Local"/> (for
+ unicast addresses) or <ref table="Mcast_Macs_Local"/> (for
+ multicast addresses) should be sent.
</p>
</column>
<column name="MAC">
<p>
- A MAC address that has been learned by the NVC.
+ A MAC address that has been learned by the NVC.
</p>
<p>
- The keyword <code>unknown-dst</code> is used as a special
- ``Ethernet address'' that indicates the locations to which
- packets in a logical switch whose destination addresses do not
- otherwise appear in <ref table="Ucast_Macs_Remote"/> (for
- unicast addresses) or <ref table="Mcast_Macs_Remote"/> (for
- multicast addresses) should be sent.
+ The keyword <code>unknown-dst</code> is used as a special
+ ``Ethernet address'' that indicates the locations to which
+ packets in a logical switch whose destination addresses do not
+ otherwise appear in <ref table="Ucast_Macs_Remote"/> (for
+ unicast addresses) or <ref table="Mcast_Macs_Remote"/> (for
+ multicast addresses) should be sent.
</p>
</column>
<group title="Identification">
<column name="name">
- Symbolic name for the logical router.
+ Symbolic name for the logical router.
</column>
<column name="description">
- An extended description for the logical router.
+ An extended description for the logical router.
</column>
</group>
</table>
<group title="Bidirectional Forwarding Detection (BFD)">
<p>
- BFD, defined in RFC 5880, allows point to point detection of
- connectivity failures by occasional transmission of BFD control
- messages. VTEPs are expected to implement BFD.
+ BFD, defined in RFC 5880, allows point to point detection of
+ connectivity failures by occasional transmission of BFD control
+ messages. VTEPs are expected to implement BFD.
</p>
<p>
- BFD operates by regularly transmitting BFD control messages at a
- rate negotiated independently in each direction. Each endpoint
- specifies the rate at which it expects to receive control messages,
- and the rate at which it's willing to transmit them. An endpoint
- which fails to receive BFD control messages for a period of three
- times the expected reception rate will signal a connectivity
- fault. In the case of a unidirectional connectivity issue, the
- system not receiving BFD control messages will signal the problem
- to its peer in the messages it transmits.
+ BFD operates by regularly transmitting BFD control messages at a
+ rate negotiated independently in each direction. Each endpoint
+ specifies the rate at which it expects to receive control messages,
+ and the rate at which it's willing to transmit them. An endpoint
+ which fails to receive BFD control messages for a period of three
+ times the expected reception rate will signal a connectivity
+ fault. In the case of a unidirectional connectivity issue, the
+ system not receiving BFD control messages will signal the problem
+ to its peer in the messages it transmits.
</p>
<p>
- A hardware VTEP is expected to use BFD to determine reachability of
- devices at the end of the tunnels with which it exchanges data. This
- can enable the VTEP to choose a functioning service node among a set of
- service nodes providing high availability. It also enables the NVC to
- report the health status of tunnels.
+ A hardware VTEP is expected to use BFD to determine reachability of
+ devices at the end of the tunnels with which it exchanges data. This
+ can enable the VTEP to choose a functioning service node among a set of
+ service nodes providing high availability. It also enables the NVC to
+ report the health status of tunnels.
</p>
<p>
- In most cases the BFD peer of a hardware VTEP will be an Open vSwitch
- instance. The Open vSwitch implementation of BFD aims to comply
- faithfully with the requirements put forth in RFC 5880. Open vSwitch
- does not implement the optional Authentication or ``Echo Mode''
- features.
+ In most cases the BFD peer of a hardware VTEP will be an Open vSwitch
+ instance. The Open vSwitch implementation of BFD aims to comply
+ faithfully with the requirements put forth in RFC 5880. Open vSwitch
+ does not implement the optional Authentication or ``Echo Mode''
+ features.
</p>
<group title="BFD Configuration">
- <p>
- A controller sets up key-value pairs in the <ref column="bfd"/>
- column to enable and configure BFD.
+ <p>
+ A controller sets up key-value pairs in the <ref column="bfd"/>
+ column to enable and configure BFD.
</p>
<column name="bfd" key="enable" type='{"type": "boolean"}'>
<code>00:23:20:00:00:01</code>.
</column>
- <column name="bfd" key="bfd_src_ip">
+ <column name="bfd" key="bfd_src_ip">
Set to an IPv4 address to set the IP address used as source for
transmitted BFD packets. The default is <code>169.254.1.0</code>.
- </column>
+ </column>
- <column name="bfd" key="bfd_dst_ip">
+ <column name="bfd" key="bfd_dst_ip">
Set to an IPv4 address to set the IP address used as destination
for transmitted BFD packets. The default is <code>169.254.1.1</code>.
- </column>
+ </column>
</group>
<group title="BFD Status">
- <p>
- The VTEP sets key-value pairs in the <ref column="bfd_status"/>
- column to report the status of BFD on this interface. When BFD is
- not enabled, with <ref column="bfd" key="enable"/>, the switch clears
- all key-value pairs from <ref column="bfd_status"/>.
- </p>
-
- <column name="bfd_status" key="state"
- type='{"type": "string",
- "enum": ["set", ["admin_down", "down", "init", "up"]]}'>
- Reports the state of the BFD session. The BFD session is fully
- healthy and negotiated if <code>UP</code>.
- </column>
-
- <column name="bfd_status" key="forwarding" type='{"type": "boolean"}'>
- Reports whether the BFD session believes this <ref
- table="Physical_Locator"/> may be used to forward traffic. Typically
- this means the local session is signaling <code>UP</code>, and the
- remote system isn't signaling a problem such as concatenated path
- down.
- </column>
-
- <column name="bfd_status" key="diagnostic">
- In case of a problem, set to a short message that reports what the
- local BFD session thinks is wrong.
- </column>
-
- <column name="bfd_status" key="remote_state"
- type='{"type": "string",
- "enum": ["set", ["admin_down", "down", "init", "up"]]}'>
- Reports the state of the remote endpoint's BFD session.
- </column>
-
- <column name="bfd_status" key="remote_diagnostic">
- In case of a problem, set to a short message that reports what the
- remote endpoint's BFD session thinks is wrong.
- </column>
+ <p>
+ The VTEP sets key-value pairs in the <ref column="bfd_status"/>
+ column to report the status of BFD on this interface. When BFD is
+ not enabled, with <ref column="bfd" key="enable"/>, the switch clears
+ all key-value pairs from <ref column="bfd_status"/>.
+ </p>
+
+ <column name="bfd_status" key="state"
+ type='{"type": "string",
+ "enum": ["set", ["admin_down", "down", "init", "up"]]}'>
+ Reports the state of the BFD session. The BFD session is fully
+ healthy and negotiated if <code>UP</code>.
+ </column>
+
+ <column name="bfd_status" key="forwarding" type='{"type": "boolean"}'>
+ Reports whether the BFD session believes this <ref
+ table="Physical_Locator"/> may be used to forward traffic. Typically
+ this means the local session is signaling <code>UP</code>, and the
+ remote system isn't signaling a problem such as concatenated path
+ down.
+ </column>
+
+ <column name="bfd_status" key="diagnostic">
+ In case of a problem, set to a short message that reports what the
+ local BFD session thinks is wrong.
+ </column>
+
+ <column name="bfd_status" key="remote_state"
+ type='{"type": "string",
+ "enum": ["set", ["admin_down", "down", "init", "up"]]}'>
+ Reports the state of the remote endpoint's BFD session.
+ </column>
+
+ <column name="bfd_status" key="remote_diagnostic">
+ In case of a problem, set to a short message that reports what the
+ remote endpoint's BFD session thinks is wrong.
+ </column>
</group>
</group>
</table>