sliver-openvswitch.git
10 years agoclassifier: Use array for subtables instead of a list.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
classifier: Use array for subtables instead of a list.

Using a linear array allows more efficient memory access for lookups.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agolib: Add prefetch support (for GCC)
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
lib: Add prefetch support (for GCC)

Define OVS_PREFETCH() and OVS_PREFETCH_WRITE() using builtin prefetch
for GCC, and ovs_prefetch_range() for prefetching a range of addresses.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agolib/flow: Optimize minimask_has_extra() and minimask_is_catchall()
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
lib/flow: Optimize minimask_has_extra() and minimask_is_catchall()

We only need to iterate over the bits masked by the 'b' in
minimask_has_extra(), since for zeroes in 'b' there can be no 'extra'
wildcards in 'a', as 'b' has already wildcarded all the bits.

minimask_is_catchall() can be simplified by the invariant that mask's
map never has 1-bits for all-zero values.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agolib/classifier: Hide more of the internal data structures.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
lib/classifier: Hide more of the internal data structures.

It is better not to expose definitions not needed by users.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Use classifer cursor API to collect vlan usage.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
ofproto: Use classifer cursor API to collect vlan usage.

This was the only place in OVS code that accessed classifier internal
data structures directly.  Use the classifier cursor API instead, so
that following patches can hide classifier internal data structures.

Note: There seems to be no test case to verify that this vlan usage
collection is implemented correctly.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agolib: Inline functions used in classifier_lookup.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
lib: Inline functions used in classifier_lookup.

This helps about 1% in TCP_CRR performance test.  However, this also
helps by clearly showing the classifier_lookup() cost in perf reports
as one item.

This also cleans up the flow/match APIs from functionality only used
by the classifier, making is more straightforward to evolve them
later.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agolib/flow: Simplify miniflow accessors, add ipv6 support.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
lib/flow: Simplify miniflow accessors, add ipv6 support.

Add new macro MINIFLOW_MAP(FIELD) that returns the map covering the
given struct flow field.

Change the miniflow accessors to macros so that they can take the
field name directly.

Use these to add ipv6 support to miniflow_hash_5tuple().

Add ipv6 support to flow_hash_5tuple() as well so that these two
functions continue to return the same hash value for the corresponding
flows.

Also, simplify miniflow_get_metadata().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Inline trivial functions.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
ofproto: Inline trivial functions.

rule_dpif_is_internal is among the top ten OVS internal functions in
recent perf reports.  Inline it and some other equally trivial
functions.

This change removes rule_is_internal(), since the fact that a table is
an internal one is defined within ofproto-dpif, not ofproto.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto: Inline actions in struct rule_actions.
Jarno Rajahalme [Tue, 29 Apr 2014 22:50:38 +0000 (15:50 -0700)]
ofproto: Inline actions in struct rule_actions.

Allocate struct rule_actions and the space for the actions at once.
This reduces one memory indirection and helps reduce cache misses
visible in perf annotations.

Fix some old comments referring to ref count, since we now use RCU for
this.

Enforce constness of the actions that are assigned from rule_actions
throughout the code.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoovs-rcu: Log the name of the main thread as "main" instead of "".
Ben Pfaff [Tue, 29 Apr 2014 21:44:39 +0000 (14:44 -0700)]
ovs-rcu: Log the name of the main thread as "main" instead of "".

The main thread has the empty string as its name, but that's not a good
log string.

Without this patch we can get log message like
    blocked 1000 ms waiting for  to quiesce
from ovsrcu_synchronize().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agotests: Whitelist messages about RCU blocking in the testsuite.
Ben Pfaff [Tue, 29 Apr 2014 21:40:51 +0000 (14:40 -0700)]
tests: Whitelist messages about RCU blocking in the testsuite.

In production these may indicate a bug, but in the testsuite they probably
just indicate a time-warp.

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agoofp-version-opt: Fix spelling and capitalization.
Ben Pfaff [Tue, 29 Apr 2014 20:55:54 +0000 (13:55 -0700)]
ofp-version-opt: Fix spelling and capitalization.

"OpenFlow" is one word.
"Version" isn't a proper noun.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agodatapath: Convert mask list in mask array.
Pravin B Shelar [Wed, 23 Apr 2014 15:34:51 +0000 (08:34 -0700)]
datapath: Convert mask list in mask array.

mask caches index of mask in mask_list.  On packet recv OVS
need to traverse mask-list to get cached mask.  Therefore array
is better for retrieving cached mask.  This also allows better
cache replacement algorithm by directly checking mask's existence.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
10 years agodatapath: Add flow mask cache.
Pravin B Shelar [Wed, 23 Apr 2014 15:34:08 +0000 (08:34 -0700)]
datapath: Add flow mask cache.

On every packet OVS needs to lookup flow-table with every mask
until it finds a match. The packet flow-key is first masked
with mask in the list and then the masked key is looked up in
flow-table.  Therefore number of masks can affect packet
processing performance.

Following patch adds mask index to mask cache from last
pakcet lookup in same flow.  Index of mask is stored in
this cache. This cache is searched by 5 tuple hash (skb rxhash).

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
10 years agodatapath: Move table destroy to dp-rcu callback.
Pravin B Shelar [Wed, 23 Apr 2014 15:33:38 +0000 (08:33 -0700)]
datapath: Move table destroy to dp-rcu callback.

Ths simplifies flow-table-destroy API.  This change is required
for following patches.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
10 years agonetdev: Fix an use of uninitialized mutex.
Alex Wang [Tue, 29 Apr 2014 06:42:51 +0000 (23:42 -0700)]
netdev: Fix an use of uninitialized mutex.

Commit 05bf6d3c62e1d (ovs-thread: Add checking for mutex and
rwlock initialization.) helps find an use of uninitialized
mutex (netdev_class_mutex) during upgrade.  The assertion
check aborts the ovs.

This commit fixes the issue by adding the proper initialization.

Bug #1239914.
Bug #1240598.
Bug #1240626.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodpif-linux: Fix a bug in creating port.
Alex Wang [Tue, 29 Apr 2014 05:05:05 +0000 (22:05 -0700)]
dpif-linux: Fix a bug in creating port.

Based on the policy of 'OVS_VPORT_ATTR_UPCALL_PID', if upcall should
not be sent to userspace (i.e. the number of handler threads is zero),
the netlink message for creating vport should be an one-element array of
value 0.  However, dpif_linux_port_add__() fails to obey it and generates
zero-payload netlink message which causes the netlink transaction failing
with ERANGE error.

This is particularly bad when the 'flow-restore-wait' is set during upgrade,
since number of handler threads is not set in dpif-linux module and ovs is
not able to add port in datapath until the 'flow-restore-wait' is disabled.
Connection may lose due to this bug.

This bug was introduced by commit 1579cf677fc (dpif-linux: Implement the
API functions to allow multiple handler threads read upcall.).

This commit fixes the bug by fixing the dpif_linux_port_add__() to generate
the correct netlink message when the number of handler threads is not set.

Bug #1239914.
Bug #1240598.
Bug #1240626.

Reported-by: Gurucharan Shetty <gshetty@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-dpif: restore bond rebalance for non-recirc bond
Andy Zhou [Fri, 25 Apr 2014 02:20:05 +0000 (19:20 -0700)]
ofproto-dpif: restore bond rebalance for non-recirc bond

Bond rebalancing was disabled for bonds not using recirculation. The
patch fixes this bug.

While fixing the bug, the bond_rebalance() was also restructured
slightly to move bond related logic back into ofproto/bond.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-bond: do not allow recirculation when we failed to allocate recirc_id
Andy Zhou [Fri, 25 Apr 2014 02:21:40 +0000 (19:21 -0700)]
ofproto-bond: do not allow recirculation when we failed to allocate recirc_id

When recirc pool is exhausted, a new bond won't be allocate a new
recirc_id. The bond->recirc_id will remain zero. This condition
should prevent the bond from use recirculation. This check was missing
before this patch.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoopenvswitch.h: Clarify use of key attributes.
Jarno Rajahalme [Tue, 29 Apr 2014 00:31:26 +0000 (17:31 -0700)]
openvswitch.h: Clarify use of key attributes.

Key attributes relating to actual packet headers are ignored for
OVS_PACKET_CMD_EXECUTE as the header key attributes are retrieved
from the packet itself.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agolib/odp-util: Remove extra parenthesis from sctp key output.
Jarno Rajahalme [Tue, 29 Apr 2014 00:31:25 +0000 (17:31 -0700)]
lib/odp-util: Remove extra parenthesis from sctp key output.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agoopenvswitch.h: Note that 64 bit ints are 4-aligned.
Jarno Rajahalme [Tue, 29 Apr 2014 00:31:25 +0000 (17:31 -0700)]
openvswitch.h: Note that 64 bit ints are 4-aligned.

In general, all Netlink 64-bit data may be 4-byte aligned, due to
netlink header and attributes being 4-aligned.

To avoid unaligned access the data should be copied out of the netlink
attribute before access.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agolib/ofp-actions: Update comment.
Jarno Rajahalme [Tue, 29 Apr 2014 00:31:25 +0000 (17:31 -0700)]
lib/ofp-actions: Update comment.

We recently renamed ofpbuf's 'l2' member as 'frame'.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agotests: Fix up "ofproto-dpif - ofproto-dpif-monitor 1".
Ben Pfaff [Mon, 28 Apr 2014 22:59:40 +0000 (15:59 -0700)]
tests: Fix up "ofproto-dpif - ofproto-dpif-monitor 1".

Commit 1335a8d578b03e (tests: Fix race condition waiting for monitor
thread.) fixed a race condition in a test.  Commit 8ba0a5227f6 (ovs-thread:
Make caller provide thread name when creating a thread.) slightly changed
the output that the test checked, breaking the test.  However, I was used
to the test occasionally failing due to the race (not realizing that the
race had been fixed) so I applied the commit anyway.

This commit fixes the broken test.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agoovs-thread: Add checking for mutex and rwlock initialization.
Ben Pfaff [Thu, 24 Apr 2014 23:49:05 +0000 (16:49 -0700)]
ovs-thread: Add checking for mutex and rwlock initialization.

With glibc, a mutex or rwlock filled with all-zero-bytes is properly
initialized for use, but this is not true for any other libc that OVS
supports.  However, OVS gets a lot more testing with glibc than any other
libc.  This means that developers keep introducing bugs that do not
manifest on the main development platform.

This commit should help avoid the problem, by reusing the existing 'where'
members to indicate whether a mutex or rwlock has been initialized.

Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agolacp: Don't lock potentially uninitialized mutex in lacp_status().
Ben Pfaff [Thu, 24 Apr 2014 23:58:45 +0000 (16:58 -0700)]
lacp: Don't lock potentially uninitialized mutex in lacp_status().

If the 'lacp' parameter is nonnull, then we know that the file scope mutex
has been initialized, since that's done as a side effect of creating a
lacp object, but otherwise there's no guarantee.

Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoPrepare for post-2.2.0 (2.2.90).
Justin Pettit [Mon, 28 Apr 2014 21:45:07 +0000 (14:45 -0700)]
Prepare for post-2.2.0 (2.2.90).

Signed-off-by: Justin Pettit <jpettit@nicira.com>
10 years agoPrepare for 2.2.0.
Justin Pettit [Mon, 28 Apr 2014 21:30:27 +0000 (14:30 -0700)]
Prepare for 2.2.0.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
10 years agoovs-rcu: Log a helpful warning when ovsrcu_synchronize() stalls.
Ben Pfaff [Mon, 28 Apr 2014 22:25:19 +0000 (15:25 -0700)]
ovs-rcu: Log a helpful warning when ovsrcu_synchronize() stalls.

This made it easier for me to find a thread that was causing stalls.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agoovs-thread: Make caller provide thread name when creating a thread.
Ben Pfaff [Sat, 26 Apr 2014 00:46:21 +0000 (17:46 -0700)]
ovs-thread: Make caller provide thread name when creating a thread.

Thread names are occasionally very useful for debugging, but from time to
time we've forgotten to set one.  This commit adds the new thread's name
as a parameter to the function to start a thread, to make that mistake
impossible.  This also simplifies code, since two function calls become
only one.

This makes a few other changes to the thread creation function:

    * Since it is no longer a direct wrapper around a pthread function,
      rename it to avoid giving that impression.

    * Remove 'pthread_attr_t *' param that every caller supplied as NULL.

    * Change 'pthread *' parameter into a return value, for convenience.

The system-stats code hadn't set a thread name, so this fixes that issue.

This patch is a prerequisite for making RCU report the name of a thread
that is blocking RCU synchronization, because the easiest way to do that is
for ovsrcu_quiesce_end() to record the current thread's name.
ovsrcu_quiesce_end() is called before the thread function is called, so it
won't get a name set within the thread function itself.  Setting the thread
name earlier, as in this patch, avoids the problem.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agoovs-thread: Quiesce in xpthread_barrier_wait().
Ben Pfaff [Fri, 25 Apr 2014 20:50:48 +0000 (13:50 -0700)]
ovs-thread: Quiesce in xpthread_barrier_wait().

Otherwise the udpif revalidator threads can postpone RCU callbacks
essentially forever, especially if there are many revalidator threads and
little network traffic.

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agotimeval: Preserve quiescence across time_poll().
Ben Pfaff [Sat, 26 Apr 2014 01:25:06 +0000 (18:25 -0700)]
timeval: Preserve quiescence across time_poll().

Otherwise ovsrcu_synchronize() busy-waits in its loop because its
poll_block() un-quiesces, causing the global_seqno to increase, which is
what it waits for.

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agodatapath: Check for backported skb_orphan_frags().
Joe Stringer [Mon, 28 Apr 2014 01:59:25 +0000 (13:59 +1200)]
datapath: Check for backported skb_orphan_frags().

This was causing build failures on debian wheezy. Check for the feature
rather than the version.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agoPython Logging Formatting Improvements
Dave Tucker [Sun, 30 Mar 2014 11:26:55 +0000 (12:26 +0100)]
Python Logging Formatting Improvements

The Open vSwitch daemons written in C support user-configured logging
patterns as described in ovs-appctl(8). This commit adds this capability
to the daemons written in Python.

- Add a '__log_patterns' attribute to the Vlog class
- Populate this using the default patterns in ovs-appctl(8)
- Add a '__start_time' attribute to the Vlog class to support '%r'
- Update the '_log' method to build the log message according to the
  pattern
- Add a 'set_pattern' method to allow the default patterns to be changed
- Update 'set_levels_from_string' to support setting the pattern from a
  string

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agobridge: Refactor the 'Instant' stats logic.
Alex Wang [Thu, 3 Apr 2014 20:12:35 +0000 (13:12 -0700)]
bridge: Refactor the 'Instant' stats logic.

This commit refactors the 'Instant' stats related logic in bridge.c
by moving it into bridge_run().

This change brings the following effects:

1. bridge.c will wait on the global connectivity sequence number when
   there is no pending instant stats transaction.  and the main thread
   will no longer be waken up every 100 ms for 'Instant' stats check.
   the related overhead is eliminated.

2. the netdev's sequence number is used to avoid updating unchanged netdev
   status.  so, the update is more efficient.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
10 years agoofproto-dpif: Use sequence number to wake up main thread for
Alex Wang [Thu, 17 Apr 2014 19:24:45 +0000 (12:24 -0700)]
ofproto-dpif: Use sequence number to wake up main thread for
packet-in I/O.

This commit adds per 'struct ofproto_dpif' sequence number for
packet-in I/O.  Whenever ofproto_dpif_send_packet_in() is called,
the calling thread will change the sequence number to wake up the
main thread.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
10 years agotests: Fix race condition waiting for monitor thread.
Joe Stringer [Sun, 27 Apr 2014 23:51:28 +0000 (11:51 +1200)]
tests: Fix race condition waiting for monitor thread.

Occasionally, test #770 "ofproto-dpif - ofproto-dpif-monitor 1" would
fail, because the testsuite looked in the logs for evidence of a thread
being created, but it checked before vswitchd was able to spawn the
thread.

This patch fixes the race by modifying the commands that check for
creation/termination of threads to wait until they see the messages
instead.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agorevalidator: Fix ukey stats cache updating.
Joe Stringer [Fri, 25 Apr 2014 22:23:43 +0000 (15:23 -0700)]
revalidator: Fix ukey stats cache updating.

revalidate_ukey() had a bug where it would update the ukey->stats even
if it decided not to push stats (as an optimisation). ukey->stats should
only be updated when those stats are pushed.

This bug would arise in the following situation:
* A flow has been dumped before.
* The flow needs to be revalidated.
* The flow is low-throughput.
* The flow has new statistics to push.

Such cases rely on flow deletion to update the stats. However, that code
pushes the delta between the ukey->stats and the final flow dump. If the
ukey stats cache is updated without the stats being pushed, those stats
would be lost.

This caused intermittent testsuite failures on "learning action -
self-modifying flow with idle_timeout". Introduced by 698ffe3623f1b630ae
"revalidator: Only revalidate high-throughput flows."

Bug #1238927.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agoofproto-dpif-upcall: Fix a bug.
Alex Wang [Fri, 25 Apr 2014 17:39:53 +0000 (10:39 -0700)]
ofproto-dpif-upcall: Fix a bug.

Commit 7d170098 (ofproto-dpif-upcall: Remove the flow_dumper thread.)
initialized the memory barrier inside the udpif_start_threads() function.
However, the udpif_start_threads() function does not check the number of
revalidator threads specified in udpif.  So, when the number is zero, it
causes the error in barrier initialization.  This could happen when the
other_config:flow-restore-wait is set and the udpif_flush() is called.

This commit fixes the issue, by checking the specified number of threads
in udpif_start_threads().

Reported-by: Gurucharan Shetty <gshetty@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
10 years agoofproto: Don't destroy mutex before its use.
Gurucharan Shetty [Tue, 22 Apr 2014 17:18:02 +0000 (10:18 -0700)]
ofproto: Don't destroy mutex before its use.

Currently, we are calling guarded_list_destroy()
to destroy a mutex and then go ahead and use it through
delete_group
 ->delete_group__
   ->handle_flow_mod__
    ->run_rule_executes
     ->guarded_list_pop_all

The group related unit tests cause ovs-vswitchd to crash
because of this (on windows).

Calling guarded_list_destroy() after delete_group() solves the
problem.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agonetdev: Initialize netdev_class_mutex.
Gurucharan Shetty [Wed, 16 Apr 2014 20:33:26 +0000 (13:33 -0700)]
netdev: Initialize netdev_class_mutex.

This code path currently does not initialize
netdev_class_mutex.
dummy_enable
 ->netdev_dummy_register
   ->netdev_register_provider
     ->ovs_mutex_lock(&netdev_class_mutex)

ovsdb-server on windows crashes without it.

This commit adds a new initialization function.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoFAQ: Explain what to do when building against a too-new kernel.
Ben Pfaff [Thu, 24 Apr 2014 22:35:35 +0000 (15:35 -0700)]
FAQ: Explain what to do when building against a too-new kernel.

Also add references to this FAQ from INSTALL and configure.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
10 years agobfd/cfm: Check status change before update status to database.
Alex Wang [Thu, 3 Apr 2014 17:20:44 +0000 (10:20 -0700)]
bfd/cfm: Check status change before update status to database.

This commit adds boolean flag in bfd/cfm module for checking
status change.  If there is no status change, the current
update to OVS database will skip the bfd/cfm session.

In the experiment with 5K bfd sessions, when one session is
flapping at rate of every 0.3 second, this patch reduces the
cpu utilization of the ovs-vswitchd thread from 13 to 6.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
10 years agodaemon: Move some common code to daemon.c
Gurucharan Shetty [Wed, 23 Apr 2014 21:22:38 +0000 (14:22 -0700)]
daemon: Move some common code to daemon.c

We have some common code between daemon-unix.c and
daemon-windows.c. Move them to daemon.c

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodaemon: Close standard file descriptors after detach for windows.
Gurucharan Shetty [Wed, 23 Apr 2014 17:28:00 +0000 (10:28 -0700)]
daemon: Close standard file descriptors after detach for windows.

In the unit tests, we check for some logs stored in stderr. In case
of windows, unit tests fail because the child writes additional information
into stderr because it does not have it closed. This commit
closes standard file descriptors for windows too.

Because the functions related to closing file descriptors is common
for both windows and unix, add it to the common daemonization file
daemon.c

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodaemon: Rename daemon.c as daemon-unix.c
Gurucharan Shetty [Wed, 23 Apr 2014 16:03:38 +0000 (09:03 -0700)]
daemon: Rename daemon.c as daemon-unix.c

An upcoming commit re-introduces daemon.c to have
common functions across daemon-unix.c and daemon-windows.c

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-dpif-xlate: Identify STP BPDUs more specifically.
Padmanabhan Krishnan [Thu, 24 Apr 2014 20:18:18 +0000 (13:18 -0700)]
ofproto-dpif-xlate: Identify STP BPDUs more specifically.

Apart from STP, EVB extension of LLDP as well as IEEE 802.1QBG use the
Nearest Customer Bridge (NCB) DMAC which has a value of 0180.c200.0000.
STP can be distinguished by Ethertype from these protocols.

Signed-off-by: Padmanabhan Krishnan <kprad1@yahoo.com>
[blp@nicira.com rewrote the details of the patch]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Padmanabhan Krishnan <kprad1@yahoo.com>
10 years agoofproto: Reduce taking rule references.
Jarno Rajahalme [Thu, 24 Apr 2014 15:21:49 +0000 (08:21 -0700)]
ofproto: Reduce taking rule references.

Only take reference to a looked up rule when needed.

This reduces the total CPU utilization of rule_ref/unref calls by 80%,
from 5% of total server CPU capacity to 1% in a netperf TCP_CRR
test stressing the userspace.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Make taking rule reference conditional on lookup.
Jarno Rajahalme [Thu, 24 Apr 2014 15:21:49 +0000 (08:21 -0700)]
ofproto: Make taking rule reference conditional on lookup.

Prior to this paths the rule lookup functions have always taken a
reference on the found rule before returning.  Make this conditional,
so that unnecessary refs/unrefs can be avoided in a later patch.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: RCU postpone rule destruction.
Jarno Rajahalme [Thu, 24 Apr 2014 15:21:49 +0000 (08:21 -0700)]
ofproto: RCU postpone rule destruction.

This allows rules to be used without taking references while RCU
protected.

The last step of destroying an ofproto also needs to be postponed, as
the rule destruction requires the class structure to be available at
the postponed destruction callback.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agohmap_random_node: Improve distribution
YAMAMOTO Takashi [Tue, 22 Apr 2014 04:34:36 +0000 (13:34 +0900)]
hmap_random_node: Improve distribution

Improve random distribution for an hmap with a small number of nodes
with the expense of the increased cpu cost.
It would be a fair trade-off because the situation is rather common
for bond, which is currently the only consumer of this API in tree.

Consider 2 items, 4 buckets, no collision.

    bucket 0   item 0
    bucket 1
    bucket 2
    bucket 3   item 1

The old algorithm picks item 0 if rand % 4 == 0.  (25%)
Otherwise it picks item 1.  (75%)

This change makes them 50%.

Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agoofproto-dpif-upcall: Remove the flow_dumper thread.
Ethan Jackson [Thu, 10 Apr 2014 07:14:08 +0000 (07:14 +0000)]
ofproto-dpif-upcall: Remove the flow_dumper thread.

Previously, we had a separate flow_dumper thread that fetched flows from
the datapath to distribute to revalidator threads. This patch takes the
logic for dumping and pushes it into the revalidator threads, resulting
in simpler code with similar performance to the current code.

One thread, the "leader", is responsible for beginning and ending each
flow dump, maintaining the flow_limit, and checking whether the
revalidator threads need to exit. All revalidator threads dump,
revalidate, delete datapath flows and garbage collect ukeys.

Co-authored-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agobridge: When ports disappear from a datapath, add them back.
Ben Pfaff [Thu, 24 Apr 2014 01:33:36 +0000 (18:33 -0700)]
bridge: When ports disappear from a datapath, add them back.

Before commit 2a73b1d73d4bdb (bridge: Reconfigure in single pass.), if a
port disappeared, for one reason or another, from a datapath, the next
bridge reconfiguration pass would notice and, if the port was still
configured in the database, add the port back to the datapath.  That
commit, however, removed the logic from bridge_refresh_ofp_port() that
did that and failed to add the same logic to the replacement function
bridge_delete_or_reconfigure_ports().  This commit fixes the problem.

To see this problem on a Linux kernel system:

ovs-vsctl add-br br0                             # 1
tunctl -t tap                                    # 2
ovs-vsctl add-port br0 tap                       # 3
ovs-dpctl show                                   # 4
tunctl -d tap                                    # 5
ovs-dpctl show                                   # 6
tunctl -t tap                                    # 7
ovs-vsctl del-port tap -- add-port br0 tap       # 8
ovs-dpctl show                                   # 9

Steps 1-4 create a bridge and a tap and add it to the bridge and
demonstrate that the tap is part of the datapath.  Step 5 and 6 delete
the tap and demonstrate that it has therefore disappeared from the
datapath.  Step 7 recreates a tap with the same name, and step 8
forces ovs-vswitchd to reconfigure.  Step 9 shows the effect of the
fix: without the fix, the new tap is not added back to the datapath;
with this fix, it is.

Special thanks to Gurucharan Shetty <gshetty@nicira.com> for finding a
simple reproduction case and then bisecting to find the commit that
introduced the problem.

Bug #1238467.
Reported-by: Ronald Lee <ronaldlee@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agorevalidator: Prevent handling the same flow twice.
Joe Stringer [Wed, 23 Apr 2014 03:31:17 +0000 (15:31 +1200)]
revalidator: Prevent handling the same flow twice.

When the datapath flow table is modified while a flow dump operation is
in progress, it is possible for the same flow to be dumped twice. In
such cases, revalidators may perform redundant work, or attempt to
delete the same flow twice.

This was causing intermittent testsuite failures for test #670 -
"ofproto-dpif, active-backup bonding" where a flow (that had not
previously been dumped) was dumped, revalidated and deleted twice.

The logs show errors such as:
"failed to flow_get (No such file or directory) skb_priority(0),..."
"failed to flow_del (No such file or directory) skb_priority(0),..."

This patch adds a 'flow_exists' field to 'struct udpif_key' to track
whether the flow is (in progress) to be deleted. After doing a ukey
lookup, we check whether ukey->mark or ukey->flow indicates that the
flow has already been handled. If it has already been handled, we skip
handling the flow again.

We also defer ukey cleanup for flows that fail revalidation, so that the
ukey will still exist if the same flow is dumped twice. This allows the
above logic to work in this case.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
10 years agoofproto-dpif: Improve code clarity and comments on recirc changes to rule_dpif_lookup()
Andy Zhou [Wed, 23 Apr 2014 20:05:40 +0000 (13:05 -0700)]
ofproto-dpif: Improve code clarity and comments on recirc changes to rule_dpif_lookup()

This patch improves the code readability and comments on the
recirculation related changes to rule_dpif_lookup() base on off-line
discussions with Jarno.  There is no behavior changes.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agolib/util: Input validation in str_to_uint
Zoltan Kiss [Wed, 23 Apr 2014 13:45:21 +0000 (14:45 +0100)]
lib/util: Input validation in str_to_uint

This function returns true when 's' is negative or greater than UINT_MAX. Also,
the representation of 'int' and 'unsigned int' is implementation dependent, so
converting [INT_MAX..UINT_MAX] values with str_to_int is fragile.
Instead, we should convert straight to 'long long' and do a boundary check
before returning the converted value.
This patch also move the function to the .c file as it's not-trivial now, and
deletes the other str_to_u* functions as they are not used.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agorun-ryu: Use unix socket rather than patch ports
Simon Horman [Wed, 23 Apr 2014 05:06:12 +0000 (14:06 +0900)]
run-ryu: Use unix socket rather than patch ports

My understanding of the implementation of patch ports is that they
are rather special, being handled as a special case inside
compose_output_action__() and do not exist in the datapath.

A side effect of the implementation of patch ports (though perhaps not the
portion mentioned above) is that the OFPUTIL_PC_PORT_DOWN bit may not be
set via a port mod message. In particular, the call to
netdev_turn_flags_on() in update_port_config() fails.

There is a test provided by Ryu that test this via port mod and thus fails.

While that test could be modified or the results ignored it seems to me
that it would be best if ryu-check used ports which were more fully
featured and not special cases.

Thus this patch moves run-ryu to use unix socket backed ports rather than
patch ports.

I believe a more significant problem with the use of patch ports
is that they will require (more) special case code in order to correctly
handle recirculation. As Ryu provides many tests that exercise
recirculation for MPLS it would be nice if they could be used to exercise
recirculation for MPLS (which I have provided patches for separately[1])
without the need to add more special-case code for that purpose.

I believe that patch ports are also incompatible with recirculation for
bonding, which has already been merged, though I have not verified that
and it is not strictly related to this patch as I do not believe that Ryu
provides any tests to exercise that case.

The key problem with patch ports in the context of recirculation is that
the ofproto and in_port may change during translation.  And this
information is lost by the time that execution occurs.

Furthermore the new in_port will not exist in the datapath as it is a
patch port. That particular problem may be addressed by executing the
actions in user-space, I have posted patches to provide infrastructure
for that[1].

Overall it is not clear to me that the complexity of supporting
recirculation for patch-ports would have sufficient pay-off.

[1] [PATCH v3 00/16] Flow-Based Recirculation for MPLS

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agodaemon-windows: Recognize --no-chdir option for windows.
Gurucharan Shetty [Fri, 18 Apr 2014 18:04:14 +0000 (11:04 -0700)]
daemon-windows: Recognize --no-chdir option for windows.

The option won't have any effect on the running of the daemon.
Recognizing the option lets us avoid if else conditions in unit
tests.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoTODO: Add the project list from the hackathon.
Ben Pfaff [Tue, 22 Apr 2014 23:26:05 +0000 (16:26 -0700)]
TODO: Add the project list from the hackathon.

I've had a couple of requests for an updated project list, so this commit
adds it to the tree.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
10 years agotestsuite.at: Workaround for carriage returns on windows.
Gurucharan Shetty [Fri, 18 Apr 2014 16:00:46 +0000 (09:00 -0700)]
testsuite.at: Workaround for carriage returns on windows.

In unit tests, we compare text written in logs or stdout/stderr
to figure out the success or failure of tests. In Windows,
since new line is represented by CR+LF, autoconf tests run in
MinGW environment fail.

Asking diff to ignore trailing carriage returns is one way
to solve the problem

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agotestsuite.at: kill for windows.
Gurucharan Shetty [Fri, 18 Apr 2014 15:24:42 +0000 (08:24 -0700)]
testsuite.at: kill for windows.

We use kill to cleanup processes from pidfiles.
Windows has a 'taskkill' which does something similar.
We can check if the process with a PID exists with
'tasklist'. Both tasklist and taskkill return 0 for
both success and failure. So, we will have to grep
to see if there is a o/p.

A typical o/p of tasklist is:
$ tasklist | grep ovs
ovsdb-server.exe              3228 RDP-Tcp#0                  2      6,132 K
ovs-vswitchd.exe              2080 RDP-Tcp#0                  2      5,808 K

$ tasklist //fi "PID eq 3228"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
ovsdb-server.exe              3228 RDP-Tcp#0                  2      6,132 K

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agotestsuite.at: pwd for windows.
Gurucharan Shetty [Fri, 18 Apr 2014 15:17:11 +0000 (08:17 -0700)]
testsuite.at: pwd for windows.

On MinGW, "pwd -W" gives the present working directory
in the form of windows path (i.e C:/temp instead of /c/temp).
When we pass the directory path to daemons as arguments,
we should be passing it in the form of windows path.

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodaemon-windows: Create pidfiles with option --pidfile.
Gurucharan Shetty [Tue, 15 Apr 2014 20:26:53 +0000 (13:26 -0700)]
daemon-windows: Create pidfiles with option --pidfile.

In Windows, we cannot delete a file that has been opened.
We use this feature to "lock" the pidfile.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodaemon-windows: Implement --detach option for Windows.
Gurucharan Shetty [Tue, 15 Apr 2014 17:14:05 +0000 (10:14 -0700)]
daemon-windows: Implement --detach option for Windows.

When "--detach" is specified, a daemon will create a new
process with the same command line options as the parent.
Additionally, an undocumented command line option "--pipe-handle"
is passed to child. Once the child is ready to handle external
commands, it communicates with the parent that it is ready using
the pipe handle. The parent exits. This lets us run the daemons
in background. This will also help the unit tests because currently
most of the unit tests pass the '--detach' option to the daemons.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodaemon-windows: Rename service variables.
Gurucharan Shetty [Mon, 14 Apr 2014 19:57:53 +0000 (12:57 -0700)]
daemon-windows: Rename service variables.

Sa far, we are using variable 'detach' to indicate whether the option
"--service" has been set. We were using variable 'detached' to indicate that
the daemon is being called from the Windows services manager.

An upcoming commit introduces command line option "--detach" for daemons
running on Windows. This will cause confusion with variable names.
Therefore, rename the variables.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agostream: Introduce [p]windows_[p]stream_class.
Gurucharan Shetty [Fri, 4 Apr 2014 21:13:32 +0000 (14:13 -0700)]
stream: Introduce [p]windows_[p]stream_class.

On Linux, we heavily use --remote=punix:* to listen for
connections through unix domain sockets. We also use, unix:*
to connect to a daemon that is listening on unix domain sockets.
Many times, we create default unix domain sockets for listening
and many utilities connect to these sockets by default.

Windows does not have unix domain sockets. So far, we could just use
ptcp:* and tcp:* for listening and initiating connections respectively.
The drawback here is that one has to provide a specific TCP port.

For unit tests, it looks useful to let kernel choose that port.
As such, we can let that chosen kernel port be stored in the
file specified with punix:* and unix:*. For this purpose, introduce
a new [p]windows_[p]stream_class. Since it is just a wrapper around
[p]tcp_[p]stream_class, add it to stream-tcp.c.

commit cb54a8c (unixctl: Add support for Windows.) used the above concept
for only control channel connections (i.e., --unixctl for daemons and its
interaction with ovs-appctl). This commit adds the same support for
all unix domain sockets.  Now that we have a separate class
[p]stream_class for hiding kernel assigned TCP port inside a file meant for
unix domain sockets in windows, make unixctl use it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agotests: Define a variable "IS_WIN32" for tests.
Gurucharan Shetty [Thu, 3 Apr 2014 22:52:45 +0000 (15:52 -0700)]
tests: Define a variable "IS_WIN32" for tests.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoovs-rcu: Name the ovsrcu_postpone_thread to 'urcu'.
Alex Wang [Tue, 15 Apr 2014 18:32:26 +0000 (11:32 -0700)]
ovs-rcu: Name the ovsrcu_postpone_thread to 'urcu'.

The ovs-rcu module adds a new thread for checking the grace period.
Since the thread name is not set, it will inherit the name of the
thread that creates it.  This makes the 'top' output quite confusing.

This commit names the thread to 'urcu' for clarity.

Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
10 years agoofproto-dpif-upcall: Fix logic error in handler/revalidator threads
Alex Wang [Tue, 22 Apr 2014 03:05:08 +0000 (20:05 -0700)]
ofproto-dpif-upcall: Fix logic error in handler/revalidator threads
creation and deletion.

Commit 1f8675481e (ofproto-dpif-upcall: Fix ovs-vswitchd crash.)
directly copied the udpif_set_threads() logic to udpif_stop_threads()
and udpif_start_threads().  In fact, this was erroneous and caused
unittest failures.

This commit fixes the above issue by correcting the checks in
udpif_stop_threads() and udpif_start_threads(), and adding necessary
checks in udpif_set_threads().

Acked-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
10 years agoofproto-dpif-upcall: Fix ovs-vswitchd crash.
Alex Wang [Tue, 22 Apr 2014 00:31:11 +0000 (17:31 -0700)]
ofproto-dpif-upcall: Fix ovs-vswitchd crash.

On current master, caller of udpif_set_threads() can pass 0 value
on n_handlers and n_revalidators to delete all handler and revalidator
threads.

After commit 9a159f748866 (ofproto-dpif-upcall: Remove the dispatcher
thread.), udpif_set_threads() also calls the dpif_handlers_set() with
the 0 value 'n_handlers'.  Since dpif level always assume the 'n_handlers'
be non-zero, this causes warnings and even crash of ovs-vswitchd.

This commit fixes the above issue by defining separate functions for
starting and stopping handler and revalidator threads.  So
udpif_set_threads() will never be called with 0 value arguments.

Reported-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Co-authored-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodatapath: add recirc action
Andy Zhou [Tue, 8 Apr 2014 11:13:42 +0000 (11:13 +0000)]
datapath: add recirc action

Recirculation implementation for Linux kernel data path.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: add hash action
Andy Zhou [Fri, 11 Apr 2014 08:41:18 +0000 (01:41 -0700)]
datapath: add hash action

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: remove unneeded declaration of new_vport().
Rami Rosen [Sun, 20 Apr 2014 09:19:44 +0000 (12:19 +0300)]
datapath: remove unneeded declaration of new_vport().

This patch removes the new_vport() forward declaration in datapath.c
as it is not needed.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch.h: rename hash action definition
Andy Zhou [Fri, 18 Apr 2014 03:06:58 +0000 (20:06 -0700)]
openvswitch.h: rename hash action definition

Rename hash_bias to hash_basis to make it consistent with similar
usages.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agoodp-util: Always generate key/mask pair in netlink for recirc_id
Andy Zhou [Fri, 18 Apr 2014 06:13:46 +0000 (23:13 -0700)]
odp-util: Always generate key/mask pair in netlink for recirc_id

Currently netlink flow (and mask) recirc_id attribute is only
serialized when the recirc_id value is non-zero. For this logic
to work correctly, the interpretation of the missing recirc_id
depends on whether the datapath supports recirculation.

This patch remove the ambiguity of the meaning of missing recirc_id
attribute in netlink message.  When recirc_id is non-zero, or when
it is not a wildcard match, both key and mask attributes are
serialized.  On the other hand, when recirc_id is zero, and being
wildcarded, they are not serialized.  A missing recirc_id key and
mask attribute thus should always be interpreted as wildcard,
same as other flow fields.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agoofproto-dpif: Rule lookup starts from table zero for non-recirc datapath
Andy Zhou [Fri, 18 Apr 2014 06:40:27 +0000 (23:40 -0700)]
ofproto-dpif: Rule lookup starts from table zero for non-recirc datapath

Currently, all packet lookup starts from internal table for possible
matching of post recirculation rules. This is not necessary for
datapath that does not support recirculation.

This patch adds the ability to steering rule lookup starting table
based on whether datapath supports recirculation.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agoofproto-dpif-upcall: Don't use stack garbage
YAMAMOTO Takashi [Fri, 18 Apr 2014 02:13:01 +0000 (11:13 +0900)]
ofproto-dpif-upcall: Don't use stack garbage

Catched by "learning action - self-modifying flow with hard_timeout"
test case.

The bug introduced by commit b256dc52.
("ofproto-dpif-xlate: Cache xlate_actions() effects.")

Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agoofproto-dpif-upcall: Remove the dispatcher thread.
Alex Wang [Thu, 27 Feb 2014 07:03:24 +0000 (23:03 -0800)]
ofproto-dpif-upcall: Remove the dispatcher thread.

With the foundation laid in previous commits, this commit
removes the 'dispatcher' thread by allowing 'handler'
threads to read upcalls directly from dpif.

This commit significantly simplifies the flow miss handling
code and brings slight improvement to flow setup rate.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodpif-linux: Add thread-safety annotations.
Alex Wang [Fri, 18 Apr 2014 00:16:34 +0000 (17:16 -0700)]
dpif-linux: Add thread-safety annotations.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodpif-linux: Pass 'struct dpif_linux *' to internal static functions.
Alex Wang [Thu, 17 Apr 2014 23:33:17 +0000 (16:33 -0700)]
dpif-linux: Pass 'struct dpif_linux *' to internal static functions.

This commit reformats the dpif-linux module so that all internal
static functions take 'struct dpif_linux *' as input argument.
This will allow the adding of thread-safety annotations.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodpif-linux: Implement the API functions to allow multiple handler threads read upcall.
Alex Wang [Wed, 26 Feb 2014 18:10:29 +0000 (10:10 -0800)]
dpif-linux: Implement the API functions to allow multiple handler threads read upcall.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodpif-netdev: Use miniflow as a flow key.
Jarno Rajahalme [Fri, 18 Apr 2014 15:26:57 +0000 (08:26 -0700)]
dpif-netdev: Use miniflow as a flow key.

Use miniflow as a flow key in the userspace datapath classifier.  The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agoclassifier: Support miniflow as a key.
Jarno Rajahalme [Fri, 18 Apr 2014 15:26:56 +0000 (08:26 -0700)]
classifier: Support miniflow as a key.

Support struct miniflow as a key for datapath flow lookup.

The new classifier interface classifier_lookup_miniflow_first() takes
a miniflow as a key and stops at the first match with no regard to
flow prioritites.  This works only if the classifier has no
conflicting rules (as is the case with the userspace datapath
classifier).

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agolib/flow: Possibly faster miniflow_hash_in_minimask()
Jarno Rajahalme [Fri, 18 Apr 2014 15:26:56 +0000 (08:26 -0700)]
lib/flow: Possibly faster miniflow_hash_in_minimask()

Upcoming patches add classifier lookups using miniflows, this is
heavily used for it.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agolib/flow: Add miniflow accessors and miniflow_get_tcp_flags().
Jarno Rajahalme [Fri, 18 Apr 2014 15:26:56 +0000 (08:26 -0700)]
lib/flow: Add miniflow accessors and miniflow_get_tcp_flags().

Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these.  These will be used in a
later patch.

Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies.  Similarly, some inline
functions using struct flow are now in lib/flow.h.  IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agolib/flow: Introduce miniflow_extract().
Jarno Rajahalme [Fri, 18 Apr 2014 15:26:56 +0000 (08:26 -0700)]
lib/flow: Introduce miniflow_extract().

miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow.  This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.

miniflow_extract() takes a miniflow as an input/output parameter.  On
input the buffer for values to be extracted must be properly
initialized.  On output the map contains ones for all the fields that
have been extracted.

Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.

Some explicit "inline" keywords are necessary for GCC to optimize this
properly.  Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agolib/ofp-util: Restore the check for minus sign in port number strings.
Jarno Rajahalme [Fri, 18 Apr 2014 15:26:56 +0000 (08:26 -0700)]
lib/ofp-util: Restore the check for minus sign in port number strings.

Commit 33ab38d9 (meta-flow: Simplify mf_from_ofp_port_string())
inadvertently removed a check for minus sign at the beginning of a
port number string introduced by commit 05dddba (meta-flow: Don't
allow negative port numbers).  This check is still needed, so put it
back, but to ofputil_port_from_string() this time.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agoMerge pull request #3 from joestringer/submit/xlate_cache_v2
ejj [Fri, 18 Apr 2014 00:35:52 +0000 (17:35 -0700)]
Merge pull request #3 from joestringer/submit/xlate_cache_v2

Cache the modules affected by xlate_actions().

10 years agorevalidator: Only revalidate high-throughput flows.
Joe Stringer [Tue, 4 Mar 2014 17:36:37 +0000 (09:36 -0800)]
revalidator: Only revalidate high-throughput flows.

Previously we would revalidate all flows if the "need_revalidate" flag
was raised. This patch modifies the logic to delete low throughput flows
rather than revalidate them. High-throughput flows are unaffected by
this change. This patch identifies the flows based on the mean time
between packets since the last dump.

This change is primarily targeted at situations where:
* Flow dump duration is high (~1 second)
* Revalidation is triggered. (eg, by bridge reconfiguration or learning)

After the need_revalidate flag is set, next time a new flow dump session
starts, revalidators will begin revalidating the flows. This full
revalidation is more expensive, which significantly increases the flow
dump duration. At the end of this dump session, the datapath flow
management algorithms kick in for the next dump:

* If flow dump duration becomes too long, the flow limit is decreased.
* The number of flows in the datapath then exceeds the flow_limit.
* As the flow_limit is exceeded, max_idle is temporarily set to 100ms.
* Revalidators delete all flows that haven't seen traffic recently.

The effect of this is that many low-throughput flows are deleted after
revalidation, even if they are valid. The revalidation is unnecessary
for flows that would be deleted anyway, so this patch skips the
revalidation step for those flows.

Note that this patch will only perform this optimization if the flow has
already been dumped at least once, and only if the time since the last
dump is sufficiently long. This gives the flow a chance to become
high-throughput.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
---
v2: Acked.
v1: Determine "high-throughput" by packets rather than bytes.
    Calculate the mean time between packets for comparison, rather than
      comparing the number of packets since the last dump.
RFC: First post.

10 years agoofproto-dpif-xlate: Cache xlate_actions() effects.
Joe Stringer [Thu, 10 Apr 2014 04:00:28 +0000 (16:00 +1200)]
ofproto-dpif-xlate: Cache xlate_actions() effects.

This patch adds a new object called 'struct xlate_cache' which can be
set in 'struct xlate_in', and passed to xlate_actions() to cache the
modules affected by this flow translation. Subsequently, the caller can
pass the xcache to xlate_push_stats() to credit stats and perform side
effects for a lower cost than full flow translation.

These changes are aimed currently at long-lived flows, decreasing the
average dump duration for such flows by 50-80%. This allows more flows
to be supported in the datapath at a given time. Applying these changes
to short-lived flows is left for a later commit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
---
v2: Acked.
v1: Add caching for fin_timeout action.
    Expire netflows on xlate_cache_clear().
    Account to bonds using a copy of 'flow' rather than hash.
    Always build XC_NORMAL entry (previously only if may_learn is true)
    Rename xlate_from_cache()->xlate_push_stats()
    Add may_learn parameter to xlate_push_stats()
    Tidy up xlate_actions__() mirror/netflow code.
    Fold in style fixups.
RFC: First post.

10 years agoofproto: New function ofproto_refresh_rule().
Joe Stringer [Tue, 4 Mar 2014 01:23:12 +0000 (17:23 -0800)]
ofproto: New function ofproto_refresh_rule().

This function checks for a rule in the classifier:
* If the rule exists, reset its modified time.
* If an equivalent rule exists, reset that rule's modified time.
* If no rule exists, re-install the rule and reset its modified time.
* Finally, return the rule that was modified.

This function will be used to ensure that hard timeouts for learnt rules
are refreshed if traffic consistently hits a rule with a learn action in
it. The first user will be the next commit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
---
v2: Acked.
v1: Ensure rule->modified is updated correctly.
RFC: First post.

10 years agoflow.h: Fix a comment typo
YAMAMOTO Takashi [Fri, 11 Apr 2014 01:19:24 +0000 (10:19 +0900)]
flow.h: Fix a comment typo

Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
10 years agoofproto-dpif.at: Fix a race.
Alex Wang [Wed, 16 Apr 2014 18:01:12 +0000 (11:01 -0700)]
ofproto-dpif.at: Fix a race.

For the fixed line, the ofctl_monitor.log should have 18 lines.
So, should wait until it reaches 18 lines.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
10 years agodpif-netdev: Move hash function out of the recirc action, into its own action
Andy Zhou [Wed, 9 Apr 2014 01:42:39 +0000 (18:42 -0700)]
dpif-netdev: Move hash function out of the recirc action, into its own action

Currently recirculation action can optionally compute hash. This patch
adds a hash action that is independent of the recirc action, which
no longer computes hash.  For megaflow bond with recirc, the output
to a bond port action will look like:

    hash(hash_l4(0)), recirc(<recirc_id>)

Obviously, when a recirculation application that does not depend on
hash value can just use the recirc action alone.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Pravin B Shelar <pshelar@nicira.com
10 years agodatapath: Fix a double free bug for the sample action
Andy Zhou [Tue, 15 Apr 2014 23:28:15 +0000 (16:28 -0700)]
datapath: Fix a double free bug for the sample action

When sample action returns with an error, the skb has already been
freed. This patch fix a bug to make sure we don't free it again.

10 years agoofproto-dpif: xlate should not attribute stats to bond entry when using recirc
Andy Zhou [Wed, 16 Apr 2014 15:04:23 +0000 (08:04 -0700)]
ofproto-dpif: xlate should not attribute stats to bond entry when using recirc

When recirculation is used to implement bond, the bond entry stats are
collected from the hidden post recirculation rules. This bug causes
double counting of stats to some strenuous bond entries.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agoofproto/bond: properly maintain hash entry pr_rule
Andy Zhou [Wed, 16 Apr 2014 15:01:32 +0000 (08:01 -0700)]
ofproto/bond: properly maintain hash entry pr_rule

This is a bug causing per hash entry's pr_rule pointer not properly
maintained; they became NULL after each rebalancing. This patch fixes
this bug.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agoofproto/bond: Keep hash entry slave valid.
Andy Zhou [Wed, 16 Apr 2014 14:51:02 +0000 (07:51 -0700)]
ofproto/bond: Keep hash entry slave valid.

Bond recirculation needs to refresh the 'hidden rules' from
time to time. Keep hash entry slave valid to prevent those
hidden rules from being removed.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agoofproto/bond: only display hash entries with tx_byptes > 1KB
Andy Zhou [Wed, 16 Apr 2014 08:36:59 +0000 (01:36 -0700)]
ofproto/bond: only display hash entries with tx_byptes > 1KB

When recirculation is used to implement bond, all bond entries are
always populated regardless whether there is traffic going through
them or not. This change cuts down the noise when running
'ovs-appctl bond/show', by skipping '0KB' entries.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agodatapath: Allow each vport to have an array of 'port_id's.
Alex Wang [Tue, 15 Apr 2014 06:37:10 +0000 (23:37 -0700)]
datapath: Allow each vport to have an array of 'port_id's.

In order to allow handlers directly read upcalls from datapath,
we need to support per-handler netlink socket for each vport in
datapath.  This commit makes this happen.  Also, it is guaranteed
to be backward compatible with previous branch.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>