[ovs-discuss] xenServer and openVswitch 1.0.99

Juan Tellez juan at embrane.com
Fri Feb 3 16:29:26 PST 2012


Jesse,

> I don't really see anything in the information that you've given that
> indicates OVS is the one dropping packets

We do not see the same problem with the Linux Bridge, and we want to use the vswitch.

Is it possible that ovs disrupts long living TCP connections in the presence of many short flows? The long living connections experience long 10-15s delays caused by excessive packet drops after running for about 16 hours.  

Again .. I'm looking for the possibility that is an already fixed bug in 1.2, 1.3 or 1.4.

Thanks,

Juan

-----Original Message-----
From: Jesse Gross [mailto:jesse at nicira.com] 
Sent: Friday, February 03, 2012 9:20 AM
To: Juan Tellez
Cc: discuss at openvswitch.org; Vijay Chander
Subject: Re: [ovs-discuss] xenServer and openVswitch 1.0.99

On Thu, Feb 2, 2012 at 10:06 PM, Juan Tellez <juan at embrane.com> wrote:
> Jesse,
>
>>>> What's the traffic mixture like when you have this problem with vlans (i.e. single flow vs. many connections)?  If you run a single stream, what is the ratio of hits to misses on the relevant datapath?
>
> Our traffic is varied, some very short flows others are long lasting tcp connections.

If you have many short flows, it's possible that the CPU load you see
is simply the result of normal processing.

> We are mostly concerned about the long flows dropping lots of packets.  When we see messages as the above ones, can we expect that the vswitch has dropped packets?

I don't really see anything in the information that you've given that
indicates OVS is the one dropping packets.

> I think the relevant traffic is in the vif*.0 below, which is 1.1%.  Can you explain hit/miss/lost statistic below?

Hits are packets processed entirely in the kernel, misses are sent to
userspace for flow setup, lost are packets that were queued to
userspace but exceeded the queue length.

> Kern.log traces are interesting .. they do seem to correlate to some of the failures we see:
>
> /var/log/kern.log.9.gz:Oct  2 20:55:58 localhost kernel: vif122.0: draining TX queue
> /var/log/kern.log.9.gz:Oct  2 20:56:00 localhost kernel: vif117.0: draining TX queue
> /var/log/kern.log.9.gz:Oct  2 20:56:00 localhost kernel: vif121.0: draining TX queue
> /var/log/kern.log.9.gz:Oct  2 20:56:02 localhost kernel: vif112.0: draining TX queue
> /var/log/kern.log.9.gz:Oct  2 20:56:05 localhost kernel: vif113.0: draining TX queue
>
> Is draining occurring on a regular interval?

Those messages are coming from netback, not OVS.  Combined with the
fact that you see dropped counts going up on the interface itself, it
seems that's the likely cause of the problem.  Probably something on
the guest side is not keeping up but you should talk to the Xen guys.


More information about the discuss mailing list