[ovs-discuss] xenServer and openVswitch 1.0.99
juan at embrane.com
Thu Feb 2 22:06:08 PST 2012
>>> What's the traffic mixture like when you have this problem with vlans (i.e. single flow vs. many connections)? If you run a single stream, what is the ratio of hits to misses on the relevant datapath?
Our traffic is varied, some very short flows others are long lasting tcp connections. We are mostly concerned about the long flows dropping lots of packets. When we see messages as the above ones, can we expect that the vswitch has dropped packets?
I think the relevant traffic is in the vif*.0 below, which is 1.1%. Can you explain hit/miss/lost statistic below?
system at xenbr2:
lookups: frags:0, hit:110495900, missed:1284600, lost:34
port 0: xenbr2 (internal)
port 1: eth2
port 2: xapi21 (internal)
port 3: vif455.1
port 4: vif462.0
port 5: vif456.0
port 6: vif461.0
port 7: vif457.0
port 8: vif458.0
port 9: vif465.0
port 10: vif459.0
port 11: vif467.0
port 12: vif460.0
port 13: vif463.0
port 14: vif464.0
port 16: vif466.0
We are, attempting to push the host by sending lots of small flows in order to see if we can reproduce the problem in our system more easily.
>>> Is there anything interesting in the ovs-vswitchd log?
No. They are empty for the times we are interested in.
Kern.log traces are interesting .. they do seem to correlate to some of the failures we see:
/var/log/kern.log.9.gz:Oct 2 20:55:58 localhost kernel: vif122.0: draining TX queue
/var/log/kern.log.9.gz:Oct 2 20:56:00 localhost kernel: vif117.0: draining TX queue
/var/log/kern.log.9.gz:Oct 2 20:56:00 localhost kernel: vif121.0: draining TX queue
/var/log/kern.log.9.gz:Oct 2 20:56:02 localhost kernel: vif112.0: draining TX queue
/var/log/kern.log.9.gz:Oct 2 20:56:05 localhost kernel: vif113.0: draining TX queue
Is draining occurring on a regular interval?
From: Jesse Gross [mailto:jesse at nicira.com]
Sent: Thursday, February 02, 2012 6:36 PM
To: Juan Tellez
Cc: discuss at openvswitch.org; Vijay Chander
Subject: Re: [ovs-discuss] xenServer and openVswitch 1.0.99
On Wed, Feb 1, 2012 at 6:07 PM, Juan Tellez <juan at embrane.com> wrote:
> Dmesg hasn't changed for a while .. and sadly it is not time-stamped. Below is the tail:
> device vif467.1 entered promiscuous mode
> device tap467.0 entered promiscuous mode
> device tap467.1 entered promiscuous mode
> /local/domain/465/device/vif/0: Connected
> /local/domain/465/device/vif/1: Connected
> /local/domain/466/device/vif/0: Connected
> /local/domain/466/device/vif/1: Connected
> /local/domain/467/device/vif/0: Connected
> /local/domain/467/device/vif/1: Connected
> vif458.2: draining TX queue
> vif456.2: draining TX queue
> vif457.2: draining TX queue
> vif459.2: draining TX queue
>> What are the outputs of dmesg and ovs-dpctl show?
What's the traffic mixture like when you have this problem with vlans
(i.e. single flow vs. many connections)? If you run a single stream,
what is the ratio of hits to misses on the relevant datapath?
Is there anything interesting in the ovs-vswitchd log?
More information about the discuss