[ovs-discuss] Upgraded to openvswitch-1.4.1 and still high load and polluted syslog

Luiz Ozaki luiz.ozaki at locaweb.com.br
Thu May 24 09:41:19 PDT 2012


On 5/17/12 8:58 AM, Oliver Francke wrote:
> Hi,
>
> uhm, I think I have my firewall-provisioning ready for production, but still temporary high load of the ovs-vswitchd.
>
> Anybody with a clue of what's going on there?
>
> --- 8-<  ---
>
> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844633|poll_loop|WARN|Dropped 771 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844634|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (101% CPU usage)
> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844635|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (101% CPU usage)
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844636|timeval|WARN|105 ms poll interval (56 ms user, 44 ms system) is over 152 times the weighted mean interval 1 ms (342116319 samples)
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844637|timeval|WARN|context switches: 0 voluntary, 2 involuntary
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844638|coverage|INFO|Skipping details of duplicate event coverage for hash=959f79a0 in epoch 342116319
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844639|poll_loop|WARN|Dropped 880 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
>
> --- 8-<  ---
>
> and ovs-dpctl shows:
>
> system at vmbr1:
>          lookups: hit:269430948 missed:1076470 lost:1
>          flows: 6
>          port 0: vmbr1 (internal)
>          port 1: eth1
>          port 4: vlan10 (internal)
>          port 5: tap822i1d0
>          port 6: tap822i1d1
>          port 7: tap410i1d0
>          port 9: tap1113i1d0
>          port 13: tap433i1d0
>          port 15: tap377i1d0
>          port 16: tap416i1d0
>          port 18: tap287i1d0
>          port 19: tap451i1d0
>          port 23: tap160i1d0
>          port 24: tap376i1d0
>          port 27: tap1084i1d0
>          port 28: tap1085i1d0
>          port 30: tap760i1d0
>          port 31: tap339i1d0
> system at vmbr0:
>          lookups: hit:15321943230 missed:8565995663 lost:201094006
>          flows: 15216
>          port 0: vmbr0 (internal)
>          port 1: vlan146 (internal)
>          port 2: eth0
OVS can't handle some sort of traffic, causing loss packets and high 
load, as shown on the vmbr0.

If you have a firewall VM, that might be the problem. OVS probably can't 
handle all those concentrated traffic. If you're running NAT, it's worse.


We had 1 VM Firewall running NAT handling about 100 VMs and we had some 
problems. We had to split the firewall into multiples VMs/hosts to 
spread the workload or change to normal bridge.

-- 
Luiz Henrique Ozaki



More information about the discuss mailing list