[ovs-discuss] Upgraded to openvswitch-1.4.1 and still high load and polluted syslog

Volkan YAZICI volkan.yazici at gmail.com
Thu May 24 03:21:41 PDT 2012


Hi Oliver!

I have a triangle shaped network topology of 3 ovs 1.4.1
switches. When I plug-in a host into the network I immediatly
start to observe the same log pollution and excessive CPU usage
similar to yours. I either need to un-plug the machine or break
the triangle into a straight line by removing one of the
connections between the switches. Have you made any progress on
your side about the problem? Any clues?


Best.

On Thu, 17 May 2012 13:58:00 +0200, Oliver Francke writes:
> Hi,
>
> uhm, I think I have my firewall-provisioning ready for production, but still temporary high load of the ovs-vswitchd.
>
> Anybody with a clue of what's going on there?
>
> --- 8-< ---
>
> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844633|poll_loop|WARN|Dropped 771 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844634|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (101% CPU usage)
> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844635|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (101% CPU usage)
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844636|timeval|WARN|105 ms poll interval (56 ms user, 44 ms system) is over 152 times the weighted mean interval 1 ms (342116319 samples)
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844637|timeval|WARN|context switches: 0 voluntary, 2 involuntary
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844638|coverage|INFO|Skipping details of duplicate event coverage for hash=959f79a0 in epoch 342116319
> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844639|poll_loop|WARN|Dropped 880 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
>
> --- 8-< ---
>
> and ovs-dpctl shows:
>
> system at vmbr1:
>         lookups: hit:269430948 missed:1076470 lost:1
>         flows: 6
>         port 0: vmbr1 (internal)
>         port 1: eth1
>         port 4: vlan10 (internal)
>         port 5: tap822i1d0
>         port 6: tap822i1d1
>         port 7: tap410i1d0
>         port 9: tap1113i1d0
>         port 13: tap433i1d0
>         port 15: tap377i1d0
>         port 16: tap416i1d0
>         port 18: tap287i1d0
>         port 19: tap451i1d0
>         port 23: tap160i1d0
>         port 24: tap376i1d0
>         port 27: tap1084i1d0
>         port 28: tap1085i1d0
>         port 30: tap760i1d0
>         port 31: tap339i1d0
> system at vmbr0:
>         lookups: hit:15321943230 missed:8565995663 lost:201094006
>         flows: 15216
>         port 0: vmbr0 (internal)
>         port 1: vlan146 (internal)
>         port 2: eth0
> .
> .
> .
>
> Hints welcome, cause my fear is, if I put a couple of rules to some 120 VM's, that there could be performance issues when load is high anyway?
>
> Kind regards,
>
> Oliver.
>
> Am 10.05.2012 um 10:25 schrieb Oliver Francke:
>
>> Hi *,
>> 
>> as suggested by Ben I upgraded to 1.4.1 and configured with following command:
>> 
>> >          ovs-vsctl set bridge vmbr0 other-config:flow-eviction-threshold=10000
>> 
>> all 4 nodes, which worked seamlessly. And now to the section with the "but" in it:
>> 
>> But: still high load with polluting syslog with: ( not constantly, but ever so often)
>> 
>> --- 8-< ---
>> 
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139643|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139644|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139645|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139646|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139647|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139648|timeval|WARN|52 ms poll interval (10 ms user, 10 ms system) is over 19 times the weighted mean interval 3 ms (31293432 samples)
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139649|timeval|WARN|context switches: 0 voluntary, 136 involuntary
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139650|coverage|INFO|Event coverage (epoch 31293432/entire run), hash=0a8403eb:
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139651|coverage|INFO|ofproto_dpif_xlate          30 / 418241133
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139652|coverage|INFO|flow_extract                15 / 119894932
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139653|coverage|INFO|hmap_pathological            6 / 187853714
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139654|coverage|INFO|hmap_expand                275 / 1030432349
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139655|coverage|INFO|netdev_get_stats           117 /   1275136
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139656|coverage|INFO|poll_fd_wait                15 / 469400883
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139657|coverage|INFO|util_xalloc              21740 / 85658737570
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139658|coverage|INFO|netdev_ethtool             234 /   2550522
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139659|coverage|INFO|netlink_received           486 / 444888366
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139660|coverage|INFO|netlink_sent               264 / 361913061
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139661|coverage|INFO|bridge_reconfigure           0 /         6
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139662|coverage|INFO|ofproto_flush                0 /         2
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139663|coverage|INFO|ofproto_update_port          0 /       131
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139664|coverage|INFO|facet_revalidate             0 /    157765
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139665|coverage|INFO|facet_unexpected             0 /         1
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139666|coverage|INFO|dpif_port_add                0 /         2
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139667|coverage|INFO|dpif_port_del                0 /         2
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139668|coverage|INFO|dpif_flow_flush              0 /         4
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139669|coverage|INFO|dpif_flow_put                0 /       445
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139670|coverage|INFO|dpif_flow_del                0 / 119661491
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139671|coverage|INFO|dpif_purge                   0 /         2
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139672|coverage|INFO|mac_learning_learned         0 /      6111
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139673|coverage|INFO|mac_learning_expired         0 /      5598
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139674|coverage|INFO|poll_zero_timeout            0 /      6190
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139675|coverage|INFO|pstream_open                 0 /         4
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139676|coverage|INFO|stream_open                  0 /         1
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139677|coverage|INFO|netdev_set_policing          0 /       706
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139678|coverage|INFO|netdev_get_ifindex           0 /       123
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139679|coverage|INFO|netdev_get_hwaddr            0 /       125
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139680|coverage|INFO|nln_changed                  0 /       137
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139681|coverage|INFO|netlink_recv_jumbo           0 /  16397628
>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139682|coverage|INFO|47 events never hit
>> May  8 06:30:16 fcmsnode0 ovs-vswitchd: 139683|poll_loop|WARN|Dropped 216 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
>> May  8 06:30:16 fcmsnode0 ovs-vswitchd: 139684|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:16 fcmsnode0 ovs-vswitchd: 139685|poll_loop|WARN|wakeup due to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at lib/dpif-linux.c:1197 (52% CPU usage)
>> May  8 06:30:17 fcmsnode0 ovs-vswitchd: 139686|poll_loop|WARN|Dropped 480 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
>> 
>> --- 8-< ---
>> 
>> Perhaps there is already a hint in the stats... If not, how to dig into it further...?
>> 
>> Kind regards,
>> 
>> Oliver.
>> 
>> -- 
>> 
>> Oliver Francke
>> 
>> filoo GmbH
>> Moltkestraße 25a
>> 33330 Gütersloh
>> HRB4355 AG Gütersloh
>> 
>> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>> 
>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>> 
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss


More information about the discuss mailing list