Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1386514 - [fdBeta] ovs-vswitchd takes 100% CPU time
Summary: [fdBeta] ovs-vswitchd takes 100% CPU time
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.1
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Lance Richardson
QA Contact: qding
URL:
Whiteboard:
Depends On:
Blocks: 1366899 1397502 1397504 1475772 1477658 1496194
TreeView+ depends on / blocked
 
Reported: 2016-10-19 07:11 UTC by Marcin Mirecki
Modified: 2017-09-26 15:53 UTC (History)
11 users (show)

Fixed In Version: openvswitch-2.5.0-22.git20160727.el7fdb
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1397502 1397504 1475772 (view as bug list)
Environment:
Last Closed: 2017-01-12 17:16:09 UTC
Target Upstream Version:


Attachments (Terms of Use)
Result of: strace -Cvfr -p `pidof ovs-vswitchd` -o vswitchd.strace (deleted)
2016-10-19 07:11 UTC, Marcin Mirecki
no flags Details
patched ovs with fix and debug (deleted)
2016-10-30 09:09 UTC, Thadeu Lima de Souza Cascardo
no flags Details
openswitch logs after patch from 2016-10-30 (deleted)
2016-11-02 13:11 UTC, Marcin Mirecki
no flags Details
ovs-vswitchd log (deleted)
2016-11-13 08:07 UTC, Mor
no flags Details
ovs-vswitchd log (deleted)
2016-11-21 13:37 UTC, Mor
no flags Details
strace log (deleted)
2016-11-21 17:22 UTC, Mor
no flags Details

Description Marcin Mirecki 2016-10-19 07:11:54 UTC
Created attachment 1211993 [details]
Result of: strace -Cvfr -p `pidof ovs-vswitchd` -o vswitchd.strace

When the openvswitch service is started, the ovs-vswitchd process takes a 100% of CPU time.

Top dump:
 1227 root      10 -10   48208  11240   9908 R 100.0  0.1   1534:17 ovs-vswitchd                                                                                
The error seems to have appeared without any specific cause.
Once it started, it is now always occurring whenever the openvswitch service is started.
                                                   
Result of: strace -Cvfr -p `pidof ovs-vswitchd` -o vswitchd.strace
shows the following recurring piece of log being generated at an enormous rate:


1227       0.000013 recvfrom(3, 0x55f5a48f6dcd, 339, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000013 recvmsg(11, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000012 accept(10, 0x7ffc6ea263d0, [128]) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000014 recvmsg(14, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000012 recvmsg(13, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000013 recvmsg(13, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000012 recvmsg(13, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1227       0.000014 poll([{fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=11, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=3, events=POLLIN}], 6, 0) = 0 (Timeout)
1227       0.000015 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={53405, 379161}, ru_stime={38423, 776487}, ru_maxrss=11240, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=496, ru_majflt=0, ru_nswap=0, ru_inblock=1688, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=53779241, ru_nivcsw=260447}) = 0

Comment 2 Thadeu Lima de Souza Cascardo 2016-10-30 09:09:35 UTC
Created attachment 1215516 [details]
patched ovs with fix and debug

Hi.

Can you test this package and let me know if it fixes the problem? If it doesn't, can you enable debug for poll-loop and send the logs, please?

ovs-appctl vlog/set poll_loop:DBG

Thanks.
Cascardo.

Comment 3 Marcin Mirecki 2016-11-02 13:11:06 UTC
Created attachment 1216532 [details]
openswitch logs after patch from 2016-10-30

The patch seems to have fixed the issue.

I attach the collected openvswitch logs.

Comment 4 Thadeu Lima de Souza Cascardo 2016-11-03 18:09:50 UTC
Hi, Marcin.

The fix has been applied upstream, can you test master and see if it fixes your problem? Just in case it is really fixed by the fix and not the debug messages I added.

Thanks.
Cascardo.

Comment 5 Mor 2016-11-13 08:07:50 UTC
Created attachment 1220123 [details]
ovs-vswitchd log

Comment 6 Mor 2016-11-13 08:10:53 UTC
Hi guys, 
I have the problem reproduced on my environment: (Open vSwitch) 2.6.90.

Comment 7 Thadeu Lima de Souza Cascardo 2016-11-14 12:22:26 UTC
The logs seem to indicate timeouts every 500ms for revalidator and every 5s for stats update. All normal timeouts. Also, only 0% CPU use is shown. Either the logs do not represent the time when the issue happened, or there is something that is not being accounted.

Cascardo.

Comment 8 Mor 2016-11-14 12:52:02 UTC
I grabbed the log only after verifying that ovs-vswitchd process consumes high CPU usage. I also set the OVS logging level to debug (ovs-appctl vlog/set dbg). I will try to upgrade to OVS 2.6.1.

Thanks,
Mor

Comment 9 qding 2016-11-18 10:06:55 UTC
Hi  Marcin,

I'm network-QE and reviewing the BZ, but cannot reproduce the issue. Could you please give more info? like kernel version, openvswitch package, steps, etc.

Thanks
QJ

Comment 10 Mor 2016-11-21 13:37:07 UTC
Hi,

Its not always reproducible on my environment. And I have no steps to reproduce it.

I am attaching additional debug level logs taken from OVN host.

Kernel version: 
3.10.0-512.el7.x86_64

OVS packages:
openvswitch-2.6.90-1.fc24.x86_64

Comment 11 Mor 2016-11-21 13:37:52 UTC
Created attachment 1222398 [details]
ovs-vswitchd log

Comment 12 Mor 2016-11-21 17:22:20 UTC
Created attachment 1222447 [details]
strace log

Comment 13 Mor 2016-11-21 17:58:02 UTC
I will try to upgrade to the latest master release and update with my findings.

Comment 14 Mor 2016-11-22 07:21:09 UTC
It seems to be fixed with the latest master release.

Comment 15 qding 2016-11-22 07:39:00 UTC
(In reply to Mor from comment #10)
> 
> Its not always reproducible on my environment. And I have no steps to
> reproduce it.
> 
> I am attaching additional debug level logs taken from OVN host.
> 
> Kernel version: 
> 3.10.0-512.el7.x86_64
> 
> OVS packages:
> openvswitch-2.6.90-1.fc24.x86_64

Thank you for the update.
Without steps, could you please give something like command line history? I'd like to know what you have done before the issue happens to help me to reproduce.

thanks
QJ

Comment 16 Mor 2016-11-22 10:19:21 UTC
In our case, the changes are done by a software provider (which provides CMS access to OVN). This software component uses the OVS API. So unfortunately, the command-line history won't us to track the changes.

Comment 18 Flavio Leitner 2016-11-22 16:36:59 UTC
The validated fix is applied to fdBeta package.
Please open a fresh a bug if there is another issue to be resolved.
Thanks!

Comment 19 qding 2016-11-23 01:17:03 UTC
(In reply to Mor from comment #16)
> In our case, the changes are done by a software provider (which provides CMS
> access to OVN). This software component uses the OVS API. So unfortunately,
> the command-line history won't us to track the changes.

Now the issue has been fixed in openvswitch-2.5.0-22.git20160727.el7fdb. I cannot reproduce the issue. Could you please help verify it?

thank you
QJ

Comment 21 Mor 2016-11-24 07:47:31 UTC
Hi QJ,

In our environment we require OVS 2.6.0 or higher, so I unable to verify it on 2.5.0.

THanks.
Mor

Comment 22 qding 2016-11-24 07:59:22 UTC
(In reply to Mor from comment #21)
> Hi QJ,
> 
> In our environment we require OVS 2.6.0 or higher, so I unable to verify it
> on 2.5.0.
> 
It's ok. Thank you anyway.

Comment 24 Marcin Mirecki 2017-06-09 08:21:36 UTC
clearing needinfo (already resolved)


Note You need to log in before you can comment on or make changes to this bug.