Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1694783 - ovs-vswitchd CPU usage is high and lots of "ip netns" stuck in "D" state
Summary: ovs-vswitchd CPU usage is high and lots of "ip netns" stuck in "D" state
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Flavio Leitner
QA Contact: Roee Agiman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-01 16:49 UTC by David Hill
Modified: 2019-04-16 14:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4023011 Troubleshoot None ovs-vswitchd CPU usage is high and lots of "ip netns" stuck in "D" state 2019-04-01 17:02:13 UTC

Description David Hill 2019-04-01 16:49:19 UTC
Description of problem:
ovs-vswitchd CPU usage is high and lots of "ip netns" stuck in "D" state .   There're ~140 HA routers and controllers were rebooted one by one.   Rebooting them altogether solved the problem.

Version-Release number of selected component (if applicable):
openvswitch-2.9.0-19

How reproducible:
Random

Steps to Reproduce:
1. Randomly occuring
2.
3.

Actual results:
ovs-vswitchd CPU is high, ip netns commands get stuck/delayed in D state for a while, rebooting all 3 controllers solved the problem but generating sosreports while in this state is not possible.

Expected results:


Additional info:

Comment 3 David Hill 2019-04-03 15:41:32 UTC
We have sosreports available attached to the case.  I'm yanking them right now and will do a first analysis of the logs.

Comment 4 Flavio Leitner 2019-04-11 18:57:08 UTC
Hi David,

Could you point me to the right sosreport?
What's the current status? Is the problem reproducible? If yes, what are the steps?

IIRCthe kernel has to get the RNTL lock to do the ip netns commands, and since that is used by many things, we will need to identify the one holding the lock for too long. Usually in that scenario the CPU gets stuck and the kernel prints a warning followed by a stack trace.

Thanks,
fbl

Comment 5 David Hill 2019-04-12 18:29:31 UTC
Hey Flavio,

   We have all of that on collabshell/supportshell if you have access. You can ping me on IRC if you can't access them.
The sosreports we have were generated after the reboot of all 3 controllers at the same time as before that, sosreport would
break on the network information gathering due to this issue.

Thank you very much,

David Hill


Note You need to log in before you can comment on or make changes to this bug.