Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1692440 - Neutron SRIOV agent loop iteration exceeding polling interval, causing high CPU usage in neutron-rootwrap-daemon
Summary: Neutron SRIOV agent loop iteration exceeding polling interval, causing high C...
Keywords:
Status: POST
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rodolfo Alonso
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-25 15:43 UTC by Bernard Cafarelli
Modified: 2019-04-16 04:37 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1760471 None None None 2019-04-05 17:47:29 UTC
OpenStack gerrit 650418 None None None 2019-04-05 18:31:14 UTC

Description Bernard Cafarelli 2019-03-25 15:43:05 UTC
On a 13z5 deployment with SRIOV, the neutron SRIOV agent on compute node will keep appearing in CPU usage via the rootwrap calls to "ip link". Sample log:
2019-03-25 13:09:02.345 59675 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Loop iteration exceeded interval (2 vs. 2.27140307426)! daemon_loop /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:412
2019-03-25 13:09:02.345 59675 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Agent rpc_loop - iteration:2 started daemon_loop /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:374
2019-03-25 13:09:02.345 59675 DEBUG neutron.agent.linux.utils [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Running command (rootwrap daemon): ['ip', 'link', 'show'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-03-25 13:09:03.096 59675 DEBUG neutron.agent.linux.utils [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Running command (rootwrap daemon): ['ip', 'link', 'show'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-03-25 13:09:03.847 59675 DEBUG neutron.agent.linux.utils [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Running command (rootwrap daemon): ['ip', 'link', 'show'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-03-25 13:09:04.598 59675 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Loop iteration exceeded interval (2 vs. 2.25298190117)! daemon_loop /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:412
2019-03-25 13:09:04.598 59675 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-1ec83b30-7aa4-440f-91eb-26ad1b84561f - - - - -] Agent rpc_loop - iteration:3 started daemon_loop /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:374

Based on the "loop iteration exceeded" message, I modified /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/sriov_agent.ini to add "polling_interval=5" to [agent] section, and after restarting the neutron_sriov_agent, activity returned to normal (1 iteration doing some ip link rootwrap calls, and pausing until next iteration).

Now, the question is, is the agent loop exceeding 2 seconds on a small deployment normal? Should we configure this parameter to longer value?

Comment 6 Bernard Cafarelli 2019-04-04 15:52:12 UTC
To extend the loop interval, on the compute node, this can be edited in /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/sriov_agent.ini

In [agent] section, set:
polling_interval=5

Then restart the container:
# docker restart neutron_sriov_agent

To see if iterations take too much time, watch for similar lines to:
2019-04-04 15:18:31.366 59965 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-1263b780-bfad-408b-a2fa-8d89af87dc95 - - - - -] Loop iteration exceeded interval (2 vs. 3.00258302689)! daemon_loop /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:412
in /var/log/containers/neutron/sriov-nic-agent.log

Comment 15 Boris Deschenes 2019-04-15 13:57:38 UTC
another impact of this issue is the high (around 80% for us) CPU usage even on inactive SR-IOV compute nodes, this disappears completely if the container is started with the ulimit nofiles=16384 (as proposed in https://review.openstack.org/#/c/650418/)


Note You need to log in before you can comment on or make changes to this bug.