Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1694877 - top shows super high loads when tuned profile realtime-virtual-host is applied
Summary: top shows super high loads when tuned profile realtime-virtual-host is applied
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Scott Wood
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1672377
TreeView+ depends on / blocked
 
Reported: 2019-04-01 22:36 UTC by Yichen Wang
Modified: 2019-04-10 22:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Yichen Wang 2019-04-01 22:36:18 UTC
Description of problem

As recommended by RedHat, I am trying to switchover to tuned. When applying realtime-virtual-host profile, we are seeing a very weird behavior. It may not be tuned's fault, but it is only seen after this tuned profile.

In order to better reveal this, we configured our testbed to have mixed of HT and non-HT deployment. On HT enabled node, the issue is 100% reproducible (4/4), on non-HT enabled node, the issue is 30% reproducible (1/3). All nodes are essentially running with same hardware/software.

[root@quincy-compute-4 ~]# top
top - 13:03:08 up 13:39,  1 user,  load average: 149.30, 149.47, 149.73
Tasks: 1622 total,   9 running, 1613 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.7 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 39469430+total, 19495484+free, 19490510+used,  4834360 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 19361452+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 15856 root      39  19       0      0      0 S  13.9  0.0  77:30.36 kipmi0
   610 root      -4   0       0      0      0 S  11.7  0.0  74:54.14 ktimersoftd/60
    23 root      -3   0       0      0      0 S   4.5  0.0  14:18.30 ksoftirqd/1
   213 root      -4   0       0      0      0 S   4.5  0.0  33:38.03 ktimersoftd/20
226553 root      20   0  163648   3976   1592 R   1.9  0.0   0:00.38 top

On non-HT enabled testbed, top shows:
[root@quincy-control-2 ~]# top
top - 13:03:43 up 13:40,  1 user,  load average: 73.13, 73.89, 73.84
Tasks: 1133 total,   2 running, 1131 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  1.6 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 19652601+total, 91847016 free, 96707744 used,  7971260 buff/cache
KiB Swap: 33554428 total, 33554428 free,        0 used. 92710192 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
183236 rabbitmq  20   0 3574340 107592   2804 S  28.3  0.1   0:02.50 beam.smp
 54714 rabbitmq  20   0 7835916 268384   4348 S  27.7  0.1 224:43.84 beam.smp
     4 root      -4   0       0      0      0 S   2.9  0.0  27:16.72 ktimersoftd/0
   213 root      -4   0       0      0      0 S   2.6  0.0  21:12.23 ktimersoftd/20
 71258 cloudpu+  20   0  298596  83080   7696 S   2.0  0.0  14:13.75 cloudpulse-serv
 53655 mysql     20   0   15.6g 531176 146124 S   1.6  0.3  10:44.52 mysqld
183237 root      20   0  163116   3476   1592 R   1.6  0.0   0:00.20 top

You see the load average is like 150 for HT, and 74 for non-HT. Given this is a 20 * 2 = 40 cores CPU, if the system is behaving like what "top" says, it should be super overloaded. However, things are still fine, just the top output is scaring people away.

The issue is confirmed to be tuned, on both new and old kernel versions. I tried with tuned profile realtime, same thing. Then I tried realtime's parent profile, network-latency, issue is not seen. So clearly something is wrong when applying realtime tuned profiles. When reproduced, it is 100% happening, reboot doesn't do anything.

Maybe first of all, I would like helps to understand how to start to debug with this...

Version-Release number of selected component (if applicable):
RHEL: 3.10.0-957.10.1.rt56.924.el7.x86_64
tuned-profiles-nfv-guest-2.10.0-6.el7.noarch
tuned-profiles-realtime-2.10.0-6.el7.noarch
tuned-profiles-nfv-host-2.10.0-6.el7.noarch
tuned-profiles-nfv-2.10.0-6.el7.noarch
tuned-2.10.0-6.el7.noarch

How reproducible:
5/7

Steps to Reproduce:
1. Install RHEL 7.6, apply RT kernel with above version;
2. Apply tuned profile realtime-virtual-host
3. Reboot

Actual results:
top shows super high loads (7x.xx (non-HT)/15x.xx (HT));

Expected results:
top should show very minor load (0.xx)

Additional info:
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping:              4
CPU MHz:               1601.000
CPU max MHz:           1601.0000
CPU min MHz:           1000.0000
BogoMIPS:              3200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              28160K
NUMA node0 CPU(s):     0-19,40-59
NUMA node1 CPU(s):     20-39,60-79
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp flush_l1d

Comment 4 Yichen Wang 2019-04-04 23:33:19 UTC
Hi, guys. This is not a super urgent priority for us, as the normal functionality does work fine. But I want to get the debugging started, and not wait until the last moment when customers getting panic.

Honestly I have no idea how to start, maybe start with some scheduler debugging tools/methods to understand what exactly jobs are queueing? Want some guidance or professional tips from you guys to see where to start to look problems like this...

THanks very much!

Regards,
Yichen

Comment 5 Clark Williams 2019-04-07 23:35:36 UTC
(In reply to Yichen Wang from comment #4)
> Hi, guys. This is not a super urgent priority for us, as the normal
> functionality does work fine. But I want to get the debugging started, and
> not wait until the last moment when customers getting panic.
> 
> Honestly I have no idea how to start, maybe start with some scheduler
> debugging tools/methods to understand what exactly jobs are queueing? Want
> some guidance or professional tips from you guys to see where to start to
> look problems like this...
> 
> THanks very much!
> 

Just so I'm clear, you're seeing this with RT kernels running on the host? And the
high load average is being seen on the host? 

That's the case, I'd start looking at top, or 'perf top' output to see what's eating
up CPU time.

Comment 6 Yichen Wang 2019-04-08 05:59:47 UTC
Hi, Clark,

Yes, RT kernel, version is given above. High load is being seen on host level.

The key mystery is, "top" shows very high load, but the CPU usage is not adding up to that high from what shown in top. "top" output is given above, "perf top" is also not showing much interesting:
Samples: 195K of event 'cycles:ppp', Event count (approx.): 77185401655
Overhead  Shared Object                Symbol
  82.01%  [kernel]                     [k] cpu_idle_poll
   1.75%  [kernel]                     [k] port_inb
   0.94%  [kernel]                     [k] system_call_after_swapgs
   0.84%  [kernel]                     [k] retint_userspace_restore_args
   0.72%  [kernel]                     [k] enqueue_entity
   0.68%  [kernel]                     [k] try_to_wake_up
   0.66%  [kernel]                     [k] __schedule
   0.64%  [kernel]                     [k] select_task_rq_fair
   0.59%  [kernel]                     [k] rt_spin_unlock
   0.53%  [kernel]                     [k] __enqueue_entity
   0.52%  [kernel]                     [k] __switch_to

Thanks very much!

Regards,
Yichen


Note You need to log in before you can comment on or make changes to this bug.