Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1364461 - Network utilization charts are not working properly
Summary: Network utilization charts are not working properly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: core
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2
Assignee: anmol babu
QA Contact: Daniel Horák
URL:
Whiteboard:
: 1366083 (view as bug list)
Depends On:
Blocks: Console-2-GA
TreeView+ depends on / blocked
 
Reported: 2016-08-05 12:00 UTC by Daniel Horák
Modified: 2016-10-04 06:59 UTC (History)
10 users (show)

Fixed In Version: RHEL: rhscon-agent-0.0.18-1.el7scon Ubuntu: rhscon_agent-0.0.18-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-04 06:59:39 UTC
Target Upstream Version:


Attachments (Terms of Use)
Host page: Network utilization vs Network Throughput (deleted)
2016-08-05 12:00 UTC, Daniel Horák
no flags Details
Network trafic measured by `nload` (deleted)
2016-08-05 12:02 UTC, Daniel Horák
no flags Details
network throughput (deleted)
2016-08-09 13:33 UTC, Martin Kudlej
no flags Details


Links
System ID Priority Status Summary Last Updated
Gerrithub.io 286478 None None None 2016-08-08 03:44:13 UTC
Red Hat Bugzilla 1338692 None None None Never
Red Hat Bugzilla 1365578 None None None Never
Red Hat Bugzilla 1365989 None None None Never
Red Hat Bugzilla 1365995 None None None Never
Red Hat Bugzilla 1366242 None None None Never
Red Hat Product Errata RHEA-2016:1754 normal SHIPPED_LIVE New packages: Red Hat Storage Console 2.0 2017-04-18 19:09:06 UTC


Description Daniel Horák 2016-08-05 12:00:59 UTC
Created attachment 1187860 [details]
Host page: Network utilization vs Network Throughput

Description of problem:
  It seems like the Network utilization chart doesn't work properly.

  I have HW cluster with 3 MON and 4 OSD nodes with two configured networks (1G and 10G).

  And I utilize the network via iperf command (`iperf -s` on first OSD node and `iperf -c 192.168.100.101 --time 36000` on the second OSD node (192.168.100.101 is the IP address of the 10G network interface on the first OSD node).

  Command `nload` on the interface p2p1 (192.168.100.102) shows following values:
    Curr: 4.65 GBit/s
    Avg: 4.65 GBit/s
    Min: 4.64 GBit/s
    Max: 4.65 GBit/s
    Ttl: 11045.28 GByte

  And it is wisible also in the Host -> Performance - Network Throughput chart in USM.
  
  But Network utilization section says for all values zero.

Version-Release number of selected component (if applicable):
  USM Server (RHEL 7.2):
  ceph-installer-1.0.14-1.el7scon.noarch
  libcollection-0.6.2-25.el7.x86_64
  ceph-ansible-1.0.5-32.el7scon.noarch
  rhscon-core-0.0.39-1.el7scon.x86_64
  rhscon-ui-0.0.51-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  rhscon-ceph-0.0.39-1.el7scon.x86_64

  Ceph OSD/MON node (RHEL 7.2):
  calamari-server-1.4.8-1.el7cp.x86_64
  ceph-base-10.2.2-33.el7cp.x86_64
  ceph-common-10.2.2-33.el7cp.x86_64
  ceph-mon-10.2.2-33.el7cp.x86_64
  ceph-osd-10.2.2-33.el7cp.x86_64
  ceph-selinux-10.2.2-33.el7cp.x86_64
  collectd-ping-5.5.1-1.1.el7.x86_64
  collectd-5.5.1-1.1.el7.x86_64
  libcephfs1-10.2.2-33.el7cp.x86_64
  libcollection-0.6.2-25.el7.x86_64
  python-cephfs-10.2.2-33.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch

How reproducible:
  100%

Steps to Reproduce:
1. Utilize network by `iperf -s` on one node and `iperf -c 192.168.100.101 --time 36000` on the second node.

Actual results:
  Network utilization charts shows zeros.

Expected results:
  Network utilization charts shows meaningfull data.

Additional info:
  See the attached screenshots.

Comment 1 Daniel Horák 2016-08-05 12:02:49 UTC
Created attachment 1187861 [details]
Network trafic measured by `nload`

Comment 6 Martin Kudlej 2016-08-09 09:34:55 UTC
Tested with:
server
ceph-ansible-1.0.5-32.el7scon.noarch
ceph-installer-1.0.14-1.el7scon.noarch
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-ui-0.0.52-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-master-2015.5.5-1.el7.noarch
salt-selinux-0.0.41-1.el7scon.noarch

node
calamari-server-1.4.8-1.el7cp.x86_64
ceph-base-10.2.2-36.el7cp.x86_64
ceph-common-10.2.2-36.el7cp.x86_64
ceph-mon-10.2.2-36.el7cp.x86_64
ceph-selinux-10.2.2-36.el7cp.x86_64
libcephfs1-10.2.2-36.el7cp.x86_64
python-cephfs-10.2.2-36.el7cp.x86_64
rhscon-agent-0.0.18-1.el7scon.noarch
rhscon-core-selinux-0.0.41-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-minion-2015.5.5-1.el7.noarch
salt-selinux-0.0.41-1.el7scon.noarch

and there are these issues:

1) Host dashboard shows Performance->Throughput in wrong units. Now there is: "312112629.0 KB/s" and it should be "312112629.0 packets/s" because it is interface-rx_tx from Graphite

2) Network->Utilization units are wrong. There is "GB" now and it should be "GB/s".

Comment 7 anmol babu 2016-08-09 13:31:34 UTC
Network throughput is calculated as summation of average of interface rx and average of interface tx across all interfaces of node.

Network Utilization is calculated as summation of rxs and txs of all interfaces in the node divided by summation of bandwidths of all interfaces and then the result multiplied by 100 to get the percentage.

Comment 8 Martin Kudlej 2016-08-09 13:33:40 UTC
Created attachment 1189267 [details]
network throughput

Could you please explain how are related these 2 numbers and how is throughput calculated?

Comment 9 Martin Kudlej 2016-08-09 13:49:28 UTC
(In reply to anmol babu from comment #7)
"average of interface rx and average of interface tx across all interfaces of node" in other words means "packets/s" and not KB/s
For example for this node RX an TX from ifconfig output:
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.16.157.12  netmask 255.255.248.0  broadcast 10.16.159.255
        inet6 fe80::d6be:d9ff:feb3:8ef0  prefixlen 64  scopeid 0x20<link>
        ether d4:be:d9:b3:8e:f0  txqueuelen 1000  (Ethernet)
---->        RX packets 964147750  bytes 1076760752912 (1002.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
---->        TX packets 2685424374  bytes 3982185569935 (3.6 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:be:d9:b3:8e:f2  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:be:d9:b3:8e:f4  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em4: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:be:d9:b3:8e:f6  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 2608699  bytes 2085705817 (1.9 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2608699  bytes 2085705817 (1.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p1p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 192.168.100.105  netmask 255.255.255.0  broadcast 192.168.100.255
        inet6 fe80::92e2:baff:fe04:7e80  prefixlen 64  scopeid 0x20<link>
        ether 90:e2:ba:04:7e:80  txqueuelen 1000  (Ethernet)
---->        RX packets 1780672903  bytes 25882228119558 (23.5 TiB)
        RX errors 0  dropped 132320  overruns 0  frame 0
---->        TX packets 2928330198  bytes 18071710203275 (16.4 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p1p2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 90:e2:ba:04:7e:81  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


So please change unit KB/s to packets/s in right graph.

Comment 10 Nishanth Thomas 2016-08-09 14:25:43 UTC
As anmol explained in the call, we are taking octets/sec(octet is nothing but a byte) not packets/s from collectd. So if you change in the UI to packet per sec it won't be correct. 

As discussed in the call we will make two changes in the UI

1) Network utilization  KB/MB/GB per sec
2) Network throughput   KB/MB/GB per sec

Comment 11 Karnan 2016-08-10 08:37:17 UTC
(In reply to Nishanth Thomas from comment #10)
> As anmol explained in the call, we are taking octets/sec(octet is nothing
> but a byte) not packets/s from collectd. So if you change in the UI to
> packet per sec it won't be correct. 
> 
> As discussed in the call we will make two changes in the UI
> 
> 1) Network utilization  KB/MB/GB per sec
> 2) Network throughput   KB/MB/GB per sec

Network throughput is a time series data. we cannot convert KB/MB/GB per sec.
so it will be plotted as B/s

Comment 12 Karnan 2016-08-10 10:03:27 UTC
(In reply to Karnan from comment #11)
> (In reply to Nishanth Thomas from comment #10)
> > As anmol explained in the call, we are taking octets/sec(octet is nothing
> > but a byte) not packets/s from collectd. So if you change in the UI to
> > packet per sec it won't be correct. 
> > 
> > As discussed in the call we will make two changes in the UI
> > 
> > 1) Network utilization  KB/MB/GB per sec
> > 2) Network throughput   KB/MB/GB per sec
> 
> Network throughput is a time series data. we cannot convert KB/MB/GB per sec.
> so it will be plotted as B/s

thresholds are coming as time series data in bytes. At one point it can be in kb and next moment it can be in gb. so dynamically switching units for whole data is not feasible. so, we are sticking with B/s

Comment 13 Nishanth Thomas 2016-08-10 12:46:08 UTC
Moving to ON_QA. FIV  rhscon-ui-0.0.53-1.el7scon

Comment 14 Martin Kudlej 2016-08-10 17:25:22 UTC
Tested with 
rhscon-ui-0.0.53-1.el7scon.noarch.rpm
and there are correct units now.
In "network throughput" graph there are B/s with big number because units cannot be dynamically changed. bug 1365995

Also there is request for documenting how is calculated network throughput https://bugzilla.redhat.com/show_bug.cgi?id=1338692#c5

Also there is bug 1365989 for calculating network throughput only from interfaces related to Ceph.

Comment 15 Nishanth Thomas 2016-08-11 06:20:29 UTC
*** Bug 1366083 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2016-08-23 19:58:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754


Note You need to log in before you can comment on or make changes to this bug.