Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1366044 - Performance information displayed is incorrect
Summary: Performance information displayed is incorrect
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: UI
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3
Assignee: sankarshan
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-10 21:32 UTC by Jean-Charles Lopez
Modified: 2017-03-23 04:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 04:06:25 UTC


Attachments (Terms of Use)

Description Jean-Charles Lopez 2016-08-10 21:32:07 UTC
Description of problem: While looking at the home page for the cluster, the performance information displayed is inaccurate and appears to be the last information that was polled


Version-Release number of selected component (if applicable): 2.0 (0.0.39)


How reproducible:
100%

Steps to Reproduce:
1. Create an RBD  (rbd create <pool_name>/<rbd_name> --size 128)
2. Map the RBD to a node (rbd map <pool_name>/<rbd_name>)
3. Start a test job against the cluster to generate load (dd if=/dev/zero of=/dev/rbd0 bs=1k count=1024
4. Observe the performance information on the UI
5. When the dd command stops, information is still displayed

Actual results:
Non zero values are displayed

Expected results:
0 values should be displayed

Additional info:

Comment 4 Nishanth Thomas 2016-08-11 02:07:34 UTC
Information in the dashboards are polled and collected to time-series DB periodically and USM gets the data from there to display it on the dashboard. The polling interval is configured as 10 minutes and it will take 10-15  minutes to appear on the dashboard. This is an expected behavior and not bug.

Comment 5 Nishanth Thomas 2016-08-11 06:33:35 UTC
we already have a bug to track this issue, Martin can you please link that up here?

Comment 6 Jean-Charles Lopez 2016-08-11 15:21:14 UTC
(In reply to Nishanth Thomas from comment #4)
> Information in the dashboards are polled and collected to time-series DB
> periodically and USM gets the data from there to display it on the
> dashboard. The polling interval is configured as 10 minutes and it will take
> 10-15  minutes to appear on the dashboard. This is an expected behavior and
> not bug.

My cluster has been doing nothing for the last 12 hours and it still shows 10199.3 KB/s bandwidth usage, 0.6 K IOPS and 0.4 ms latency. Is it still expected behaviour?

Comment 7 Nishanth Thomas 2016-08-12 13:40:09 UTC
(In reply to Jean-Charles Lopez from comment #6)
> (In reply to Nishanth Thomas from comment #4)
> > Information in the dashboards are polled and collected to time-series DB
> > periodically and USM gets the data from there to display it on the
> > dashboard. The polling interval is configured as 10 minutes and it will take
> > 10-15  minutes to appear on the dashboard. This is an expected behavior and
> > not bug.
> 
> My cluster has been doing nothing for the last 12 hours and it still shows
> 10199.3 KB/s bandwidth usage, 0.6 K IOPS and 0.4 ms latency. Is it still
> expected behaviour?

It is not 10199.3 KB/s but it is 10199.3 B/s(corrected in the latest version)
One thing to understand here is that these values are not just specific to cluster alone. We retrieve the values on per host basis(which you can see on the host dashboard).For example take network throughput, it takes all the network interfaces into account, not only the one used by ceph cluster. For cluster it takes all host values, aggregates and average it out. For the main dashboard 
its aggregate of all cluster values.This is applicable to all the stats mentioned above. So I think what you are seeing is expected.

Comment 8 Jean-Charles Lopez 2016-08-12 15:22:37 UTC
(In reply to Nishanth Thomas from comment #7)
> (In reply to Jean-Charles Lopez from comment #6)
> > (In reply to Nishanth Thomas from comment #4)
> > > Information in the dashboards are polled and collected to time-series DB
> > > periodically and USM gets the data from there to display it on the
> > > dashboard. The polling interval is configured as 10 minutes and it will take
> > > 10-15  minutes to appear on the dashboard. This is an expected behavior and
> > > not bug.
> > 
> > My cluster has been doing nothing for the last 12 hours and it still shows
> > 10199.3 KB/s bandwidth usage, 0.6 K IOPS and 0.4 ms latency. Is it still
> > expected behaviour?
> 
> It is not 10199.3 KB/s but it is 10199.3 B/s(corrected in the latest version)
> One thing to understand here is that these values are not just specific to
> cluster alone. We retrieve the values on per host basis(which you can see on
> the host dashboard).For example take network throughput, it takes all the
> network interfaces into account, not only the one used by ceph cluster. For
> cluster it takes all host values, aggregates and average it out. For the
> main dashboard 
> its aggregate of all cluster values.This is applicable to all the stats
> mentioned above. So I think what you are seeing is expected.

Yes the KB was a typo sorry mas I installed the latest update that fixes the hard coded unit.

OK for this explanation regarding where the numbers come.

Still unusable for a ceph storage admin as it is so I'll live with it.

Comment 9 Ju Lim 2016-08-24 18:58:58 UTC
JC Lopez:
1366044 and 1364461 Make sure all units displayed are the same across all windows. A detail but details make the difference and it should be an easy fix.

1366044 and 1364461 Make sure performance information is cluster related and not network related. If network oriented, make sure you split figures between public and cluster interfaces to distinguish between client traffic and replication traffic

1366044 and 1364461 Make sure performance information is refreshed like it should be. Performance refreshed every 10 to 15 minutes is not performance, it’s trending or snapshot and then you should change all labels.

Comment 10 Ju Lim 2016-08-26 03:59:58 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1364461 as well.  Both this and BZ1364461 seem to contain a few different issues raised by JC Lopez.

Specific to the "performance" metrics, the current way is calculated is not perceived as useful to the storage admin.  What I believe JC Lopez is asking for is for the information to be shown at a cluster level, which impacts the Main Dashboard and Cluster-Overview tab (Cluster object details).

If we do show network specific performance metrics, the ask is to split the front-end and back-end traffic vs. how it's currently done based on all (front-end and back-end) interfaces and then averaging the values.

If do-able, then it would mean 2 lines on the network-related "Performance" charts for the front-end and back-end, and adding either a legend and/or providing a hover/tooltip to indicate whether it's the front-end or back-end traffic.  Note: This would have to be performed on all the places where network performance is shown, i.e. main dashboard, cluster overview tab, and host overview tab.

Note: All "Performance" related widgits in the 3 "dashboards" should be re-labeled as Performance Trends.

I think beyond this JC Lopez probably would like to see specific telemetry information for clusters that's not address, and probably further discussion is warranted to add that as a RFE for consideration in a future release.


Note You need to log in before you can comment on or make changes to this bug.