Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1517233 - clearing info alert doesn't remove warning alert
Summary: clearing info alert doesn't remove warning alert
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: gowtham
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks: 1503134
TreeView+ depends on / blocked
 
Reported: 2017-11-24 11:17 UTC by Lubos Trilety
Modified: 2018-08-24 09:55 UTC (History)
6 users (show)

Fixed In Version: tendrl-ansible-1.6.1-2.el7rhgs.noarch.rpm, tendrl-api-1.6.1-1.el7rhgs.noarch.rpm, tendrl-commons-1.6.1-1.el7rhgs.noarch.rpm, tendrl-monitoring-integration-1.6.1-1.el7rhgs.noarch.rpm, tendrl-node-agent-1.6.1-1.el7, tendrl-ui-1.6.1-1.el7rhgs.noarch.rpm,
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-15 12:50:35 UTC


Attachments (Terms of Use)
stale alert (deleted)
2017-11-24 11:17 UTC, Lubos Trilety
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1611601 None None None Never
Github Tendrl commons issues 800 None None None 2018-01-09 09:34:15 UTC
Github Tendrl gluster-integration pull 543 None None None 2018-01-09 09:35:39 UTC

Internal Links: 1611601

Description Lubos Trilety 2017-11-24 11:17:02 UTC
Created attachment 1358621 [details]
stale alert

Description of problem:
'Service: glustershd is disconnected in cluster' warning alert is not cleared with 'Service: glustershd is disconnected in cluster' event. It stays forever in Alerts drawer, no other 'Service: glustershd is disconnected in cluster' event clears it.


Version-Release number of selected component (if applicable):
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-notifier-1.5.4-3.el7rhgs.noarch

How reproducible:
30%

Steps to Reproduce:
1. Restart glusterd service when 'another transaction is in progress' for a volume. In my case some stale lock happened.
2.
3.


Actual results:
In first scenario RHGSWA generates two events with the same timestamp 'Service: glustershd is disconnected in cluster' and 'Service: glustershd is disconnected in cluster'. For the first one an alert is created (mail and snmp trap send if configured). The second event is 'ignored', no clear alert is generated.


Expected results:
A clear event should be processed even when it happens almost the same time.


Additional info:
Any clearing event which comes almost the same time as original alert doesn't clear the alert. I was able to have 'Status of peer: <hostname> in cluster <cluster_ID> changed from Connected to Disconnected' in Alerts drawer, because 'Disconnected to Connected' event comes almost the same time.

Used scenario (it has bigger reproducibility):
1. Switch off one gluster node
2. Load some data to the gluster volume
3. Start node
4. Restart glusterd service on some other node (if needed several times)

However for this particular alert another stop-start of glusterd service clears the warning.

Comment 1 Nishanth Thomas 2017-11-24 12:26:59 UTC
Probability of occurrence of this is very low in normal condtions. I couldn't reproduce in my setup. Also there is a workaround mentioned in case of occurrence. I don't think its a blocker for current release. Moving this out.

Comment 2 gowtham 2018-01-09 08:36:02 UTC
Svc clearing alert is not matched with warning alert, so it is not cleared. it is fixed now https://github.com/Tendrl/gluster-integration/pull/543, https://github.com/Tendrl/commons/pull/801

Comment 4 Filip Balák 2018-08-14 06:29:29 UTC
I was not able to reproduce the issue with original build nor I was able to reproduce it with current version. It seems that it is very difficult to turn gluster into `Another transaction is in progress` state with new version of gluster with given reproducer.

I came up with different scenario which leads to similar error. I was able to do it with repeatedly calling `gluster volume start <volume> force` from more nodes at once but when I restarted glusterd I didn't see any alert related to glustershd. I tested turning glustershd off which leads to described behaviour: BZ 1611601.

I propose to close this BZ 1517233 and track progress in new BZ 1611601.

Tested with:
glusterfs-3.12.2-15.el7rhgs.x86_64
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-11.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-9.el7rhgs.noarch

Comment 5 Anand Paladugu 2018-08-15 03:01:23 UTC
PM ack is already set on this BZ for it to be dropped.

Comment 6 Martin Bukatovic 2018-08-15 12:50:35 UTC
I'm closing this BZ (see comment 4 for details on why), as was discussed on
program meeting on 2018-08-14. Both development (Nishant) and product management
(Anand) agrees.


Note You need to log in before you can comment on or make changes to this bug.