Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1517219 - When tendrl server run out of space,tendrl-monitoring-integration service fails with a trace and web admin UI shows invalid entries for some volume
Summary: When tendrl server run out of space,tendrl-monitoring-integration service fai...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-ui
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Shubhendu Tripathi
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-24 10:49 UTC by Manisha Saini
Modified: 2019-04-11 08:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)
Dashboard not available when ran out of disk space on tendrl server (deleted)
2017-11-24 10:49 UTC, Manisha Saini
no flags Details
Showing invalid entries for the volume deleted before runing out of space (deleted)
2017-11-24 10:50 UTC, Manisha Saini
no flags Details
After running out of memory (deleted)
2017-11-24 10:51 UTC, Manisha Saini
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1511826 None None None 2019-01-21 16:59:17 UTC
Github Tendrl monitoring-integration issues 261 None None None 2017-11-24 11:28:41 UTC

Description Manisha Saini 2017-11-24 10:49:17 UTC
Created attachment 1358613 [details]
Dashboard not available when ran out of disk space on tendrl server

Description of problem:
When tendrl server run out of space,tendrl-monitoring-integration service fails with a trace and web admin UI shows invalid entries for some volume


Version-Release number of selected component (if applicable):
# rpm -qa|egrep "tendrl|gluster|kernel"
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64
samba-vfs-glusterfs-4.6.3-6.el7rhgs.x86_64
glusterfs-fuse-3.8.4-52.el7rhgs.x86_64
python-gluster-3.8.4-52.el7rhgs.noarch
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
kernel-tools-libs-3.10.0-693.5.2.el7.x86_64
glusterfs-libs-3.8.4-52.el7rhgs.x86_64
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-17.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-51.el7rhgs.x86_64
glusterfs-3.8.4-52.el7rhgs.x86_64
glusterfs-cli-3.8.4-52.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-52.el7rhgs.x86_64
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
glusterfs-events-3.8.4-52.el7rhgs.x86_64
tendrl-gluster-integration-1.5.4-4.el7rhgs.noarch
kernel-tools-3.10.0-693.5.2.el7.x86_64
kernel-3.10.0-693.5.2.el7.x86_64
glusterfs-client-xlators-3.8.4-52.el7rhgs.x86_64
glusterfs-server-3.8.4-52.el7rhgs.x86_64
glusterfs-rdma-3.8.4-52.el7rhgs.x86_64
tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch
abrt-addon-kerneloops-2.1.11-48.el7.x86_64
kernel-3.10.0-693.2.2.el7.x86_64
glusterfs-geo-replication-3.8.4-52.el7rhgs.x86_64
kernel-3.10.0-693.2.1.el7.x86_64
glusterfs-api-3.8.4-52.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1.Imported gluster cluster on web admin successfully.
2.Created 2 volumes.
3.Deleted one volume
4.Created 1 more volume
After a while,the tendrl server ran out of disk space.Web admin UI shows black screen because of out of space
5.Attached new disk to tendrl server.Extended the root partition to get some more space
6.Rebooted the tendrl server node
7.Restarted the tendrl-node-agent and tendrl-monitoring-agent services on tendrl server and restarted tendrl-node-agent and tendrl-gluster-integration service on gluster cluster.

Actual results:
tendrl-monitoring-agent service and tendrl-gluster-integration  started with a trace back.

On web UI it shows INVALID entries for the volume which was deleted before increasing space on tendrl server.

tendrl-gluster-integration-

[root@dhcp42-125 ~]# service tendrl-gluster-integration status
Redirecting to /bin/systemctl status tendrl-gluster-integration.service
● tendrl-gluster-integration.service - Tendrl Gluster Daemon to Manage gluster tasks
   Loaded: loaded (/usr/lib/systemd/system/tendrl-gluster-integration.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-11-24 15:04:17 IST; 48min ago
 Main PID: 32515 (tendrl-gluster-)
   CGroup: /system.slice/tendrl-gluster-integration.service
           └─32515 /usr/bin/python /usr/bin/tendrl-gluster-integration

Nov 24 15:52:49 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: TypeError: 'NoneType' object has no attribute '__getitem__'
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: Exception in thread Thread-16880:
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: Traceback (most recent call last):
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: File "/usr/lib64/python2.7/threading.py", line 812, in __b...nner
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: self.run()
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: File "/usr/lib64/python2.7/threading.py", line 765, in run
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: self.__target(*self.__args, **self.__kwargs)
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs..._job
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: if job.payload["type"] == NS.type and \
Nov 24 15:53:01 dhcp42-125.lab.eng.blr.redhat.com tendrl-gluster-integration[32515]: TypeError: 'NoneType' object has no attribute '__getitem__'
Hint: Some lines were ellipsized, use -l to show in full.

---------

# systemctl status tendrl-monitoring-integration
● tendrl-monitoring-integration.service - Monitoring Integration
   Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-11-24 05:08:13 EST; 21min ago
 Main PID: 14895 (tendrl-monitori)
   CGroup: /system.slice/tendrl-monitoring-integration.service
           └─14895 /usr/bin/python /usr/bin/tendrl-monitoring-integration

Nov 24 05:29:17 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: TypeError: 'NoneType' object has no attribute '__getitem__'
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: Exception in thread Thread-11746:
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: Traceback (most recent call last):
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: self.run()
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: File "/usr/lib64/python2.7/threading.py", line 765, in run
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: self.__target(*self.__args, **self.__kwargs)
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 139, in process_job
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: if job.payload["type"] == NS.type and \
Nov 24 05:29:46 dhcp43-14.lab.eng.blr.redhat.com tendrl-monitoring-integration[14895]: TypeError: 'NoneType' object has no attribute '__getitem__'



Expected results:

tendrl-monitoring-agent service and tendrl-gluster-integration should start without any trace back.

The web UI should reflect the correct info when the tendrl server comes up


Additional info:

Comment 2 Manisha Saini 2017-11-24 10:50:43 UTC
Created attachment 1358614 [details]
Showing invalid entries for the volume deleted before runing out of space

Comment 3 Manisha Saini 2017-11-24 10:51:23 UTC
Created attachment 1358615 [details]
After running out of memory

Comment 8 Mrugesh Karnik 2017-11-27 09:21:40 UTC
Errant behaviour is expected when the server runs out of disk space. There's no software fix to be applied to this situation. However, the way to address this would be to monitor the tendrl server itself.

Comment 10 Shubhendu Tripathi 2018-11-19 06:12:00 UTC
This is not an issue if proper disk space allocated for graphite DB in server.
Martin, I propose this to be closed as NOTABUG. Kindly provide your thoughts.

Comment 11 Martin Bukatovic 2019-01-23 19:58:27 UTC
(In reply to Shubhendu Tripathi from comment #10)
> This is not an issue if proper disk space allocated for graphite DB in
> server.
> Martin, I propose this to be closed as NOTABUG. Kindly provide your thoughts.

No, in that edge case (running out of space), WA should report the problem so that
user is aware of the problem, and not crash with a traceback in a log, without any
useful end user notification, breaking dashboard functionality for no obvious
(from end user perspective) reason.


Note You need to log in before you can comment on or make changes to this bug.