Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1332542 - Tiering related core observed with "uuid_is_null () message".
Summary: Tiering related core observed with "uuid_is_null () message".
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
Target Milestone: ---
: RHGS 3.2.0
Assignee: Nithya Balachandran
QA Contact: Sweta Anandpara
Depends On:
Blocks: 1351522 1358196 1360122 1360125
TreeView+ depends on / blocked
Reported: 2016-05-03 12:43 UTC by Shashank Raj
Modified: 2017-03-23 05:29 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.8.4-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1358196 (view as bug list)
Last Closed: 2017-03-23 05:29:17 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Shashank Raj 2016-05-03 12:43:58 UTC
Description of problem:

Tiering related core observed with "uuid_is_null () message".

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:

Not sure of the exact steps which generated this core but this was observed when tiering related cases were executed.

bt as below:

#0  0x00007fa21e1543fc in uuid_is_null () from /lib64/
#1  0x00007fa21070fc09 in ctr_delete_hard_link_from_db.isra.1.constprop.4 ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#2  0x00007fa210716bd1 in ctr_rename_cbk ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#3  0x00007fa21092cccd in trash_common_rename_cbk ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#4  0x00007fa211169024 in posix_rename ()
   from /usr/lib64/glusterfs/3.7.9/xlator/storage/
#5  0x00007fa210934c27 in trash_rename ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#6  0x00007fa21071219d in ctr_rename ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#7  0x00007fa21003449e in changelog_rename ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#8  0x00007fa21e9cbb1a in default_rename () from /lib64/
#9  0x00007fa20b9cd4fa in ?? ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#10 0x00007fa21ea3ef90 in graphyylex_destroy () from /lib64/
#11 0x00007fa21c4d7a54 in ?? ()
---Type <return> to continue, or q <return> to quit---
#12 0x00007fa21e9fa009 in __gf_calloc () from /lib64/
#13 0x00007fa20c011bd0 in ?? ()
#14 0x00007fa21c4d7a54 in ?? ()
#15 0x00007fa21e9cbb1a in default_rename () from /lib64/
#16 0x00007fa20b5a0dd8 in up_rename ()
   from /usr/lib64/glusterfs/3.7.9/xlator/features/
#17 0x00007fa21e9d8542 in default_rename_resume () from /lib64/
#18 0x00007fa21e9f73cd in call_resume () from /lib64/
#19 0x00007fa20c054c70 in ?? ()
#20 0x00007fa20c054c98 in ?? ()
#21 0x00007fa1ea4eae70 in ?? ()
#22 0x00007fa20c054c98 in ?? ()
#23 0x00007fa1ea4eae70 in ?? ()
#24 0x00007fa20b393363 in iot_worker ()
   from /usr/lib64/glusterfs/3.7.9/xlator/performance/
#25 0x00007fa21d82fdc5 in start_thread () from /lib64/
#26 0x00007fa21d1761cd in clone () from /lib64/

Actual results:

Tiering related core observed with "uuid_is_null () message".

Expected results:

there should not be any core generated.

Additional info:

Dont have the exact steps to reproduce, however filing this bug so that we don't miss this issue.

Comment 3 Joseph Elwin Fernandes 2016-05-09 05:08:01 UTC
As we dont have exact steps to reproduce, We dont know the cause of a NULL uuid or GFID. But yes handling the exception properly is important. Will send a patch with the necessary exception handling in ctr code.

Is there any chance of getting access to the brick logs?

Comment 4 Shashank Raj 2016-05-09 16:24:03 UTC
Since i was not sure of the exact scenario which generated it and unfortunately there were lot many tests done on the same setup from the time the core got generated, i don't think i will be able to provide the exact brick logs.

but will surely keep a check on this issue and update the bugzilla with details if i hit it again.

Comment 6 Shashank Raj 2016-05-16 09:04:31 UTC
Since the issue was seen on nfs-ganesha + Tiering setup and as we are no more testing/supporting tiering with ganesha for 3.1.3. I guess we can move it out.

Comment 13 Nithya Balachandran 2016-08-03 07:11:10 UTC
Targeting this BZ for 3.2.0.

Comment 15 Atin Mukherjee 2016-09-17 14:49:09 UTC
Upstream mainline :
Upstream 3.8 :

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 19 Sweta Anandpara 2017-02-28 10:12:16 UTC
Tried to reproduce the issue on 3.1.3, by creating a scenario of continuous directory rename and triggering a graph switch by changing any of the performance options of the (tiered) volume, (as advised by Nithya). That did not result in any trace.

Have followed the same steps in the build 3.8.4-14 in a ganesha+tiered volume setup, and again, did not hit any trace. I do however see plenty of the errors (pasted below) in 3.1.3 as well as 3.2

[2017-02-28 09:59:28.840923] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:28.913263] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
[2017-02-28 09:59:28.913836] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:28.991517] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
[2017-02-28 09:59:28.991900] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:29.071659] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
[2017-02-28 09:59:29.072078] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:29.145699] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context

I have been unsuccessful in reproducing this issue, and hence cannot confidently claim the fix that this BZ is addressing. Having said that, repeated testing of the above scenario has not resulted in any crash. Hence moving this to verified as of now. Will reopen this BZ if QE ends up hitting this crash again. Setup details below:

[root@dhcp46-111 ~]# gluster peer status
Number of Peers: 5

Uuid: 61964c73-d65d-45f5-8de6-2dfa1db76db7
State: Peer in Cluster (Connected)

Uuid: b0714c63-8dba-4922-9019-ac1ef9702076
State: Peer in Cluster (Connected)

Uuid: ffde978a-bb28-44ed-9c73-886d29d7fa19
State: Peer in Cluster (Connected)

Uuid: a0e14dcd-67ce-4b36-adeb-9b1be8e65b7f
State: Peer in Cluster (Connected)

Uuid: ce2bd89e-f047-4cd8-bd73-ec0c5a6d974c
State: Peer in Cluster (Connected)
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# gluster v info vol_tier
Volume Name: vol_tier
Type: Tier
Volume ID: d544486f-c47e-420d-9b17-daad43058231
Status: Started
Snapshot Count: 0
Number of Bricks: 18
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Options Reconfigured:
performance.client-io-threads: on
performance.stat-prefetch: on
ganesha.enable: on
features.cache-invalidation: on
cluster.tier-mode: cache
features.ctr-enabled: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# gluster v status vol_tier
Status of volume: vol_tier
Gluster process                             TCP Port  RDMA Port  Online  Pid
Hot Bricks:
hot5                                        49156     0          Y       29550
hot4                                        49156     0          Y       26907
hot3                                        49157     0          Y       21410
hot2                                        49158     0          Y       12868
hot1                                        49158     0          Y       18627
hot0                                        49160     0          Y       20041
Cold Bricks:
Brick 49156     0          Y       18443
Brick 49156     0          Y       17060
Brick 49156     0          Y       11256
Brick 49155     0          Y       19670
Brick 49154     0          Y       26794
Brick 49154     0          Y       29438
Brick 49159     0          Y       18462
Brick 49157     0          Y       17079
Brick 49157     0          Y       11275
Brick 49156     0          Y       19720
0                                           49155     0          Y       26813
1                                           49155     0          Y       29457
Self-heal Daemon on localhost               N/A       N/A        Y       20103
Self-heal Daemon on dhcp46-115.lab.eng.blr.                                  N/A       N/A        Y       18730
Self-heal Daemon on dhcp46-139.lab.eng.blr.                                  N/A       N/A        Y       12899
Self-heal Daemon on dhcp46-124.lab.eng.blr.                                  N/A       N/A        Y       21430
Self-heal Daemon on dhcp46-131.lab.eng.blr.                                  N/A       N/A        Y       26927
Self-heal Daemon on dhcp46-152.lab.eng.blr.                                  N/A       N/A        Y       29570
Task Status of Volume vol_tier
Task                 : Tier migration      
ID                   : 3be315db-1eca-4cdd-ae81-ad54442e69fc
Status               : in progress         
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# rpm -qa | grep gluster
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# 
[root@dhcp46-111 ~]# 

[root@dhcp35-153 mnt]# 
[root@dhcp35-153 mnt]# mount -t nfs -o vers=4 /mnt/test
[root@dhcp35-153 mnt]# cd /mnt/test
[root@dhcp35-153 test]# ls -a
.  ..  .trashcan
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# df -k .
Filesystem           1K-blocks   Used Available Use% Mounted on 114554880 719872 113835008   1% /mnt/test
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mkdir -p dir$i/dir$j;done; done
[root@dhcp35-153 test]# ls -a
.  ..  dir1  dir10  dir2  dir3  dir4  dir5  dir6  dir7  dir8  dir9  .trashcan
[root@dhcp35-153 test]# ls dir1/dir
Display all 100 possibilities? (y or n)
[root@dhcp35-153 test]# ls dir1/dir
ls: cannot access dir1/dir: No such file or directory
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# 
[root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mv dir$i/dir$j dir$i/newdir$j;done; done
[root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mv dir$i/newdir$j dir$i/olddir$j;done; done
[root@dhcp35-153 test]#

Comment 21 errata-xmlrpc 2017-03-23 05:29:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.