Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 984498 - nfs: gfid (zero) and we have EIO while "rm -rf *"
Summary: nfs: gfid (zero) and we have EIO while "rm -rf *"
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: 2.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1286578
TreeView+ depends on / blocked
 
Reported: 2013-07-15 11:42 UTC by Saurabh
Modified: 2016-01-19 06:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1286578 (view as bug list)
Environment:
Last Closed: 2015-11-30 09:40:04 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Saurabh 2013-07-15 11:42:54 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@nfs1 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64
[root@nfs1 ~]# 

How reproducible:
seen on this build

Steps to Reproduce:
1. create a volume, start it
2. nfs mount the volume
3. start fs-sanity on the volume, wait for fs-sanity to finish.
4. cd <mount-point>,
5. rm -rf * 

Actual results:
[root@rhsauto030 ~]# cd /mnt/nfs-test/
[root@rhsauto030 nfs-test]# ls
dir  run1888
[root@rhsauto030 nfs-test]# cd run1888/
[root@rhsauto030 run1888]# ls
coverage     fileop_L1_1   fileop_L1_11  fileop_L1_2  fileop_L1_4  fileop_L1_6  fileop_L1_8  linux-2.6.31.1
fileop_L1_0  fileop_L1_10  fileop_L1_12  fileop_L1_3  fileop_L1_5  fileop_L1_7  fileop_L1_9  openssl-1.0.0d
[root@rhsauto030 run1888]# rm -rf *
rm: cannot remove `fileop_L1_0': Directory not empty
rm: cannot remove `fileop_L1_1': Directory not empty
rm: cannot remove `fileop_L1_10': Directory not empty
rm: cannot remove `fileop_L1_11': Directory not empty
rm: cannot remove `fileop_L1_12': Directory not empty
rm: cannot remove `fileop_L1_2': Directory not empty
rm: cannot remove `fileop_L1_3': Directory not empty
rm: cannot remove `fileop_L1_4': Directory not empty
rm: cannot remove `fileop_L1_5': Directory not empty
rm: cannot remove `fileop_L1_6': Directory not empty
rm: cannot remove `fileop_L1_7': Directory not empty
rm: cannot remove `fileop_L1_8': Directory not empty
rm: cannot remove `fileop_L1_9': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/arm/mach-ep93xx/include': Input/output error
rm: cannot remove `linux-2.6.31.1/arch/arm/mach-ep93xx/include': Input/output error
rm: cannot remove `linux-2.6.31.1/arch/arm/mach-mmp': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/arm/plat-pxa': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/frv': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/mips': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/mn10300': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/sh/include': Directory not empty
rm: cannot remove `linux-2.6.31.1/arch/x86': Directory not empty
rm: cannot remove `linux-2.6.31.1/drivers/staging': Directory not empty
rm: cannot remove `linux-2.6.31.1/drivers/acpi': Input/output error
rm: cannot remove `linux-2.6.31.1/drivers/acpi': Input/output error




from nfs.log,

[2013-07-15 00:32:24.647338] E [afr-self-heal-common.c:2685:afr_log_self_heal_completion_status] 0-dist-rep-replicate-3: background gfid or missing entry self heal  is not attempted, medatadata self heal  is not attempted, data self heal  is not attempted, entry self heal  failed on  /run1888/linux-2.6.31.1/drivers/acpi
[2013-07-15 00:32:24.650666] I [afr-self-heal-entry.c:1837:afr_sh_entry_common_lookup_done] 0-dist-rep-replicate-3: /run1888/linux-2.6.31.1/drivers/acpi/acpica: Skipping entry self-heal because of gfid absence
[2013-07-15 00:32:24.651098] E [afr-self-heal-common.c:2685:afr_log_self_heal_completion_status] 0-dist-rep-replicate-3: background gfid or missing entry self heal  is not attempted, medatadata self heal  is not attempted, data self heal  is not attempted, entry self heal  failed on  /run1888/linux-2.6.31.1/drivers/acpi
[2013-07-15 00:32:24.652271] W [dht-common.c:1040:dht_lookup_everywhere_cbk] 0-dist-rep-dht: multiple subvolumes (dist-rep-replicate-0 and dist-rep-replicate-4) have file /run1888/linux-2.6.31.1/drivers/acpi (preferably rename the file in the backend, and do a fresh lookup)
[2013-07-15 00:32:24.654499] I [afr-self-heal-entry.c:1837:afr_sh_entry_common_lookup_done] 0-dist-rep-replicate-3: /run1888/linux-2.6.31.1/drivers/acpi/acpica: Skipping entry self-heal because of gfid absence
[2013-07-15 00:32:24.655048] E [afr-self-heal-common.c:2685:afr_log_self_heal_completion_status] 0-dist-rep-replicate-3: background gfid or missing entry self heal  is not attempted, medatadata self heal  is not attempted, data self heal  is not attempted, entry self heal  failed on  /run1888/linux-2.6.31.1/drivers/acpi
[2013-07-15 00:32:24.655091] E [dht-common.c:827:dht_lookup_everywhere_done] 0-dist-rep-dht: path /run1888/linux-2.6.31.1/drivers/acpi exists as a file on one subvolume and directory on another. Please fix it manually
[2013-07-15 00:32:24.655115] W [nfs3.c:1226:nfs3svc_lookup_cbk] 0-nfs: fa63f964: /run1888/linux-2.6.31.1/drivers/acpi => -1 (Input/output error)
[2013-07-15 00:32:24.655140] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: fa63f964, LOOKUP: NFS: 5(I/O error), POSIX: 5(Input/output error), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000


Expected results:
Not sure why glusterfs is returning zero gfid, whereas it should not be the case

Additional info:

Comment 4 Vivek Agarwal 2013-07-22 14:31:52 UTC
Based on preliminary investigation by Rajesh Joseph, seems to be an AFR issue.

Comment 5 rjoseph 2013-07-23 05:16:58 UTC
The main issue in the bug is that some files have NULL gfid. NFS generates UUID for all the create operations as well as for lookup operation. Therefore from NFS perspective it is very unlikely that any files will have NULL gfid. Therefore it looks like the issue is coming from AFR.

The other issue which might be the cause of the problem is  
[2013-07-15 00:32:24.655091] E [dht-common.c:827:dht_lookup_everywhere_done] 0-dist-rep-dht: path /run1888/linux-2.6.31.1/drivers/acpi exists as a file on one subvolume and directory on another. Please fix it manually

Comment 6 Pranith Kumar K 2013-09-26 13:35:56 UTC
According to the following log, seems like DHT detected same entry as directory in one subvolume and file in the other.
[2013-07-15 00:32:24.655091] E [dht-common.c:827:dht_lookup_everywhere_done] 0-dist-rep-dht: path /run1888/linux-2.6.31.1/drivers/acpi exists as a file on one subvolume and directory on another. Please fix it manually

Need to figure out why this happened.


Note You need to log in before you can comment on or make changes to this bug.