Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1365626 - IO hang on ganesha mount during remove brick operation.
Summary: IO hang on ganesha mount during remove brick operation.
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: ganesha-nfs
Version: 3.8
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1379662
TreeView+ depends on / blocked
 
Reported: 2016-08-09 17:31 UTC by Shashank Raj
Modified: 2017-11-07 10:39 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1379662 (view as bug list)
Environment:
Last Closed: 2017-11-07 10:39:47 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Shashank Raj 2016-08-09 17:31:46 UTC
Description of problem:

IO hang on ganesha mount during remove brick operation

Version-Release number of selected component (if applicable):

[root@dhcp43-133 ~]# rpm -qa|grep glusterfs
glusterfs-libs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-fuse-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-api-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-cli-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-client-xlators-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-server-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-geo-replication-3.8.1-0.4.git56fcf39.el7rhgs.x86_64

[root@dhcp43-133 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.4-0.dev.26.el7rhgs.x86_64
nfs-ganesha-2.4-0.dev.26.el7rhgs.x86_64
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64

How reproducible:

Once

Steps to Reproduce:
1.Create a 6x2 dist-rep volume and enable ganesha on the volume.

2.Do a subdir v4 mount on the client

mount -t nfs -o vers=4 10.70.40.192:/newvolume/subdir /mnt1470753422.46

3.Start creating nested dir and files

for i in {1..30}; do mkdir /mnt1470753422.46/a$i;  for j in {1..50}; do mkdir /mnt1470753422.46/a$i/b$j; for k in {1..50}; do touch /mnt1470753422.46/a$i/b$j/c$k; done done done

4.Start the remove brick operation:

gluster volume remove-brick newvolume replica 2  dhcp43-133.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp41-206.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 start

5. Once the remove brick operation is complete, commit the brick removal

gluster volume  remove-brick newvolume replica 2  dhcp43-133.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp41-206.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 commit 

6. Observe that the IO hangs on the client and following messages are seen in /var/log/ganesha.log

[root@dhcp46-206 ~]# ps -ef|grep mkdir
root      9288  9283  0 20:00 ?        00:00:02 bash -c cd /root && for i in {1..30}; do mkdir /mnt1470753422.46/a$i;  for j in {1..50}; do mkdir /mnt1470753422.46/a$i/b$j; for k in {1..50}; do touch /mnt1470753422.46/a$i/b$j/c$k; done done done

09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] posix2fsal_error :FSAL :CRIT :Mapping 107(default) to ERR_FSAL_SERVERFAULT
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with Transport endpoint is not connected
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] posix2fsal_error :FSAL :CRIT :Mapping 107(default) to ERR_FSAL_SERVERFAULT
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with Transport endpoint is not connected
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error

Actual results:

IO hang on ganesha mount during remove brick operation.

Expected results:

Additional info:

sosreport and logs will be attached

Comment 1 Shashank Raj 2016-08-09 17:37:01 UTC
sosreport and logs can be found under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1365626

Comment 2 Niels de Vos 2016-09-12 05:39:46 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 4 Niels de Vos 2017-11-07 10:39:47 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.