Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1691320 - glusterfs: write operations fail when the size is equal or greater than 1 GB
Summary: glusterfs: write operations fail when the size is equal or greater than 1 GB
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rinku
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks: 1678575
TreeView+ depends on / blocked
 
Reported: 2019-03-21 11:57 UTC by Stefano Garzarella
Modified: 2019-04-11 07:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
glfs_write_bug.c useful to reproduce the bug (deleted)
2019-03-21 11:57 UTC, Stefano Garzarella
no flags Details

Description Stefano Garzarella 2019-03-21 11:57:55 UTC
Created attachment 1546452 [details]
glfs_write_bug.c useful to reproduce the bug

Description of problem:
While debugging BZ1678575 I discovered that using write APIs (eg. glfs_write(), glfs_pwrite() or *async()) with a size >= 1GB there is a strange behaviour:
- an error is printed in the client log (no error in the server log)
- glfs_write() doesn't return any error, but the write operation is not executed
- subsequent operations fail with "Transport endpoint is not connected"

Version-Release number of selected component (if applicable):
glusterfs-server-3.12.2-40.el7rhgs.x84_64

How reproducible:
100%

Steps to Reproduce:
1. Change server, volume and path in the glfs_write_bug.c
2. gcc `pkg-config --cflags glusterfs-api` `pkg-config --libs glusterfs-api` glfs_write_bug.c -o glfs_write_bug
3. ./glfs_write_bug

Actual results:
TEST glfs_write - size: 512 MiB pattern: 171
  glfs_write - size: 536870912 ret: 536870912
  glfs_read - size: 1024 ret: 1024
  PASS
TEST glfs_write - size: 1023 MiB pattern: 42
  glfs_write - size: 1072693248 ret: 1072693248
  glfs_read - size: 1024 ret: 1024
  PASS
TEST glfs_write - size: 1024 MiB pattern: 203
  glfs_write - size: 1073741824 ret: 1073741824
[2019-03-21 10:43:24.607883] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x131)[0x7f436bd83131] (--> /lib64/libgfrpc.so.0(+0xda01)[0x7f436bd48a01] (--> /lib64/libgfrpc.so.0(+0xdb22)[0x7f436bd48b22] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x94)[0x7f436bd4a0b4] (--> /lib64/libgfrpc.so.0(+0xfc50)[0x7f436bd4ac50] ))))) 0-gv33-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-03-21 10:43:24.607433 (xid=0x11)
  glfs_read - size: 1024 ret: -1
  glfs_read: Transport endpoint is not connected
END ret=-1

Expected results:
TEST glfs_write - size: 512 MiB pattern: 171
  glfs_write - size: 536870912 ret: 536870912
  glfs_read - size: 1024 ret: 1024
  PASS
TEST glfs_write - size: 1023 MiB pattern: 42
  glfs_write - size: 1072693248 ret: 1072693248
  glfs_read - size: 1024 ret: 1024
  PASS
TEST glfs_write - size: 1024 MiB pattern: 203
  glfs_write - size: 1073741824 ret: 1073741824
  glfs_read - size: 1024 ret: 1024
END ret=0

Additional info:
As clients, I used both RHEL8 (glusterfs-api-3.12.2-40.2.el8.x86_64) and F29 (glusterfs-api-5.5-1.fc29.x86_64)

I had the same issue also using a Fedora 29 server (glusterfs-server-5.5-1.fc29.x86_64):
TEST glfs_write - size: 1024 MiB pattern: 172
  glfs_write - size: 1073741824 ret: 1073741824
[2019-03-21 10:55:08.998979] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x131)[0x7fbf924ec131] (--> /lib64/libgfrpc.so.0(+0xda01)[0x7fbf924b1a01] (--> /lib64/libgfrpc.so.0(+0xdb22)[0x7fbf924b1b22] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x94)[0x7fbf924b30b4] (--> /lib64/libgfrpc.so.0(+0xfc50)[0x7fbf924b3c50] ))))) 0-gv0-client-0: forced unwinding frame type(GlusterFS 4.x v1) op(WRITE(13)) called at 2019-03-21 10:55:08.998126 (xid=0x11)
  glfs_read - size: 1024 ret: -1
  glfs_read: Transport endpoint is not connected

Comment 2 Stefano Garzarella 2019-03-25 12:46:21 UTC
Same issue also with client and server on Fedora 30 and GlusterFS 6 (glusterfs-server-6.0-1.fc30.x86_64, glusterfs-api-6.0-1.fc30.x86_64):
TEST glfs_write - size: 1024 MiB pattern: 131
  glfs_write - size: 1073741824 ret: 1073741824
[2019-03-25 12:45:13.013398] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f61de8e05f6] (--> /lib64/libgfrpc.so.0(+0xe2f4)[0x7f61de8822f4] (--> /lib64/libgfrpc.so.0(+0xe412)[0x7f61de882412] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f61de8833b7] (--> /lib64/libgfrpc.so.0(+0xfff8)[0x7f61de883ff8] ))))) 0-gv0-client-0: forced unwinding frame type(GlusterFS 4.x v1) op(WRITE(13)) called at 2019-03-25 12:45:13.013037 (xid=0x10)
  glfs_read - size: 1024 ret: -1
  glfs_read: Transport endpoint is not connected

Comment 3 Amar Tumballi 2019-03-25 13:13:13 UTC
Looks like a valid bug.

For devel reference:

Check: iobuf_get2() in libglusterfs/src/iobuf.c, and notice that there is a check for '== -1' with size_t format, which is normally a unsigned int.

Due to this, the check @ Line: 630, we should change *get_pagesize() function to return ssize_t or int64_t, and then also keep rounded_size as ssize_t or int64_t... With this, I guess we should be able to fix the issue.


Note You need to log in before you can comment on or make changes to this bug.