Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 160568 - aio to gfs filesystem fails with aio-stress and oracle10g tpcc
Summary: aio to gfs filesystem fails with aio-stress and oracle10g tpcc
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
Assignee: Wendy Cheng
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-15 20:24 UTC by John Shakshober
Modified: 2010-01-12 03:05 UTC (History)
3 users (show)

Fixed In Version: RHBA-2006-0234
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-09 19:45:34 UTC


Attachments (Terms of Use)
Draft patch. (deleted)
2005-11-19 06:08 UTC, Wendy Cheng
no flags Details | Diff
gfs_aio.patch.v1 (deleted)
2005-11-20 06:03 UTC, Wendy Cheng
no flags Details | Diff
gfs_aio.patch.v2 (deleted)
2005-11-27 05:37 UTC, Wendy Cheng
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0234 normal SHIPPED_LIVE GFS-kernel bug fix update 2006-03-09 05:00:00 UTC

Description John Shakshober 2005-06-15 20:24:59 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041215 Firefox/1.0 Red Hat/1.0-12.EL4

Description of problem:
Recently tested RHEL4 U1, with GFS 6.1 and am getting a process hang when attempting to use AIO and DIO with GFS mounted file systems.

strace show Oracle simply calling io_getevents every 30 seconds.  

Linux bigbaddell2.lab.boston.redhat.com 2.6.9-11.ELsmp #1 SMP Fri May 20 18:25:30 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux

We could then reproduce the problem using aio-stress to the gfs filesystem 




Version-Release number of selected component (if applicable):
 2.6.9-11.ELsmp, GFS-kernel-smp-2.6.9-35.4

How reproducible:
Always

Steps to Reproduce:
1. run aio-stress w/ -O (for O_direct option) to a gfs filesystem
2. or try to startup an Oracle database with dio and aio options.
3.
  

Actual Results:  

[root@bigbaddell2 aio]#  ./aio-stress -s 1024 -r 64 -t 1 /oraclegfs/t1
file size 1024MB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
adding file /oraclenfs/t1 thread 0
ret -22 (Invalid argument) on io_submit
error -1 on run_built

Or you can see Oracle's hang ... Oracle should check the results value from io_submit .... instead they are just hanging on io_getevents.

ps -ef | grep ora

oracle   13977 13956  0 09:35 pts/3    00:00:00 sqlplus   as sysdba
oracle   13983     1  0 09:35 ?        00:00:00 ora_pmon_tpcc
oracle   13985     1  0 09:35 ?        00:00:00 ora_mman_tpcc
oracle   13987     1  0 09:35 ?        00:00:00 ora_dbw0_tpcc
oracle   13989     1  0 09:35 ?        00:00:00 ora_lgwr_tpcc
oracle   13991     1  0 09:35 ?        00:00:00 ora_ckpt_tpcc
oracle   13993     1  0 09:35 ?        00:00:00 ora_smon_tpcc
oracle   13995     1  0 09:35 ?        00:00:00 ora_reco_tpcc
oracle   13999     1  0 09:35 ?        00:00:00 oracletpcc (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     14010  4997  0 09:41 pts/2    00:00:00 ps -ef

[root@bigbaddell2 oracle]# strace -p 13999
Process 13999 attached - interrupt to quit
io_getevents(0x2a96b9d000, 0x1, 0x400, 0x7fbffeee70, 0x7fbfff6f50) = 0
io_getevents(0x2a96b9d000, 0x1, 0x400, 0x7fbffeee70, 0x7fbfff6f50 <unfinished ...>

Additional info:

Comment 1 Jeff Moyer 2005-06-15 20:35:23 UTC
From a brief look at the gfs source code, it seems they don't support AIO.  The
AIO subsystem is failing the submit operation here:

ssize_t aio_setup_iocb(struct kiocb *kiocb)
{
	struct file *file = kiocb->ki_filp;
	ssize_t ret = 0;

	switch (kiocb->ki_opcode) {
	case IOCB_CMD_PREAD:
	        ...
		ret = -EINVAL;
		if (file->f_op->aio_read)
			kiocb->ki_retry = aio_pread;
		break;
        ...
	if (!kiocb->ki_retry)
		return ret;


Comment 8 Wendy Cheng 2005-11-19 06:08:07 UTC
Created attachment 121258 [details]
Draft patch. 

Draft patch - target completion date: Dec. 15, 2005.

Comment 9 Wendy Cheng 2005-11-20 06:03:15 UTC
Created attachment 121272 [details]
gfs_aio.patch.v1

Draft version 1: this version successfully run thru the default setting of
aiocp.c test program (by Daniel McNeil daniel@osdl.org).

Comment 10 Wendy Cheng 2005-11-27 05:37:51 UTC
Created attachment 121516 [details]
gfs_aio.patch.v2

Run successfully with various options of aiocp.c 
and aio-stress.c on one single node, except 
./aio-stress -S testfile (i.e. 1024MB data file 
with O_SYNC option) which eventually finishes but 
*very* slow. 

So two test items for next week:

1. figure out why O_SYNC is so slow
2. run test on >= 2 nodes (need to re-write the 
   test cases though).

Comment 17 Red Hat Bugzilla 2006-03-09 19:45:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0234.html



Note You need to log in before you can comment on or make changes to this bug.