|Summary:||[NetApp 4.8 bug] online resize of filesystem does not work|
|Product:||Red Hat Enterprise Linux 4||Reporter:||Tanvi <tanvi>|
|Component:||kernel||Assignee:||Jeff Moyer <jmoyer>|
|Status:||CLOSED ERRATA||QA Contact:||Martin Jenner <mjenner>|
|Version:||4.8||CC:||ahecox, andriusb, bmarzins, coughlan, marting, mchristi, naveenr, rlerch, tanvi, tao, xdl-redhat-bugzilla|
|Fixed In Version:||Doc Type:||Bug Fix|
Red Hat Enterprise Linux 4.8 can detect online growing or shrinking of an underlying block device. However, there is no method to automatically detect that a device has changed size, so manual steps are required to recognize this and resize any file systems which reside on the given device(s). When a resized block device is detected, a message like the following will appear in the system logs: VFS: busy inodes on changed media or resized disk sdi If the block device was grown, then this message can be safely ignored. However, if the block device was shrunk without shrinking any data set on the block device first, the data residing on the device may be corrupted. It is only possible to do an online resize of a filesystem that was created on the entire LUN (or block device). If there is a partition table on the block device, then the file system will have to be unmounted to update the partition table.
|Last Closed:||2009-05-18 19:31:51 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||444964, 480338|
|Bug Blocks:||458123, 458752, 461297, 479684|
Description Tanvi 2008-07-10 14:02:31 UTC
+++ This bug was initially created as a clone of Bug #444964 +++ Description of problem: From the resize2fs manpage: The resize2fs program will resize ext2 or ext3 file systems. It can be used to enlarge or shrink an unmounted file system located on device. If the filesystem is mounted, it can be used to expand the size of the mounted filesystem, assuming the kernel supports on-line resizing. (As of this writing, the Linux 2.6 kernel supports on-line resize for filesystems mounted using ext3 only.). It has been seen that online resize of filesystem doesn't work. The resize2fs tool is supposed to resize ext2/ext3 filesystems while they are mounted and are in use by the system. But if the filesystem is mounted (i.e. device is in use) and the mounted device is resized on the target, the kernel is not able to detect the new size of the device. To reflect the new size, we need to unmount and then remount the filesystem. Version-Release number of selected component (if applicable): [root@lnx200-171 ~]# lsb_release -a LSB Version: :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch Distributor ID: RedHatEnterpriseAS Description: Red Hat Enterprise Linux AS release 4 (Nahant Update 7 Beta) Release: 4 Codename: NahantUpdate7Beta How reproducible: Always reproducible Steps to Reproduce: 1 mount an iSCSI LUN 2 Increase the LUN size on the target 3 rescan the iSCSI sessions on the initiator 4 resize the filesystem using resize2fs ( resize2fs reports that the filesystem is already occupying all the blocks and that there is nothing to resize) 5 unmount the file system 6 mount it again 7 resize the filesystem using resize2fs ( now it works ) Actual results: Device size change is not reflected to the filesystem utilities and dm-multipath Expected results: Device size change should be reflected to filesystem resize utilities so that the filesystem can be grown/expanded to the new size. Additional info: Point to note here is that after executing step 3, the SCSI subsystem reflects the new size. This can be verified with the value present in /sys/block/DEVICE/size If we resize the LUN while it is mounted, SCSI reflects the change after the rescan, but resize2fs does not. This forfeits the whole idea of "Online Resize" because we can't see the new size on the _filesystem_ Similar is the case when using multipath. When we try to expand a multipathed LUN, to reflect the size change in multipath, we need to flush (release) the LUN (using `multipath -F`) and then re-discover it (using `multipath -v3`) which would be a disruptive operation for the LUN. -- Additional comment from email@example.com on 2008-05-19 11:06 EST -- I wonder whether this is related to the recent patch set: http://lkml.org/lkml/2008/5/8/40 -- Additional comment from firstname.lastname@example.org on 2008-05-29 10:10 EST -- (In reply to comment #2) > I wonder whether this is related to the recent patch set: > > http://lkml.org/lkml/2008/5/8/40 I put together a test kernel with those patches applied. It can be found at: http://people.redhat.com/jmoyer/dio/rhel5/ I have not yet run the kernel through the reproducer described here. I'll do so first chance I get. -- Additional comment from email@example.com on 2008-05-29 11:26 EST -- Ritesh, could you please try out the test kernel? -- Additional comment from firstname.lastname@example.org on 2008-05-29 13:02 EST -- (In reply to comment #4) > Ritesh, could you please try out the test kernel? Ritesh, only do so if you have spare cycles. I'd rather have you test after I am convinced that the patches solve the problem. Thanks. -- Additional comment from email@example.com on 2008-06-27 11:35 EST -- OK, I finally got the required hardware setup to test this, and it works for me. Please test out the kernels posted to my people page (see comment #3) and let me know if they resolve the issue for you. Thanks! -- Additional comment from firstname.lastname@example.org on 2008-07-09 03:39 EST -- We verified it and here is the result In case of scsi devices the file system can be resized online with the new kernel package. But when using multipathed devices the size is still not reflected unless maps are flushed and rediscovered ( which ultimately requires unmount of the multipathed device) -- Additional comment from email@example.com on 2008-07-09 03:44 EST -- (In reply to comment #8) > We verified it and here is the result > > In case of scsi devices the file system can be resized online with the new > kernel package. That was always the case, right? I thought you were testing iSCSI. > But when using multipathed devices the size is still not reflected unless maps > are flushed and rediscovered ( which ultimately requires unmount of the > multipathed device) Can you provide your configuration, please? -- Additional comment from firstname.lastname@example.org on 2008-07-09 05:31 EST -- (In reply to comment #9) > That was always the case, right? I thought you were testing iSCSI. > Yes, used to. But wasn't true with the actual test we did. Online resize wouldn't work on the standard kernels shipped with RHEL5.2. A umount was required. > Can you provide your configuration, please? iSCSI LUN was mapped with two paths with multipath enabled on top of it. If you need the exact command outputs please let me know.
Comment 1 Andrius Benokraitis 2008-07-10 14:18:09 UTC
This is highly dependent on RHEL 5.3 inclusion, so we'll follow its lead on this. Furthermore this could be something we may not have capacity for in 4.8 since it will be a small release.
Comment 2 RHEL Product and Program Management 2008-09-03 13:08:05 UTC
Updating PM score.
Comment 3 RHEL Product and Program Management 2008-09-22 17:23:38 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Comment 4 Jeff Moyer 2008-09-22 17:36:19 UTC
Created attachment 317386 [details] wrapper for lower level revalidate_disk routines
Comment 5 Jeff Moyer 2008-09-22 17:37:00 UTC
Created attachment 317387 [details] adjust block device size after an online resize of a disk
Comment 6 Jeff Moyer 2008-09-22 17:37:31 UTC
Created attachment 317388 [details] check for device resize when rescanning partitions
Comment 7 Jeff Moyer 2008-09-22 17:38:04 UTC
Created attachment 317389 [details] scsi sd driver calls revalidate_disk wrapper
Comment 8 Jeff Moyer 2008-09-22 17:39:08 UTC
Created attachment 317390 [details] add flush_disk to factor out common buffer cache flushing code
Comment 9 Jeff Moyer 2008-09-22 17:39:37 UTC
Created attachment 317391 [details] call flush_disk after detecting an online resize
Comment 10 Jeff Moyer 2008-09-22 17:40:34 UTC
The above patches are backports of the upstream patch set from Andrew Patterson. They have not yet been through any sort of testing. I'll update the bug when I have testing results.
Comment 12 Vivek Goyal 2009-01-09 13:55:13 UTC
Committed in 78.26.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 13 Jeff Moyer 2009-01-09 15:21:35 UTC
Can we get customer testing on this kernel, please? Thanks!
Comment 14 Tanvi 2009-01-15 11:16:34 UTC
I tested it with 78.28.EL kernel. Things are not working. I had to unmount and then remount the SCSI device before ext2online could resize the filesystem.
Comment 15 Jeff Moyer 2009-01-15 14:29:35 UTC
(In reply to comment #14) > I tested it with 78.28.EL kernel. Things are not working. I had to unmount and > then remount the SCSI device before ext2online could resize the filesystem. Thanks for the quick testing turn-around, Tanvi! I'll look into this immediately.
Comment 16 Jeff Moyer 2009-01-15 16:30:32 UTC
(In reply to comment #14) > I tested it with 78.28.EL kernel. Things are not working. I had to unmount and > then remount the SCSI device before ext2online could resize the filesystem. Hi, Tanvi, I just tried this with the 78.29.EL kernel, and it works for me. Could you provide more information on your test procedure so that I can try to reproduce the problem? These are the steps I took: service iscsi start mkfs -t ext3 /dev/sdi mount /dev/sdi /mnt/equallogic/ cd /mnt/equallogic/ touch x touch y dd if=/dev/zero of=foo bs=1M count=100 sync sync # login to iscsi target and resize the lun iscsi-rescan ext2online /dev/sdi df -h . Thanks!
Comment 17 Tanvi 2009-01-16 06:45:13 UTC
Hi Jeffrey, I did follow the same steps. I retested with 78.29.EL kernel and was able to resize an online SCSI device. Thank you. But to resize a file system which was created on top of a multipathed device, I had to follow following steps 1.unmount the device 2.flush the map 3.create the map again 4.remount it 5.ext2online Is it expected to be fixed in RHEL4.8?
Comment 18 Jeff Moyer 2009-01-16 14:49:34 UTC
Hi, Tanvi, Sorry I didn't test with a multipath device! I just went ahead and did so, and I got it to work, but it's even worse than RHEL 5! For the most part, the procedure is the same as the RHEL 5 procedure. I'll spell it out here, though: service iscsi start service multipathd start mkfs -t ext3 /dev/mpath/mpathX mount /dev/mpath/mpathX /mnt/equallogic/ cd /mnt/equallogic/ touch x touch y dd if=/dev/zero of=foo bs=1M count=100 # resize the iscsi target iscsi-rescan dmsetup table mpathX > /root/newtab # modify newtab to have the new end sector of the device in column 2 dmsetup suspend /dev/mpath/mpathX dmsetup reload /dev/mpath/mpathX /root/newtab dmsetup resume /dev/mpath/mpathX # and now the stupid part. ext2online sees that /dev/mpath/mpathX is a # symbolic link to /dev/dm-X, and so looks in /etc/mtab for /dev/dm-X. # Of course, that doesn't exist, so it fails. So, I modified /etc/mtab to # put the dm-X device in place of the mpath device, and: ext2online /dev/mpath/mpath9 And that worked for me. Strangely enough, I couldn't use --force, nor could I just pass in /dev/dm-X. I'm quite puzzled by ext2online's reticence to actually do what you want. I'll file a bug on that. Now, I know a bug was filed to update the user-space tools to allow this online resizing to be less painful using the multipath utilities in RHEL 5. Has the same bug been filed for RHEL 4? If now, we should dup the RHEL 5 bug to RHEL 4. I'll get to work on the ext2online bug. Thanks again for your patience and your testing, Tanvi. It is much appreciated!
Comment 19 Jeff Moyer 2009-01-16 15:04:00 UTC
I filed bug 480338 to track the e2fsprogs (ext2online) issue.
Comment 20 Martin George 2009-01-16 15:09:18 UTC
Yes, the user-space bug for multipath utilities has been cloned for RHEL 4.8. It is tracked at bugzilla #479684.
Comment 21 Ben Marzinski 2009-01-21 23:45:20 UTC
Did you try just running # multipath After the underlying block device has been resized. In RHEL4, it should already do the same thing as the manual method in Comment #18 did. Strangely, when I try this, ext2online fails for me [root@ask-06 mnt]# ext2online /dev/mapper/mpath7 ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b error: Input/output error: read -1 of 16384 bytes at 4096 However, it fails just the same using the method in Comment #18, so I'm not sure if this is a completely unrelated problem.
Comment 22 Jeff Moyer 2009-01-22 16:39:54 UTC
(In reply to comment #21) > Did you try just running > # multipath > After the underlying block device has been resized. In RHEL4, it should > already do the same thing as the manual method in Comment #18 did. Hi, Ben. I'll assume your question is addressed to me. No, I didn't try that, and yes, it does work. > Strangely, when I try this, ext2online fails for me > > [root@ask-06 mnt]# ext2online /dev/mapper/mpath7 > ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b > error: Input/output error: read -1 of 16384 bytes at 4096 > > However, it fails just the same using the method in Comment #18, so I'm not > sure if this is a completely unrelated problem. I've never seen that. You'll need to provide a whole lot more information if we're to debug it, though.
Comment 23 Ben Marzinski 2009-01-22 22:02:38 UTC
I'm using an X86_64 box with the RHEL 4.8 1/4/09 nightly build installed. It's connected to a Winchestor storage array via FC using the qla2400 driver. I'm running the 2.6.9-78.30.ELsmp kernel. The multipath device I'm using looks like: mpath7 (3600d0230000000000e13955cc3757806) [size=48 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1][active] \_ 5:0:0:6 sdh 8:112 [active][ready] The commands I'm running are: # multipath # mkfs -t ext3 /dev/mapper/mpath7 # mount /dev/mapper/mpath7 /mnt/test # echo 1 > /sys/block/sdh/device/rescan # multipath # ext2online After more testing, I've found that it works just fine if I start with 51200000 block device, and resize to a 307200000 block device. However, it fails when I try to go from a 51200000 block device to a 614400000 block device. I'm not sure exactly at what size it starts failing, but it does seem to be size dependent.
Comment 26 Ben Marzinski 2009-01-22 23:00:22 UTC
Some more information. I was wrong. It seems to happen randomly. It just looked like it was size dependent for a couple of runs. Also, when this happens, all IO to the device seems to fail. If you try to do a dd from the multipath device, it will fail as well. However unmounting the filesystem fixes this.
Comment 27 Jeff Moyer 2009-01-23 15:41:38 UTC
I found this in your system logs: Jan 22 09:46:47 ask-06 kernel: kjournald starting. Commit interval 5 seconds Jan 22 09:46:47 ask-06 kernel: EXT3 FS on dm-2, internal journal Jan 22 09:46:47 ask-06 kernel: EXT3-fs: mounted filesystem with ordered data mod e. Jan 22 09:47:17 ask-06 kernel: end_request: I/O error, dev sdh, sector 8208 Jan 22 09:47:17 ask-06 kernel: device-mapper: dm-multipath: Failing path 8:112. Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 10 27 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:47:17 ask-06 kernel: end_request: I/O error, dev sdh, sector 12312 Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 15 39 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:47:17 ask-06 kernel: end_request: I/O error, dev sdh, sector 8 Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 1 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 10 26 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:48:51 ask-06 kernel: SCSI device sdh: 1228800000 512-byte hdwr sectors (629146 MB) Jan 22 09:48:51 ask-06 kernel: SCSI device sdh: drive cache: write back Jan 22 09:48:51 ask-06 kernel: sdh: detected capacity change from 52428800000 to 629145600000 /dev/mpath/mpath7 is a symbolic link to /dev/dm-2. It looks like those I/O errors were present before any resizing was done. Is that right? You can read the first 4k of the multipath disk just fine: [root@ask-06 mnt]# dd if=/dev/mapper/mpath7 of=/dev/null bs=4k count=1 1+0 records in 1+0 records out But try to read the next 4k and you get a failure: [root@ask-06 mnt]# dd if=/dev/mapper/mpath7 of=/dev/null bs=4k count=2 dd: reading `/dev/mapper/mpath7': Input/output error 1+0 records in 1+0 records out I/O to the underlying sd device works just fine: [root@ask-06 mnt]# dd if=/dev/sdh of=/dev/null bs=4k count=2 2+0 records in 2+0 records out
Comment 28 Jeff Moyer 2009-01-23 19:33:04 UTC
I'd like to know what caused the I/O errors in the first place. Ben mentioned that it might have happened during the online resize, as he has to unmap the LUN before growing it. At any rate, the problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an error. I proposed this patch upstream: http://lkml.org/lkml/2009/1/23/288 A simple way around the problem is to mmap the device and read from the locations that are giving I/O errors (but that's hardly acceptable!). So, if the I/O errors are indeed from taking the LUN offline, then we will have to update our documentation to perhaps suggest suspending the device mapper device before doing the resize of the storage device.
Comment 31 Jeff Moyer 2009-02-09 21:08:47 UTC
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Red Hat Enterprise Linux 4.8 can detect online growing or shrinking of an underlying block device. However, there is no method to automatically detect that a device has changed size, so manual steps are required to recognize this and resize any file systems which reside on the given device(s). When a resized block device is detected, a message like the following will appear in the system logs: VFS: busy inodes on changed media or resized disk sdi If the block device was grown, then this message can be safely ignored. However, if the block device was shrunk without shrinking any data set on the block device first, the data residing on the device may be corrupted. It is only possible to do an online resize of a filesystem that was created on the entire LUN (or block device). If there is a partition table on the block device, then the file system will have to be unmounted to update the partition table.
Comment 32 Chris Ward 2009-02-20 13:31:42 UTC
~~ Attention Partners! ~~ RHEL 4.8 Partner Alpha has been released on partners.redhat.com. There should be a fix present in the Beta, which addresses this bug. If you have already completed testing your other URGENT priority bugs, and you still haven't had a chance yet to test this bug, please do so at your earliest convenience, to ensure that only the highest possible quality bits are shipped in the upcoming public Beta drop. If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. Thanks, more information about Beta testing to come. - Red Hat QE Partner Management
Comment 34 Naveen Reddy 2009-03-05 05:53:39 UTC
Verified it in RHEL4.8 successfully. Steps followed - (Taken from Comment 18) 1. Map a LUN from a NetApp Controller (Fibre Channel target) 2. Discover it on the host 3. mkfs -t ext3 /dev/mapper/mpath5 4. mount /dev/mapper/mpath5 mnt1/ 5. touch x touch y dd if=/dev/zero of=foo bs=1M count=100 6. Resize the LUN and then did a rescan on the host 7. dmsetup table mpath5 > /root/newtab 8. Modified newtab to have new end sector of the device. 9. dmsetup suspend /dev/mapper/mpath5 dmsetup reload /dev/mapper/mpath5 /root/newtab dmsetup resume /dev/mapper/mpath5 ext2online /dev/mapper/mpath5 Online resize happened successfully.
Comment 37 errata-xmlrpc 2009-05-18 19:31:51 UTC
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html