Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 985401 - iscsi + lvm + fstab:_netdev delays boot then fails
Summary: iscsi + lvm + fstab:_netdev delays boot then fails
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Peter Rajnoha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-17 12:07 UTC by Patrick Monnerat
Modified: 2014-01-15 13:16 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-15 13:16:46 UTC


Attachments (Terms of Use)
/var/log/messages excerpts (deleted)
2013-07-18 09:33 UTC, Patrick Monnerat
no flags Details
sources with updated lvm2-activation-generator (deleted)
2013-07-19 08:38 UTC, Peter Rajnoha
no flags Details
systemd-analyze plot with attachment from comment 22 installed (deleted)
2013-07-19 09:32 UTC, Patrick Monnerat
no flags Details

Description Patrick Monnerat 2013-07-17 12:07:29 UTC
Description of problem:
Automounting of ext4 FS on LVM iscsi drive fails on boot.

Forgive me if systemd is not the appropriate component for this bug: there are too many packages involved for me to determine precisely which is the one to update.

An external iscsi storage drive contains an lvm group with an ext4 logical volume. To automount it, there is the following line in /etc/fstab:

/dev/mapper/vg_lxbackup-lv_lxbackup /NASlxbackup         ext4    _netdev,defaults 1 2

Problems:
- Although the _netdev option is used, there is a blocking job during boot for this drive. This job fails by timeout and thus delays uptime by > 1 minute. Note that without the _netdev option, this job failure aborts the boot and falls back to a recovery shell.
- After boot, the volume group/logical volume is not configured (thus not mounted). The physical scsi device has been properly created. To make it OK, I have to manually "vgchange -ay" + mount ---> automount not complete.

I know this LVM vs. network dependence is a chicken & egg problem and similar issues have already been reported. But it's still not fixed, despite the _netdev option that makes believe the feature is supported. This is why I do insist to request a fix rather than some config quick'n dirty trick.

Version-Release number of selected component (if applicable):
systemd-201-2.fc18.7.x86_64
iscsi-initiator-utils-6.2.0.872-19.fc18.x86_64
lvm2-2.02.98-4.fc18.x86_64

How reproducible:
Always

Steps to Reproduce:
1. On an external device, create an iscsi volume. Configure it in the local iscsi client.
2. Locally, create volume group vgxxx with the iscsi volume above as physical volume for that group. create logical volume lvxxx in vgxxx. Format lvxxx as ext4 and mount it. It should be usable through the mount point by now.
3. Insert line "/dev/mapper/vgxxx-lvxxx /mnt ext4 _netdev,defaults 1 2" into /etc/fstab.
4. Reboot: at fedora logo progress image, type esc and see the blocking job.
5. When up, login as root and check presence of:
- iscsi device (/dev/sdx): OK.
- logical volume device (/dev/mapper/vgxxx-lvxxx): KO

Actual results:
See above.

Expected results:
No useless boot delay and volume mounted when system is up and running.

Thanks for investigating.

Comment 1 Michal Schmidt 2013-07-17 12:27:16 UTC
(In reply to Patrick Monnerat from comment #0)
> 5. When up, login as root and check presence of:
> - iscsi device (/dev/sdx): OK.
> - logical volume device (/dev/mapper/vgxxx-lvxxx): KO

Please paste the output of:
udevadm info -q all -n /dev/sdX

Since the device with the PV exists, lvmetad should activate the VG and LV on it. Reassigning to lvm2 for now.

Comment 2 Peter Rajnoha 2013-07-17 12:33:23 UTC
(In reply to Michal Schmidt from comment #1)
> (In reply to Patrick Monnerat from comment #0)
> > 5. When up, login as root and check presence of:
> > - iscsi device (/dev/sdx): OK.
> > - logical volume device (/dev/mapper/vgxxx-lvxxx): KO
> 
> Please paste the output of:
> udevadm info -q all -n /dev/sdX
> 
> Since the device with the PV exists, lvmetad should activate the VG and LV
> on it. Reassigning to lvm2 for now.

The lvmetad is not used by default in F18, only in F19+. So in F18 there still must be a script to run the LVM activation "manually". Formerly, the script used to activate the volumes after network is set and ready was the "netfs" init script. I'm not sure now what's the equivalent in systemd world, I need to check...

Comment 3 Patrick Monnerat 2013-07-17 12:46:15 UTC
> udevadm info -q all -n /dev/sdX
# udevadm info -q all -n /dev/sda
P: /devices/platform/host2/session1/target2:0:0/2:0:0:0/block/sda
N: sda
S: disk/by-id/scsi-35005907ff190f282
S: disk/by-id/wwn-0x5005907ff190f282
S: disk/by-path/ip-172.25.0.64:3260-iscsi-iqn.2012-07.com.lenovoemc:storage.lxbackup.BackupStore-lun-0
E: DEVLINKS=/dev/disk/by-id/scsi-35005907ff190f282 /dev/disk/by-id/wwn-0x5005907ff190f282 /dev/disk/by-path/ip-172.25.0.64:3260-iscsi-iqn.2012-07.com.lenovoemc:storage.lxbackup.BackupStore-lun-0
E: DEVNAME=/dev/sda
E: DEVPATH=/devices/platform/host2/session1/target2:0:0/2:0:0:0/block/sda
E: DEVTYPE=disk
E: ID_BUS=scsi
E: ID_MODEL=LIFELINE-DISK
E: ID_MODEL_ENC=LIFELINE-DISK\x20\x20
E: ID_PART_TABLE_TYPE=gpt
E: ID_PATH=ip-172.25.0.64:3260-iscsi-iqn.2012-07.com.lenovoemc:storage.lxbackup.BackupStore-lun-0
E: ID_PATH_TAG=ip-172_25_0_64_3260-iscsi-iqn_2012-07_com_lenovoemc_storage_lxbackup_BackupStore-lun-0
E: ID_REVISION=2
E: ID_SCSI=1
E: ID_SCSI_SERIAL=f190f282
E: ID_SERIAL=35005907ff190f282
E: ID_SERIAL_SHORT=5005907ff190f282
E: ID_TYPE=disk
E: ID_VENDOR=LENOVO
E: ID_VENDOR_ENC=LENOVO\x20\x20
E: ID_WWN=0x5005907ff190f282
E: ID_WWN_WITH_EXTENSION=0x5005907ff190f282
E: MAJOR=8
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: UDISKS_PARTITION_TABLE=1
E: UDISKS_PARTITION_TABLE_COUNT=1
E: UDISKS_PARTITION_TABLE_SCHEME=gpt
E: UDISKS_PRESENTATION_NOPOLICY=0
E: USEC_INITIALIZED=11663

lvmetad not active, as defaults on F18. I tried to read doc in this direction, but I've read lvm does not use it by default: may be this is the solution... still have to find the recipe!

Comment 4 Michal Schmidt 2013-07-17 13:02:27 UTC
(In reply to Peter Rajnoha from comment #2)
> The lvmetad is not used by default in F18, only in F19+. So in F18 there
> still must be a script to run the LVM activation "manually".

Ah, right. I forgot that F18 was still using the fedora-storage-init{,-late} scripts.

> Formerly, the script used to activate the volumes after network is set and
> ready was the "netfs" init script. I'm not sure now what's the equivalent
> in systemd world, I need to check...

I am not aware of any. Was netfs removed prematurely from initscripts?
[Reassigning to initscripts]

Anyway, with lvmetad all this becomes so much nicer, that I'd recommend going to F19 instead.

Comment 5 Peter Rajnoha 2013-07-17 13:06:04 UTC
Yes, I highly recommend using F19 as well because lvmetad gained a lot of fixes there. The F18 version was not enabled by default there as it was quite new and not quite mature at that version. And yes, enabling lvmetad should resolve the issue as then the LVM volumes are activated based on incoming udev events and so it does not require direct activation call elsewhere (though without lvmetad, there's a regression as netfs had this direct activation call and it got lost).

Comment 6 Peter Rajnoha 2013-07-17 13:16:38 UTC
Well, seems that adding a separate service generated by our lvm2-activation-generator is the most straightforward solution here. Just like we generate lvm2-activation.service if lvmetad is not used, we'd generate a service that is run after network is set up and then iscsi/fcoe...

Should be a one/two line patch, I'll do an update...

Comment 7 Patrick Monnerat 2013-07-17 13:40:41 UTC
> Anyway, with lvmetad all this becomes so much nicer, that I'd recommend going to F19 instead.
@Michal: thanks for the advice, but it's impossible: the problem is with an interactive VNC server with several people working on it and depending on more than 200 private packages with no resources to prepare them for F19 :-(( Have to wait F20...
In addition, the external NAS volume is for the backup of all these users' files.

Is there a way to use lvmetad on F18 ? Or backport the F19 version ?

> Should be a one/two line patch, I'll do an update...
Thanks Peter, I'm waiting for it...

By the way: would it also fix the useless boot delay ?

Thanks for the help,
Patrick

Comment 8 Michal Schmidt 2013-07-17 13:43:58 UTC
(In reply to Patrick Monnerat from comment #7)
> By the way: would it also fix the useless boot delay ?

Yes, because the delay is the waiting for the device to appear.

Comment 9 Patrick Monnerat 2013-07-17 13:45:26 UTC
Thanks a lot :-)

Comment 10 Peter Rajnoha 2013-07-17 14:56:49 UTC
The patch:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=8606bf316a683ebd14bb4cdf4eb1b3477a790b62

I'll do an F18 update shortly.

Comment 11 Patrick Monnerat 2013-07-18 09:33:49 UTC
Created attachment 775231 [details]
/var/log/messages excerpts

Many thanks for the patch Peter.
I applied it here and tried in a VM: I'm afraid it does not work: same result as before.
Please find /var/log/messages excerpts in attachement.

Comment 12 Peter Rajnoha 2013-07-18 09:57:20 UTC
Was lvm2-activation-generator installed? (it should be in /lib/systemd/system-generators dir). If you're installing from sources, you need to call an extra make install_systemd_generators.

Also, I've modified the original patch for F18 as F18 uses fedora-storage-init script to activate lvm2 volume, therefore I've removed lvm2-activation-early.service and lvm2-activation.service from the generator. I've only kept the lvm2-activation-net.service (that is an equivalent of the former netfs script with a line that was responsible for activating the LVM volumes).

Please, try this build: http://koji.fedoraproject.org/koji/taskinfo?taskID=5626261. This one seems to be working on my machine - without that, the LVM on iscsi volumes are not activated and they are activated after this update...

Comment 13 Patrick Monnerat 2013-07-18 10:30:36 UTC
> ... If you're installing from sources, you need to call an extra make install_systemd_generators.

I've updated the spec file locally to include the patch and rpmbuild -ba ...

> Please, try this build...
Done, but it still fails, i'm afraid. Did I miss some command to enable it ?

# systemctl status lvm2-activation-net.service
lvm2-activation-net.service - Activation of LVM2 logical volumes
   Loaded: loaded (/etc/lvm/lvm.conf)
   Active: inactive (dead) since Thu 2013-07-18 12:12:00 CEST; 10min ago
     Docs: man:lvm(8)
           man:vgchange(8)
  Process: 943 ExecStart=/usr/sbin/lvm vgchange -aay --sysinit (code=exited, status=0/SUCCESS)

Jul 18 12:12:00 rawhide.datasphere.ch systemd[1]: Starting Activation of LVM2 logical volumes...
Jul 18 12:12:00 rawhide.datasphere.ch lvm[943]: 2 logical volume(s) in volume group "rawhide_vg" now active
Jul 18 12:12:00 rawhide.datasphere.ch systemd[1]: Started Activation of LVM2 logical volumes.

If I issue "vgchange -aay" manually after login, the /dev/mapper/ device is created.

Thank you for your support.

Comment 14 Patrick Monnerat 2013-07-18 10:37:44 UTC
Additional info for timing:
---
Jul 18 12:11:59 rawhide avahi-daemon[460]: Registering new address record for fe80::5054:ff:fe74:6a4e on eth0.*.
Jul 18 12:12:00 rawhide kernel: [    5.604518] scsi 2:0:0:0: Direct-Access     LENOVO   LIFELINE-DISK    2    PQ: 0 ANSI: 5
Jul 18 12:12:00 rawhide kernel: [    5.604724] sd 2:0:0:0: Attached scsi generic sg0 type 0
Jul 18 12:12:00 rawhide kernel: [    5.605828] sd 2:0:0:0: [sda] 5368709120 512-byte logical blocks: (2.74 TB/2.50 TiB)
Jul 18 12:12:00 rawhide kernel: [    5.605831] sd 2:0:0:0: [sda] 4096-byte physical blocks
Jul 18 12:12:00 rawhide iscsi[851]: Starting iscsi: [  OK  ]
Jul 18 12:12:00 rawhide kernel: [    5.607040] sd 2:0:0:0: [sda] Write Protect is off
Jul 18 12:12:00 rawhide kernel: [    5.607710] sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
Jul 18 12:12:00 rawhide systemd[1]: Started LSB: Starts and stops login and scanning of iSCSI devices..
Jul 18 12:12:00 rawhide systemd[1]: Starting Activation of LVM2 logical volumes...
Jul 18 12:12:00 rawhide kernel: [    5.620540]  sda: sda1
Jul 18 12:12:00 rawhide kernel: [    5.624836] sd 2:0:0:0: [sda] Attached SCSI disk
Jul 18 12:12:00 rawhide lvm[943]: 2 logical volume(s) in volume group "rawhide_vg" now active
Jul 18 12:12:00 rawhide systemd[1]: Started Activation of LVM2 logical volumes.
Jul 18 12:12:00 rawhide avahi-daemon[460]: Registering new address record for 2002:c346:88:1:5054:ff:fe74:6a4e on eth0.*.
Jul 18 12:12:00 rawhide avahi-daemon[460]: Withdrawing address record for fe80::5054:ff:fe74:6a4e on eth0.
Jul 18 12:12:00 rawhide NetworkManager[553]: <info> Activation (eth0) Stage 5 of 5 (IPv6 Commit) scheduled...
Jul 18 12:12:00 rawhide NetworkManager[553]: <info> Activation (eth0) Stage 5 of 5 (IPv6 Commit) started...
Jul 18 12:12:01 rawhide iscsid: Connection1:0 to [target: iqn.2012-07.com.lenovoemc:storage.lxbackup.BackupStore, portal: 172.25.0.64,3260] through [iface: default] is operational now
Jul 18 12:12:01 rawhide NetworkManager[553]: <info> Policy set 'eth0' (eth0) as default for IPv6 routing and DNS.
Jul 18 12:12:01 rawhide NetworkManager[553]: <info> Activation (eth0) Stage 5 of 5 (IPv6 Commit) complete.
Jul 18 12:12:20 rawhide kernel: [   25.120453] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.
Jul 18 12:13:26 rawhide systemd[1]: Job dev-mapper-vg_lxbackup\x2dlv_lxbackup.device/start timed out.
---

Comment 15 Peter Rajnoha 2013-07-18 13:15:02 UTC
(In reply to Patrick Monnerat from comment #14)
> Jul 18 12:12:00 rawhide iscsi[851]: Starting iscsi: [  OK  ]
> Jul 18 12:12:00 rawhide kernel: [    5.607040] sd 2:0:0:0: [sda] Write
> Protect is off
> Jul 18 12:12:00 rawhide kernel: [    5.607710] sd 2:0:0:0: [sda] Write
> cache: disabled, read cache: enabled, supports DPO and FUA
> Jul 18 12:12:00 rawhide systemd[1]: Started LSB: Starts and stops login and
> scanning of iSCSI devices..
> Jul 18 12:12:00 rawhide systemd[1]: Starting Activation of LVM2 logical
> volumes...
> Jul 18 12:12:00 rawhide kernel: [    5.620540]  sda: sda1
> Jul 18 12:12:00 rawhide kernel: [    5.624836] sd 2:0:0:0: [sda] Attached
> SCSI disk

Oh, I'm afraid the state of the device as "being added to the system in real" is asynchronous to iscsiadm --login call in the initscript/systemd unit. But I'm not 100% sure at the moment - I need to discuss with someone from iscsi team whether there's any cure for this. If we can't make it synchronous, then that's bad as we can't hook the script to activate the LVM volumes properly (as there is no time during boot sequence where the iscsi is considered as "prepared and ready" - actually the same happens for scsi scan for which there was scsi_wait module once, but it was removed then - I'm afraid iscsi is the same here). If that's the case, the only thing you can do is to use lvmetad. But let's see what iscsi folk say about this...

Comment 16 Patrick Monnerat 2013-07-18 13:16:26 UTC
It seems your dynamic job does not start too early, but too fast: when it executes vgchange, the creation of the iscsi physical device and its subdevices (i.e.: partitions) has be scheduled, but not complete (by the kernel or udev or whatever). As a result, vgchange does not take sda1 into account, because it does not exist yet, although the iscsi job terminated.

To check that, I've (quick'n dirty) modified your dynamic job Execstart as:

ExecStart=/usr/bin/bash -c '/usr/bin sleep 1; /usr/sbin/lvm vgchange -aay --sysinit'

I know this is awful but it works !!!
One might rather wait for the devices to be set-up rather than a fixed amount of time... maybe in the iscsi-initiator-utils package...

I hope it helps,
Patrick

Comment 17 Patrick Monnerat 2013-07-18 13:19:53 UTC
Correction: "/usr/bin sleep" --> "/usr/bin/sleep"

Comment 18 Peter Rajnoha 2013-07-18 13:22:43 UTC
(In reply to Patrick Monnerat from comment #16)
> It seems your dynamic job does not start too early, but too fast: when it
> executes vgchange, the creation of the iscsi physical device and its
> subdevices (i.e.: partitions) has be scheduled, but not complete (by the
> kernel or udev or whatever). As a result, vgchange does not take sda1 into
> account, because it does not exist yet, although the iscsi job terminated.

Yes, exactly, that's the consequence of the asynchronicity I mentioned... Well, let's see if we can make it synchronous somehow. The sleep 1 - well, it's not deterministic as you could still run into a problem that it will take more that 1 second... Would be fine if we could find a way to synchronize with the kernel actually attaching the devs so they're all accessible for sure.

Chris, any idea?

Comment 19 Peter Rajnoha 2013-07-18 14:06:51 UTC
Patrick, can you try adding scsi_mod.scan=sync to your kernel command line and see if it helps in any way? (also, remove the 1s sleep)

Comment 20 Patrick Monnerat 2013-07-18 14:18:52 UTC
> ... try adding scsi_mod.scan=sync ...

still fails :-(

Comment 21 Chris Leech 2013-07-18 23:29:10 UTC
(In reply to Peter Rajnoha from comment #15)
> Oh, I'm afraid the state of the device as "being added to the system in
> real" is asynchronous to iscsiadm --login call in the initscript/systemd
> unit. But I'm not 100% sure at the moment - I need to discuss with someone
> from iscsi team whether there's any cure for this. If we can't make it
> synchronous, then that's bad as we can't hook the script to activate the LVM
> volumes properly (as there is no time during boot sequence where the iscsi
> is considered as "prepared and ready" - actually the same happens for scsi
> scan for which there was scsi_wait module once, but it was removed then -
> I'm afraid iscsi is the same here). If that's the case, the only thing you
> can do is to use lvmetad. But let's see what iscsi folk say about this...

When iscsiadm returns, all it claims is that the iscsi session has been established.  At that point the system has a virtual scsi host with a single target, target and device scanning happens async from the scsi core.  There's no difference with iscsi, it's just a transport.

Comment 22 Peter Rajnoha 2013-07-19 08:38:26 UTC
Created attachment 775680 [details]
sources with updated lvm2-activation-generator

(In reply to Chris Leech from comment #21)
> There's no difference with iscsi, it's just a transport.

So in that case I suppose the "scsi_mod.scan=sync" on kernel cmd line + "udevadm settle" call before LVM activation should then work. Patrick, I've added the extra udevadm settle call before the activation in the generated lvm2-activation-net.service (src.rpm attached) - please, try it if you can. If even this is not working, the only way is using the lvmetad then with the autoactivation based on events...

Comment 23 Patrick Monnerat 2013-07-19 09:32:05 UTC
Created attachment 775710 [details]
systemd-analyze plot with attachment from comment 22 installed

> please, try it if you can...
No success: please find the chronology in attachment.

I'm OK to work as a "guinea pig" if you need some assistance for upgrading/installing lvmetad on F18 :-)

Comment 24 Patrick Monnerat 2013-07-19 14:42:34 UTC
For info:
I have rebuilt from lvm2-2.02.98-9.fc19.src.rpm for F18, installed and
systemctl enable lvm2-lvmetad
and it seems to work properly.
If you don't see any drawbacks, I think this solution is OK for me: I can override package in my private repository. Perhaps some F18 backport should be considered for other users (just a suggestion :-)
Many thanks for your prompt and efficient help,
Patrick

Comment 25 Peter Rajnoha 2013-07-22 14:18:50 UTC
(In reply to Patrick Monnerat from comment #24)
> For info:
> I have rebuilt from lvm2-2.02.98-9.fc19.src.rpm for F18, installed and
> systemctl enable lvm2-lvmetad

(...you don't need to enable lvm2-lvmetad.service, setting global/use_lvmetad=1 should be enough - the lvm2-lvmetad.service is run automatically on first lvmetad socket access - the lvm2-lvmetad.socket is enabled by default - just for convenience :))

> and it seems to work properly.
> If you don't see any drawbacks, I think this solution is OK for me: I can

OK, well, F19 build has all the important fixes in, so yes, that one I can recommend...

> override package in my private repository. Perhaps some F18 backport should
> be considered for other users (just a suggestion :-)

Yes, though I'm still not sure why the scsi sync mode does not apply here - it should... I'm still discussing this with Chris. If there appears to be no way how to make the iscsi/scsi scanning in sync mode, then yes, I'll just backport those patches from F19. But first I'd like to be sure about the scsi sync mode as it interests me...

Comment 26 Patrick Monnerat 2013-07-22 14:22:55 UTC
If you'd like me to do some future testings, I'm OK (at least on my test VM)...

Comment 27 Fedora End Of Life 2013-12-21 15:39:35 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.


Note You need to log in before you can comment on or make changes to this bug.