Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1363664 - Fails to find VG when Sil3132 controller is present after cryptroot unlocks enclosing LUKS [NEEDINFO]
Summary: Fails to find VG when Sil3132 controller is present after cryptroot unlocks e...
Keywords:
Status: NEW
Alias: None
Product: LVM and device-mapper
Classification: Community
Component: lvm2
Version: 2.02.122
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-03 10:09 UTC by Xen
Modified: 2016-08-18 13:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
rule-engine: lvm-technical-solution?
bugs: needinfo? (bugs)
rule-engine: lvm-test-coverage?


Attachments (Terms of Use)

Description Xen 2016-08-03 10:09:24 UTC
I am sorry if this is the wrong place to file this bug, I now realize.

But I have to achieve something now or I won't get ahead.

On Ubuntu and Mint systems the system will fail to boot when a Sil3132 controller is present and (ostensibly) the system uses LVM (but all my systems do). In addition, the Kubuntu Live DVD (of some time previous) boots fine -- I will conjecture that this is the same for a 16.04 Live DVD.

In fact LVM will find the PV just fine when (in my case) the crypt device has been opened, which is why I apologize; of course LVM would have no way to fail on a correctly mapped PV in this case.

The system will boot and the crypt container will get unlocked; by the initrd. Consequently, the pvscan invocation will fail to find the PV; or it fails to find a certain volume group, which I assume is when the system tries to load the root device (for the root filesystem).

So either the crypt is not getting unlocked (and mapped) correctly, or there is another reason why pvscan and/or vgchange -ay is failiong in the boot order. The Live session confirms that opening the crypt manually automatically results in the PV and VG getting loaded.

However this failure happens whether a disk is attached to that controller or not.

I'm too hungry to write anything good.

The error at boot is: "Volume group "name" not found".

The controller is fully funcational (as far as non-raid goes anyway) and I can in principle BIOS-boot of it (ie. start Windows off it for example) and I can access it quite easily from the Live DVD:

[    0.197785] pci 0000:02:00.0: [1095:3132] type 00 class 0x010400
[    0.197817] pci 0000:02:00.0: reg 0x10: [mem 0xfdcff000-0xfdcff07f 64bit]
[    0.197839] pci 0000:02:00.0: reg 0x18: [mem 0xfdcf8000-0xfdcfbfff 64bit]
[    0.197853] pci 0000:02:00.0: reg 0x20: [io  0xdf00-0xdf7f]
[    0.197881] pci 0000:02:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    0.197943] pci 0000:02:00.0: supports D1 D2
[    0.197989] pci 0000:02:00.0: disabling ASPM on pre-1.1 PCIe device.  You can enable it with 'pcie_aspm=force'

[    7.137213] sata_sil24 0000:02:00.0: version 1.1
[    7.148359] scsi host0: sata_sil24
[    7.156742] ahci 0000:00:11.0: version 3.0
[    7.157023] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[    7.157027] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ccc 
[    7.162732] scsi host1: sata_sil24
[    7.162820] ata1: SATA max UDMA/100 host m128@0xfdcff000 port 0xfdcf8000 irq 16
[    7.162823] ata2: SATA max UDMA/100 host m128@0xfdcff000 port 0xfdcfa000 irq 16

[    9.372048] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    9.375774] ata1.00: ATA-8: WDC WD7500BPKX-00HPJT0, 01.01A01, max UDMA/133
[    9.375779] ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    9.379773] ata1.00: configured for UDMA/100
[    9.379921] scsi 0:0:0:0: Direct-Access     ATA      WDC WD7500BPKX-0 1A01 PQ: 0 ANSI: 5
[    9.380261] sd 0:0:0:0: [sde] 1465149168 512-byte logical blocks: (750 GB/698 GiB)
[    9.380264] sd 0:0:0:0: [sde] 4096-byte physical blocks
[    9.380272] sd 0:0:0:0: Attached scsi generic sg5 type 0
[    9.380331] sd 0:0:0:0: [sde] Write Protect is off
[    9.380335] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00
[    9.380413] sd 0:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    9.705192]  sde: sde1 sde2
[    9.705536] sd 0:0:0:0: [sde] Attached SCSI disk

02:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)

Just posting this here as reference I guess. I will have to check what happens in the boot sequence but I am not sure I can find out.

This should be it for now.

Comment 1 Xen 2016-08-03 10:16:05 UTC
Oh and the disks load just fine:

/dev/sde1: UUID="14DCC30ADCC2E558" TYPE="ntfs" PARTUUID="021460ee-01"
/dev/sde2: UUID="5866C79466C770F4" TYPE="ntfs" PARTUUID="021460ee-02"

sde                8:64   0 698.7G  0 disk  
├─sde1             8:65   0   450M  0 part  
└─sde2             8:66   0  58.6G  0 part  

Maybe Grub passes the wrong parameters to the initrd. The BIOS likes to shift any attached disks to the front; but Grub2 finds the devices just fine (using "ls") and the boot failure also happens when no disks are present.

Comment 2 Xen 2016-08-18 13:21:38 UTC
I am sorry, I had to write this somewhere. However, just let me restate here in brevity:

- on this system, there is a typical LUKS container inside a partition
- Inside the LUKS container is a typical PV containing typical LVs.
- When the Sil3132 controller is merely _present_ in the system on booting and unlocking the crypt container in the initrd the PV (VG) appears to not get activated.

I am not entirely keen on what exactly I meant when I wrote this report. Sorry for the confusion.

It appears as though additional devices (such as /boot on the same VG as /) cannot be loaded. Something is directly referencing the VG which I assume is going to be for either root (/dev/vgname/root as it were) or boot (/dev/vgname/boot) as it were.

The error came pretty soon though and I assume it was the root device that was not getting loaded. Meaning the error would be in the initrd. All I know is that in /usr/share/initramfs-tools/scripts/local-top/lvm2 a call is made merely for "lvm lvchange -aay -y --sysinit <root device>" and nothing else. I also know in the resulting initrd the ORDER files specifies cryptroot _after_ lvm2. That means that after cryptroot is run, lvm2 is not run again, and it must be activated in some other way (udev? I don't know these things really).

The cryptroot file (script) explicitly sets itself last in the order by causing all other scripts in the directory to be prereqs. That's not really helpful if I want to force an activation at a later stage.

All I can see is that cryptroot will check for LVM2_member using blkid and then call vgscan and vgchange -a y --sysinit. The reluctance that exists in the lvm2 script to not activate everything; is not heeded here, or found here, it will just activate all VG.

Apparently something goes wrong there that doesn't go wrong when booting a system that doesn't use LUKS.

So I assume the difference would be in the order; in the booting system lvm2 script will handle the activation of the root device; in the nonbooting system, cryptroot script will handle the activation of the root device; the difference is also that cryptroot script activates everything (but might fail) while the lvm2 script only activates root (and might succeed).

That's the only analysis I can make at this point. The weird thing is that opening the LUKS from a running system gives no problems.

Then again, cryptsetup may not depend on a manual call, but on dmeventd or udev instead. The question is what will happen if I change the cryptroot script to use a root-only activation call such as lvm lvchange -aay --sysinit $NEWROOT or even to skip the blkid test, so I guess that is the next step in testing this.

Regards.


Note You need to log in before you can comment on or make changes to this bug.