Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1511233 - NULL pointer dereference when attempting to start remnant superblock
Summary: NULL pointer dereference when attempting to start remnant superblock
Keywords:
Status: CLOSED DUPLICATE of bug 1509466
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kmod-kvdo
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: vdo-internal
QA Contact: Jakub Krysl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-08 23:43 UTC by Corey Marthaler
Modified: 2017-11-09 21:14 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-09 21:14:28 UTC


Attachments (Terms of Use)

Description Corey Marthaler 2017-11-08 23:43:52 UTC
Description of problem:
Having used these same devices over and over now, some part of an older super block must still exist somewhere.

# PV2 should be long gone
[root@mckinley-04 ~]# vdo create --name PV2 --device /dev/mapper/mpathg
Creating VDO PV2
vdo: ERROR - VDO volume PV2 already exists

[root@mckinley-04 ~]# vdo list
PV1

# Let's try to start it anyway...
[root@mckinley-04 ~]# vdo start --name PV2
Starting VDO PV2



[ 5624.337587] kvdo27:dmsetup: starting device 'PV2' device instantiation 27 (ti=ffffb8964d111040) write policy sync
[ 5624.349154] kvdo27:dmsetup: underlying device, REQ_FLUSH: not supported, REQ_FUA: not supported
[ 5624.359058] kvdo27:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
[ 5624.369323] kvdo27:dmsetup: loadVolumeGeometry ID mismatch, expected 5, got 268482810: kvdo: Component id mismatch in decoder (2059)
[ 5624.388434] kvdo27:dmsetup: Could not parse geometry block; continuing assuming it's an archaic superblock
[ 5624.400846] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 5624.411116] IP: [<ffffffffc0837546>] makeUDSIndex+0xb6/0x330 [kvdo]
[ 5624.419709] PGD 0
[ 5624.421963] Oops: 0000 [#1] SMP
[ 5624.428695] Modules linked in: dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio kvdo(OE) uds(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass crc32_pclmul dcdbas ghash_clmulni_intel aesni_intel dm_service_time lrw ipmi_si gf128mul glue_helper ablk_helper cryptd ipmi_devintf mei_me joydev pcspkr lpc_ich ipmi_msghandler wmi mei acpi_pad shpchp acpi_power_meter dm_multipath sg nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit qla2xxx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci nvme_fc(T) nvme_fabrics crct10dif_pclmul crct10dif_common drm tg3 libata nvme crc32c_intel scsi_transport_fc megaraid_sas ptp nvme_core
[ 5624.544365]  i2c_core scsi_tgt pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 5624.551129] CPU: 7 PID: 25396 Comm: dmsetup Tainted: G           OE  ------------ T 3.10.0-776.el7.x86_64 #1
[ 5624.562099] Hardware name: Dell Inc. PowerEdge R820/0RN9TC, BIOS 2.0.20 01/16/2014
[ 5624.570552] task: ffff99bdba06cf10 ti: ffff99bdf5a04000 task.ti: ffff99bdf5a04000
[ 5624.578904] RIP: 0010:[<ffffffffc0837546>]  [<ffffffffc0837546>] makeUDSIndex+0xb6/0x330 [kvdo]
[ 5624.588653] RSP: 0018:ffff99bdf5a07a18  EFLAGS: 00010246
[ 5624.594581] RAX: ffff99acff038400 RBX: ffff99bdfc2ae800 RCX: 000000000000042b
[ 5624.602543] RDX: 0000000000000000 RSI: ffff99ad664983c0 RDI: ffffffffc0853052
[ 5624.610506] RBP: ffff99bdf5a07a58 R08: ffffffffc0853035 R09: ffff99ad10976d60
[ 5624.618472] R10: ffff999f7fc03a00 R11: 00000000000012a1 R12: 0000000000000000
[ 5624.626437] R13: ffff99bdfc2aebe0 R14: ffff99bdf5a07b38 R15: ffff99bdece51c18
[ 5624.634402] FS:  00007fd9afbe7840(0000) GS:ffff99bdfeac0000(0000) knlGS:0000000000000000
[ 5624.643431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5624.649846] CR2: 0000000000000018 CR3: 0000001ffe420000 CR4: 00000000000407e0
[ 5624.657823] Call Trace:
[ 5624.660573]  [<ffffffffc0833ee9>] ? logInfo+0x69/0x90 [kvdo]
[ 5624.666913]  [<ffffffffc082f0fc>] makeDedupeIndex+0x3c/0x60 [kvdo]
[ 5624.673833]  [<ffffffffc083900b>] makeKernelLayer+0x74b/0xaf0 [kvdo]
[ 5624.680940]  [<ffffffffc082c195>] vdoInitialize+0x215/0x3c0 [kvdo]
[ 5624.687852]  [<ffffffffc082c62b>] vdoCtr+0x2eb/0x350 [kvdo]
[ 5624.694095]  [<ffffffffc008aad7>] dm_table_add_target+0x177/0x430 [dm_mod]
[ 5624.701778]  [<ffffffffc008e5e7>] table_load+0x157/0x390 [dm_mod]
[ 5624.708588]  [<ffffffffc008e490>] ? retrieve_status+0x1c0/0x1c0 [dm_mod]
[ 5624.716076]  [<ffffffffc008f240>] ctl_ioctl+0x210/0x590 [dm_mod]
[ 5624.722789]  [<ffffffffc008f5ce>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
[ 5624.729502]  [<ffffffffa541ba9d>] do_vfs_ioctl+0x33d/0x540
[ 5624.735628]  [<ffffffffa54c03bf>] ? file_has_perm+0x9f/0xb0
[ 5624.741851]  [<ffffffffa541bd41>] SyS_ioctl+0xa1/0xc0
[ 5624.747503]  [<ffffffffa58d9e49>] system_call_fastpath+0x16/0x1b
[ 5624.754206] Code: 00 00 48 8b 93 c0 03 00 00 48 c7 c7 52 30 85 c0 49 c7 c0 35 30 85 c0 b9 2b 04 00 00 4c 8b 48 08 48 8b 45 d0 48 8b b0 f8 00 00 00 <48> 8b 42 18 48 2b 42 0c ba 00 10 00 00 48 c1 e0 0c 48 89 04 24



Version-Release number of selected component (if applicable):
3.10.0-776.el7.x86_64

lvm2-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-libs-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-cluster-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-lockd-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-python-boom-0.8-2.el7    BUILT: Fri Nov  3 07:48:54 CDT 2017
cmirror-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-libs-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-event-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-event-libs-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017
sanlock-3.5.0-1.el7    BUILT: Wed Apr 26 09:37:30 CDT 2017
sanlock-lib-3.5.0-1.el7    BUILT: Wed Apr 26 09:37:30 CDT 2017
vdo-6.1.0.34-8    BUILT: Fri Nov  3 06:58:45 CDT 2017
kmod-kvdo-6.1.0.34-7.el7    BUILT: Fri Nov  3 06:44:06 CDT 2017


How reproducible:
Twice so far

Comment 2 Bryan Gurney 2017-11-09 13:33:30 UTC
This sounds very similar to BZ 1509466; I just posted the vmcore-dmesg.txt output from the creation of the VDO volume to the kernel oops.

One key difference is that in BZ 1509466, I had a virtual disk that was entirely zeroed:

kvdo0:dmsetup: loadVolumeGeometry ID mismatch, expected 5, got 0:
 kvdo: Component id mismatch in decoder (2059)

...and in this case, you see this message:

kvdo27:dmsetup: loadVolumeGeometry ID mismatch, expected 5, got 268482810: kvdo: Component id mismatch in decoder (2059)

268482810 is 0x1000b8fa, or the byte sequence "fa b8 00 10".  I can get that byte sequence if I run "parted /dev/sdc mklabel msdos":

[root@rhel75test-20171023 ~]# hexdump -C -n 16384 /dev/sdc
00000000  fa b8 00 10 8e d0 bc 00  b0 b8 00 00 8e d8 8e c0  |................|
00000010  fb be 00 7c bf 00 06 b9  00 02 f3 a4 ea 21 06 00  |...|.........!..|
00000020  00 be be 07 38 04 75 0b  83 c6 10 81 fe fe 07 75  |....8.u........u|
00000030  f3 eb 16 b4 02 b0 01 bb  00 7c b2 80 8a 74 01 8b  |.........|...t..|
00000040  4c 02 cd 13 ea 00 7c 00  00 eb fe 00 00 00 00 00  |L.....|.........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00004000


If you run this sequence of commands, you can reproduce this bug:

Starting from a zeroed block device (in this example, /dev/sdc):

1. # vdo create --name=vdo2 --device=/dev/sdc --verbose
2. # vdo stop --name=vdo2
3. # parted /dev/sdc mklabel msdos
4. # vdo start --name=vdo2

Keep in mind that in step 3, parted has overwritten (and corrupted) the first 512 bytes of the geometry block of VDO volume "vdo1".  Note that parted doesn't output a warning, because it doesn't recognize the VDO geometry block as an existing disk label.

If one were to run "parted /dev/sdc mklabel msdos" after having already written an msdos label, you see this warning:

[root@rhel75test-20171023 ~]# parted /dev/sdc mklabel msdos
Warning: The existing disk label on /dev/sdc will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? n

Comment 3 sclafani 2017-11-09 21:11:32 UTC
This is indeed a duplicate of 1509466.

Comment 4 sclafani 2017-11-09 21:14:28 UTC

*** This bug has been marked as a duplicate of bug 1509466 ***


Note You need to log in before you can comment on or make changes to this bug.