Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1512631 - failing vdo status commands should mention vdoconf.yml as a possible solution
Summary: failing vdo status commands should mention vdoconf.yml as a possible solution
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kmod-kvdo
Version: 7.5
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Corey Marthaler
QA Contact: vdo-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-13 17:42 UTC by Corey Marthaler
Modified: 2018-02-28 12:07 UTC (History)
5 users (show)

Fixed In Version: 6.1.0.85
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-03 20:57:49 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0900 normal SHIPPED_LIVE new packages: kmod-kvdo 2018-04-10 13:54:47 UTC

Description Corey Marthaler 2017-11-13 17:42:24 UTC
Description of problem:
I had to clean up from bug 1512624 by removing the underlying storage (vdo failed to allow me to remove the old device) and starting again (even remembering to zero out the new storage to avoid bug 1510558).

# New device
[root@mckinley-04 ~]# lvs -a -o +devices
  LV   VG               Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices               
  LV   snapper_thinp    -wi-a-----  10.00g                                                     /dev/mapper/mpatha1(0)

[root@mckinley-04 ~]# dd if=/dev/zero of=/dev/snapper_thinp/LV bs=4096 count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 15.1227 s, 271 MB/s

[root@mckinley-04 ~]# vdo create --name glarch --vdoLogicalSize 20G --device /dev/snapper_thinp/LV
Creating VDO glarch
Starting VDO glarch
Starting compression on VDO glarch
VDO instance 1 volume is ready at /dev/mapper/glarch

# /dev/snapper_thinp/origin was the old device, long gone now.
[root@mckinley-04 ~]# vdo status 
vdo: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/snapper_thinp/origin' with No such file or directory

Nov 13 11:30:22 mckinley-04 vdo: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/snapper_thinp/origin' with No such file or directory

[root@mckinley-04 ~]# vdo list
glarch

# I'll reboot and try again...


Version-Release number of selected component (if applicable):
vdo-6.1.0.46-9    BUILT: Fri Nov 10 15:47:57 CST 2017
kmod-kvdo-6.1.0.46-8.el7    BUILT: Fri Nov 10 16:03:57 CST 2017

Comment 2 Bryan Gurney 2017-11-13 17:58:43 UTC
There may have been a version mismatch here, due to BZ 1510176 causing the old module to not be unloaded after the "yum remove" and "yum install" phase.  Also, BZ 1511096 covers the module not displaying its version in modinfo, which would have otherwise identified the old module.

Let me know what the remove / create sequence looks like after the reboot.

Comment 3 Bryan Gurney 2017-11-13 20:47:06 UTC
Corey let me know that the issue survives after the reboot; however, the manual removal was incomplete, because there was still an entry in the /etc/vdoconf.yml file.  Since it was the only entry, he removed the /etc/vdoconf.yml file, and the "Failed to make FileLayer" error message for the nonexistent device no longer appeared.

Comment 4 Bryan Gurney 2017-11-13 20:58:16 UTC
The remaining question: is there a message that the "vdo" command can relay off of the vdodumpconfig "Failed to make FileLayer... No such file or directory" message?  It could be something to convey that there could be a configuration entry for a VDO volume stored on a device that no longer exists.

Comment 5 bjohnsto 2017-11-14 18:52:13 UTC
I'm not sure we can do anything special here. vdodumpconfig really has no knowledge of vdo manager or its config file, nor should it.

Comment 6 Louis Imershein 2017-11-14 20:01:20 UTC
Could the message describe a probable cause for the error conditiion?  Would that be useful or misleading?

Comment 7 bjohnsto 2017-11-17 19:33:58 UTC
Rereading the question here. Let me rephrase my answer.

Is there something we could do? Sure. I'm just not sure its the best idea. We could parse vdodumpconfig output to stderr for specific error messages and then try to relog them as more vdo specific things. But we would have to be very sure about what mappings to create. Also, we don't really do this sort of this now with other tools we use, like vdoformat for instance. We just let the tool display what error it gets. 

This feels like it should be a PM or CEE decision.

Comment 8 Louis Imershein 2017-11-29 19:06:50 UTC
At the minimnum, we should make sure that our generic messages provide information about common potential causes of failures.

It is better if we can give the customer more direction with a couple days of engineering effort, i think we should.  If it's more than that, we should think about putting more planning into doing it for a future release.

Comment 9 Corey Marthaler 2017-11-29 21:31:27 UTC
Here's another 'vdo status' failure after a successful creation (but with a left over entry from a failed prior vdo creation) that again survives reboots.

[root@host-116 ~]# vdostats --human-readable
Device                    Size      Used Available Use% Space saving%
/dev/mapper/origin       20.0G      4.0G     16.0G  20%           94%

[root@host-116 ~]# vdo status
vdo: ERROR - VDO volume PV previous operation (create) is incomplete

Nov 29 15:10:28 host-116 vdo: ERROR - VDO volume PV previous operation (create) is incomplete

After removing the invalid entry (caused by a prior failed create), vdo status worked again.

If vdo status is failing we need to educate users (i'd argue in the failure message itself) about the /etc/vdoconf.yml file if manually editing/cleaning it is going to be the only way in which to have the status command work again.

Comment 10 bjohnsto 2017-12-05 00:17:30 UTC
(In reply to Corey Marthaler from comment #9)
> Here's another 'vdo status' failure after a successful creation (but with a
> left over entry from a failed prior vdo creation) that again survives
> reboots.
> 
> [root@host-116 ~]# vdostats --human-readable
> Device                    Size      Used Available Use% Space saving%
> /dev/mapper/origin       20.0G      4.0G     16.0G  20%           94%
> 
> [root@host-116 ~]# vdo status
> vdo: ERROR - VDO volume PV previous operation (create) is incomplete
> 
> Nov 29 15:10:28 host-116 vdo: ERROR - VDO volume PV previous operation
> (create) is incomplete
> 
> After removing the invalid entry (caused by a prior failed create), vdo
> status worked again.
> 
> If vdo status is failing we need to educate users (i'd argue in the failure
> message itself) about the /etc/vdoconf.yml file if manually editing/cleaning
> it is going to be the only way in which to have the status command work
> again.

If you have an entry in the config from a failed previous create, you shouldn't need to manually edit the config file (I would never suggest doing this ever). You should be able to run vdo remove with the --force method to clear it from the config file.

Comment 11 bjohnsto 2017-12-05 00:18:44 UTC
(In reply to bjohnsto from comment #10)
> (In reply to Corey Marthaler from comment #9)
> > Here's another 'vdo status' failure after a successful creation (but with a
> > left over entry from a failed prior vdo creation) that again survives
> > reboots.
> > 
> > [root@host-116 ~]# vdostats --human-readable
> > Device                    Size      Used Available Use% Space saving%
> > /dev/mapper/origin       20.0G      4.0G     16.0G  20%           94%
> > 
> > [root@host-116 ~]# vdo status
> > vdo: ERROR - VDO volume PV previous operation (create) is incomplete
> > 
> > Nov 29 15:10:28 host-116 vdo: ERROR - VDO volume PV previous operation
> > (create) is incomplete
> > 
> > After removing the invalid entry (caused by a prior failed create), vdo
> > status worked again.
> > 
> > If vdo status is failing we need to educate users (i'd argue in the failure
> > message itself) about the /etc/vdoconf.yml file if manually editing/cleaning
> > it is going to be the only way in which to have the status command work
> > again.
> 
> If you have an entry in the config from a failed previous create, you
> shouldn't need to manually edit the config file (I would never suggest doing
> this ever). You should be able to run vdo remove with the --force method to
> clear it from the config file.

meant the --force option, not method.

Comment 13 Jakub Krysl 2017-12-15 09:44:49 UTC
I am not able to hit the error using vdo status. I reproduced it with vdo start by stopping the vdo, removing the lv under it and starting it again. At this point there is only the /etc/vdoconfig.yml entry, which makes the vdo think this particular volume still exists. 
# vdo start --name vdo                                                                                                                                                                        
Starting VDO vdo                                                                                                                                                                                                  
vdo: ERROR - Could not set up device mapper for vdo                                                                                                                                                               
vdo: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/mapper/vg-lv' with No such file or directory

Using 'vdo remove --name vdo --force' resolves this. But there is no change to the vdodumpconfig error as Louis suggested to direct customer to this solution.
Is it possible to maybe check if underlying device still exists when this error is triggered and if not, give the --force option suggestion?

Comment 14 Joe Shimkus 2017-12-21 19:58:28 UTC
Stepping outside the boundaries of defined management practices one can create scenarios which are (barring bugs) not possible within those boundaries. For any such  scenario we can know what "correct" (meaning "what we want") response should occur.  This, though, is only because we know the totality of the specifically crafted scenario and the desired outcome.

This is not to say that such scenarios are impossible in the "real world."  Given human fallibility it is well within the realm of possibilities that an error (whether of oversight or deliberate action) can arise.  These real world occurrences do not provide the complete view of the constructed scenarios.  As a consequence, determinism as to the correct response is impossible to achieve.

Consider the situation described in Jakub's comment of 2017-12-15.  We know what the "correct" response is because the scenario was crafted to evoke that response.  In the case of an user erroneously removing the logical volume that same response is incorrect.  The user, hopefully being able to non-destructively reconstruct the logical volume's description, will want the vdo instance to remain.

As far as is possible we should provide correct, precise information and advice to the user.  Unfortunately, not all possible scenarios can be so handled and require human intervention.

Comment 15 Joe Shimkus 2018-01-02 20:59:32 UTC
Corey,
I'm assigning the bug to you because it's marked ON_QA and you reported it.  If it should be assigned to someone else I would appreciate it if you would do so.

Thanks.

Comment 16 Corey Marthaler 2018-01-03 15:42:36 UTC
I'm moving this back to assigned for now as the move to modified appears to have been invalid w/o an actual fix for this issue. Please correct me if I'm wrong.

I think the best bet here is to have devel close this bug as either WONTFIX or NOTABUG.

Originally it was thought that editing the vdoconf file was the only way to remedy this situation, but then it was learned that a 'vdo remove --force' appears to work for these types of issues as well. If we come across a scenario in the future where the force doesn't work then we can reopen this bug.

Comment 17 Joe Shimkus 2018-01-03 20:57:49 UTC
As agreed yesterday (2018-01-02) in #vdo we're marking this as NOTABUG.
In any specific test scenario one should attempt 'vdo remove --force' for the particular vdo and if that fails open a new bug (reopen this one only if changing the description).
We are not specifically including recommendation in the face of these scenarios to use 'vdo remove --force' as differentiating between a test scenario and a failure/mistake in the field is not possible.


Note You need to log in before you can comment on or make changes to this bug.