Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1692230 - Snapshot creation seemed to succeed, but metadata not created:
Summary: Snapshot creation seemed to succeed, but metadata not created:
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.8-2
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ovirt-4.3.4
: ---
Assignee: Nir Soffer
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-25 05:43 UTC by Marcus West
Modified: 2019-04-14 13:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)

Description Marcus West 2019-03-25 05:43:22 UTC
## Description of problem:

Every evening, customer snapshosts all VM's as part of Commvault backup process.  We noticed a one-off failure to delete a snapshot.  On closer inspection, the snapshot creation initially reported to succeed, but we did see the following error in the logs:

  2019-03-22 01:19:31,835+13 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [...] Failed building DiskImage: candidate can not be null please use static method createGuidFromString
  2019-03-22 01:19:31,836+13 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [...] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand' return value '
  VolumeInfoReturn:{status='Status [code=0, message=Done]'}
  status = INVALID
  truesize = 0
  apparentsize = 0
  children:
  []
  '

The snapshot delete was failing as the metadata slot was empty:

  NONE=####################################################################....

## Version-Release number of selected component (if applicable):

ovirt-engine-4.2.8.2-0.1.el7ev.noarch  Tue Feb 12 09:58:45 2019
vdsm-4.20.46-1.el7ev.x86_64            Thu Jan 17 07:00:24 2019 (rhvh--4.2.8.0--0.20190116)


## How reproducible:

This environment does multiple snapshots through the evening (Commvault), but we have only one we have noticed so far.

## Steps to Reproduce:
1. take a snapshot to do a backup (Commvault)
2. delete snapshot after backup is done
3.

## Actual results:

Snapshot 'succeeds' but metadata information is not there

## Expected results:

If snapshot volume create failed, snapshot should be rolled back

## Additional info:

Why did snapshot create go ahead even though the voulume create failed? 
 
What happened to the metadata?  Even if this is a one-off storage issue, engine should be robust enough to roll back the snapshot from a failed VolumeCreate

Comment 2 Germano Veit Michel 2019-03-25 05:57:51 UTC
I was checking this with Marcus, we seem to have 2 problems here:

1) The metadata for the volume was empty ~20 seconds after volume creation (GetVolumeInfo at 01:19:31 shows empty metadata, volume just created at 01:19:07). So most likely this metadata was never written.

2) Engine went ahead with the snapshotVdsCommand even after seeing the empty metadata for the created volume. I don't think SnapshotVDSCommand should have been sent because GetVolumeInfoVDSCommand showed a bad volume.

Regarding 1, there seem to be something wrong with the LVM metadata of the SD, we see some random and weird failures in the logs. Its a big SD, 50T and 1300LVs with a lot of fragmentation.


Note You need to log in before you can comment on or make changes to this bug.