Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1685128 - VirtualDomain considers qemu 'paused' virtual as running
Summary: VirtualDomain considers qemu 'paused' virtual as running
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: resource-agents
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 8.1
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1682136
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-04 12:22 UTC by michal novacek
Modified: 2019-03-15 17:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description michal novacek 2019-03-04 12:22:39 UTC
Description of problem:
VirtualDomain resource agent will consider 'paused' virtuals as running. This
happened when host systems ran out of space on /var/lib/libvirt and paused
virtuals. This resulted in non-responsive virtuals without resource agent
noticing.

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-124.el7.x86_64
libvirt-3.9.0-14.el7_5.8.x86_64

How reproducible: always

Steps to Reproduce:
1. Have pacemaker cluster with a running VirtualDomain resource.
2. Pause virtual machine using "virsh suspend <your vm>

Actual results: monitor action happily returning 0

Expected results: monitor failing and pacemaker taking action

Additional info:

I know that I could use 'monitor_script' to resolve this situation but I
consider it workaround. I believe this should be "by default" resolved by
VirtualDomain resource agent. It should check that the virtual machine is
reported as "Started". 'monitor_script' should be used for situations where it
is reported as "Running" by virsh but is not working properly (stuck on boot
for example).

----

$ pcs resource
...
 gitlab-runner-1	(ocf::heartbeat:VirtualDomain):	Started zapp-02

$ pcs resource show gitlab-runner-1
 Resource: gitlab-runner-1 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/var/lib/libvirt/gitlab-runner-1.xml force_stop=yes migration_transport=ssh
  Meta Attrs: allow-migrate=true
  Utilization: cpu=1 hv_memory=1000
  Operations: migrate_from interval=0 timeout=120s (gitlab-runner-1-migrate_from-interval-0)
              migrate_to interval=0 timeout=120s (gitlab-runner-1-migrate_to-interval-0)
              monitor interval=10 timeout=30 (gitlab-runner-1-monitor-interval-10)
              start interval=0s timeout=90 (gitlab-runner-1-start-interval-0s)
              stop interval=0s timeout=90 (gitlab-runner-1-stop-interval-0s)

$ pcs resource debug-monitor gitlab-runner-1
Operation monitor for gitlab-runner-1 (ocf:heartbeat:VirtualDomain) returned: 'ok' (0)

$ virsh list
 Id    Name                           State
----------------------------------------------------
 10    gitlab-runner-1                running
 11    gitlab-runner-2                running

$ ssh gitlab-runner-1 uptime
 13:09:43 up  3:08,  0 users,  load average: 0.00, 0.01, 0.05

$ virsh suspend gitlab-runner-1
Domain gitlab-runner-1 suspended

$ virsh list
 Id    Name                           State
----------------------------------------------------
 10    gitlab-runner-1                paused
 11    gitlab-runner-2                running

$ ssh gitlab-runner-1 uptime
ssh: connect to host gitlab-runner-1 port 22: No route to host

$ sleep 60 && pcs resource debug-monitor gitlab-runner-1
Operation monitor for gitlab-runner-1 (ocf:heartbeat:VirtualDomain) returned: 'ok' (0)


Note You need to log in before you can comment on or make changes to this bug.