Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 594476 - status check program for vm.sh & user-controlled error tolerance
Summary: status check program for vm.sh & user-controlled error tolerance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 583788
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-20 19:27 UTC by Benjamin Kahn
Modified: 2016-04-26 15:24 UTC (History)
11 users (show)

Fixed In Version: rgmanager-2.0.52-6.el5_5.8
Doc Type: Enhancement
Doc Text:
Previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program.
Clone Of:
Environment:
Last Closed: 2010-08-25 06:33:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0647 normal SHIPPED_LIVE rgmanager bug fix and enhancement update 2010-08-25 06:33:29 UTC

Description Benjamin Kahn 2010-05-20 19:27:56 UTC
This bug has been copied from bug #583788 and has been proposed
to be backported to 5.5 z-stream (EUS).

Comment 11 yeylon@redhat.com 2010-06-21 15:43:58 UTC
we need some more improvement to the rhevm-check validation.

1. in current state we have only one timeout interval for rhev-check every X min.
due to the VM restart take ~ 5 min. this is the minimum limit that the test can run. this is unelectable for the rhevm node period for downtime (5 min interval + 5 min boot time will cause 10 min. of downtime)

we need to add a way to reduce this timeout to a more manner time.
one way to do this is by adding two different types of intervals
a. interval= X - for regular testing
b. after_failure_interval = Y - time to wait after the VM was restarted before initial testing

2. in the current state after one failure of the rhev-check.sh the rhevm node will be rebooted which is not the best way to go, we need to take in account possible scenarios that the VM did not response due to load or other possible scenarios.

we need to add a way to test few times before we determining if the RHEVM VM is dead. lets say if rhev-check.sh return error MSG once keep retry for X times for Y intervals and if all attempts has failed migrate the VM  

__max_failures="5" __failure_expire_time="60"

3. in current state the VM shutdown is being executed using virtsh shutdown and after 15 sec the KVM process is being killed so the VM did not have time to properly shutdown which can (and will) lead for corruption. (i had one) we need to increase the timeout between the shutdown of the VM and the process being killed (100~120 sec. should be fine)

Comment 15 yeylon@redhat.com 2010-06-29 11:20:02 UTC
looks like at this stage rhev-check.sh does not work as expected. the 5 min timeout for starting a VM never ends.

1. migrate the VM service.
2. as soon as the VM was relocated kill the KVM process on the server
3. see that the rhev-check keep getting errors but will not try to migrate the service once again after 5 min as expected but only after half an hour.

this will require respin.

Comment 20 errata-xmlrpc 2010-08-25 06:33:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0647.html

Comment 21 Florian Nadge 2010-10-18 17:33:09 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program.

Comment 22 Florian Nadge 2010-10-18 17:33:21 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program.+Previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program.


Note You need to log in before you can comment on or make changes to this bug.