Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 236595 - Guest Reboot Fails, 30 Second Shutdown Timeout
Summary: Guest Reboot Fails, 30 Second Shutdown Timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Daniel Berrange
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-16 17:41 UTC by Devan Goodwin
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHEA-2007-0635
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 17:10:18 UTC
Target Upstream Version:


Attachments (Terms of Use)
Don't destroy guests on reboot timeout (deleted)
2007-07-19 19:41 UTC, Daniel Berrange
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2007:0635 normal SHIPPED_LIVE xen enhancement update 2007-10-30 15:49:02 UTC

Description Devan Goodwin 2007-04-16 17:41:19 UTC
When rebooting a guest on sufficiently slow hardware, guest shuts down but does
not come back.

Ticket filed with Xen: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=967

Was suggested we create this ticket as a blocker for an RHN ticket to make sure
we don't lose track of the issue.

Details from the Xen ticket:

Encountered a problem where attempting to reboot guests with xm or virsh
resulted in the guest being destroyed and not coming back up. Found the
following xend.log entries:

[2007-04-16 11:07:15 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:940)
XendDomainInfo.handleShutdownWatch
[2007-04-16 11:07:15 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:940)
XendDomainInfo.handleShutdownWatch
[2007-04-16 11:07:45 xend.XendDomainInfo 2129] INFO (XendDomainInfo:930) Domain
shutdown timeout expired: name=sanjose id=5
[2007-04-16 11:07:45 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:1463)
XendDomainInfo.destroy: domid=5
[2007-04-16 11:07:45 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:1471)
XendDomainInfo.destroyDomain(5)

Shutdown timeout expires exactly 30 seconds after the first call to
handleShutdownWatch, and watching the guest console it appears the guest needs
just slightly more than 30 seconds to shutdown on the hardware in question.

Suspect a 30 second hard coded timeout which is likely too short.


How reproducible:

Depends on hardware, system in question was rlx-0-04.rhndev.redhat.com.

Comment 1 Daniel Berrange 2007-04-17 13:02:14 UTC
I searched for the 'shutdown timeout expired' message and found it in

./python/xen/xend/XendDomainInfo.py

It checks to see if the domain has been shutting down for > SHUTDOWN_TIMEOUT,
and if so kills it.

                if self.shutdownStartTime:
                    timeout = (SHUTDOWN_TIMEOUT - time.time() +
                               self.shutdownStartTime)
                    if timeout < 0:
                        log.info(
                            "Domain shutdown timeout expired: name=%s id=%s",
                            self.info['name'], self.domid)
                        self.destroy()


SHUTDOWN_TIMEOUT is set to '30' at the top of the file. I reckon we need to bump
this up to 60 seconds at least.


Comment 2 Clifford Perry 2007-04-18 12:39:32 UTC
Flagging the bug as proposed for RHEL 5.1. Seems like easy modification. 

Comment 3 RHEL Product and Program Management 2007-04-18 12:45:13 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Daniel Berrange 2007-07-19 18:52:38 UTC
*** Bug 248942 has been marked as a duplicate of this bug. ***

Comment 5 Daniel Berrange 2007-07-19 19:37:42 UTC
Upstream Xen has removed the shutdown timer completely, allowing the admin to
deal with non-responsive guests as they see fit. They can run a 'destroy'
manually if desirable, or take other action.

changeset:   15179:152dc0d812b2
user:        kfraser@localhost.localdomain
date:        Wed May 30 10:06:23 2007 +0100
summary:     xend: Don't destroy domains on shutdown timeout.


Comment 6 Daniel Berrange 2007-07-19 19:41:14 UTC
Created attachment 159606 [details]
Don't destroy guests on reboot timeout

This patch is a copy of upstream code ported to RHEL-5 tree

Comment 8 Daniel Berrange 2007-08-27 22:52:57 UTC
Patch applied in:

* Mon Aug 27 2007 Daniel P. Berrange <berrange@redhat.com> - 3.0.3-37.el5
- Don't destroy guest after shutdown timeout (rhbz #236595)


Comment 11 errata-xmlrpc 2007-11-07 17:10:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0635.html



Note You need to log in before you can comment on or make changes to this bug.