Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 155304 - gnbd_monitor doesn't correctly reset after an uncached gnbd has failed and been restored
Summary: gnbd_monitor doesn't correctly reset after an uncached gnbd has failed and be...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gnbd
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-18 21:21 UTC by Ben Marzinski
Modified: 2009-04-16 20:28 UTC (History)
1 user (show)

Fixed In Version: RHBA-2006-0170
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-01-06 20:28:59 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0170 normal SHIPPED_LIVE gnbd bug fix update 2006-01-06 05:00:00 UTC

Description Ben Marzinski 2005-04-18 21:21:39 UTC
when an uncached gnbd fails, gnbd_monitor fences the server if it is
nonresponsive. Then it waits for all current users of the device to close it.
Finally it tries to contact the server at regular intervals.  If the server
comes back up, and reexports the device. gnbd_monitor is supposed to reimport it
and start the monitoring all over again.

Currently, the check to make sure that the reimport was successful is wrong, so
usually, after the device has been successfully reimported, gnbd_monitor will
not reset. The next time that the device fails, gnbd_monitor will skip the fence
steps and simply try and reimport the device.  This means that it cases where
the gnbd server is nonresponsive, but the gnbd server node is still alive,
gnbd_monitor will not fence the server after the first time.
Fixing this problem involves changing the line
if (check_recvd(dev) == 1)
to
if (check_recvd(dev) >= 0)
which is obviously the correct thing to check for.

A related issue is the requirement that gnbd_monitor waits until all users have
closed the device.  This is an unnecessary requirement, and it makes it much
harder to use dm-multipath, since dm-multipath keeps failed paths open.

Comment 1 Red Hat Bugzilla 2006-01-06 20:28:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0170.html



Note You need to log in before you can comment on or make changes to this bug.