Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 162422 - Recovery problem when the gulm master node is fenced
Summary: Recovery problem when the gulm master node is fenced
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gulm
Version: 3
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: michael conrad tadpol tilstra
QA Contact: Cluster QE
URL: https://www.redhat.com/archives/linux...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-07-04 12:28 UTC by Alban Crequy
Modified: 2009-04-16 20:25 UTC (History)
1 user (show)

Fixed In Version: RHBA-2005-723
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-10 15:26:07 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:723 normal SHIPPED_LIVE GFS bug fix update 2005-09-30 04:00:00 UTC
Red Hat Product Errata RHBA-2005:733 normal SHIPPED_LIVE gulm bug fix update 2005-10-07 04:00:00 UTC

Description Alban Crequy 2005-07-04 12:28:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.7.6) Gecko/20050322

Description of problem:
When the gulm master node also mount a GFS filesystem, the fencing process does not run properly if the gulm master node has to be fenced.

I may loose data because the recovery process begun too early (before the fence is finished).


Version-Release number of selected component (if applicable):
GFS-6.0.2.20-2

How reproducible:
Always

Steps to Reproduce:
1. Get a 8-nodes cluster.
2. Choose 5 nodes for gulm servers.
3. Mount a GFS filesystem of your 8 nodes.
4. Unplug the network of the current gulm master.
5. Wait until another gulm server become the master.
6. Do NOT run fence_ack_manual and check if the locks of the unplugged node are released.
  

Actual Results:  1. The locks are released immediately when another gulm server become the master.
2. The journal is recovered by another node immediately too.

Expected Results:  The recovery process should wait that the user run fence_ack_manual

Additional info:

More explanations here:
https://www.redhat.com/archives/linux-cluster/2005-July/msg00000.html

I file this bugzilla as requested here:
https://www.redhat.com/archives/linux-cluster/2005-July/msg00006.html

Comment 1 michael conrad tadpol tilstra 2005-07-05 13:40:33 UTC
Have you tried this with only three nodes are gulm servers?  How does it behave
then?

Comment 2 michael conrad tadpol tilstra 2005-07-05 14:01:38 UTC
just checked with three nodes. bug is there too.

Comment 3 michael conrad tadpol tilstra 2005-07-05 15:08:36 UTC
check_for_stale_expires() is tripping on everyone.  it only runs if a jid
mapping is marked 1. (live mappings are marked 2).  Only time jidmapping is
marked 1 is when a node other than owner is replaying the journal.  Why are the
live mappings getting switched to 1 from 2? i duno, but I bet that's the bug
right there.  I look deeper.

Comment 4 michael conrad tadpol tilstra 2005-07-05 18:00:22 UTC
Fixed the issue that was in comment #3 didn't fix the bug.  Digging more.

Comment 5 michael conrad tadpol tilstra 2005-07-05 18:05:17 UTC
Bug only appears when master lock server is also mounting gfs.
So a workaround is to put the lockservers onto dedicated nodes.


Comment 7 michael conrad tadpol tilstra 2005-07-19 15:21:13 UTC
There was a kludge that tried to fix something, but I cannot find or figure what
it was suppose to fix.  That kludge was causing this.  Betting on this being a
bigger problem that whatever it was trying to fix and removing the kludge.


I think what it tried to fix was some weird end case where multiple clients and
lock servers failed in some way. 

Comment 9 Red Hat Bugzilla 2005-09-30 14:56:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-723.html


Comment 10 Red Hat Bugzilla 2005-10-07 16:43:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-733.html


Comment 11 Red Hat Bugzilla 2005-10-10 15:26:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-723.html



Note You need to log in before you can comment on or make changes to this bug.