Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 228101 - when a node is fenced it cannot rejoin the cluster
Summary: when a node is fenced it cannot rejoin the cluster
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Steven Dake
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-09 22:57 UTC by Josef Bacik
Modified: 2016-04-26 13:48 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-02-14 18:58:37 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Josef Bacik 2007-02-09 22:57:14 UTC
Description of problem:
In attempting to do GFS2 testing, I've found that if I start cman on one of my 
nodes and the other node hasn't started it yet, it will fence that node as 
expected.  The problem is that when the second node comes up it cannot join 
the cluster, and the node that is currently running just loops spitting out 
this in /var/log/messages

Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] Sending initial ORF token
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] New Configuration:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ]      r(0) ip(10.10.1.13)
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Left:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Joined:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [SYNC ] This node is within the 
primary component and will provide service.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] New Configuration:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ]      r(0) ip(10.10.1.13)
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Left:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Joined:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [SYNC ] This node is within the 
primary component and will provide service.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] entering OPERATIONAL state.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] got nodejoin message 
10.10.1.13
Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] entering GATHER state from 
11.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering GATHER state from 
0.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Creating commit token 
because I am the rep.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Saving state aru 9 high seq 
received 9
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering COMMIT state.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering RECOVERY state.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] position [0] member 
10.10.1.13:
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] previous ring seq 160 rep 
10.10.1.13
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] aru 9 high delivered 8 
received flag 0
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Did not need to originate 
any messages in recovery.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Storing new sequence id for 
ring a4

I will look into this more next week, but I'm still in the process of reading 
the openais code so I'm not in a position to intelligently troubleshoot this 
yet.

Version-Release number of selected component (if applicable):

[root@rh5cluster2 ~]# rpm -q openais
openais-0.80.2-1.el5

How reproducible:
Every time

Steps to Reproduce:
1.bring both nodes up without starting cman
2.start cman on one node and let it fence the other node
  
Actual results:
The fenced node isn't allowed to join the cluster and the node that is 
currently up just loops.

Expected results:
It should let the node join.

Comment 1 Josef Bacik 2007-02-14 18:58:37 UTC
ok i'm an idiot, i had iptables turned on on the second node.  closing this.


Note You need to log in before you can comment on or make changes to this bug.