Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 232216 - IP failover ignoring restricited configuration
Summary: IP failover ignoring restricited configuration
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-14 13:35 UTC by Dave Berry
Modified: 2009-04-16 20:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-31 18:30:43 UTC


Attachments (Terms of Use)

Description Dave Berry 2007-03-14 13:35:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070226 Fedora/1.5.0.10-1.fc6 Firefox/1.5.0.10 pango-text

Description of problem:
3 node GFS cluster sharing 2 virtual IPs as 2 different services.
IPs are listed as services in the cluster.conf and the failover is set to use ordered/restricted. 
IP failover when the box goes down but does not return to the correctly prioritized box when it returns. 

<failoverdomain name="ip_domain2" ordered="1" restricted="1">
                                <failoverdomainnode name="fs102" priority="1"/>
                                <failoverdomainnode name="fs101" priority="2"/>
                                <failoverdomainnode name="fs02" priority="3"/>





Version-Release number of selected component (if applicable):
rgmanager-1.9.54-1

How reproducible:
Always


Steps to Reproduce:
1. Configure failover domain with ordered/restricted flags and service of a VIP
2. Shutdown primary box and failover IP
3. Bring up primaryy box and watch logs to see service returns

Actual Results:
Service does not fail back

 Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Relocating group nfs_ip2 to better node fs102
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Event (0:2:1) Processed
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <notice> Stopping service nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #52: Failed changing RG status
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Handling failure request for RG nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #57: Failed changing RG status

Expected Results:
The IP should fail back to better node

Additional info:

Comment 1 Lon Hohberger 2007-03-20 19:52:28 UTC
This isn't actually a policy bug; the cause of error #52 is the key here - that
shouldn't happen.  Could you try with the 1.9.54-3.228823 packages available here: 

http://people.redhat.com/lhh/packages.html

Comment 2 Dave Berry 2007-03-21 12:58:16 UTC
Tried with new  rgmanager package and I get the same results

Mar 20 16:49:03 fs102 clurgmgrd[5659]: <info> State change: fs101 UP 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip1, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Relocating group nfs_ip1 to
better node fs101 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip2, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Event (0:3:1) Processed 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <notice> Stopping service nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #52: Failed changing RG status 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Handling failure request for RG
nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #57: Failed changing RG status 
Mar 20 16:49:19 fs102 clurgmgrd: [5659]: <debug> Checking 172.16.1.224, Level 0 

Comment 3 Lon Hohberger 2007-03-26 15:13:19 UTC
Hi,

I tried to reproduce this several times - and haven't been able to.  Could you
give me some hints about your systems?  It must be some sort of a race condition.

Last Thursday, I received a patch from a community user of linux-cluster which
*may* address this if you're willing to try it (though, I must be clear, I
couldn't get it to happen with or without their patch).  The reason it *may*
address this is because it fixes two bugs in the view-formation (data
distribution) code and an error case in the rgmanager message code.

Comment 4 Lon Hohberger 2007-03-26 15:14:41 UTC
By hints, I mean things like RAM / processor speed / # of cores

Comment 5 Dave Berry 2007-04-18 14:31:43 UTC
Both boxes are identical(Dell 1950s)
2 Dual Core Intel Xeon 2Ghz processors
2GB RAM
Qlogic QLA2432 fibre card 
Broadcom BCM5708 Gigabit Ethernet

Comment 6 Lon Hohberger 2007-05-02 12:56:52 UTC
Ok - I'll have to build using the patch from the community users.  The patch
addresses several things - including bugs in the vft subsystem (the part that's
throwing errors :) ).

Comment 7 Lon Hohberger 2007-05-16 15:49:31 UTC
This *should* be fixed in 4.5; could you retest on the current rgmanager package?

Comment 8 Lon Hohberger 2007-07-31 18:30:43 UTC
Per comment #3.


Note You need to log in before you can comment on or make changes to this bug.