|Summary:||clvmd can deadlock when starting up again|
|Product:||[Retired] Red Hat Cluster Suite||Reporter:||Corey Marthaler <cmarthal>|
|Component:||dlm||Assignee:||Christine Caulfield <ccaulfie>|
|Status:||CLOSED NEXTRELEASE||QA Contact:||Cluster QE <mspqa-list>|
|Version:||4||CC:||agk, ccaulfie, cluster-maint, teigland|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2005-06-07 14:57:05 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Corey Marthaler 2005-04-22 16:07:14 UTC
Description of problem: Both Dean and I have seen this multiple times on our respective clusters, but since it happens during init we have limited debugging abilities and are unable to pinpoint the exact state which is causing this. We added a 'set -x' to the clvmd script and get this output up to the time of the hang. We then tried to add an strace to the vgchange but have yet to reproduce since. Apr 20 11:52:39 morph-02 clvmd: + start Apr 20 11:52:39 morph-02 clvmd: + for rtrn in 0 Apr 20 11:52:39 morph-02 clvmd: + pidof clvmd Apr 20 11:52:39 morph-02 clvmd: + echo -n 'Starting clvmd:' Apr 20 11:52:39 morph-02 clvmd: + clvmd Apr 20 11:52:41 morph-02 clvmd: Cluster LVM daemon started - connected to CMAN Apr 20 11:52:41 morph-02 kernel: CMAN: WARNING no listener for port 11 on node m orph-05 Apr 20 11:52:41 morph-02 kernel: CMAN: WARNING no listener for port 11 on node m orph-01 Apr 20 11:52:41 morph-02 clvmd: + rtrn=0 Apr 20 11:52:41 morph-02 clvmd: + '[' 0 -eq 0 ']' Apr 20 11:52:41 morph-02 clvmd: + success Apr 20 11:52:41 morph-02 clvmd: + '[' -z '' ']' Apr 20 11:52:41 morph-02 clvmd: + initlog -q -n /etc/rc3.d/S24clvmd -s '' -e 1 Apr 20 11:52:41 morph-02 clvmd: succeeded Apr 20 11:52:41 morph-02 clvmd: + '[' serial '!=' verbose -a -z '' ']' Apr 20 11:52:41 morph-02 clvmd: + echo_success Apr 20 11:52:41 morph-02 clvmd: + '[' serial = color ']' Apr 20 11:52:41 morph-02 clvmd: + echo -n '[ ' Apr 20 11:52:41 morph-02 clvmd: + '[' serial = color ']' Apr 20 11:52:41 morph-02 clvmd: + echo -n OK Apr 20 11:52:41 morph-02 clvmd: + '[' serial = color ']' Apr 20 11:52:41 morph-02 clvmd: + echo -n ' ]' Apr 20 11:52:41 morph-02 clvmd: + echo -ne '\r' Apr 20 11:52:41 morph-02 clvmd: + return 0 Apr 20 11:52:41 morph-02 clvmd: + return 0 Apr 20 11:52:41 morph-02 clvmd: Apr 20 11:52:41 morph-02 clvmd: + echo Apr 20 11:52:41 morph-02 clvmd: + /usr/sbin/vgscan Apr 20 11:52:41 morph-02 clvmd: + '[' -n '' ']' Apr 20 11:52:41 morph-02 clvmd: Activating lvms: Apr 20 11:52:41 morph-02 clvmd: + echo -n 'Activating lvms:' Apr 20 11:52:41 morph-02 clvmd: + /usr/sbin/vgchange -ayl [HANG] This can cause quite a problem since hanging in the init process causes the machine to not start up. The only way we've found to get out of this once we hit it is to delete all the volumes on the other nodes in the cluster and attempt to boot once again. We have seen this when rebooting all nodes at once or by keeping one node up and rebooting all others. Although neither way reproduces this reliably. :( Version-Release number of selected component (if applicable): Cluster LVM daemon version: 2.01.09 (2005-04-04) How reproducible: every so often
Comment 1 Alasdair Kergon 2005-04-22 16:12:15 UTC
More config info: What is your lv configuration? (eg output of lvs, vgs, pvs, fstab etc.) Are you using the intended RHEL4 U1 mkinitrd package?
Comment 2 Corey Marthaler 2005-04-22 18:43:06 UTC
I'm not sure the lv config matters as we've seen it with a multitude of them but the current one I have is: [root@tank-02 ~]# rpm -qa | grep mkinitrd mkinitrd-126.96.36.199-1 [root@tank-02 ~]# lvdisplay --- Logical volume --- LV Name /dev/gfs/gfs0 VG Name gfs LV UUID Ftjr1N-qmsn-fvoo-3ZDY-K70C-3cX0-CYVwqC LV Write Access read/write LV Status available # open 1 LV Size 460.14 GB Current LE 117796 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:2 --- Logical volume --- LV Name /dev/gfs/gfs1 VG Name gfs LV UUID zXITiV-n9hm-Iy2u-K97Q-zAE9-OZKU-TOlLMH LV Write Access read/write LV Status available # open 1 LV Size 460.14 GB Current LE 117796 Segments 2 Allocation inherit Read ahead sectors 0 Block device 253:3 --- Logical volume --- LV Name /dev/gfs/gfs2 VG Name gfs LV UUID 9XlzeF-g3HY-b6Y7-k7xD-QHqF-sz7a-IHGuWG LV Write Access read/write LV Status available # open 1 LV Size 460.15 GB Current LE 117798 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:4 [root@tank-02 ~]# vgdisplay --- Volume group --- VG Name gfs System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 4 VG Access read/write VG Status resizable Clustered yes Shared no MAX LV 0 Cur LV 3 Open LV 3 Max PV 0 Cur PV 2 Act PV 2 VG Size 1.35 TB PE Size 4.00 MB Total PE 353390 Alloc PE / Size 353390 / 1.35 TB Free PE / Size 0 / 0 VG UUID nVr361-5iqu-lDqq-RSXF-XHfe-jfql-1Ne7lV [root@tank-02 ~]# pvdisplay --- Physical volume --- PV Name /dev/sda1 VG Name gfs PV Size 690.21 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 176695 Free PE 0 Allocated PE 176695 PV UUID mcecjV-f1Hi-31a9-5hN1-CZpI-5Ks8-ipjvEr --- Physical volume --- PV Name /dev/sda2 VG Name gfs PV Size 690.21 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 176695 Free PE 0 Allocated PE 176695 PV UUID qNLOfT-gdNy-Zrto-9dbx-981f-Zh5f-c6BlUu [root@tank-02 ~]# cat /etc/fstab # This file is edited by fstab-sync - see 'man fstab-sync' for details /dev/VolGroup00/LogVol00 / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 none /dev/pts devpts gid=5,mode=620 0 0 none /dev/shm tmpfs defaults 0 0 none /proc proc defaults 0 0 none /sys sysfs defaults 0 0 /dev/VolGroup00/LogVol01 swap swap defaults 0 0 /dev/gfs/gfs0 /mnt/gfs0 gfs debug 0 0 /dev/gfs/gfs1 /mnt/gfs1 gfs debug 0 0 /dev/gfs/gfs2 /mnt/gfs2 gfs debug 0 0 /dev/gfs/gfs3 /mnt/gfs3 gfs debug 0 0 /dev/hdc /media/cdrom auto pamconsole,exec,noauto,managed 0 0 /dev/fd0 /media/floppy auto pamconsole,exec,noauto,managed 0 0
Comment 3 Christine Caulfield 2005-04-26 14:57:07 UTC
strace of or debugging output from clvmd might also be handy if you can get them
Comment 5 Christine Caulfield 2005-05-10 08:21:44 UTC
I'm now fairly sure that this is a DLM bug.
Comment 6 David Teigland 2005-05-24 14:20:10 UTC
Here's the test script Patrick and I have been running. "sleeprand" sleeps for a random number of seconds between 1 and N. The killall -9 on one of the nodes leaves the dlm lockspace running on that one node so restarting clvmd will reuse that lockspace. #!/bin/bash set -x while [ 1 ] do ssh root@bench-21 killall -9 clvmd ./doall killall clvmd ~/sleeprand 40 ./doall clvmd ./doall vgchange -aly done
Comment 7 David Teigland 2005-05-25 07:09:52 UTC
Another test I'm running on 8 nodes at once is a loop around: service clvmd start sleeprand 10 service clvmd stop sleeprand 10
Comment 8 Corey Marthaler 2005-05-26 16:00:12 UTC
I finally saw this again last night after 46 iterations of running revolver with the initscripts flag. This was to a five node cluster (tank-01 - 05) and one gfs filesystem. On the 46th iteration, tank-02, 03, and 05 were shot and when they came back up all three were stuck at the clvmd init start up. After being stuck over night, I power cycled tank-05, and that allowed the other two nodes to become free and continue the start up process. Also tank-05 (after being power cycled) was able to complete it's start up process. So this time it wasn't a case of, once this is hit, one has to boot single, clean up volumes, or boot a kern w/o the init scripts on.
Comment 9 Corey Marthaler 2005-05-26 16:04:30 UTC
I saw this message (on the two nodes which were not shot) once the deadlock was resolved: May 26 08:41:57 tank-01 kernel: dlm: clvmd: nodes_reconfig failed -1 May 26 08:21:31 tank-04 kernel: dlm: clvmd: nodes_reconfig failed 1
Comment 10 Corey Marthaler 2005-05-26 19:47:12 UTC
Hit this again today, this time however, power cycling the hung nodes individually didn't result in the others un-hanging. Not until I power cycled everyone in the whole cluster did they all come up just fine.
Comment 11 David Teigland 2005-05-27 15:52:13 UTC
I can reproduce this on my bench machines using the test in comment 7. We have collected a lot of information and have narrowed down the problem (it appears) to missing dlm socket connections in the dlm communications layer. We're hoping the latest tcpdump logs will explain why the connections are being lost or are not being reestablished
Comment 12 Christine Caulfield 2005-05-31 15:58:27 UTC
The connect retry count was not getting cleared...ever. So a node only had 3 chances of a failed connect and after that it would refuse to talk a particular node. When two nodes get into this state with each other there is a rather obvious deadlock. Because lowcomms connections can be initiated by either node, if this happens on one node only then the DLM will run quite happily. That explains (I think) why this is fairly rare but can be provoked by lots of connect/disconnects.
Comment 13 Corey Marthaler 2005-06-01 18:51:31 UTC
There is still a hang in the clvmd init scripts but it's no longer in 'vgchange', should I open a new bug or is this the same problem, but moved to a different place? Starting clvmd: + daemon clvmd + local gotbase= force= + local base= user= nice= bg= pid= + nicelevel=0 + '[' clvmd '!=' clvmd ']' + '[' -z '' ']' + base=clvmd + '[' -f /var/run/clvmd.pid ']' + '[' -n '' -a -z '' ']' + ulimit -S -c 0 + '[' -n '' ']' + '[' serial = verbose -a -z '' ']' + '[' -z '' ']' + initlog -q -c clvmd [HANG]
Comment 14 Christine Caulfield 2005-06-02 07:57:24 UTC
Impossible to tell from that. Can you get the full debug output again please ? /proc/cluster/services /proc/cluster/dlm_debug netstat -ta from all nodes.
Comment 15 Christine Caulfield 2005-06-02 10:09:25 UTC
Is this on SMP? I've commited an SMP race fix to FC4 & CVS head that might be relevant.
Comment 16 Christine Caulfield 2005-06-02 16:10:16 UTC
OK, I've checked in that fix. See you again tomorrow ...
Comment 17 Adam "mantis" Manthei 2005-06-06 20:19:50 UTC
The summary of this bug is a little misleading. The bug is not in the init scripts but clvmd itself, unless the implication is that clvmd is misbehaving with initlog(). Some rather basic cases ought to be able to narrow down where the bug is (e.g. remove initlog from the init script or write a controlled test to identify the problem). Please update the summary to something that more acurately reflects the nature of the bug.
Comment 18 Corey Marthaler 2005-06-06 21:05:02 UTC
I ran revolver all weekend on the 4 node tank cluster (tank-01, 03, 04, 05) without allowing quorum to be lost (only shooting one node at a time) and never saw any issues. This morning I restarted revolver so that quorum gets lost (three nodes shot each time) and after 40 iterations, I saw the hang on all three nodes shot. Two of the nodes were stuck starting clvmd and one was stuck doing a vgchange. CMAN on the the node left up reported that everyone was apart of the cluster. I then killed one of the hung nodes and that allowed the other two hung nodes to get past the deadlock and continue. The killed node then also came back up without problems.
Comment 19 Corey Marthaler 2005-06-07 14:00:47 UTC
Hit this last night on a three node gulm cluster. One slave was shot and got stuck coming back up while doing a vgchange. Apparently you do not need to lose quorum inorder for this to happen.
Comment 20 Christine Caulfield 2005-06-07 14:03:43 UTC
gulm! when did gulm appear in this bug report ?
Comment 21 Corey Marthaler 2005-06-07 14:57:05 UTC
closing this bug and opening new one for comments #18 and #19.