Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 990623 - corosync does not notice network down
Summary: corosync does not notice network down
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
Depends On:
Blocks: 991412
TreeView+ depends on / blocked
Reported: 2013-07-31 14:59 UTC by michal novacek
Modified: 2013-08-05 08:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 991412 (view as bug list)
Last Closed: 2013-08-05 08:14:46 UTC
Target Upstream Version:

Attachments (Terms of Use)
relevant part of corosync log. (deleted)
2013-07-31 14:59 UTC, michal novacek
no flags Details

Description michal novacek 2013-07-31 14:59:32 UTC
Created attachment 781194 [details]
relevant part of corosync log.

Description of problem:
I have a pacemaker+corosync cluster of 3 virtual nodes with no services. I run
at the same time on all of them script dropping all communication on INPUT ond
OUTPUT chains except ssh. Pacemaker never notice that network is unreacheable
and reports all the nodes online and cluster quorate. Corosync does seem to
notice though.

Version-Release number of selected component (if applicable):

How reproducible: always

Steps to Reproduce:
1. setup corosync+pacemaker cluster
2. add DROP rule to INPUT and OUTPUT chains in iptables on all nodes at the
same time.

Actual results: pacemaker would not notice anything wrong.

Expected results: pacemaker noticing and each node forms an inquorate island

Additional info:
Node in the cluster:

# pcs status
Last updated: Wed Jul 31 16:43:25 2013
Last change: Wed Jul 31 15:09:27 2013 via crm_resource on virt-064
Stack: cman
Current DC: virt-070 - partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, unknown expected votes
1 Resources configured.

Online: [ virt-064 virt-065 virt-070 ]

Full list of resources:

 virt-fencing   (stonith:fence_xvm):    Stopped 

# iptables-save
# Generated by iptables-save v1.4.7 on Wed Jul 31 16:50:16 2013
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT 
-A OUTPUT -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT 
# Completed on Wed Jul 31 16:50:16 2013

# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster config_version="1" name="STSRHTS7932">
        <totem token="3000"/>
        <fence_daemon clean_start="0" post_join_delay="20"/>
                <clusternode name="virt-064" nodeid="1" votes="1">
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-064"/>
                <clusternode name="virt-065" nodeid="2" votes="1">
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-065"/>
                <clusternode name="virt-070" nodeid="3" votes="1">
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-070"/>
                <fencedevice agent="fence_pcmk" name="pcmk"/>

There is a lot of "Totem is unable to form a cluster because of an operating
system or network fault. The most common cause of this message is that the
local firewall is configured improperly." messages in corosync.log so corosync
seems to know about the problem.

Comment 2 michal novacek 2013-08-02 11:38:45 UTC
The same behaviour happens with having -j DROP on INPUT only.

Comment 3 Jan Friesse 2013-08-05 08:14:46 UTC
Filtering only INPUT (or OUTPUT) by iptables is unsupported. Also at least localhost must not be filtered otherwise corosync doesn't work and will be unable to create single node membership. This is configuration error, not a bug.

Note You need to log in before you can comment on or make changes to this bug.