Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1354439 - nfs client I/O stuck post IP failover
Summary: nfs client I/O stuck post IP failover
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: common-ha
Version: mainline
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On: 1302545 1303037
Blocks: 1330218 1278336 1363722
TreeView+ depends on / blocked
 
Reported: 2016-07-11 10:19 UTC by Soumya Koduri
Modified: 2017-03-27 18:24 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.9.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1278336
: 1363722 (view as bug list)
Environment:
Last Closed: 2017-03-27 18:24:03 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
continuous-io.sh (deleted)
2016-08-02 06:21 UTC, Soumya Koduri
no flags Details
portblock_test.sh (deleted)
2016-08-02 06:21 UTC, Soumya Koduri
no flags Details
test_results_withfix (deleted)
2016-08-02 06:23 UTC, Soumya Koduri
no flags Details
test_results_withoutfix (deleted)
2016-08-02 06:53 UTC, Soumya Koduri
no flags Details

Description Soumya Koduri 2016-07-11 10:19:39 UTC
+++ This bug was initially created as a clone of Bug #1278336 +++

Description of problem:

While testing nfs-ganesha HA IP failover/failback cases, we have noticed that the client I/O gets stuck sometimes.

Version-Release number of selected component (if applicable):
RHGS 3.1

How reproducible:
Not always


Actual results:

Client I/O gets stuck

Expected results:

Client I/O should resume post IP failover.

Additional info:
I am attaching pkt trace taken from the client side. I see many TCP re-transmission requests post failover. Need to debug that.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-11-05 05:03:57 EST ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Soumya Koduri on 2015-11-05 05:12:58 EST ---

The I/O resumes post failback though. Shall attach pkt traces in both the cases.

--- Additional comment from Soumya Koduri on 2015-11-05 05:13:23 EST ---

Setup details: 
dhcp3-238.gsslab.pnq.redhat.com, dhcp3-234.gsslab.pnq.redhat.com (root/root123)

--- Additional comment from Soumya Koduri on 2015-11-06 02:43:56 EST ---

Root cause'd the problem. I can now consistently reproduce this issue. This problem happens during second consecutive fail-over of VIP to the same node- 

say 
* server1 has VIP1, server2 has VIP2
* client connected to VIP1/server1.
* Server1 has gone down, VIP1 moved to server2
* Client is now connected to VIP1/server2 
* Server1 comes back online. VIP1 moved back to server1
* Now suppose server1 goes down again, VIP1 is failed over back to server2.

Here is when the client I/O gets stuck. The issue is with the TCP connection now being reset
by the server2 during VIP failback. Still finding out how/where to get the fix. Shall update the bug.

--- Additional comment from Soumya Koduri on 2015-11-06 02:47:52 EST ---

The workaround for this issue is to restart the nfs-ganesha server on the server2. That shall reset the TCP connections.

--- Additional comment from Soumya Koduri on 2015-11-06 04:42:47 EST ---

Correction to my comment#4 above. This issue seems to happening after couple of failover and failback to the same node. Couple of times I have seen the node which has taken over the VIP sending PSH ACK or SYN ACK packets when client tries to re-establish TCP connection. But after couple of fail-over scenarios, that doesn't happen.

--- Additional comment from Soumya Koduri on 2015-11-10 05:34:34 EST ---

Have posted question to few technical mailing list to understand TCP behaviour. Meawhile as suggested by Niels, tried out pacemaker portblock resource agent to tickle few invalid TCP packets from the server which forces client to reset its connection and thus allowing I/O to continue.

Now need to check how we can plug in this new resouce agent into existing scripts.

Meanwhile as a workaround, whenever the client seem to be stuck post failover, create the below resource agent on the server machine hosting the VIP -

pcs resource create ganesha_portblock ocf:heartbeat:portblock protocol=tcp portno=2049 action=unblock ip=VIP reset_local_on_unblock_stop=on tickle_dir=/run/gluster/shared_storage/tickle_dir/

Post the I/O resume delete it -

pcs resource delete ganesha_portblock

--- Additional comment from Soumya Koduri on 2015-11-19 00:30:35 EST ---

We are checking with Networking experts internally on this peculiar TCP behaviour.

mail thread: http://post-office.corp.redhat.com/archives/tech-list/2015-November/msg00173.html

As mentioned in the https://bugzilla.redhat.com/show_bug.cgi?id=369991#c16 , this seems a well known issue with the repetitive failovers of NFS servers in the cluster. CTDB uses TCP tickle ACKs as a workaround/to overcome this issue. As mentioned in the above note, we shall try to use pacemaker portblock to achieve the similar behaviour.
Note: this resource agent is not yet packaged in RHEL downstream. So it may take sometime to package it separately. We shall discuss about the same with Cluster-suite team and update.

--- Additional comment from Niels de Vos on 2016-01-27 06:44:56 EST ---

Soumya, please open a bug against the resource-agents package to get portblock included.

--- Additional comment from Soumya Koduri on 2016-01-28 01:42:40 EST ---

Done. I have opened bug1302545

--- Additional comment from Jiffin on 2016-03-07 04:22:49 EST ---

fix for https://bugzilla.redhat.com/show_bug.cgi?id=1302545 got merged

Comment 1 Vijay Bellur 2016-07-11 10:28:01 UTC
REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock resource agents to tickle packets post failover(/back)) posted (#3) for review on master by soumya k (skoduri@redhat.com)

Comment 2 Vijay Bellur 2016-07-12 07:14:54 UTC
REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock resource agents to tickle packets post failover(/back)) posted (#4) for review on master by soumya k (skoduri@redhat.com)

Comment 3 Vijay Bellur 2016-07-18 09:51:47 UTC
REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock RA to tickle packets post failover(/back)) posted (#5) for review on master by soumya k (skoduri@redhat.com)

Comment 4 Vijay Bellur 2016-07-31 09:48:48 UTC
REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock RA to tickle packets post failover(/back)) posted (#6) for review on master by soumya k (skoduri@redhat.com)

Comment 5 Soumya Koduri 2016-08-02 06:21:03 UTC
Created attachment 1186659 [details]
continuous-io.sh

Script to continuously generate I/O on a v3 mount point.

Comment 6 Soumya Koduri 2016-08-02 06:21:53 UTC
Created attachment 1186660 [details]
portblock_test.sh

Script to do failovers and failback in a loop (for about 100 iterations) between two servers.

Comment 7 Soumya Koduri 2016-08-02 06:23:35 UTC
Created attachment 1186661 [details]
test_results_withfix

Test results with fix.

Comment 8 Soumya Koduri 2016-08-02 06:53:45 UTC
Created attachment 1186668 [details]
test_results_withoutfix

Test results without fix applied.

Comment 9 Soumya Koduri 2016-08-02 06:54:20 UTC
To verify the portblock RA introduced, below are the tests performed.

A 2-node nfs-ganesha HA setup is used.

On the client machine:
Attached 'continuous.sh' script is ran which continuously generates I/O on a v3 mount(since grace doesn't affect v3 clients) of VIPA configured on one of the servers.

portblock_test.sh - 
This script triggeres failover & failback between two nodes for about 100 iterations. After VIP is successfully
failed-over/failed-back, there is a sleep of 10sec for the I/O to continue for sometime.

That means if there is no I/O generated between two iterations, that resembles I/O getting stuck 

As can be seen from the test results attached (test_results_withoutfix), I/O got stuck in between few iterations without the fix 

Tue Aug 2 11:28:11 IST 2016
43.7 Tue Aug  2 11:27:27 IST 2016 - Loop4
Starting Failover from 10.70.43.7 Tue Aug  2 11:27:38 IST 2016 - Loop5
Completed Failover from 10.70.43.7 Tue Aug  2 11:28:00 IST 2016 - Loop5
Starting Failback to 10.70.43.7 Tue Aug  2 11:28:10 IST 2016 - Loop6
Tue Aug 2 11:28:11 IST 2016


But that wasn't the case with the fix applied (test_results_withfix)

Comment 10 Vijay Bellur 2016-08-03 11:43:42 UTC
REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock RA to tickle packets post failover(/back)) posted (#7) for review on master by soumya k (skoduri@redhat.com)

Comment 11 Vijay Bellur 2016-08-03 12:02:14 UTC
COMMIT: http://review.gluster.org/14878 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit ea6a1ebe931e49464eb17205b94f5c87765cf696
Author: Soumya Koduri <skoduri@redhat.com>
Date:   Fri Jul 8 12:30:25 2016 +0530

    commn-HA: Add portblock RA to tickle packets post failover(/back)
    
    Portblock resource-agents are used to send tickle ACKs so as to
    reset the oustanding tcp connections. This can be used to reduce
    the time taken by the NFS clients to reconnect post IP
    failover/failback.
    
    Two new resource agents (nfs_block and nfs_unblock) of type
    ocf:portblock with action block & unblock are created for each
    Virtual-IP (cluster_ip-1). These resource agents along with cluster_ip-1
    RA are grouped in the order of block->IP->unblock and also the entire
    group maintains same colocation rules so that they reside on the same
    node at any given point of time.
    
    The contents of tickle_dir are of the following format -
    * A file is created for each of the VIPs used in the ganesha cluster.
    * Each of those files contain entries about clients connected
      as below:
    SourceIP:port_num       DestinationIP:port_num
    
    Hence when one server failsover, connections of the clients connected
    to other VIPs are not affected.
    
    Note: During testing I observed that tickle ACKs are sent during
    failback but not during failover, though I/O successfully
    resumed post failover.
    
    Also added a dependency on portblock RA for glusterfs-ganesha package
    as it may not be available (as part of resource-agents package) in
    all the distributions.
    
    Change-Id: Icad6169449535f210d9abe302c2a6971a0a96d6f
    BUG: 1354439
    Signed-off-by: Soumya Koduri <skoduri@redhat.com>
    Reviewed-on: http://review.gluster.org/14878
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>

Comment 12 Shyamsundar 2017-03-27 18:24:03 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.