Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 593010 - [NetApp 4.9 bug] DM-Multipath fails to update maps after path state transition on ALUA enabled setups
Summary: [NetApp 4.9 bug] DM-Multipath fails to update maps after path state transitio...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.9
Hardware: All
OS: Linux
high
high
Target Milestone: beta
: 4.9
Assignee: Ben Marzinski
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 566685
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-17 15:31 UTC by Martin George
Modified: 2011-01-05 07:16 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, even though some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command multipathd -k"reconfigure"
Clone Of: 566685
Environment:
Last Closed: 2010-07-22 22:36:34 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Martin George 2010-05-17 15:31:09 UTC
+++ This bug was initially created as a clone of Bug #566685 +++

Description of problem:
On ALUA enabled setups, it is seen that dm-multipath fails to update its maps after a path state transition (when an active/optimized path transitions to an active/non-optimized path & vice versa).

Version-Release number of selected component (if applicable):
RHEL 5.4 GA (2.6.18-164.el5)
device-mapper-multipath-0.4.7-30.el5
iscsi-initiator-utils-6.2.0.871-0.10.el5 
ALUA settings are used in the multipath.conf - the ALUA priority callout (/sbin/mpath_prio_alua) is used with group-by-prio enabled along with the ALUA hardware handler (as per bug 562080).

How reproducible:
Always

Steps to Reproduce:
1. Map an iSCSI lun (with ALUA enabled) to a RHEL 5.4 host. In this case, I have 1 active/optimized paths + 4 active/non-optimized paths to the lun. Configure dm-multipath on it as follows:

# multipath -ll
mpath1 (360a98000572d42746b4a555039386553) dm-3 NETAPP,LUN
[size=2.0G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][enabled]
\_ 11:0:0:1 sdk 8:160 [active][ready]
\_ round-robin 0 [prio=40][enabled]
\_ 7:0:0:1  sdg 8:96  [active][ready]
\_ 8:0:0:1  sdh 8:112 [active][ready]
\_ 9:0:0:1  sdi 8:128 [active][ready]
\_ 10:0:0:1 sdj 8:144 [active][ready]

The individual path priority weights & RTPGs are as follows:
# /sbin/mpath_prio_alua -v /dev/sdk
Target port groups are implicitly supported.
Reported target port group is 4 [active/optimized]
50
# /sbin/mpath_prio_alua -v /dev/sdg
Target port groups are implicitly supported.
Reported target port group is 2 [active/non-optimized]
10
# /sbin/mpath_prio_alua -v /dev/sdh
Target port groups are implicitly supported.
Reported target port group is 1 [active/non-optimized]
10
# /sbin/mpath_prio_alua -v /dev/sdi
Target port groups are implicitly supported.
Reported target port group is 3 [active/non-optimized]
10
# /sbin/mpath_prio_alua -v /dev/sdj
Target port groups are implicitly supported.
Reported target port group is 1 [active/non-optimized]
10 

2) Now run IO on the above multipath device. IOstats show up as:

Before path state transition:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         48.25    0.00   51.75    0.00    0.00    0.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdg               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdk             614.50         0.00     17184.00          0      34368
sdj               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         28.32    0.00   43.86   17.29    0.00   10.53

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdg               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdk            5826.37         0.00    204573.13          0     411192
sdj               0.00         0.00         0.00          0          0

IO is running fine till here.

3) Now trigger a path state transition on the target storage array. In this case, the active/optimized path transitions to RTPG 2 i.e. sdg. And the original active/optimized path in RTPG 4 i.e. sdk, transitions to an active/non-optimized path as shown below:

# /sbin/mpath_prio_alua -v /dev/sdk
Target port groups are implicitly supported.
Reported target port group is 4 [active/non-optimized]
10
# /sbin/mpath_prio_alua -v /dev/sdg
Target port groups are implicitly supported.
Reported target port group is 2 [active/optimized]
50
# /sbin/mpath_prio_alua -v /dev/sdh
Target port groups are implicitly supported.
Reported target port group is 1 [active/non-optimized]
10
# /sbin/mpath_prio_alua -v /dev/sdi
Target port groups are implicitly supported.
Reported target port group is 3 [active/non-optimized]
10
# /sbin/mpath_prio_alua -v /dev/sdj
Target port groups are implicitly supported.
Reported target port group is 1 [active/non-optimized] 
10

But the IOstats now show up as follows:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          3.99    0.00   12.22   83.29    0.00    0.50

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdg             725.50         0.00     23552.00          0      47104
sdi             736.00         0.00     25656.00          0      51312
sdh             639.50         0.00     23552.00          0      47104
sdk               0.00         0.00         0.00          0          0
sdj             752.00         0.00     26992.00          0      53984

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         31.58    0.00   40.60   27.57    0.00    0.25

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdg             619.00         0.00     22016.00          0      44032
sdi             702.50         0.00     22016.00          0      44032
sdh             672.50         0.00     22016.00          0      44032
sdk               0.00         0.00         0.00          0          0
sdj             694.00         0.00     22168.00          0      44336 

And multipath -ll shows up as:

# multipath -ll
mpath1 (360a98000572d42746b4a555039386553) dm-3 NETAPP,LUN
[size=2.0G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=10][enabled]
\_ 11:0:0:1 sdk 8:160 [active][ready]
\_ round-robin 0 [prio=80][active]
\_ 7:0:0:1  sdg 8:96  [active][ready]
\_ 8:0:0:1  sdh 8:112 [active][ready]
\_ 9:0:0:1  sdi 8:128 [active][ready]
\_ 10:0:0:1 sdj 8:144 [active][ready]

Obviously the multipath path groups are messed up. IO is now running through all the underlying devices of the 2nd path group i.e. sdg, sdh, sdi & sdj, whereas it should have been actually running on sdg alone (since that is the only active/optimized path available now).
  
Actual results:
IO is running on all underlying paths of the 2nd path group, after path state transition.

Expected results:
IO should have been running on the active/optimized path alone, after path state transition.

Additional info:
Restarting the multipathd daemon or running multipathd -k"reconfigure" properly reconfigures the multipath maps. But this should have been automatically handled by dm-multipath.

--- Additional comment from marting@netapp.com on 2010-02-19 07:27:11 EST ---

Created an attachment (id=395088)
Multipath.conf for the above scenario

Comment 1 Martin George 2010-05-17 15:31:35 UTC
Tracking this for RHEL 4.9.

Comment 2 Ben Marzinski 2010-05-25 23:32:54 UTC
I'm not sure how much more work this is to do in RHEL4 than RHEL5, but I'll take a look.

Comment 4 Ben Marzinski 2010-07-16 21:54:40 UTC
There's a significant amount more work necessary to get this working properly in RHEL4 than RHEL5.

Comment 5 Tom Coughlan 2010-07-22 22:36:34 UTC
This BZ requests a significant change to the established RHEL 4 behavior. Currently in RHEL 4, the path groups are established when the device is configured, and they remain unchanged until an explicit reconfiguration is done. If we make this change, then path groups will change dynamically when the storage configuration changes. Although it is true that this is generally desirable, it is a change in behavior that may come as a surprise to existing users. This sort of change is not appropriate at this very advanced stage in the life of RHEL 4. (We are, by the way, still planning to deal with this in RHEL5.)  

The right way to handle this in RHEL 4 is to document the workaround (multipathd -k"reconfigure") in the Release Notes.

Comment 6 Tom Coughlan 2010-07-22 22:53:50 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Proposed RHEL 4.9 Release Note:

(Ben, please review.) 

When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, eventhough some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command   

multipathd -k"reconfigure"

Comment 8 Ryan Lerch 2011-01-05 07:15:43 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1,3 @@
-Proposed RHEL 4.9 Release Note:
-
-(Ben, please review.) 
-
-When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, eventhough some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command   
+When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, even though some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command   
 
 multipathd -k"reconfigure"


Note You need to log in before you can comment on or make changes to this bug.