Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 227409 - no_path_retry = # acts like no_path_retry = queue
Summary: no_path_retry = # acts like no_path_retry = queue
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Ben Marzinski
QA Contact: Corey Marthaler
Depends On:
TreeView+ depends on / blocked
Reported: 2007-02-05 20:47 UTC by Jonathan Earl Brassow
Modified: 2010-01-12 02:28 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-04-27 16:17:09 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Jonathan Earl Brassow 2007-02-05 20:47:40 UTC
While testing mirroring on top of MP, we noticed that the I/Os were queued indefinitely by MP when we 
failed all the paths (took out the device).

For those that understand the MP device mapper table, if the problem is in userspace, you should be 
able to tell by the table...  Unfortunately, I don't have those table lines to add to this bugzilla...

BTW, this was with device-mapper-multipath-0.4.5-20.RHEL4.x86_64.rpm

Comment 1 Ben Marzinski 2007-02-06 01:15:36 UTC
Do you know if the table ever changed from something like:

mpath10: 0 102400000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:48 1000 

to something like:

mpath10: 0 102400000 multipath 0 0 1 1 round-robin 0 1 1 8:48 1000 

When a device is created with

no_path_retry <some number>

You will initially see "1 queue_if_no_path" in the table.  Once the device has
been failed for more than <no_path_retry> * <polling_interval> seconds, the "1
queue_if_no_path" should be replaced by "0" in the table. This works correctly
on my setup. When I interactively watch the paths being checked with

# multipathd -k
> show paths

There looks to be a printing error (it doesn't show the correct
polling_interval) on the path with queued IO, but it functions correctly. After
the correct amount of time, I get a message from multipathd that says

Feb  5 15:18:21 cypher-05 multipathd: mpath13: Disable queueing

and dmsetup table shows the correct value.

Aside from the table, you can run multipath -v3. There should be a line like:

no_path_retry = 10 (config file default)

for each device. You can see this even if the devices are already created when
you run the command. You can also run multipathd with

# multipathd -v6

This should display a lines that say

<map_name>: Retrying.. No active path

and then

<map_name>: Disable queueing

When the device finally fails the IO.  Unfortunately, these print statements
don't tell you how many retrys you have left.

Comment 2 Jonathan Earl Brassow 2007-02-09 14:40:39 UTC
I didn't originally understand that it was no_path_retry * polling_interval.  
Our values are 5 and 30 respectively.  After waiting 5 minutes... the paths 
never errored the I/O.

[root@clx12ah01 ~]# rpm -q kernel device-mapper-multipath device-mapper udev

Comment 3 Jonathan Earl Brassow 2007-02-09 14:45:38 UTC

[root@clx12ah01 ~]# rpm -q kernel-smp device-mapper-multipath device-mapper 

Comment 4 Jonathan Earl Brassow 2007-04-18 19:56:21 UTC
I am hitting this now and it is causing pain for HA LVM.

I am able to reproduce at will.

Comment 7 Jonathan Earl Brassow 2007-04-27 16:17:09 UTC
multipathd was not running.  As a result, the device-mapper mapping table was
not being properly updated.

Note You need to log in before you can comment on or make changes to this bug.