Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1520008 - [downstream clone - 4.3.0] VDSM should recover from stale NFS storage domain
Summary: [downstream clone - 4.3.0] VDSM should recover from stale NFS storage domain
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-4.3.1
: 4.3.0
Assignee: Nir Soffer
QA Contact: Elad
URL:
Whiteboard:
Depends On: 1165632
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-01 22:32 UTC by rhev-integ
Modified: 2019-02-19 11:31 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1165632
Environment:
Last Closed: 2019-02-18 09:34:58 UTC
oVirt Team: Storage


Attachments (Terms of Use)

Description rhev-integ 2017-12-01 22:32:04 UTC
+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1165632 +++
======================================================================

Description of problem:

Activation of a NFS export domain fails due to a stale file handle :

Thread-4374511::DEBUG::2014-11-19 06:46:10,161::BindingXMLRPC::177::vds::(wrapper) client [10.33.20.2] flowID [2e9c1f6]
Thread-4374511::DEBUG::2014-11-19 06:46:10,161::task::579::TaskManager.Task::(_updateState) Task=`f343a7bf-35a4-4a9b-b99c-028a37910b69`::moving from state init -> state preparing
Thread-4374511::INFO::2014-11-19 06:46:10,162::logUtils::44::dispatcher::(wrapper) Run and protect: activateStorageDomain(sdUUID='e5d713a1-1c28-46ea-b859-27db25929b1a', spUUID='dab6c34c-51a9-4e02-92de-4489a307ce17', options=None)
[..]
Thread-4374511::ERROR::2014-11-19 06:46:10,172::sdc::143::Storage.StorageDomainCache::(_findDomain) domain e5d713a1-1c28-46ea-b859-27db25929b1a not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 132, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomainPath
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('e5d713a1-1c28-46ea-b859-27db25929b1a',)
Thread-4374511::ERROR::2014-11-19 06:46:10,172::task::850::TaskManager.Task::(_setError) Task=`f343a7bf-35a4-4a9b-b99c-028a37910b69`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1242, in activateStorageDomain
    pool.activateSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1108, in activateSD
    dom = sdCache.produce(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 132, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomainPath
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('e5d713a1-1c28-46ea-b859-27db25929b1a',)

root@spm # ll /rhev/data-center/mnt/10.33.20.152:_mnt_export
ls: cannot access /rhev/data-center/mnt/10.33.20.152:_mnt_export: Stale file handle

Version-Release number of selected component (if applicable):
vdsm-4.13.2-0.9.el6ev.x86_64
vdsm-python-4.13.2-0.9.el6ev.x86_64
vdsm-xmlrpc-4.13.2-0.9.el6ev.noarch
vdsm-cli-4.13.2-0.9.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1. Cause the NFS mount to be become stale for a deactived NFS domain.
2. Attempt to reactivate the domain.

Actual results:
Domain activation fails due to the stale file handle.

Expected results:
VDSM attempts to remount the domain, allowing activation to continue.

Additional info:
Obviously, manually remounting the domain worksaround the issue.

# unmount /rhev/data-center/mnt/10.33.20.152:_mnt_export
# mount 10.33.20.152:/mnt/export /rhev/data-center/mnt/10.33.20.152:_mnt_export

(Originally by Lee Yarwood)

Comment 5 rhev-integ 2017-12-01 22:32:23 UTC
Tentatively targeting for 3.5.1 until we have an RCA. Once that's achieved, we can retarget.

(Originally by Allon Mureinik)

Comment 6 rhev-integ 2017-12-01 22:32:30 UTC
Removing for 3.6.0 since doesn't seem urgent.
Allon, what is the plan to fix this issue? Seems logical failure.

(Originally by ylavi)

Comment 7 rhev-integ 2017-12-01 22:32:36 UTC
(In reply to Yaniv Dary from comment #5)
> Removing for 3.6.0 since doesn't seem urgent.
> Allon, what is the plan to fix this issue? Seems logical failure.
No RCA, no plan.
Once we have one, we'll have the other too.

(Originally by Allon Mureinik)

Comment 10 rhev-integ 2017-12-01 22:32:55 UTC
Seems like a bug more than a RFE. Changing to reflect that.

(Originally by ylavi)

Comment 11 rhev-integ 2017-12-01 22:33:00 UTC
*** Bug 1411795 has been marked as a duplicate of this bug. ***

(Originally by Yaniv Kaul)

Comment 14 Sandro Bonazzola 2019-01-28 09:43:40 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 16 Martin Tessun 2019-02-18 09:34:58 UTC
Solving the stale file should be done by admin, checking what is the source of the issue and not automatically by Vdsm


Note You need to log in before you can comment on or make changes to this bug.