Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1512120 - Mountpoints remain on compute nodes after cinder volumes deleted.
Summary: Mountpoints remain on compute nodes after cinder volumes deleted.
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: async
: 10.0 (Newton)
Assignee: Eric Harney
QA Contact: Avi Avraham
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-10 21:42 UTC by Siggy Sigwald
Modified: 2018-04-17 16:50 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-17 16:50:27 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 383859 None None None 2017-11-13 14:36:30 UTC

Description Siggy Sigwald 2017-11-10 21:42:31 UTC
Description of problem:
We have observed this behaviour on RHOSP 10 system, where we are using NexentaStor5 for iSCSI and NFS storage backend.
We are running stability tests on the system, where the following scenario is repeated continuously:
1. Create a volume (ISCSI or NFS) on NexentaStor backend
2. Create a VM
3. Attach the volume to the VM
4. Write-read data into the volume from the VM
5. Detach the volume from the VM
6. Delete the volume
7. Delete the VM

After running such tests, we observe that the mountpoints are not deleted on compute nodes when the volume is deleted. For example: in /var/lib/nova/mnt/ we have around 3000 mountpoints, some of them are stale.
The same is observed with the iSCSI mounts.

We expect that a mountpoint will be deleted once the volume is detached and deleted. If this does not happen, the number of mountpoints will grow with the time and create a problem. 

Version-Release number of selected component (if applicable):
openstack-cinder-9.1.4-9.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a volume (ISCSI or NFS) on NexentaStor backend
2. Create a VM
3. Attach the volume to the VM
4. Write-read data into the volume from the VM
5. Detach the volume from the VM
6. Delete the volume
7. Delete the VM

Actual results:

[root@overcloud-compute-0 nova]# ls -la /var/lib/nova/mnt/ | grep -c nova
ls: cannot access /var/lib/nova/mnt/faa5d9dea5474679e52e3f28d696dbe6: Stale file handle
ls: cannot access /var/lib/nova/mnt/8726c28bfaa49d97fa008fbe46b6e2f6: Stale file handle
ls: cannot access /var/lib/nova/mnt/569266f310a72e2b0379e725f75b1410: Stale file handle
ls: cannot access /var/lib/nova/mnt/5c0a19f4aa412ceb658a268e5e692cec: Stale file handle
ls: cannot access /var/lib/nova/mnt/66ca361d24ef9a111d1ad8031d211db7: Stale file handle

Comment 7 vfalana 2017-11-27 15:04:06 UTC

Hello,

The customer has reached out to me requesting this to be treated with urgency as they have a hard timeline on this issue.

Best regards,
Victor Falana, RHCE
Technical Account Manager
Red Hat Inc.

Comment 8 Gorka Eguileor 2017-11-27 15:18:04 UTC
There's a possibility that this has been fixed in latest package, please try with latest package version of Nova and Cinder.

Comment 9 Siggy Sigwald 2017-11-27 19:26:41 UTC
Request was relayed to customer and we're currently waiting on them to upgrade to the latest version available of those packages. 
Thanks i'll let you know how it goes.

Comment 10 Siggy Sigwald 2017-11-29 00:03:49 UTC
From customer:
We have a fresh RHOSP 10 deployment with Director build last week, and I would like to test the NFS mount fix on that system, but I would need to be sure that the needed rpm versions are on that system.

Which component / rpm contains the fix for the NFS mount problem?

Here are the nova versio from the new system:

openstack-nova-common-14.0.7-11.el7ost.noarch

Comment 11 Alan Bishop 2017-12-07 15:41:58 UTC
While mindful of Matthew's comment #4, where he questions whether the nova fix is relevant, I determined the change Eric mentioned in comment #2 is available in openstack-nova-common-14.0.9-1.el7ost, which looks will be available in OSP-10z7 (it is not in z6).

Comment 17 Gorka Eguileor 2017-12-20 18:44:24 UTC
Looking at the iSCSI issues it seems like the problem is that the backend is detaching the volumes.

We see the volume getting attached:

  Oct 13 15:28:15 overcloud-compute-0 kernel: sd 121:0:0:7: Attached scsi generic sg6 type 0
  Oct 13 15:28:15 overcloud-compute-0 kernel: sd 121:0:0:7: [sdg] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)

And a little bit later the volume is detached from the storage array side:

  Oct 13 15:29:27 overcloud-compute-0 kernel: sd 121:0:0:7: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical
  Oct 13 15:29:27 overcloud-compute-0 kernel: sd 121:0:0:7: alua: Detached

Which generates a inotify failure 

  Oct 13 15:29:27 overcloud-compute-0 systemd-udevd: inotify_add_watch(7, /dev/sdg, 10) failed: No such file or directory

And then libvirt reports that the device is not present

  Oct 13 15:29:28 overcloud-compute-0 libvirtd: 2017-10-13 15:29:28.643+0000: 4188: error : qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk1' not found
  Oct 13 15:29:29 overcloud-compute-0 libvirtd: 2017-10-13 15:29:29.018+0000: 4188: error : qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk2' not found


I would recommend setting up multipathing in all the nodes (installing and setting up multipath device mapper and configure cinder and nova).

I would try attaching just 1 volume to an instance via multipathing to confirm it works and I would confirm that the multipathing priority groups are automatically set to "alua" if they are not I would go into the multipathd configuration file and forcefully set them up in there.

I believe that should help with the iSCSI case.

For the NFS case we are going to try to reproduce in house, but it would be good to have new sos reports with suggested changes and with DEBUG log levels after cleaning up /var/lib/cinder/mnt and /var/lib/nova/mnt directories of stale file handles.

As a side note I would not recommend going with the default values of "nas_secure_file_operations" as well as "nas_secure_file_permissions" to production.  For more information in this regard refer to https://access.redhat.com/solutions/3009341.

Comment 18 Paul Grist 2017-12-21 14:13:38 UTC
Just an update on the repro attempts, so far we have not been able to reproduce this on a different NFS backend, continuing now to an older Nexenta we have in house.

Comment 21 Paul Grist 2018-01-12 14:18:25 UTC
We haven't heard status updates on this one, but we can not reproduce it or debug it in house.

Adding a Nexenta contact from a different issue we are working on to see if there are any known issues for NexentaStor5 NFS - this is OSP-10 which is Newton.

Alexy, based on some other driver updates we are working with you, I just wanted to see if you were aware of any NFS issues for NexentaStor5 in Newton or had any feedback on this one.  I expect Nexenta support has been already been contacted, but I thought it was worth checking here too.

Thanks,
Paul


Note You need to log in before you can comment on or make changes to this bug.