Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1362215 - unable to have multiple pods to access a rbd volume
Summary: unable to have multiple pods to access a rbd volume
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.5.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Bradley Childs
QA Contact: Liming Zhou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-01 14:52 UTC by Qian Cai
Modified: 2017-03-20 02:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-20 02:49:40 UTC


Attachments (Terms of Use)
kubelet.log with log level 5 (deleted)
2016-08-02 14:12 UTC, Qian Cai
no flags Details

Description Qian Cai 2016-08-01 14:52:36 UTC
Description of problem:
If for example, a pod is started directly with a rbd volume even with the readonly mode, the 2nd pod directly with the same rbd volume will be stuck in the pending status forever.

$ cat cephrbd-pod-direct.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: cephrbd-pod-direct-rhel
spec:
  nodeSelector:
    os: rhel
  containers:
    - image: fedora/nginx
      name: cephrbd-direct-rhel
      volumeMounts:
        - name: cephrbd-vol-direct
          mountPath: /mnt/cephrbd-direct
  volumes:
    - name: cephrbd-vol-direct
      rbd:
         monitors: ['xx.xx.xx.xx:6789']
         pool: rbd
         image: foo
         user: rbd
         secretRef:
           name: "cephrbd-secret"
         keyring: ''
         fsType: xfs
         readOnly: true

$ cat cephrbd-pod-direct-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: cephrbd-pod-direct-2-rhel
spec:
  nodeSelector:
    os: rhel
  containers:
    - image: fedora/nginx
      name: cephrbd-direct-2-rhel
      volumeMounts:
        - name: cephrbd-vol-direct-2
          mountPath: /mnt/cephrbd-direct-2
  volumes:
    - name: cephrbd-vol-direct-2
      rbd:
         monitors: ['xx.xx.xx.xx:6789']
         pool: rbd
         image: foo
         user: rbd
         secretRef:
           name: "cephrbd-secret"
         keyring: ''
         fsType: xfs
         readOnly: true

$ watch kubectl get pods
Every 2.0s: kubectl get pods                            Mon Aug  1 10:51:17 2016

NAME                        READY     STATUS    RESTARTS   AGE
cephrbd-pod-direct-2-rhel   0/1       Pending   0          3m
cephrbd-pod-direct-rhel     1/1       Running   0          4m

Version-Release number of selected component (if applicable):
kubernetes-1.2.0-0.12.gita4463d9.el7.x86_64
docker-1.10.3-44.el7.x86_64

How reproducible:
always

Comment 2 Jan Safranek 2016-08-02 07:10:38 UTC
I am not sure RBD supports one volume to be attached to multiple nodes, it's pretty advanced feature.

Can you please post output of kubectl describe pods? And kubelet.log with level 5 from nodes where both pods are scheduled?

Comment 3 hchen 2016-08-02 13:25:04 UTC
For RBD, we only allow one read-write mount.

Comment 4 Qian Cai 2016-08-02 13:59:49 UTC
$ kubectl describe pod cephrbd-pod-direct-rhel
Name:		cephrbd-pod-direct-rhel
Namespace:	default
Node:		rhel-k8s-storage-node.os1.phx2.redhat.com/10.3.10.21
Start Time:	Mon, 01 Aug 2016 10:47:03 -0400
Labels:		<none>
Status:		Running
IP:		172.17.0.2
Controllers:	<none>
Containers:
  cephrbd-direct-rhel:
    Container ID:	docker://8b5d1e00ef52e72ea543e86d1b9e0f203d248f413a02e81901b9d3e99b722848
    Image:		fedora/nginx
    Image ID:		docker://sha256:ff0f232bb1e3236f6bd36564baf14bf726d48677edea569440c868316a528d9d
    Port:		
    QoS Tier:
      cpu:		BestEffort
      memory:		BestEffort
    State:		Running
      Started:		Mon, 01 Aug 2016 10:47:29 -0400
    Ready:		True
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Ready 	True 
Volumes:
  cephrbd-vol-direct:
    Type:		RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
    CephMonitors:	[xx.xx.xx.xx:6789]
    RBDImage:		foo
    FSType:		xfs
    RBDPool:		rbd
    RadosUser:		rbd
    Keyring:		
    SecretRef:		&{cephrbd-secret}
    ReadOnly:		true
No events.

$ kubectl describe pod cephrbd-pod-direct-2-rhel
Name:		cephrbd-pod-direct-2-rhel
Namespace:	default
Node:		/
Labels:		<none>
Status:		Pending
IP:		
Controllers:	<none>
Containers:
  cephrbd-direct-2-rhel:
    Image:	fedora/nginx
    Port:	
    QoS Tier:
      cpu:	BestEffort
      memory:	BestEffort
    Environment Variables:
Volumes:
  cephrbd-vol-direct-2:
    Type:		RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
    CephMonitors:	[10.18.41.160:6789]
    RBDImage:		foo
    FSType:		xfs
    RBDPool:		rbd
    RadosUser:		rbd
    Keyring:		
    SecretRef:		&{cephrbd-secret}
    ReadOnly:		true
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  23h		3m		1141	{default-scheduler }			Warning		FailedScheduling	pod (cephrbd-pod-direct-2-rhel) failed to fit in any node
fit failure on node (rhel-k8s-storage-node.os1.phx2.redhat.com): NoDiskConflict
fit failure on node (atomic-k8s-storage-node.os1.phx2.redhat.com): MatchNodeSelector

  23h	51s	3605	{default-scheduler }		Warning	FailedSchedulingpod (cephrbd-pod-direct-2-rhel) failed to fit in any node
fit failure on node (atomic-k8s-storage-node.os1.phx2.redhat.com): MatchNodeSelector
fit failure on node (rhel-k8s-storage-node.os1.phx2.redhat.com): NoDiskConflict

Comment 5 Qian Cai 2016-08-02 14:08:01 UTC
Here is the kubelet journals with log level 5. It is just looping on those below.

Aug 02 10:07:20 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:20.401654   18896 generic.go:182] GenericPLEG: Relisting
Aug 02 10:07:20 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:20.512179   18896 kubelet.go:2465] SyncLoop (housekeeping)
Aug 02 10:07:20 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:20.519862   18896 volumes.go:234] Making a volume.Cleaner for volume kubernetes.io~rbd/cephrbd-vol-direct of pod 150b8956-58ba-11e6-9506-fa163e07a2df
Aug 02 10:07:20 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:20.519940   18896 volumes.go:316] Used volume plugin "kubernetes.io/rbd" to unmount 150b8956-58ba-11e6-9506-fa163e07a2df/kubernetes.io~rbd
Aug 02 10:07:21 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:21.405584   18896 generic.go:182] GenericPLEG: Relisting
Aug 02 10:07:22 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:22.408992   18896 generic.go:182] GenericPLEG: Relisting
Aug 02 10:07:22 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:22.512125   18896 kubelet.go:2465] SyncLoop (housekeeping)
Aug 02 10:07:22 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:22.522141   18896 volumes.go:234] Making a volume.Cleaner for volume kubernetes.io~rbd/cephrbd-vol-direct of pod 150b8956-58ba-11e6-9506-fa163e07a2df
Aug 02 10:07:22 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:22.522355   18896 volumes.go:316] Used volume plugin "kubernetes.io/rbd" to unmount 150b8956-58ba-11e6-9506-fa163e07a2df/kubernetes.io~rbd
Aug 02 10:07:23 rhel-k8s-storage-node.os1.phx2.redhat.com kubelet[18896]: I0802 10:07:23.414140   18896 generic.go:182] GenericPLEG: Relisting

Comment 6 Qian Cai 2016-08-02 14:12:49 UTC
Created attachment 1186849 [details]
kubelet.log with log level 5

Comment 7 Qian Cai 2016-08-02 18:15:08 UTC
(In reply to hchen from comment #3)
> For RBD, we only allow one read-write mount.
Well, all the mounts are readonly above.
readOnly: true

Comment 8 hchen 2016-08-02 18:25:06 UTC
the fix is proposed to upstream 
https://github.com/kubernetes/kubernetes/pull/29622

Comment 9 Qian Cai 2017-01-26 14:47:13 UTC
This one is now merged.

Comment 11 Liming Zhou 2017-03-20 02:49:40 UTC
Can not reproduce the issues with following test env:
OCP 3.5.0.54
RHEL: 7.2.7
Storage: Ceph RBD
Containerized installation.
The 2 pods created during the test can both running well with readyonly access the rbd volume. Bug can be closed.


Note You need to log in before you can comment on or make changes to this bug.