Deploying postgres databasa with csi volumes
Hello I am trying to create postgres database in its own namespace and attach PersistentVolume on it.
I created a cluster with LKE so I have csi driver already installed.
Also secret postgres-credentials is also created.
This is my yaml file for database
# Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: postgres
name: postgres-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: linode-block-storage-retain
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: postgres
name: postgres-deployment
spec:
selector:
matchLabels:
app: postgres-container
template:
metadata:
labels:
app: postgres-container
spec:
containers:
- name: postgres-container
image: postgres:9.6.6
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: POSTGRES_USER
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-credentials
key: POSTGRES_DB
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: POSTGRES_PASSWORD
ports:
- containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgres-volume-mount
volumes:
- name: postgres-volume-mount
persistentVolumeClaim:
claimName: postgres-pvc
apiVersion: v1
kind: Service
metadata:
namespace: postgres
name: postgres-service
spec:
selector:
app: postgres-container
ports:
- port: 5432
protocol: TCP
targetPort: 5432
type: NodePort
When i go to my cloud linode dashboard I see volume is created and eveything seems fine.
This are some events in postgres pod that is showing me error.
MountVolume.MountDevice failed for volume "pvc-aa9e0765c2c74cb7" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcaa9e0765c2c74cb7 /dev/disk/by-id/scsi-0Linode_Volume_pvcaa9e0765c2c74cb7]
Unable to attach or mount volumes: unmounted volumes=[postgres-volume-mount], unattached volumes=[postgres-volume-mount default-token-mgvtv]: timed out waiting for the condition
17 Replies
Hey there -
This is a tough one, but I've come across this situation before so I want to give a couple of things to look into.
One thing to look for is to make sure that your Volume is only being mounted on a single container. If there are multiple containers attempting to mount it, the job would fail.
I've also seen this happen as a result of a syntax error. My recommendation is to go through your manifest to make sure everything is formatted correctly (no extra spaces, tabs, or anything like that).
I hope this helps!
Did you find a resolution to this? I'm having a similar problem at the moment:
AttachVolume.Attach succeeded for volume "pvc-a1b6aa5..."
MountVolume.MountDevice failed for volume "pvc-a1b6aa5..." : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvca1b6aa53... /dev/disk/by-id/scsi-0Linode_Volume_pvca1b6aa53...]
Did you find a resolution to this? I'm having a similar problem at the moment.
MountVolume.MountDevice failed for volume "pvc-d23fbce33cee4fa7" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcd23fbce33cee4fa7 /dev/disk/by-id/scsi-0Linode_Volume_pvcd23fbce33cee4fa7]
One thing to look for is to make sure that your Volume is only being mounted on a single container. If there are multiple containers attempting to mount it, the job would fail.
RWO volumes can be mounted by multiple pods, as long as they're on the same node, right?
I started seeing this error when migrating applications to a new node pool. In at least several cases, I was able to fix it by manually detaching and then reattaching the volume to the node via the https://cloud.linode.com/volumes UI. (Whether or not it was safe to do, I'm not sure.)
I have the same problem, using the exact same statements in the yaml files, are there any solutions for this ?
This will work, but you might need to wait for the first mount to fail, which can take 10 minutes.
Simply delete the VolumeAttachment object in Kubernetes, OR from the Linode Cloud Manager UI detach the volume. Then recreate the pod and be patient for around 10 minutes.
This is obviously not great if you're running a high volume production application, but in that case it's best not to run your database on Kubernetes.
Experiencing the same issue here too, same conditions, but not with postgres.
Same issue here. Re-installing/Upgrading/Redeploying the wordpress app results in the same error:
MountVolume.MountDevice failed for volume "pvc-db413a06bd404b84" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcdb413a06bd404b84 /dev/disk/by-id/scsi-0Linode_Volume_pvcdb413a06bd404b84]
Getting tired of these Volume issues to be honest.
My Postgres app redeployed and was assigned a new node. The volume got automatically detached and attached to that new node. Container/Pod failed to start with Mounting errors. I redeployed the Pod back onto it's original Node. Volume was successfully mounted back to the old Node, but the Pod/Container still won't mount:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 30m (x9 over 50m) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[dshm data kube-api-access-n6t5h]: timed out waiting for the condition
Warning FailedMount 16m (x2 over 25m) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-n6t5h dshm]: timed out waiting for the condition
Warning FailedMount 12m (x3 over 39m) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[kube-api-access-n6t5h dshm data]: timed out waiting for the condition
Warning FailedMount 92s (x33 over 52m) kubelet MountVolume.MountDevice failed for volume "pvc-19d050b1a14040c6" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvc19d050b1a14040c6 /dev/disk/by-id/scsi-0Linode_Volume_pvc19d050b1a14040c6]<
PVC description
Name: data-fanzy-postgresql-dev-0
Namespace: fanzy-dev
StorageClass: linode-block-storage-retain
Status: Bound
Volume: pvc-19d050b1a14040c6
Labels: app.kubernetes.io/component=primary
app.kubernetes.io/instance=fanzy-postgresql-dev
app.kubernetes.io/name=postgresql
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: linodebs.csi.linode.com
volume.kubernetes.io/storage-provisioner: linodebs.csi.linode.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: fanzy-postgresql-dev-0
Events: <none></none>
PV description
Name: pvc-19d050b1a14040c6
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: linodebs.csi.linode.com
Finalizers: [kubernetes.io/pv-protection external-attacher/linodebs-csi-linode-com]
StorageClass: linode-block-storage-retain
Status: Bound
Claim: fanzy-dev/data-fanzy-postgresql-dev-0
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: linodebs.csi.linode.com
FSType: ext4
VolumeHandle: 516140-pvc19d050b1a14040c6
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1662712251649-8081-linodebs.csi.linode.com
Events: <none></none></none></none>
Now I've noticed that even newly created PVC are failing to get attached to new pods/containers with the same error.
I ran through this example (https://github.com/linode/linode-blockstorage-csi-driver#create-a-kubernetes-secret) and reinstalled the drivers. PVC gets successfully created but fails to mount.
kubectl get pvc/csi-example-pvc pods/csi-example-pod
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/csi-example-pvc Bound pvc-c0ea8df9e5684244 10Gi RWO linode-block-storage-retain 21m
NAME READY STATUS RESTARTS AGE
pod/csi-example-pod 0/1 ContainerCreating 0 21m
Here's the error description from the pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/csi-example-pod to lke71838-112699-635487f6efa8
Warning FailedMount 3m38s kubelet Unable to attach or mount volumes: unmounted volumes=[csi-example-volume], unattached volumes=[kube-api-access-zvksd csi-example-volume]: timed out waiting for the condition
Warning FailedMount 83s (x5 over 12m) kubelet Unable to attach or mount volumes: unmounted volumes=[csi-example-volume], unattached volumes=[csi-example-volume kube-api-access-zvksd]: timed out waiting for the condition
Warning FailedAttachVolume 14s (x7 over 12m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-c0ea8df9e5684244" : Attach timeout for volume 802990-pvcc0ea8df9e5684244
I think you are seeing this strange behavior because of the deployment strategy used for the deployment with persistent volume and linode-block-storage-retain
storageclass.
You need to change the strategy
for deployment to Recreate
. By default, it uses RollingUpdate
.
apiVersion: apps/v1
kind: Deployment
...
spec:
strategy:
type: Recreate
...
The difference between the Recreate
and RollingUpdate
is that Recreate strategy will terminate the old pod before creating new one while RollingUpdate will create new pod before terminating the old one. If you are not using persistent volumes, then any strategy is fine. But with persistent volumes which are supposed to attach to only one pod, if the old one is not terminated, the new one will fail to come up and will remain in ContainerCreating state waiting for the storage to show up. There can be different inconsistent results because of this behavior.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
For statefulsets, when using Rolling Updates with the default Pod Management Policy (OrderedReady), it's possible to get into a broken state that requires manual intervention to repair. Please check the limitations section of k8s docs for more information: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
Also, if you are changing something on cloud manager UI which was auto-generated by k8s, then you might end up with weird issues. Your k8s might be trying to use the name/label which it gave when provisioning the resource and might not find it as the label was later changed from the UI. I once updated the label for my volume on cloud manager UI and k8s failed to identify that as PV within k8s was still referring to the old auto-generated label. I had to clean up things to get it fixed.
I got a similar problem when stateful sets were rescheduled on a different node when the cluster size was reduced…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned default/mongo-0 to lkeABC-DEF-XYZ
Warning FailedAttachVolume 37m attachdetach-controller Multi-Attach error for volume "pvc-XYZ" Volume is already exclusively attached to one node and can't be attached to another
Normal SuccessfulAttachVolume 36m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-XYZ"
Warning FailedMount 16m (x2 over 21m) kubelet Unable to attach or mount volumes: unmounted volumes=[mongo-data3], unattached volumes=[kube-api-access-5r75k mongo-data3]: timed out waiting for the condition
Warning FailedMount 6m16s (x23 over 36m) kubelet MountVolume.MountDevice failed for volume "pvc-XYZ" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcXYZ /dev/disk/by-id/scsi-0Linode_Volume_pvcXYZ]
Warning FailedMount 66s (x13 over 35m) kubelet Unable to attach or mount volumes: unmounted volumes=[mongo-data3], unattached volumes=[mongo-data3 kube-api-access-5r75k]: timed out waiting for the condition
Is there a recommended configuration for stateful sets to avoid this multi-attach then repeated FailedMount and pod not starting issue?
The problem is that /var/lib/postgresql/data
will be used for the mount point, so it will never be empty. You can use the PGDATA
variable to point to /var/lib/postgresql/data/pgdata
. It worked for me.
I've just come over from: https://www.linode.com/community/questions/20185/mounting-an-existing-volume-to-lke
However, I also receive the following error message a lot:
MountVolume.MountDevice failed for volume "ecs-database-volume" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-ecs-database-volume /dev/disk/by-id/scsi-0Linode_Volume_ecs-database-volume]
I also get other errors, but I'm just trying to tackle one at a time.
0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2
Preemption is not helpful for scheduling.
AttachVolume.Attach failed for volume "ecs-database-volume" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
So I was getting so many different errors and a multude of different issues. Just a few listed below as examples.
AttachVolume.Attach failed for volume "ecs-database-volume" : rpc error: code = Internal desc = [403] Unauthorized
AttachVolume.Attach failed for volume "ecs-database-volume" : volume attachment is being deleted
0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
AttachVolume.Attach failed for volume "ecs-database-volume" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
MountVolume.MountDevice failed for volume "ecs-database-volume" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-ecs-database-volume /dev/disk/by-id/scsi-0Linode_Volume_ecs-database-volume]
In the end this is what I did to resolve the issue.
First make sure the secret.yml
was created correctly
kubectl create secret generic linode-secret \
--namespace kube-system \
--from-literal=token=<redacted /> \
--from-literal=region=eu-central \
--from-literal=apiurl=https://api.linode.com \
--dry-run=client -o yaml
Update the ../secret.yml
file using the following manifest template:
---
apiVersion: v1
kind: Secret
metadata:
name: linode
namespace: kube-system
data:
apiurl: aHR0cHM6Ly9hcGkubGlub2RlLmNvbQ==
region: ZXUtY2VudHJhbA==
token: <redacted - but should be a base64 encoded string />
Note: For some strange reason, after a period of time, I notice that the secret/linode
data would just change on it's own?! WTF !!
Also occurs to the string data values if you are following the instructions here: https://github.com/linode/linode-blockstorage-csi-driver#create-a-kubernetes-secret
So each time, check the secret and re-apply to be sure that it is correct.
Apply, check & restart to take on new secret:
kubectl apply -f secret.yml
kubectl get secret/linode -n kube-system
kubectl rollout restart statefulset csi-linode-controller -n kube-system
The second step / next steps are a little more involved, from collecting information from the log to finally removing the volume attachment each time you have the nodes recycle/reboot.
$ kubectl logs csi-linode-controller-0 -n kube-system -c csi-attacher
I0910 03:24:16.467814 1 main.go:99] Version: v3.3.0
W0910 03:24:26.470804 1 connection.go:173] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
W0910 03:24:36.470866 1 connection.go:173] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
W0910 03:24:46.471010 1 connection.go:173] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0910 03:24:50.093835 1 common.go:111] Probing CSI driver for readiness
I0910 03:24:50.095089 1 main.go:155] CSI driver name: "linodebs.csi.linode.com"
I0910 03:24:50.095802 1 main.go:230] CSI driver supports ControllerPublishUnpublish, using real CSI handler
I0910 03:24:50.095968 1 controller.go:128] Starting CSI attacher
I0910 14:48:00.412919 1 csi_handler.go:279] Detaching "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab "
I0910 14:48:00.419327 1 csi_handler.go:231] Error processing "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab ": failed to detach: persistentvolume "ecs-database-volume" not found
I0910 14:48:00.419383 1 csi_handler.go:279] Detaching "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab "
I0910 14:48:00.425119 1 csi_handler.go:231] Error processing "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab ": failed to detach: persistentvolume "ecs-database-volume" not found
I0910 14:53:00.420129 1 csi_handler.go:279] Detaching "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab "
I0910 14:53:00.426732 1 csi_handler.go:231] Error processing "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab ": failed to detach: persistentvolume "ecs-database-volume" not found
I0910 14:53:00.427654 1 csi_handler.go:279] Detaching "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab "
I0910 14:53:00.432560 1 csi_handler.go:231] Error processing "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab ": failed to detach: persistentvolume "ecs-database-volume" not found
I0910 14:54:50.163947 1 csi_handler.go:279] Detaching "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab "
I0910 14:54:50.172132 1 csi_handler.go:231] Error processing "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab ": failed to detach: persistentvolume "ecs-database-volume" not found
I0910 14:54:50.172476 1 csi_handler.go:279] Detaching "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab "
I0910 14:54:50.177722 1 csi_handler.go:231] Error processing "csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab ": failed to detach: persistentvolume "ecs-database-volume" not found
$ kubectl logs csi-linode-controller-0 -n kube-system -c csi-provisioner
I0910 03:24:14.309663 1 feature_gate.go:245] feature gates: &{map[]}
I0910 03:24:14.309741 1 csi-provisioner.go:138] Version: v3.0.0
I0910 03:24:14.309758 1 csi-provisioner.go:161] Building kube configs for running in cluster...
W0910 03:24:24.311202 1 connection.go:173] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
W0910 03:24:34.311069 1 connection.go:173] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
W0910 03:24:44.311980 1 connection.go:173] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0910 03:24:49.596248 1 common.go:111] Probing CSI driver for readiness
I0910 03:24:49.599807 1 csi-provisioner.go:205] Detected CSI driver linodebs.csi.linode.com
I0910 03:24:49.601427 1 csi-provisioner.go:274] CSI driver supports PUBLISH_UNPUBLISH_VOLUME, watching VolumeAttachments
I0910 03:24:49.601998 1 controller.go:731] Using saving PVs to API server in background
I0910 03:24:49.702391 1 controller.go:810] Starting provisioner controller linodebs.csi.linode.com_csi-linode-controller-0_0029dacd-c739-4bd2-844d-e058ac7dc7c3!
I0910 03:24:49.702430 1 clone_controller.go:66] Starting CloningProtection controller
I0910 03:24:49.702870 1 clone_controller.go:82] Started CloningProtection controller
I0910 03:24:49.702621 1 volume_store.go:97] Starting save volume queue
I0910 03:24:49.802972 1 controller.go:859] Started provisioner controller linodebs.csi.linode.com_csi-linode-controller-0_0029dacd-c739-4bd2-844d-e058ac7dc7c3!
$ kubectl get volumeattachment
NAME ATTACHER PV NODE ATTACHED AGE
csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab linodebs.csi.linode.com ecs-database-volume lke214902-125418-59ea0a3b0000 false 23h
csi-2bd01efbf86ad15076dc55c614013405450c8e450219095d7eaa58e9dd904a51 linodebs.csi.linode.com byteloch-chat-volume lke214902-125419-02c3c3fc0000 true 46h
csi-ae9bf0763ef434807b44a8bf7fcdb7ecc5bf2d42c741922a78ef5efe15a991d0 linodebs.csi.linode.com byteloch-identity-volume lke214902-125419-02c3c3fc0000 true 46h
csi-be665973616ebd199e3ae66ba732d7e96243b22a7aba5781c2fb270db73dbd8b linodebs.csi.linode.com byteloch-vault-volume lke214902-125419-02c3c3fc0000 true 46h
$ kubectl get volumeattachment csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab -o yaml
apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
annotations:
csi.alpha.kubernetes.io/node-id: "63695965"
finalizers:
- external-attacher/linodebs-csi-linode-com
name: csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab
resourceVersion: "310661"
uid: ab691c68-0b04-4d29-a72b-52a6c2f34c72
spec:
attacher: linodebs.csi.linode.com
nodeName: lke225001-325519-59ea0a3b0000
source:
persistentVolumeName: ecs-database-volume
status:
attached: true
attachmentMetadata:
devicePath: /dev/disk/by-id/scsi-0Linode_Volume_ecs-database-volume
$ kubectl delete volumeattachment csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab
// ctrl+c to leave the command as it will hang, but continue to run the following command(s)
$ kubectl patch volumeattachment csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab -p '{"metadata":{"finalizers":null}}'
volumeattachment.storage.k8s.io/csi-e191e89bba9cef08e865165487e47b2ac6c6e96dbbc5cf848947d500209cb8ab patched
$ kubectl get volumeattachment
NAME ATTACHER PV NODE ATTACHED AGE
csi-2bd01efbf86ad15076dc55c614013405450c8e450219095d7eaa58e9dd904a51 linodebs.csi.linode.com byteloch-chat-volume lke214902-125419-02c3c3fc0000 true 46h
csi-ae9bf0763ef434807b44a8bf7fcdb7ecc5bf2d42c741922a78ef5efe15a991d0 linodebs.csi.linode.com byteloch-identity-volume lke214902-125419-02c3c3fc0000 true 46h
csi-be665973616ebd199e3ae66ba732d7e96243b22a7aba5781c2fb270db73dbd8b linodebs.csi.linode.com byteloch-vault-volume lke214902-125419-02c3c3fc0000 true 46h
After this, double check that the label of the storage that is set up in the linode cloud really is correct and matches what you have in the manifests.
I would then re-apply the PV, namespace for the PVC, the PVC itself, and then fireup my deployment/pod/whatever and no need to change the manifest spec.strategy.type
to Recreate
, which I don't know if that is valid for pods? So it would not have worked in my case.
After which it has started working consistently. I would have to do this each time I reboot/recycle the node, but at least I have a consistent solution instead of just hoping for the best each time I need to change anything storage related, which was extremely frustrating to deal with.
Related URLS:
https://www.linode.com/community/questions/20185/mounting-an-existing-volume-to-lke
https://www.linode.com/community/questions/20010/deploying-postgres-databasa-with-csi-volumes
https://www.linode.com/community/questions/22548/volume-failedmounted
https://github.com/bitnami/charts/issues/9020