Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem while creating large (100+ GB) volume #371

Open
duckhawk opened this issue Oct 7, 2023 · 0 comments
Open

Problem while creating large (100+ GB) volume #371

duckhawk opened this issue Oct 7, 2023 · 0 comments

Comments

@duckhawk
Copy link

duckhawk commented Oct 7, 2023

Just trying to create volume 100+ GB

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: largepvc 
  namespace:  default
spec:
  storageClassName: "linstor-store-r2"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  strategy:
    type: Recreate
  replicas: 1
  selector:
    matchLabels:
      component: nginx
  template:
    metadata:
      labels:
        component: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        command: ["/usr/sbin/nginx"]
        args:
        - -g
        - daemon off;
        volumeMounts:
        - mountPath: "/app/media"
          name: largepvc 
        ports:
          - containerPort: 80
            protocol: TCP
      volumes:
        - name: largepvc
          persistentVolumeClaim:
            claimName: largepvc

Linstor creates volume, but it looks like mkfs failed because of timeout

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               18m                  linstor                  Successfully assigned default/nginx-c49998c79-lvnlx to node0
  Normal   SuccessfulAttachVolume  18m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-c5dda542-a0ac-4336-882a-1724f98664b0"
  Warning  FailedMount             101s (x16 over 18m)  kubelet                  MountVolume.SetUp failed for volume "pvc-c5dda542-a0ac-4336-882a-1724f98664b0" : rpc error: code = Internal desc = NodePublishVolume failed for pvc-c5dda542-a0ac-4336-882a-1724f98664b0: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev /dev/drbd1020 /var/lib/kubelet/pods/59315b6d-3f90-4bad-b831-4af54963d3cb/volumes/kubernetes.io~csi/pvc-c5dda542-a0ac-4336-882a-1724f98664b0/mount
Output: mount: /var/lib/kubelet/pods/59315b6d-3f90-4bad-b831-4af54963d3cb/volumes/kubernetes.io~csi/pvc-c5dda542-a0ac-4336-882a-1724f98664b0/mount: wrong fs type, bad option, bad superblock on /dev/drbd1020, missing codepage or helper program, or other error.
  Warning  FailedMount  23s (x8 over 16m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[largepvc], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition

Also, resource is stucked in InUse state in one of diskfull replicas

root@master1:~# linstor r l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
+---------------------------------------------------------------------------------------------------------------+
| ResourceName                             | Node    | Port | Usage  | Conns |      State | CreatedOn           |
|===============================================================================================================|
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | node0   | 7017 | InUse  | Ok    |   UpToDate | 2023-10-07 18:03:57 |
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | node4   | 7017 | Unused | Ok    |   UpToDate | 2023-10-07 18:04:57 |
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | system1 | 7017 | Unused | Ok    | TieBreaker | 2023-10-07 18:04:52 |
+---------------------------------------------------------------------------------------------------------------+

This case can be fixed by creating fs and drbdadm up/down on node with resource in InUse status

root@master1:~# kubectl -n d8-linstor exec -ti linstor-node-8rmmv bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "linstor-satellite" out of: linstor-satellite, kube-rbac-proxy, drbd-prometheus-exporter
root@node0:/# linstor v l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node    ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node0   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊   1.28 GiB ┊ InUse  ┊   UpToDate ┊
┊ node4   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊   UpToDate ┊
┊ system1 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ DfltDisklessStorPool ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊            ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@node0:/# mkfs.ext4 -E lazy_itable_init=1 -E lazy_journal_init=1 /dev/drbd1020
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 52428800 4k blocks and 13132800 inodes
Filesystem UUID: 8e8784d7-2b96-4d50-92ac-1a9ad8074637
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done     

root@node0:
root@node0:/# drbdadm down pvc-c5dda542-a0ac-4336-882a-1724f98664b0
root@node0:/# drbdadm up pvc-c5dda542-a0ac-4336-882a-1724f98664b0
root@node0:/# linstor v l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node    ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node0   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊   UpToDate ┊
┊ node4   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊   UpToDate ┊
┊ system1 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ DfltDisklessStorPool ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊            ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@node0:/# 
exit
root@master1:~# kubectl get pods
NAME                    READY   STATUS              RESTARTS   AGE
nginx-c49998c79-lvnlx   0/1     ContainerCreating   0          28m
root@master1:~# kubectl delete pod nginx-c49998c79-lvnlx 
pod "nginx-c49998c79-lvnlx" deleted
root@master1:~# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-c49998c79-rpp82   1/1     Running   0          4m37s
root@master1:~# kubectl get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
largepvc   Bound    pvc-c5dda542-a0ac-4336-882a-1724f98664b0   200Gi      RWO            linstor-store-r2   33m

I tried to use lazy params in storageclass, but it wasn't usefull

root@master1:~# kubectl get sc linstor-store-r2 -oyaml | grep fsOpts
  linstor.csi.linbit.com/fsOpts: -E lazy_itable_init=1 -E lazy_journal_init=1

Looks like there some timeout while volume provisioning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant