Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] removal of container k3d-tools is already in progress when adding nodes with replicas #1410

Open
ligfx opened this issue Feb 21, 2024 · 0 comments · May be fixed by #1411
Open

[BUG] removal of container k3d-tools is already in progress when adding nodes with replicas #1410

ligfx opened this issue Feb 21, 2024 · 0 comments · May be fixed by #1411
Labels
bug Something isn't working

Comments

@ligfx
Copy link

ligfx commented Feb 21, 2024

Looks similar to #924

What did you do

k3d node create my-new-nodes --cluster my-k3d-cluster --replicas 3 --trace --verbose

Screenshots or terminal output

DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] Runtime Info:
&{Name:docker Endpoint:/var/run/docker.sock Version:25.0.3 OSType:linux OS:Docker Desktop Arch:aarch64 CgroupVersion:2 CgroupDriver:cgroupfs Filesystem:extfs InfoName:docker-desktop} 
INFO[0000] Adding 3 node(s) to the runtime local cluster 'my-k3d-cluster'... 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-boo-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-tools 
DEBU[0000] no netlabel present on container /k3d-my-k3d-cluster-tools 
DEBU[0000] failed to get IP for container /k3d-my-k3d-cluster-tools as we couldn't find the cluster network 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-tools 
DEBU[0000] no netlabel present on container /k3d-my-k3d-cluster-tools 
DEBU[0000] failed to get IP for container /k3d-my-k3d-cluster-tools as we couldn't find the cluster network 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-tools 
DEBU[0000] no netlabel present on container /k3d-my-k3d-cluster-tools 
DEBU[0000] failed to get IP for container /k3d-my-k3d-cluster-tools as we couldn't find the cluster network 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-test-hello-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-serverlb 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-serverlb 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-serverlb 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-2 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-1 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-agent-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-server-0 
TRAC[0000] Reading path /etc/confd/values.yaml from node k3d-my-k3d-cluster-serverlb... 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-server-0 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-server-0 
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
INFO[0000] Using the k3d-tools node to gather environment information 
INFO[0000] Using the k3d-tools node to gather environment information 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-tools 
DEBU[0000] no netlabel present on container /k3d-my-k3d-cluster-tools 
DEBU[0000] failed to get IP for container /k3d-my-k3d-cluster-tools as we couldn't find the cluster network 
DEBU[0000] Deleting node k3d-my-k3d-cluster-tools ... 
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
INFO[0000] Using the k3d-tools node to gather environment information 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-tools 
DEBU[0000] no netlabel present on container /k3d-my-k3d-cluster-tools 
DEBU[0000] failed to get IP for container /k3d-my-k3d-cluster-tools as we couldn't find the cluster network 
DEBU[0000] Deleting node k3d-my-k3d-cluster-tools ... 
ERRO[0000] docker failed to remove the container 'k3d-my-k3d-cluster-tools': Error response from daemon: removal of container k3d-my-k3d-cluster-tools is already in progress 
TRAC[0000] TranslateContainerDetailsToNode: Checking for default object label app=k3d on container /k3d-my-k3d-cluster-tools 
DEBU[0000] no netlabel present on container /k3d-my-k3d-cluster-tools 
DEBU[0000] failed to get IP for container /k3d-my-k3d-cluster-tools as we couldn't find the cluster network 
INFO[0000] Starting existing tools node k3d-my-k3d-cluster-tools... 
TRAC[0000] [Docker] Deleted Container k3d-my-k3d-cluster-tools 
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
TRAC[0000] GOOS: darwin / Runtime OS: linux (Docker Desktop) 
TRAC[0000] GOOS: darwin / Runtime OS: linux (Docker Desktop) 
INFO[0000] Starting new tools node...                   
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] Detected CgroupV2, enabling custom entrypoint (disable by setting K3D_FIX_CGROUPV2=false) 
TRAC[0000] Creating node from spec
&{Name:k3d-my-k3d-cluster-tools Role:noRole Image:ghcr.io/k3d-io/k3d-tools:5.6.0 Volumes:[k3d-my-k3d-cluster-images:/k3d/images /var/run/docker.sock:/var/run/docker.sock] Env:[] Cmd:[] Args:[noop] Ports:map[] Restart:false Created: HostPidMode:false RuntimeLabels:map[app:k3d k3d.cluster:my-k3d-cluster k3d.version:v5.6.0] RuntimeUlimits:[] K3sNodeLabels:map[] Networks:[k3d-my-k3d-cluster] ExtraHosts:[host.k3d.internal:host-gateway] ServerOpts:{IsInit:false KubeAPI:<nil>} AgentOpts:{} GPURequest: Memory: State:{Running:false Status: Started:} IP:{IP:invalid IP Static:false} HookActions:[]} 
TRAC[0000] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-my-k3d-cluster-tools Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[noop] Healthcheck:<nil> ArgsEscaped:false Image:ghcr.io/k3d-io/k3d-tools:5.6.0 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:my-k3d-cluster k3d.role:noRole k3d.version:v5.6.0] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[k3d-my-k3d-cluster-images:/k3d/images /var/run/docker.sock:/var/run/docker.sock] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode:bridge PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] ConsoleSize:[0 0] Annotations:map[] CapAdd:[] CapDrop:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[host.k3d.internal:host-gateway] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0x140001f0d4f} NetworkingConfig:{EndpointsConfig:map[k3d-my-k3d-cluster:0x1400040a180]}} 
ERRO[0000] Failed to run tools container for cluster 'my-k3d-cluster' 
INFO[0000] Starting new tools node...                   
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0000] DOCKER_SOCK=/var/run/docker.sock             
TRAC[0000] Creating node from spec
&{Name:k3d-my-k3d-cluster-tools Role:noRole Image:ghcr.io/k3d-io/k3d-tools:5.6.0 Volumes:[k3d-my-k3d-cluster-images:/k3d/images /var/run/docker.sock:/var/run/docker.sock] Env:[] Cmd:[] Args:[noop] Ports:map[] Restart:false Created: HostPidMode:false RuntimeLabels:map[app:k3d k3d.cluster:my-k3d-cluster k3d.version:v5.6.0] RuntimeUlimits:[] K3sNodeLabels:map[] Networks:[k3d-my-k3d-cluster] ExtraHosts:[host.k3d.internal:host-gateway] ServerOpts:{IsInit:false KubeAPI:<nil>} AgentOpts:{} GPURequest: Memory: State:{Running:false Status: Started:} IP:{IP:invalid IP Static:false} HookActions:[]} 
TRAC[0000] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-my-k3d-cluster-tools Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[noop] Healthcheck:<nil> ArgsEscaped:false Image:ghcr.io/k3d-io/k3d-tools:5.6.0 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:my-k3d-cluster k3d.role:noRole k3d.version:v5.6.0] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[k3d-my-k3d-cluster-images:/k3d/images /var/run/docker.sock:/var/run/docker.sock] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode:bridge PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] ConsoleSize:[0 0] Annotations:map[] CapAdd:[] CapDrop:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[host.k3d.internal:host-gateway] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0x1400037e88a} NetworkingConfig:{EndpointsConfig:map[k3d-my-k3d-cluster:0x14000520000]}} 
ERRO[0000] Failed to run tools container for cluster 'my-k3d-cluster' 
FATA[0000] failed to add 3 node(s) to the runtime local cluster 'my-k3d-cluster': failed to add one or more nodes: error gathering cluster environment info required to properly create the node: error starting existing tools node k3d-my-k3d-cluster-tools: failed to get container for node 'k3d-my-k3d-cluster-tools': Didn't find container for node 'k3d-my-k3d-cluster-tools' 


Which OS & Architecture

arch: aarch64
cgroupdriver: cgroupfs
cgroupversion: "2"
endpoint: /var/run/docker.sock
filesystem: extfs
infoname: docker-desktop
name: docker
os: Docker Desktop
ostype: linux
version: 25.0.3

Which version of k3d

k3d version v5.6.0
k3s version v1.27.5-k3s1 (default)

and

k3d version v5-dev
k3s version v1.21.7-k3s1 (default)

Which version of docker

$ docker version
Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:26 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.27.2 (137060)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435
  Built:            Tue Feb  6 21:14:22 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
@ligfx ligfx added the bug Something isn't working label Feb 21, 2024
ligfx added a commit to ligfx/k3d that referenced this issue Feb 21, 2024
…d race condition causing error

fixes k3d-io#1410

When multiple nodes are created, each call to NodeAddToCluster was calling GatherEnvironmentInfo,
which was calling EnsureToolsNode() and then deleting the tools node. This causes some type of race
condition. The work doesn't need to be done multiple times anyways, so just move it up to
NodeAddToClusterMulti and do it once, before kicking off the goroutines.
iwilltry42 pushed a commit to ligfx/k3d that referenced this issue Apr 9, 2024
…d race condition causing error

fixes k3d-io#1410

When multiple nodes are created, each call to NodeAddToCluster was calling GatherEnvironmentInfo,
which was calling EnsureToolsNode() and then deleting the tools node. This causes some type of race
condition. The work doesn't need to be done multiple times anyways, so just move it up to
NodeAddToClusterMulti and do it once, before kicking off the goroutines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant