Lab 11.1 pod/linkerd-destination-7b996475cf-tx69s container sp-validator is not ready
I'm on an AWS setup and also tried the lab with the older version of 2.11 but didn't get anywhere. I always get stuck at the setup part where two pods get stuck:
k8s-cp:~$ k get pod NAME READY STATUS RESTARTS AGE linkerd-destination-7b996475cf-tx69s 0/4 CrashLoopBackOff 5 (111s ago) 9m42s linkerd-heartbeat-27766481-f9slk 1/1 Running 0 24s linkerd-identity-6f49866669-pm44p 2/2 Running 0 9m42s linkerd-proxy-injector-66894488cc-prrqk 0/2 CrashLoopBackOff 4 (106s ago) 9m42s
ubuntu@k8s-cp:~$ k describe pod linkerd-destination-7b996475cf-tx69s ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned linkerd/linkerd-destination-7b996475cf-tx69s to k8s-worker Normal Pulled 10m kubelet Container image "cr.l5d.io/linkerd/proxy-init:v2.0.0" already present on machine Normal Created 10m kubelet Created container linkerd-init Normal Started 10m kubelet Started container linkerd-init Normal Killing 8m31s kubelet FailedPostStartHook Warning FailedPostStartHook 8m31s kubelet Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-7b996475cf-tx69s_linkerd(c8809acb-32a2-43ab-9f31-9d6486d290aa)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout , message: "linkerd-proxy failed to become ready within 120s timeout\n" Normal Created 8m1s kubelet Created container destination Normal Pulled 8m1s kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine Normal Started 8m kubelet Started container sp-validator Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/policy-controller:stable-2.12.1" already present on machine Normal Created 8m kubelet Created container policy Normal Started 8m kubelet Started container policy Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine Normal Created 8m kubelet Created container sp-validator Normal Started 8m kubelet Started container destination Normal Started 7m59s (x2 over 10m) kubelet Started container linkerd-proxy Normal Created 7m59s (x2 over 10m) kubelet Created container linkerd-proxy Normal Pulled 7m59s (x2 over 10m) kubelet Container image "cr.l5d.io/linkerd/proxy:stable-2.12.1" already present on machine Warning Unhealthy 7m59s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": dial tcp 192.168.254.189:9996: connect: connection refused Warning Unhealthy 7m54s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 7m54s kubelet Liveness probe failed: Get "http://192.168.254.189:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 5m29s (x2 over 7m59s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": dial tcp 192.168.254.189:9997: connect: connection refused Warning Unhealthy 24s (x41 over 7m54s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
ubuntu@k8s-cp:~$ k logs linkerd-destination-7b996475cf-tx69s Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003239s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.003925s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.003933s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.003936s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.003939s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.003941s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.003943s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.003946s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 15.005765s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 30.007001s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 45.008002s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 45.014336s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out] [ 60.009044s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 75.009629s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 90.011173s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 90.130492s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out] [ 105.011519s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 120.013446s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 120.106268s] INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown [ 120.116819s] INFO ThreadId(01) linkerd2_proxy: Received shutdown signal [ 135.015350s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 135.353951s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out] [ 150.016557s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
The errors in the proyx-injector pods are the same. Since everything else is working I'm having a hard time trying to figure what could be wrong. CoreDNS must be working since the other labs using service names were also working. Can anyone give me a hint where to look?
Comments
-
I just stood up an environment on AWS to try to reproduce the issue reported, but it all seems to work just fine.
Would you be able to provide the outputs of:kubectl get nodes -o wide
kubectl get pods -A -o wide
While setting up the AWS EC2 instances, VPC and SG, did you happen to follow the video guide from the introductory chapter?
Regards,
-Chris0 -
Hi @chrispokorni, I did follow the guide and just double checked. All the other labs worked without issues so I assume the setup itself is fine.
So I tried this again with an older version as another thread suggested:
export LINKERD2_VERSION=stable-2.11.4
followed by the commands in the Lab withoutlinkerd install --crds | kubectl apply -f -
as this seems to be new in 2.12.
Result islinkerd check
failing again:Linkerd core checks =================== kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API kubernetes-version ------------------ √ is running the minimum Kubernetes API version √ is running the minimum kubectl version linkerd-existence ----------------- √ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods × control plane pods are ready pod/linkerd-destination-5c56887457-sghqx container sp-validator is not ready see https://linkerd.io/2.11/checks/#l5d-api-control-ready for hints Status check results are ×
ubuntu@k8s-cp:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-cp Ready control-plane 25d v1.24.1 172.31.40.134 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9 k8s-worker Ready <none> 25d v1.24.1 172.31.46.207 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9 ubuntu@k8s-cp:~$ kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-66966888c4-qjmsd 1/1 Running 9 (34m ago) 22d 192.168.62.182 k8s-cp <none> <none> kube-system calico-node-6sjbd 1/1 Running 11 (34m ago) 25d 172.31.40.134 k8s-cp <none> <none> kube-system calico-node-k9pl4 1/1 Running 10 (34m ago) 25d 172.31.46.207 k8s-worker <none> <none> kube-system coredns-64897985d-d49sk 1/1 Running 9 (34m ago) 22d 192.168.62.180 k8s-cp <none> <none> kube-system coredns-64897985d-n4qb8 1/1 Running 9 (34m ago) 22d 192.168.62.181 k8s-cp <none> <none> kube-system etcd-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-apiserver-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-controller-manager-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-proxy-4jsnx 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-proxy-v4ktk 1/1 Running 8 (34m ago) 22d 172.31.46.207 k8s-worker <none> <none> kube-system kube-scheduler-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> linkerd linkerd-destination-5c56887457-sghqx 0/4 CrashLoopBackOff 19 (10s ago) 14m 192.168.254.137 k8s-worker <none> <none> linkerd linkerd-heartbeat-27772259-tnw5d 1/1 Running 0 5m3s 192.168.254.139 k8s-worker <none> <none> linkerd linkerd-identity-5f4dbf785d-c227v 2/2 Running 0 14m 192.168.254.136 k8s-worker <none> <none> linkerd linkerd-proxy-injector-5f7877455f-x5lkw 0/2 CrashLoopBackOff 6 (3m32s ago) 14m 192.168.254.140 k8s-worker <none> <none>
The restarts of the other pods are there as I pause the machines when I don't need them to save money.
Any ideas
Regards,
Michael0 -
I have very similar problem also in AWS environment.
`
ubuntu@k8scp:~$ linkerd checkLinkerd core checks
kubernetes-api
√ can initialize the client
√ can query the Kubernetes APIkubernetes-version
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl versionlinkerd-existence
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
× control plane pods are ready
pod/linkerd-destination-f99c6cf45-ghcw2 container sp-validator is not ready
see https://linkerd.io/2.12/checks/#l5d-api-control-ready for hintsStatus check results are ×
ubuntu@k8scp:~$ kubectl -n linkerd get po
NAME READY STATUS RESTARTS AGE
linkerd-destination-f99c6cf45-ghcw2 0/4 CrashLoopBackOff 4 (29s ago) 5m45s
linkerd-identity-75b9ddf5f9-vmw5s 2/2 Running 0 5m45s
linkerd-proxy-injector-5d69d7c8c6-b6nrr 0/2 CrashLoopBackOff 2 (32s ago) 5m44s
`kubectl logs -n linkerd linkerd-destination-f99c6cf45-ghcw2 Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003663s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.004313s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.004324s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.004327s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.004330s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.004334s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.004337s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.004340s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.004797s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.110403s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.329098s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.744807s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.246795s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.748823s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.250794s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.752781s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.254786s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.756668s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.257726s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.759729s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.261736s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.763715s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
0 -
Hi @rafjaro,
When setting up your EC2 instances, VPC, and SG, did you follow the recommendations from the video guide from the introductory chapter?
I notice the control plane node hostname appears to be "k8scp". It was not intended to be a hostname, but an alias instead, to be re-assigned to another host in a later lab.
Are there other deviations from the installation instructions of the lab guide, which may have possibly impacted the outcome of the linkerd installation?
I tried reproducing the issues reported on both occasions above, but the linkerd installation worked both times on EC2 instances.
Regards,
-Chris0 -
I had the same issue where i could not complete the "linkerd check" command. I found that the issue was related to coredns misconfiguration that i did the prev. assignment. I fixed the dns misconfiguration and can now run "linkerd check" command successfully.
0 -
Do you have more details on how you fixed the coredns misconfiguration? I am guessing I'm seeing the same issue because
kubectl get pods -A -o wide
shows me
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES .. kube-system coredns-7c65d6cfc9-lkh2c 0/1 CrashLoopBackOff 3976 (3m37s ago) 11d 10.0.0.10 cp <none> <none> kube-system coredns-7c65d6cfc9-tdj2x 0/1 CrashLoopBackOff 3975 (2m27s ago) 11d 10.0.0.209 cp <none> <none>
I'm getting the same 'pod/linkerd-destination-5c8475f9c5-ncgjt container sp-validator is not ready' when running "linkerd check"
0 -
I reapplied the backup of the configmap I took in lab 10.4 to coredns. Confirmed coredns was running and then uninstalled/reinstalled my linkerd build and that seems to have done the trick.
0
Categories
- All Categories
- 167 LFX Mentorship
- 167 LFX Mentorship: Linux Kernel
- 802 Linux Foundation IT Professional Programs
- 358 Cloud Engineer IT Professional Program
- 149 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 127 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 112 Express Courses - Discussion Forum
- 6.3K Training Courses
- 24 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 42 LFD102 Class Forum
- 198 LFD103 Class Forum
- 19 LFD110 Class Forum
- 41 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 649 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 161 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 3 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 1 LFS142 Class Forum
- 2 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 2 LFS148 Class Forum
- 2 LFS151 Class Forum
- 1 LFS157 Class Forum
- 1 LFS158 Class Forum
- 10 LFS162 Class Forum
- 2 LFS166 Class Forum
- 1 LFS167 Class Forum
- 1 LFS170 Class Forum
- 1 LFS171 Class Forum
- 1 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 135 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 56 LFS216 Class Forum
- 48 LFS241 Class Forum
- 48 LFS242 Class Forum
- 37 LFS243 Class Forum
- 12 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- LFS248 Class Forum
- 43 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 141 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 9 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 111 LFS260 Class Forum
- 149 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 18 LFS268 Class Forum
- 29 LFS269 Class Forum
- 5 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS147 Class Forum
- LFS274 Class Forum
- 3 LFS281 Class Forum
- LFW111 Class Forum
- 256 LFW211 Class Forum
- 182 LFW212 Class Forum
- 10 SKF100 Class Forum
- SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 174 Networking
- 87 Printers & Scanners
- 83 Storage
- 743 Linux Distributions
- 80 Debian
- 66 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 55 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 372 Off Topic
- 114 Introductions
- 169 Small Talk
- 23 Study Material
- 507 Programming and Development
- 304 Kernel Development
- 204 Software Development
- 1.1K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 317 Installation
- 59 All In Program
- 59 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)