Lab 11.1 pod/linkerd-destination-7b996475cf-tx69s container sp-validator is not ready
I'm on an AWS setup and also tried the lab with the older version of 2.11 but didn't get anywhere. I always get stuck at the setup part where two pods get stuck:
k8s-cp:~$ k get pod NAME READY STATUS RESTARTS AGE linkerd-destination-7b996475cf-tx69s 0/4 CrashLoopBackOff 5 (111s ago) 9m42s linkerd-heartbeat-27766481-f9slk 1/1 Running 0 24s linkerd-identity-6f49866669-pm44p 2/2 Running 0 9m42s linkerd-proxy-injector-66894488cc-prrqk 0/2 CrashLoopBackOff 4 (106s ago) 9m42s
ubuntu@k8s-cp:~$ k describe pod linkerd-destination-7b996475cf-tx69s ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned linkerd/linkerd-destination-7b996475cf-tx69s to k8s-worker Normal Pulled 10m kubelet Container image "cr.l5d.io/linkerd/proxy-init:v2.0.0" already present on machine Normal Created 10m kubelet Created container linkerd-init Normal Started 10m kubelet Started container linkerd-init Normal Killing 8m31s kubelet FailedPostStartHook Warning FailedPostStartHook 8m31s kubelet Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-7b996475cf-tx69s_linkerd(c8809acb-32a2-43ab-9f31-9d6486d290aa)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout , message: "linkerd-proxy failed to become ready within 120s timeout\n" Normal Created 8m1s kubelet Created container destination Normal Pulled 8m1s kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine Normal Started 8m kubelet Started container sp-validator Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/policy-controller:stable-2.12.1" already present on machine Normal Created 8m kubelet Created container policy Normal Started 8m kubelet Started container policy Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine Normal Created 8m kubelet Created container sp-validator Normal Started 8m kubelet Started container destination Normal Started 7m59s (x2 over 10m) kubelet Started container linkerd-proxy Normal Created 7m59s (x2 over 10m) kubelet Created container linkerd-proxy Normal Pulled 7m59s (x2 over 10m) kubelet Container image "cr.l5d.io/linkerd/proxy:stable-2.12.1" already present on machine Warning Unhealthy 7m59s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": dial tcp 192.168.254.189:9996: connect: connection refused Warning Unhealthy 7m54s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 7m54s kubelet Liveness probe failed: Get "http://192.168.254.189:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 5m29s (x2 over 7m59s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": dial tcp 192.168.254.189:9997: connect: connection refused Warning Unhealthy 24s (x41 over 7m54s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
ubuntu@k8s-cp:~$ k logs linkerd-destination-7b996475cf-tx69s
Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init)
time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der"
[ 0.003239s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.003925s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.003933s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.003936s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.003939s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[ 0.003941s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[ 0.003943s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.003946s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[ 15.005765s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 30.007001s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 45.008002s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 45.014336s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
[ 60.009044s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 75.009629s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 90.011173s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 90.130492s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
[ 105.011519s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 120.013446s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 120.106268s] INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown
[ 120.116819s] INFO ThreadId(01) linkerd2_proxy: Received shutdown signal
[ 135.015350s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[ 135.353951s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
[ 150.016557s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
The errors in the proyx-injector pods are the same. Since everything else is working I'm having a hard time trying to figure what could be wrong. CoreDNS must be working since the other labs using service names were also working. Can anyone give me a hint where to look?
Comments
-
I just stood up an environment on AWS to try to reproduce the issue reported, but it all seems to work just fine.
Would you be able to provide the outputs of:kubectl get nodes -o widekubectl get pods -A -o wideWhile setting up the AWS EC2 instances, VPC and SG, did you happen to follow the video guide from the introductory chapter?
Regards,
-Chris0 -
Hi @chrispokorni, I did follow the guide and just double checked. All the other labs worked without issues so I assume the setup itself is fine.
So I tried this again with an older version as another thread suggested:
export LINKERD2_VERSION=stable-2.11.4followed by the commands in the Lab withoutlinkerd install --crds | kubectl apply -f -as this seems to be new in 2.12.
Result islinkerd checkfailing again:Linkerd core checks =================== kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API kubernetes-version ------------------ √ is running the minimum Kubernetes API version √ is running the minimum kubectl version linkerd-existence ----------------- √ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods × control plane pods are ready pod/linkerd-destination-5c56887457-sghqx container sp-validator is not ready see https://linkerd.io/2.11/checks/#l5d-api-control-ready for hints Status check results are ×ubuntu@k8s-cp:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-cp Ready control-plane 25d v1.24.1 172.31.40.134 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9 k8s-worker Ready <none> 25d v1.24.1 172.31.46.207 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9 ubuntu@k8s-cp:~$ kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-66966888c4-qjmsd 1/1 Running 9 (34m ago) 22d 192.168.62.182 k8s-cp <none> <none> kube-system calico-node-6sjbd 1/1 Running 11 (34m ago) 25d 172.31.40.134 k8s-cp <none> <none> kube-system calico-node-k9pl4 1/1 Running 10 (34m ago) 25d 172.31.46.207 k8s-worker <none> <none> kube-system coredns-64897985d-d49sk 1/1 Running 9 (34m ago) 22d 192.168.62.180 k8s-cp <none> <none> kube-system coredns-64897985d-n4qb8 1/1 Running 9 (34m ago) 22d 192.168.62.181 k8s-cp <none> <none> kube-system etcd-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-apiserver-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-controller-manager-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-proxy-4jsnx 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-proxy-v4ktk 1/1 Running 8 (34m ago) 22d 172.31.46.207 k8s-worker <none> <none> kube-system kube-scheduler-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> linkerd linkerd-destination-5c56887457-sghqx 0/4 CrashLoopBackOff 19 (10s ago) 14m 192.168.254.137 k8s-worker <none> <none> linkerd linkerd-heartbeat-27772259-tnw5d 1/1 Running 0 5m3s 192.168.254.139 k8s-worker <none> <none> linkerd linkerd-identity-5f4dbf785d-c227v 2/2 Running 0 14m 192.168.254.136 k8s-worker <none> <none> linkerd linkerd-proxy-injector-5f7877455f-x5lkw 0/2 CrashLoopBackOff 6 (3m32s ago) 14m 192.168.254.140 k8s-worker <none> <none>
The restarts of the other pods are there as I pause the machines when I don't need them to save money.
Any ideas
Regards,
Michael0 -
I have very similar problem also in AWS environment.
`
ubuntu@k8scp:~$ linkerd checkLinkerd core checks
kubernetes-api
√ can initialize the client
√ can query the Kubernetes APIkubernetes-version
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl versionlinkerd-existence
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
× control plane pods are ready
pod/linkerd-destination-f99c6cf45-ghcw2 container sp-validator is not ready
see https://linkerd.io/2.12/checks/#l5d-api-control-ready for hintsStatus check results are ×
ubuntu@k8scp:~$ kubectl -n linkerd get po
NAME READY STATUS RESTARTS AGE
linkerd-destination-f99c6cf45-ghcw2 0/4 CrashLoopBackOff 4 (29s ago) 5m45s
linkerd-identity-75b9ddf5f9-vmw5s 2/2 Running 0 5m45s
linkerd-proxy-injector-5d69d7c8c6-b6nrr 0/2 CrashLoopBackOff 2 (32s ago) 5m44s
`kubectl logs -n linkerd linkerd-destination-f99c6cf45-ghcw2 Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003663s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.004313s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.004324s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.004327s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.004330s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.004334s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.004337s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.004340s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.004797s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.110403s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.329098s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.744807s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.246795s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.748823s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.250794s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.752781s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.254786s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.756668s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.257726s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.759729s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.261736s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.763715s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)0 -
Hi @rafjaro,
When setting up your EC2 instances, VPC, and SG, did you follow the recommendations from the video guide from the introductory chapter?
I notice the control plane node hostname appears to be "k8scp". It was not intended to be a hostname, but an alias instead, to be re-assigned to another host in a later lab.
Are there other deviations from the installation instructions of the lab guide, which may have possibly impacted the outcome of the linkerd installation?
I tried reproducing the issues reported on both occasions above, but the linkerd installation worked both times on EC2 instances.
Regards,
-Chris0 -
I had the same issue where i could not complete the "linkerd check" command. I found that the issue was related to coredns misconfiguration that i did the prev. assignment. I fixed the dns misconfiguration and can now run "linkerd check" command successfully.
0 -
Do you have more details on how you fixed the coredns misconfiguration? I am guessing I'm seeing the same issue because
kubectl get pods -A -o wideshows me
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES .. kube-system coredns-7c65d6cfc9-lkh2c 0/1 CrashLoopBackOff 3976 (3m37s ago) 11d 10.0.0.10 cp <none> <none> kube-system coredns-7c65d6cfc9-tdj2x 0/1 CrashLoopBackOff 3975 (2m27s ago) 11d 10.0.0.209 cp <none> <none>
I'm getting the same 'pod/linkerd-destination-5c8475f9c5-ncgjt container sp-validator is not ready' when running "linkerd check"
0 -
I reapplied the backup of the configmap I took in lab 10.4 to coredns. Confirmed coredns was running and then uninstalled/reinstalled my linkerd build and that seems to have done the trick.
0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 3 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 1 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 983 Software
- 375 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)
