Lab 11.1 pod/linkerd-destination-7b996475cf-tx69s container sp-validator is not ready

I'm on an AWS setup and also tried the lab with the older version of 2.11 but didn't get anywhere. I always get stuck at the setup part where two pods get stuck:
k8s-cp:~$ k get pod NAME READY STATUS RESTARTS AGE linkerd-destination-7b996475cf-tx69s 0/4 CrashLoopBackOff 5 (111s ago) 9m42s linkerd-heartbeat-27766481-f9slk 1/1 Running 0 24s linkerd-identity-6f49866669-pm44p 2/2 Running 0 9m42s linkerd-proxy-injector-66894488cc-prrqk 0/2 CrashLoopBackOff 4 (106s ago) 9m42s
[email protected]:~$ k describe pod linkerd-destination-7b996475cf-tx69s ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned linkerd/linkerd-destination-7b996475cf-tx69s to k8s-worker Normal Pulled 10m kubelet Container image "cr.l5d.io/linkerd/proxy-init:v2.0.0" already present on machine Normal Created 10m kubelet Created container linkerd-init Normal Started 10m kubelet Started container linkerd-init Normal Killing 8m31s kubelet FailedPostStartHook Warning FailedPostStartHook 8m31s kubelet Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-7b996475cf-tx69s_linkerd(c8809acb-32a2-43ab-9f31-9d6486d290aa)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout , message: "linkerd-proxy failed to become ready within 120s timeout\n" Normal Created 8m1s kubelet Created container destination Normal Pulled 8m1s kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine Normal Started 8m kubelet Started container sp-validator Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/policy-controller:stable-2.12.1" already present on machine Normal Created 8m kubelet Created container policy Normal Started 8m kubelet Started container policy Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine Normal Created 8m kubelet Created container sp-validator Normal Started 8m kubelet Started container destination Normal Started 7m59s (x2 over 10m) kubelet Started container linkerd-proxy Normal Created 7m59s (x2 over 10m) kubelet Created container linkerd-proxy Normal Pulled 7m59s (x2 over 10m) kubelet Container image "cr.l5d.io/linkerd/proxy:stable-2.12.1" already present on machine Warning Unhealthy 7m59s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": dial tcp 192.168.254.189:9996: connect: connection refused Warning Unhealthy 7m54s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 7m54s kubelet Liveness probe failed: Get "http://192.168.254.189:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 5m29s (x2 over 7m59s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": dial tcp 192.168.254.189:9997: connect: connection refused Warning Unhealthy 24s (x41 over 7m54s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[email protected]:~$ k logs linkerd-destination-7b996475cf-tx69s Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003239s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.003925s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.003933s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.003936s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.003939s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.003941s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.003943s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.003946s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 15.005765s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 30.007001s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 45.008002s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 45.014336s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out] [ 60.009044s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 75.009629s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 90.011173s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 90.130492s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out] [ 105.011519s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 120.013446s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 120.106268s] INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown [ 120.116819s] INFO ThreadId(01) linkerd2_proxy: Received shutdown signal [ 135.015350s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized... [ 135.353951s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out] [ 150.016557s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
The errors in the proyx-injector pods are the same. Since everything else is working I'm having a hard time trying to figure what could be wrong. CoreDNS must be working since the other labs using service names were also working. Can anyone give me a hint where to look?
Comments
-
I just stood up an environment on AWS to try to reproduce the issue reported, but it all seems to work just fine.
Would you be able to provide the outputs of:kubectl get nodes -o wide
kubectl get pods -A -o wide
While setting up the AWS EC2 instances, VPC and SG, did you happen to follow the video guide from the introductory chapter?
Regards,
-Chris0 -
Hi @chrispokorni, I did follow the guide and just double checked. All the other labs worked without issues so I assume the setup itself is fine.
So I tried this again with an older version as another thread suggested:
export LINKERD2_VERSION=stable-2.11.4
followed by the commands in the Lab withoutlinkerd install --crds | kubectl apply -f -
as this seems to be new in 2.12.
Result islinkerd check
failing again:Linkerd core checks =================== kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API kubernetes-version ------------------ √ is running the minimum Kubernetes API version √ is running the minimum kubectl version linkerd-existence ----------------- √ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods × control plane pods are ready pod/linkerd-destination-5c56887457-sghqx container sp-validator is not ready see https://linkerd.io/2.11/checks/#l5d-api-control-ready for hints Status check results are ×
[email protected]:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-cp Ready control-plane 25d v1.24.1 172.31.40.134 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9 k8s-worker Ready <none> 25d v1.24.1 172.31.46.207 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9 [email protected]:~$ kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-66966888c4-qjmsd 1/1 Running 9 (34m ago) 22d 192.168.62.182 k8s-cp <none> <none> kube-system calico-node-6sjbd 1/1 Running 11 (34m ago) 25d 172.31.40.134 k8s-cp <none> <none> kube-system calico-node-k9pl4 1/1 Running 10 (34m ago) 25d 172.31.46.207 k8s-worker <none> <none> kube-system coredns-64897985d-d49sk 1/1 Running 9 (34m ago) 22d 192.168.62.180 k8s-cp <none> <none> kube-system coredns-64897985d-n4qb8 1/1 Running 9 (34m ago) 22d 192.168.62.181 k8s-cp <none> <none> kube-system etcd-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-apiserver-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-controller-manager-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-proxy-4jsnx 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> kube-system kube-proxy-v4ktk 1/1 Running 8 (34m ago) 22d 172.31.46.207 k8s-worker <none> <none> kube-system kube-scheduler-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none> linkerd linkerd-destination-5c56887457-sghqx 0/4 CrashLoopBackOff 19 (10s ago) 14m 192.168.254.137 k8s-worker <none> <none> linkerd linkerd-heartbeat-27772259-tnw5d 1/1 Running 0 5m3s 192.168.254.139 k8s-worker <none> <none> linkerd linkerd-identity-5f4dbf785d-c227v 2/2 Running 0 14m 192.168.254.136 k8s-worker <none> <none> linkerd linkerd-proxy-injector-5f7877455f-x5lkw 0/2 CrashLoopBackOff 6 (3m32s ago) 14m 192.168.254.140 k8s-worker <none> <none>
The restarts of the other pods are there as I pause the machines when I don't need them to save money.
Any ideas
Regards,
Michael0 -
I have very similar problem also in AWS environment.
`
[email protected]:~$ linkerd checkLinkerd core checks
kubernetes-api
√ can initialize the client
√ can query the Kubernetes APIkubernetes-version
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl versionlinkerd-existence
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
× control plane pods are ready
pod/linkerd-destination-f99c6cf45-ghcw2 container sp-validator is not ready
see https://linkerd.io/2.12/checks/#l5d-api-control-ready for hintsStatus check results are ×
[email protected]:~$ kubectl -n linkerd get po
NAME READY STATUS RESTARTS AGE
linkerd-destination-f99c6cf45-ghcw2 0/4 CrashLoopBackOff 4 (29s ago) 5m45s
linkerd-identity-75b9ddf5f9-vmw5s 2/2 Running 0 5m45s
linkerd-proxy-injector-5d69d7c8c6-b6nrr 0/2 CrashLoopBackOff 2 (32s ago) 5m44s
`kubectl logs -n linkerd linkerd-destination-f99c6cf45-ghcw2 Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003663s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.004313s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.004324s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.004327s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.004330s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.004334s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.004337s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.004340s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.004797s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.110403s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.329098s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.744807s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.246795s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.748823s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.250794s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.752781s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.254786s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.756668s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.257726s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.759729s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.261736s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.763715s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
0 -
Hi @rafjaro,
When setting up your EC2 instances, VPC, and SG, did you follow the recommendations from the video guide from the introductory chapter?
I notice the control plane node hostname appears to be "k8scp". It was not intended to be a hostname, but an alias instead, to be re-assigned to another host in a later lab.
Are there other deviations from the installation instructions of the lab guide, which may have possibly impacted the outcome of the linkerd installation?
I tried reproducing the issues reported on both occasions above, but the linkerd installation worked both times on EC2 instances.
Regards,
-Chris0
Categories
- 10.1K All Categories
- 35 LFX Mentorship
- 88 LFX Mentorship: Linux Kernel
- 504 Linux Foundation Boot Camps
- 279 Cloud Engineer Boot Camp
- 103 Advanced Cloud Engineer Boot Camp
- 48 DevOps Engineer Boot Camp
- 41 Cloud Native Developer Boot Camp
- 2 Express Training Courses
- 2 Express Courses - Discussion Forum
- 1.7K Training Courses
- 17 LFC110 Class Forum
- 5 LFC131 Class Forum
- 19 LFD102 Class Forum
- 148 LFD103 Class Forum
- 13 LFD121 Class Forum
- 61 LFD201 Class Forum
- LFD210 Class Forum
- 1 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum
- 23 LFD254 Class Forum
- 569 LFD259 Class Forum
- 100 LFD272 Class Forum
- 1 LFD272-JP クラス フォーラム
- 1 LFS145 Class Forum
- 23 LFS200 Class Forum
- 739 LFS201 Class Forum
- 1 LFS201-JP クラス フォーラム
- 1 LFS203 Class Forum
- 45 LFS207 Class Forum
- 298 LFS211 Class Forum
- 53 LFS216 Class Forum
- 46 LFS241 Class Forum
- 41 LFS242 Class Forum
- 37 LFS243 Class Forum
- 10 LFS244 Class Forum
- 27 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- 131 LFS253 Class Forum
- 996 LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 87 LFS260 Class Forum
- 126 LFS261 Class Forum
- 31 LFS262 Class Forum
- 79 LFS263 Class Forum
- 15 LFS264 Class Forum
- 10 LFS266 Class Forum
- 17 LFS267 Class Forum
- 17 LFS268 Class Forum
- 21 LFS269 Class Forum
- 200 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- 212 LFW211 Class Forum
- 153 LFW212 Class Forum
- 899 Hardware
- 217 Drivers
- 74 I/O Devices
- 44 Monitors
- 115 Multimedia
- 208 Networking
- 101 Printers & Scanners
- 85 Storage
- 749 Linux Distributions
- 88 Debian
- 64 Fedora
- 14 Linux Mint
- 13 Mageia
- 24 openSUSE
- 133 Red Hat Enterprise
- 33 Slackware
- 13 SUSE Enterprise
- 355 Ubuntu
- 473 Linux System Administration
- 38 Cloud Computing
- 69 Command Line/Scripting
- Github systems admin projects
- 94 Linux Security
- 77 Network Management
- 108 System Management
- 49 Web Management
- 63 Mobile Computing
- 22 Android
- 27 Development
- 1.2K New to Linux
- 1.1K Getting Started with Linux
- 528 Off Topic
- 127 Introductions
- 213 Small Talk
- 20 Study Material
- 794 Programming and Development
- 262 Kernel Development
- 498 Software Development
- 923 Software
- 258 Applications
- 182 Command Line
- 2 Compiling/Installing
- 76 Games
- 316 Installation
- 53 All In Program
- 53 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)