Welcome to the Linux Foundation Forum!

Lab 11.1 pod/linkerd-destination-7b996475cf-tx69s container sp-validator is not ready

I'm on an AWS setup and also tried the lab with the older version of 2.11 but didn't get anywhere. I always get stuck at the setup part where two pods get stuck:

  1. k8s-cp:~$ k get pod
  2. NAME READY STATUS RESTARTS AGE
  3. linkerd-destination-7b996475cf-tx69s 0/4 CrashLoopBackOff 5 (111s ago) 9m42s
  4. linkerd-heartbeat-27766481-f9slk 1/1 Running 0 24s
  5. linkerd-identity-6f49866669-pm44p 2/2 Running 0 9m42s
  6. linkerd-proxy-injector-66894488cc-prrqk 0/2 CrashLoopBackOff 4 (106s ago) 9m42s
  1. ubuntu@k8s-cp:~$ k describe pod linkerd-destination-7b996475cf-tx69s
  2. ...
  3. Events:
  4. Type Reason Age From Message
  5. ---- ------ ---- ---- -------
  6. Normal Scheduled 10m default-scheduler Successfully assigned linkerd/linkerd-destination-7b996475cf-tx69s to k8s-worker
  7. Normal Pulled 10m kubelet Container image "cr.l5d.io/linkerd/proxy-init:v2.0.0" already present on machine
  8. Normal Created 10m kubelet Created container linkerd-init
  9. Normal Started 10m kubelet Started container linkerd-init
  10. Normal Killing 8m31s kubelet FailedPostStartHook
  11. Warning FailedPostStartHook 8m31s kubelet Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-7b996475cf-tx69s_linkerd(c8809acb-32a2-43ab-9f31-9d6486d290aa)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout
  12. , message: "linkerd-proxy failed to become ready within 120s timeout\n"
  13. Normal Created 8m1s kubelet Created container destination
  14. Normal Pulled 8m1s kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine
  15. Normal Started 8m kubelet Started container sp-validator
  16. Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/policy-controller:stable-2.12.1" already present on machine
  17. Normal Created 8m kubelet Created container policy
  18. Normal Started 8m kubelet Started container policy
  19. Normal Pulled 8m kubelet Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine
  20. Normal Created 8m kubelet Created container sp-validator
  21. Normal Started 8m kubelet Started container destination
  22. Normal Started 7m59s (x2 over 10m) kubelet Started container linkerd-proxy
  23. Normal Created 7m59s (x2 over 10m) kubelet Created container linkerd-proxy
  24. Normal Pulled 7m59s (x2 over 10m) kubelet Container image "cr.l5d.io/linkerd/proxy:stable-2.12.1" already present on machine
  25. Warning Unhealthy 7m59s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": dial tcp 192.168.254.189:9996: connect: connection refused
  26. Warning Unhealthy 7m54s kubelet Readiness probe failed: Get "http://192.168.254.189:9996/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  27. Warning Unhealthy 7m54s kubelet Liveness probe failed: Get "http://192.168.254.189:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  28. Warning Unhealthy 5m29s (x2 over 7m59s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": dial tcp 192.168.254.189:9997: connect: connection refused
  29. Warning Unhealthy 24s (x41 over 7m54s) kubelet Readiness probe failed: Get "http://192.168.254.189:9997/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  1. ubuntu@k8s-cp:~$ k logs linkerd-destination-7b996475cf-tx69s
  2. Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init)
  3. time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
  4. time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der"
  5. [ 0.003239s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
  6. [ 0.003925s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
  7. [ 0.003933s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
  8. [ 0.003936s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
  9. [ 0.003939s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
  10. [ 0.003941s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
  11. [ 0.003943s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
  12. [ 0.003946s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
  13. [ 15.005765s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  14. [ 30.007001s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  15. [ 45.008002s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  16. [ 45.014336s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
  17. [ 60.009044s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  18. [ 75.009629s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  19. [ 90.011173s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  20. [ 90.130492s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
  21. [ 105.011519s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  22. [ 120.013446s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  23. [ 120.106268s] INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown
  24. [ 120.116819s] INFO ThreadId(01) linkerd2_proxy: Received shutdown signal
  25. [ 135.015350s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
  26. [ 135.353951s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
  27. [ 150.016557s] WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...

The errors in the proyx-injector pods are the same. Since everything else is working I'm having a hard time trying to figure what could be wrong. CoreDNS must be working since the other labs using service names were also working. Can anyone give me a hint where to look?

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,443

    Hi @michael.weinrich,

    I just stood up an environment on AWS to try to reproduce the issue reported, but it all seems to work just fine.
    Would you be able to provide the outputs of:

    kubectl get nodes -o wide

    kubectl get pods -A -o wide

    While setting up the AWS EC2 instances, VPC and SG, did you happen to follow the video guide from the introductory chapter?

    Regards,
    -Chris

  • Posts: 6
    edited October 2022

    Hi @chrispokorni, I did follow the guide and just double checked. All the other labs worked without issues so I assume the setup itself is fine.

    So I tried this again with an older version as another thread suggested: export LINKERD2_VERSION=stable-2.11.4 followed by the commands in the Lab without linkerd install --crds | kubectl apply -f - as this seems to be new in 2.12.
    Result is linkerd check failing again:

    1. Linkerd core checks
    2. ===================
    3.  
    4. kubernetes-api
    5. --------------
    6. can initialize the client
    7. can query the Kubernetes API
    8.  
    9. kubernetes-version
    10. ------------------
    11. is running the minimum Kubernetes API version
    12. is running the minimum kubectl version
    13.  
    14. linkerd-existence
    15. -----------------
    16. 'linkerd-config' config map exists
    17. heartbeat ServiceAccount exist
    18. control plane replica sets are ready
    19. no unschedulable pods
    20. × control plane pods are ready
    21. pod/linkerd-destination-5c56887457-sghqx container sp-validator is not ready
    22. see https://linkerd.io/2.11/checks/#l5d-api-control-ready for hints
    23.  
    24. Status check results are ×
    1. ubuntu@k8s-cp:~$ kubectl get nodes -o wide
    2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    3. k8s-cp Ready control-plane 25d v1.24.1 172.31.40.134 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9
    4. k8s-worker Ready <none> 25d v1.24.1 172.31.46.207 <none> Ubuntu 20.04.5 LTS 5.15.0-1020-aws containerd://1.5.9
    5.  
    6. ubuntu@k8s-cp:~$ kubectl get pods -A -o wide
    7. NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    8. kube-system calico-kube-controllers-66966888c4-qjmsd 1/1 Running 9 (34m ago) 22d 192.168.62.182 k8s-cp <none> <none>
    9. kube-system calico-node-6sjbd 1/1 Running 11 (34m ago) 25d 172.31.40.134 k8s-cp <none> <none>
    10. kube-system calico-node-k9pl4 1/1 Running 10 (34m ago) 25d 172.31.46.207 k8s-worker <none> <none>
    11. kube-system coredns-64897985d-d49sk 1/1 Running 9 (34m ago) 22d 192.168.62.180 k8s-cp <none> <none>
    12. kube-system coredns-64897985d-n4qb8 1/1 Running 9 (34m ago) 22d 192.168.62.181 k8s-cp <none> <none>
    13. kube-system etcd-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none>
    14. kube-system kube-apiserver-k8s-cp 1/1 Running 10 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none>
    15. kube-system kube-controller-manager-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none>
    16. kube-system kube-proxy-4jsnx 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none>
    17. kube-system kube-proxy-v4ktk 1/1 Running 8 (34m ago) 22d 172.31.46.207 k8s-worker <none> <none>
    18. kube-system kube-scheduler-k8s-cp 1/1 Running 9 (34m ago) 22d 172.31.40.134 k8s-cp <none> <none>
    19. linkerd linkerd-destination-5c56887457-sghqx 0/4 CrashLoopBackOff 19 (10s ago) 14m 192.168.254.137 k8s-worker <none> <none>
    20. linkerd linkerd-heartbeat-27772259-tnw5d 1/1 Running 0 5m3s 192.168.254.139 k8s-worker <none> <none>
    21. linkerd linkerd-identity-5f4dbf785d-c227v 2/2 Running 0 14m 192.168.254.136 k8s-worker <none> <none>
    22. linkerd linkerd-proxy-injector-5f7877455f-x5lkw 0/2 CrashLoopBackOff 6 (3m32s ago) 14m 192.168.254.140 k8s-worker <none> <none>

    The restarts of the other pods are there as I pause the machines when I don't need them to save money.

    Any ideas

    Regards,
    Michael

  • I have very similar problem also in AWS environment.
    `
    ubuntu@k8scp:~$ linkerd check

    Linkerd core checks

    kubernetes-api

    √ can initialize the client
    √ can query the Kubernetes API

    kubernetes-version

    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version

    linkerd-existence

    √ 'linkerd-config' config map exists
    √ heartbeat ServiceAccount exist
    √ control plane replica sets are ready
    √ no unschedulable pods
    × control plane pods are ready
    pod/linkerd-destination-f99c6cf45-ghcw2 container sp-validator is not ready
    see https://linkerd.io/2.12/checks/#l5d-api-control-ready for hints

    Status check results are ×
    ubuntu@k8scp:~$ kubectl -n linkerd get po
    NAME READY STATUS RESTARTS AGE
    linkerd-destination-f99c6cf45-ghcw2 0/4 CrashLoopBackOff 4 (29s ago) 5m45s
    linkerd-identity-75b9ddf5f9-vmw5s 2/2 Running 0 5m45s
    linkerd-proxy-injector-5d69d7c8c6-b6nrr 0/2 CrashLoopBackOff 2 (32s ago) 5m44s
    `

    kubectl logs -n linkerd linkerd-destination-f99c6cf45-ghcw2 Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003663s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.004313s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.004324s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.004327s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.004330s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.004334s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.004337s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.004340s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.004797s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.110403s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.329098s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.744807s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.246795s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.748823s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.250794s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.752781s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.254786s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.756668s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.257726s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.759729s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.261736s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.763715s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

  • Posts: 2,443

    Hi @rafjaro,

    When setting up your EC2 instances, VPC, and SG, did you follow the recommendations from the video guide from the introductory chapter?

    I notice the control plane node hostname appears to be "k8scp". It was not intended to be a hostname, but an alias instead, to be re-assigned to another host in a later lab.

    Are there other deviations from the installation instructions of the lab guide, which may have possibly impacted the outcome of the linkerd installation?

    I tried reproducing the issues reported on both occasions above, but the linkerd installation worked both times on EC2 instances.

    Regards,
    -Chris

  • I had the same issue where i could not complete the "linkerd check" command. I found that the issue was related to coredns misconfiguration that i did the prev. assignment. I fixed the dns misconfiguration and can now run "linkerd check" command successfully.

  • Posts: 3
    edited December 2024

    Do you have more details on how you fixed the coredns misconfiguration? I am guessing I'm seeing the same issue because

    kubectl get pods -A -o wide

    shows me

    1. NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    2. ..
    3. kube-system coredns-7c65d6cfc9-lkh2c 0/1 CrashLoopBackOff 3976 (3m37s ago) 11d 10.0.0.10 cp <none> <none>
    4. kube-system coredns-7c65d6cfc9-tdj2x 0/1 CrashLoopBackOff 3975 (2m27s ago) 11d 10.0.0.209 cp <none> <none>

    I'm getting the same 'pod/linkerd-destination-5c8475f9c5-ncgjt container sp-validator is not ready' when running "linkerd check"

  • I reapplied the backup of the configmap I took in lab 10.4 to coredns. Confirmed coredns was running and then uninstalled/reinstalled my linkerd build and that seems to have done the trick.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training