Welcome to the Linux Foundation Forum!

Lab 11.1 pod/linkerd-destination-7b996475cf-tx69s container sp-validator is not ready

I'm on an AWS setup and also tried the lab with the older version of 2.11 but didn't get anywhere. I always get stuck at the setup part where two pods get stuck:

k8s-cp:~$ k get pod
NAME                                      READY   STATUS             RESTARTS       AGE
linkerd-destination-7b996475cf-tx69s      0/4     CrashLoopBackOff   5 (111s ago)   9m42s
linkerd-heartbeat-27766481-f9slk          1/1     Running            0              24s
linkerd-identity-6f49866669-pm44p         2/2     Running            0              9m42s
linkerd-proxy-injector-66894488cc-prrqk   0/2     CrashLoopBackOff   4 (106s ago)   9m42s
ubuntu@k8s-cp:~$ k describe pod linkerd-destination-7b996475cf-tx69s
...
Events:
  Type     Reason               Age    From               Message
  ----     ------               ----   ----               -------
  Normal   Scheduled            10m    default-scheduler  Successfully assigned linkerd/linkerd-destination-7b996475cf-tx69s to k8s-worker
  Normal   Pulled               10m    kubelet            Container image "cr.l5d.io/linkerd/proxy-init:v2.0.0" already present on machine
  Normal   Created              10m    kubelet            Created container linkerd-init
  Normal   Started              10m    kubelet            Started container linkerd-init
  Normal   Killing              8m31s  kubelet            FailedPostStartHook
  Warning  FailedPostStartHook  8m31s  kubelet            Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-7b996475cf-tx69s_linkerd(c8809acb-32a2-43ab-9f31-9d6486d290aa)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout
, message: "linkerd-proxy failed to become ready within 120s timeout\n"
  Normal   Created    8m1s                   kubelet  Created container destination
  Normal   Pulled     8m1s                   kubelet  Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine
  Normal   Started    8m                     kubelet  Started container sp-validator
  Normal   Pulled     8m                     kubelet  Container image "cr.l5d.io/linkerd/policy-controller:stable-2.12.1" already present on machine
  Normal   Created    8m                     kubelet  Created container policy
  Normal   Started    8m                     kubelet  Started container policy
  Normal   Pulled     8m                     kubelet  Container image "cr.l5d.io/linkerd/controller:stable-2.12.1" already present on machine
  Normal   Created    8m                     kubelet  Created container sp-validator
  Normal   Started    8m                     kubelet  Started container destination
  Normal   Started    7m59s (x2 over 10m)    kubelet  Started container linkerd-proxy
  Normal   Created    7m59s (x2 over 10m)    kubelet  Created container linkerd-proxy
  Normal   Pulled     7m59s (x2 over 10m)    kubelet  Container image "cr.l5d.io/linkerd/proxy:stable-2.12.1" already present on machine
  Warning  Unhealthy  7m59s                  kubelet  Readiness probe failed: Get "http://192.168.254.189:9996/ready": dial tcp 192.168.254.189:9996: connect: connection refused
  Warning  Unhealthy  7m54s                  kubelet  Readiness probe failed: Get "http://192.168.254.189:9996/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  7m54s                  kubelet  Liveness probe failed: Get "http://192.168.254.189:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  5m29s (x2 over 7m59s)  kubelet  Readiness probe failed: Get "http://192.168.254.189:9997/ready": dial tcp 192.168.254.189:9997: connect: connection refused
  Warning  Unhealthy  24s (x41 over 7m54s)   kubelet  Readiness probe failed: Get "http://192.168.254.189:9997/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
ubuntu@k8s-cp:~$ k logs linkerd-destination-7b996475cf-tx69s
Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init)
time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
time="2022-10-17T06:39:54Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der"
[     0.003239s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.003925s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.003933s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.003936s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.003939s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.003941s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.003943s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.003946s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[    15.005765s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[    30.007001s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[    45.008002s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[    45.014336s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
[    60.009044s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[    75.009629s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[    90.011173s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[    90.130492s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
[   105.011519s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[   120.013446s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[   120.106268s]  INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown
[   120.116819s]  INFO ThreadId(01) linkerd2_proxy: Received shutdown signal
[   135.015350s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...
[   135.353951s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: request timed out; failed to resolve A record: request timed out error.sources=[failed to resolve A record: request timed out, request timed out]
[   150.016557s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...

The errors in the proyx-injector pods are the same. Since everything else is working I'm having a hard time trying to figure what could be wrong. CoreDNS must be working since the other labs using service names were also working. Can anyone give me a hint where to look?

Comments

  • chrispokorni
    chrispokorni Posts: 2,383

    Hi @michael.weinrich,

    I just stood up an environment on AWS to try to reproduce the issue reported, but it all seems to work just fine.
    Would you be able to provide the outputs of:

    kubectl get nodes -o wide

    kubectl get pods -A -o wide

    While setting up the AWS EC2 instances, VPC and SG, did you happen to follow the video guide from the introductory chapter?

    Regards,
    -Chris

  • michael.weinrich
    michael.weinrich Posts: 6
    edited October 2022

    Hi @chrispokorni, I did follow the guide and just double checked. All the other labs worked without issues so I assume the setup itself is fine.

    So I tried this again with an older version as another thread suggested: export LINKERD2_VERSION=stable-2.11.4 followed by the commands in the Lab without linkerd install --crds | kubectl apply -f - as this seems to be new in 2.12.
    Result is linkerd check failing again:

    Linkerd core checks
    ===================
    
    kubernetes-api
    --------------
    √ can initialize the client
    √ can query the Kubernetes API
    
    kubernetes-version
    ------------------
    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version
    
    linkerd-existence
    -----------------
    √ 'linkerd-config' config map exists
    √ heartbeat ServiceAccount exist
    √ control plane replica sets are ready
    √ no unschedulable pods
    × control plane pods are ready
        pod/linkerd-destination-5c56887457-sghqx container sp-validator is not ready
        see https://linkerd.io/2.11/checks/#l5d-api-control-ready for hints
    
    Status check results are ×
    
    ubuntu@k8s-cp:~$ kubectl get nodes -o wide
    NAME         STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
    k8s-cp       Ready    control-plane   25d   v1.24.1   172.31.40.134   <none>        Ubuntu 20.04.5 LTS   5.15.0-1020-aws   containerd://1.5.9
    k8s-worker   Ready    <none>          25d   v1.24.1   172.31.46.207   <none>        Ubuntu 20.04.5 LTS   5.15.0-1020-aws   containerd://1.5.9
    
    ubuntu@k8s-cp:~$ kubectl get pods -A -o wide
    NAMESPACE     NAME                                       READY   STATUS             RESTARTS        AGE    IP                NODE         NOMINATED NODE   READINESS GATES
    kube-system   calico-kube-controllers-66966888c4-qjmsd   1/1     Running            9 (34m ago)     22d    192.168.62.182    k8s-cp       <none>           <none>
    kube-system   calico-node-6sjbd                          1/1     Running            11 (34m ago)    25d    172.31.40.134     k8s-cp       <none>           <none>
    kube-system   calico-node-k9pl4                          1/1     Running            10 (34m ago)    25d    172.31.46.207     k8s-worker   <none>           <none>
    kube-system   coredns-64897985d-d49sk                    1/1     Running            9 (34m ago)     22d    192.168.62.180    k8s-cp       <none>           <none>
    kube-system   coredns-64897985d-n4qb8                    1/1     Running            9 (34m ago)     22d    192.168.62.181    k8s-cp       <none>           <none>
    kube-system   etcd-k8s-cp                                1/1     Running            10 (34m ago)    22d    172.31.40.134     k8s-cp       <none>           <none>
    kube-system   kube-apiserver-k8s-cp                      1/1     Running            10 (34m ago)    22d    172.31.40.134     k8s-cp       <none>           <none>
    kube-system   kube-controller-manager-k8s-cp             1/1     Running            9 (34m ago)     22d    172.31.40.134     k8s-cp       <none>           <none>
    kube-system   kube-proxy-4jsnx                           1/1     Running            9 (34m ago)     22d    172.31.40.134     k8s-cp       <none>           <none>
    kube-system   kube-proxy-v4ktk                           1/1     Running            8 (34m ago)     22d    172.31.46.207     k8s-worker   <none>           <none>
    kube-system   kube-scheduler-k8s-cp                      1/1     Running            9 (34m ago)     22d    172.31.40.134     k8s-cp       <none>           <none>
    linkerd       linkerd-destination-5c56887457-sghqx       0/4     CrashLoopBackOff   19 (10s ago)    14m    192.168.254.137   k8s-worker   <none>           <none>
    linkerd       linkerd-heartbeat-27772259-tnw5d           1/1     Running            0               5m3s   192.168.254.139   k8s-worker   <none>           <none>
    linkerd       linkerd-identity-5f4dbf785d-c227v          2/2     Running            0               14m    192.168.254.136   k8s-worker   <none>           <none>
    linkerd       linkerd-proxy-injector-5f7877455f-x5lkw    0/2     CrashLoopBackOff   6 (3m32s ago)   14m    192.168.254.140   k8s-worker   <none>           <none>
    

    The restarts of the other pods are there as I pause the machines when I don't need them to save money.

    Any ideas

    Regards,
    Michael

  • I have very similar problem also in AWS environment.
    `
    ubuntu@k8scp:~$ linkerd check

    Linkerd core checks

    kubernetes-api

    √ can initialize the client
    √ can query the Kubernetes API

    kubernetes-version

    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version

    linkerd-existence

    √ 'linkerd-config' config map exists
    √ heartbeat ServiceAccount exist
    √ control plane replica sets are ready
    √ no unschedulable pods
    × control plane pods are ready
    pod/linkerd-destination-f99c6cf45-ghcw2 container sp-validator is not ready
    see https://linkerd.io/2.12/checks/#l5d-api-control-ready for hints

    Status check results are ×
    ubuntu@k8scp:~$ kubectl -n linkerd get po
    NAME READY STATUS RESTARTS AGE
    linkerd-destination-f99c6cf45-ghcw2 0/4 CrashLoopBackOff 4 (29s ago) 5m45s
    linkerd-identity-75b9ddf5f9-vmw5s 2/2 Running 0 5m45s
    linkerd-proxy-injector-5d69d7c8c6-b6nrr 0/2 CrashLoopBackOff 2 (32s ago) 5m44s
    `

    kubectl logs -n linkerd linkerd-destination-f99c6cf45-ghcw2 Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8" time="2023-01-15T09:40:27Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der" [ 0.003663s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.004313s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.004324s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.004327s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.004330s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.004334s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.004337s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.004340s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.004797s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.110403s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.329098s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.744807s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.246795s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 1.748823s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.250794s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 2.752781s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.254786s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 3.756668s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.257726s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 4.759729s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.261736s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 5.763715s] WARN ThreadId(01) policy:watch{port=8443}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

  • chrispokorni
    chrispokorni Posts: 2,383

    Hi @rafjaro,

    When setting up your EC2 instances, VPC, and SG, did you follow the recommendations from the video guide from the introductory chapter?

    I notice the control plane node hostname appears to be "k8scp". It was not intended to be a hostname, but an alias instead, to be re-assigned to another host in a later lab.

    Are there other deviations from the installation instructions of the lab guide, which may have possibly impacted the outcome of the linkerd installation?

    I tried reproducing the issues reported on both occasions above, but the linkerd installation worked both times on EC2 instances.

    Regards,
    -Chris

  • I had the same issue where i could not complete the "linkerd check" command. I found that the issue was related to coredns misconfiguration that i did the prev. assignment. I fixed the dns misconfiguration and can now run "linkerd check" command successfully.

  • jsauerhaft
    jsauerhaft Posts: 3
    edited December 2024

    Do you have more details on how you fixed the coredns misconfiguration? I am guessing I'm seeing the same issue because

    kubectl get pods -A -o wide

    shows me

    NAMESPACE     NAME                                               READY   STATUS             RESTARTS           AGE     IP           NODE     NOMINATED NODE   READINESS GATES
    ..
    kube-system   coredns-7c65d6cfc9-lkh2c                           0/1     CrashLoopBackOff   3976 (3m37s ago)   11d     10.0.0.10    cp       <none>           <none>
    kube-system   coredns-7c65d6cfc9-tdj2x                           0/1     CrashLoopBackOff   3975 (2m27s ago)   11d     10.0.0.209   cp       <none>           <none>
    

    I'm getting the same 'pod/linkerd-destination-5c8475f9c5-ncgjt container sp-validator is not ready' when running "linkerd check"

  • I reapplied the backup of the configmap I took in lab 10.4 to coredns. Confirmed coredns was running and then uninstalled/reinstalled my linkerd build and that seems to have done the trick.

Categories

Upcoming Training