Welcome to the Linux Foundation Forum!

Isuue with linkerd

Options
aditya03
aditya03 Posts: 15
edited February 13 in LFS258 Class Forum

I am deploying a service mesh linkerd, but I got and stuck at "linkerd check".
How to solve this?

ubuntu@ip:~$ linkerd check

kubernetes-api

√ can initialize the client
√ can query the Kubernetes API

kubernetes-version

√ is running the minimum Kubernetes API version

linkerd-existence

√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
× control plane pods are ready
pod/linkerd-destination-b98b5c974-ppvk5 container sp-validator is not ready
see https://linkerd.io/2.14/checks/#l5d-api-control-ready for hints

Status check results are ×

Comments

  • aditya03
    aditya03 Posts: 15
    Options
  • chrispokorni
    chrispokorni Posts: 2,178
    edited February 13
    Options

    Hi @aditya03,

    This discussion was moved to the LFS258 Forum.

    When opening a new discussion topic please do so in its dedicated forum, otherwise the instructors who may be able to provide assistance will not be notified accordingly - as is the case of this discussion.

    Linkerd installation and check may fail for many reasons. It could be related to infrastructure networking, or cluster networking misconfiguration, resources, or compatibility.

    What is your cluster infrastructure - cloud or local hypervisor, what type of VMs are hosting your lab environment - size and OS, how is your network configured at infrastructure and how are firewalls configured, what CNI plugin do you have installed in your cluster, what is the Kubernetes version of all components (it should be the same, unlike earlier when they were mismatched).

    What output is produced by the following commands?

    kubectl get nodes -o wide

    kubectl get pods -A -o wide

    Regards,
    -Chris

  • aditya03
    aditya03 Posts: 15
    edited March 4
    Options

    Hi @chrispokorni ,

    Thanks point noted.

    Here is the list of my kubernetes configuration

    Currently I have 1 master- 1 worker cluster hosted on AWS EC2, instance size t2.large .
    OS - Ubuntu 20.04.6 LTS
    Containerd for CNI

    Versions -
    Master
    Kubernetes v1.28.1
    kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.1", GitCommit:"8dc49c4b984b897d423aab4971090e1879eb4f23", GitTreeState:"clean", BuildDate:"2023-08-24T11:21:51Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

    Worker
    Kubernetes v1.28.1
    kubectl version
    Client Version: v1.28.1

    kubeadm version: &version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:20:04Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}

    I have noticed that my kubeadm version doesn't match.
    So I was upgrading kubeadm on worker node but got the same error (which I have posted in another discussion) when I tried updating the repositories on worker.

    Get:14 https://download.docker.com/linux/ubuntu focal/stable amd64 Packages [38.0 kB]
    Err:15 https://packages.cloud.google.com/apt kubernetes-xenial Release
    404 Not Found [IP: 142.251.42.78 443]
    Get:16 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2752 kB]
    Get:17 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [418 kB]
    Get:18 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [2606 kB]
    Get:19 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [363 kB]
    Get:20 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [944 kB]
    Get:21 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [198 kB]
    Get:22 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [23.9 kB]
    Reading package lists... Done
    E: The repository 'http://apt.kubernetes.io kubernetes-xenial Release' no longer has a Release file.
    N: Updating from such a repository can't be done securely, and is therefore disabled by default.
    N: See apt-secure(8) manpage for repository creation and user configuration details.

  • aditya03
    aditya03 Posts: 15
    Options

    These are the outputs for the command you asked above

    ubuntu@ip-172-31-33-185:~$ kubectl get nodes -o wide
    NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    ip-172-31-33-185 Ready control-plane 47d v1.28.1 172.31.33.185 Ubuntu 20.04.6 LTS 5.15.0-1053-aws containerd://1.6.27
    ip-172-31-39-104 Ready 47d v1.28.1 172.31.39.104 Ubuntu 20.04.6 LTS 5.15.0-1052-aws containerd://1.6.27

    ubuntu@ip-172-31-33-185:~$ kubectl get pods -A -o wide
    NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    default nginx-7854ff8877-x4s64 1/1 Running 10 (121m ago) 26d 192.168.0.105 ip-172-31-33-185
    kube-system cilium-k6nx7 1/1 Running 24 (121m ago) 47d 172.31.39.104 ip-172-31-39-104
    kube-system cilium-operator-788c7d7585-9xkqh 1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
    kube-system cilium-operator-788c7d7585-ws48h 1/1 Running 22 (121m ago) 45d 172.31.39.104 ip-172-31-39-104
    kube-system cilium-q9cmw 1/1 Running 24 (121m ago) 47d 172.31.33.185 ip-172-31-33-185
    kube-system coredns-5d78c9869d-dxfwv 0/1 CrashLoopBackOff 44 (2m51s ago) 24h 192.168.1.212 ip-172-31-39-104
    kube-system coredns-5d78c9869d-tcnf8 0/1 CrashLoopBackOff 70 (2m3s ago) 20d 192.168.0.122 ip-172-31-33-185
    kube-system etcd-ip-172-31-33-185 1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
    kube-system kube-apiserver-ip-172-31-33-185 1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
    kube-system kube-controller-manager-ip-172-31-33-185 1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
    kube-system kube-proxy-4wqg7 1/1 Running 22 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
    kube-system kube-proxy-cl6b6 1/1 Running 22 (121m ago) 45d 172.31.39.104 ip-172-31-39-104
    kube-system kube-scheduler-ip-172-31-33-185 1/1 Running 22 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
    linkerd linkerd-destination-b98b5c974-ppvk5 0/4 CrashLoopBackOff 336 (2s ago) 23d 192.168.0.32 ip-172-31-33-185
    linkerd linkerd-heartbeat-28492501-btvnr 0/1 Error 0 109m 192.168.1.144 ip-172-31-39-104
    linkerd linkerd-heartbeat-28492501-gppfz 0/1 Error 0 77m 192.168.1.235 ip-172-31-39-104
    linkerd linkerd-heartbeat-28492501-qpmpq 0/1 Error 0 102m 192.168.1.53 ip-172-31-39-104
    linkerd linkerd-heartbeat-28492501-t2bht 0/1 Error 0 120m 192.168.1.252 ip-172-31-39-104
    linkerd linkerd-heartbeat-28492501-td2m2 0/1 Error 0 114m 192.168.1.214 ip-172-31-39-104
    linkerd linkerd-heartbeat-28492501-vlc26 0/1 Error 0 96m 192.168.1.226 ip-172-31-39-104
    linkerd linkerd-heartbeat-28492501-x5lxm 0/1 Error 0 87m 192.168.1.165 ip-172-31-39-104
    linkerd linkerd-identity-5ddf68fd7b-bkv7l 2/2 Running 10 (121m ago) 20d 192.168.0.19 ip-172-31-33-185
    linkerd linkerd-proxy-injector-7775b95864-qpmqw 0/2 CrashLoopBackOff 107 (30s ago) 20d 192.168.0.82 ip-172-31-33-185
    low-usage-limit limited-hog-66d5cd76bc-r2tdc 1/1 Running 5 (121m ago) 20d 192.168.0.206 ip-172-31-33-185
    small nginx-nfs-7cfd6b85bf-shf5n 0/1 Pending 0 27d

  • aditya03
    aditya03 Posts: 15
    Options

    NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-185 Ready control-plane 47d v1.28.1 172.31.33.185 Ubuntu 20.04.6 LTS 5.15.0-1053-aws containerd://1.6.27 ip-172-31-39-104 Ready 47d v1.28.1 172.31.39.104 Ubuntu 20.04.6 LTS 5.15.0-1052-aws containerd://1.6.27

  • aditya03
    aditya03 Posts: 15
    Options

    Hi @chrispokorni

    I am still stuck with same error as above with linkerd.

    My workload is on ec2, all instances being t2.large with 20gb EBS attached volume.
    Containerd for CNI

    These are the version throughout the cluster.
    kubectl version
    Client Version: v1.28.1
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

    kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.1", GitCommit:"8dc49c4b984b897d423aab4971090e1879eb4f23", GitTreeState:"clean", BuildDate:"2023-08-24T11:21:51Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

    kubelet --version
    Kubernetes v1.28.1

    OS Version

    Ubuntu 20.04.6 LTS

    I have single security group applied to all intances and it is full open(inbound & outbound).

    Command Outputs
    kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-185 Ready control-plane 49d v1.28.1 172.31.33.185 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.27 ip-172-31-37-153 Ready control-plane 24h v1.28.1 172.31.37.153 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.28 ip-172-31-39-104 Ready <none> 48d v1.28.1 172.31.39.104 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.28 ip-172-31-42-14 Ready control-plane 24h v1.28.1 172.31.42.14 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.28

    kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-7854ff8877-x4s64 1/1 Running 14 (70m ago) 28d 192.168.0.32 ip-172-31-33-185 <none> <none> kube-system cilium-9vgj2 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system cilium-db2rl 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system cilium-k6nx7 1/1 Running 27 (70m ago) 48d 172.31.39.104 ip-172-31-39-104 <none> <none> kube-system cilium-operator-788c7d7585-9xkqh 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system cilium-operator-788c7d7585-ws48h 1/1 Running 27 (70m ago) 47d 172.31.39.104 ip-172-31-39-104 <none> <none> kube-system cilium-q9cmw 1/1 Running 28 (70m ago) 48d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system coredns-5d78c9869d-hjxr4 0/1 CrashLoopBackOff 38 (2m44s ago) 24h 192.168.0.71 ip-172-31-33-185 <none> <none> kube-system coredns-5d78c9869d-tcnf8 0/1 CrashLoopBackOff 121 (2m57s ago) 22d 192.168.0.253 ip-172-31-33-185 <none> <none> kube-system etcd-ip-172-31-33-185 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system etcd-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system etcd-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-apiserver-ip-172-31-33-185 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-apiserver-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-apiserver-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-controller-manager-ip-172-31-33-185 1/1 Running 28 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-controller-manager-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-controller-manager-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-proxy-4wqg7 1/1 Running 26 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-proxy-cl6b6 1/1 Running 25 (70m ago) 47d 172.31.39.104 ip-172-31-39-104 <none> <none> kube-system kube-proxy-nj78s 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-proxy-zcsqk 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-scheduler-ip-172-31-33-185 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-scheduler-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-scheduler-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> linkerd linkerd-destination-f9b45794b-6rlvh 0/4 CrashLoopBackOff 31 (23s ago) 24m 192.168.1.24 ip-172-31-39-104 <none> <none> linkerd linkerd-heartbeat-28495741-7ntz9 0/1 Error 0 15m 192.168.1.17 ip-172-31-39-104 <none> <none> linkerd linkerd-heartbeat-28495741-h9lgp 0/1 Error 0 9m51s 192.168.1.54 ip-172-31-39-104 <none> <none> linkerd linkerd-heartbeat-28495741-qzcg8 1/1 Running 0 4m1s 192.168.1.43 ip-172-31-39-104 <none> <none> linkerd linkerd-identity-6b54bfdbff-c4s47 2/2 Running 0 24m 192.168.1.168 ip-172-31-39-104 <none> <none> linkerd linkerd-proxy-injector-5b47db6c88-vsczq 0/2 CrashLoopBackOff 14 (89s ago) 24m 192.168.1.154 ip-172-31-39-104 <none> <none> low-usage-limit limited-hog-66d5cd76bc-r2tdc 1/1 Running 9 (70m ago) 22d 192.168.0.206 ip-172-31-33-185 <none> <none> small nginx-nfs-7cfd6b85bf-shf5n 0/1 Pending 0 29d <none> <none> <none> <none>

  • chrispokorni
    chrispokorni Posts: 2,178
    Options

    Hi @aditya03,

    Prior to installing the service mesh I would attempt to fix the coredns pods that are in CrashLoopBackOff state. Uninstall the service mesh and try to delete the coredns pods, wait for the controller to re-create them and if they return to Running state then try to install the service mesh again.

    Regards,
    -Chris

  • aditya03
    aditya03 Posts: 15
    Options

    Hi @chrispokorni

    I deleted the service mesh and delted the coredns pod but it it came back in crashLoopbackOff.
    I tried couple different things like adding this to /var/lib/kubelet/config.yaml and restart the kubelet but no good.
    evictionHard:
    imagefs.available: 1%
    memory.available: 100Mi
    nodefs.available: 1%
    nodefs.inodesFree: 1%

    I untainted the nodes or NoSchideule, that too didn't work

    Here is the description of coredns pod should you need it to debug, kindly help. Thanks

    kubectl -n kube-system describe pod coredns-5d78c9869d-rwq8h Name: coredns-5d78c9869d-rwq8h Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: coredns Node: ip-172-31-33-185/172.31.33.185 Start Time: Thu, 07 Mar 2024 16:03:32 +0000 Labels: k8s-app=kube-dns pod-template-hash=5d78c9869d Annotations: <none> Status: Running IP: 192.168.0.40 IPs: IP: 192.168.0.40 Controlled By: ReplicaSet/coredns-5d78c9869d Containers: coredns: Container ID: containerd://9bdb4e2a9cd7d3720545c4cd540888604f255194fc2ff5aeede54e07d619d32e Image: registry.k8s.io/coredns/coredns:v1.10.1 Image ID: registry.k8s.io/coredns/coredns@sha256:a0ead06651cf580044aeb0a0feba63591858fb2e43ade8c9dea45a6a89ae7e5e Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Thu, 07 Mar 2024 16:27:57 +0000 Finished: Thu, 07 Mar 2024 16:27:57 +0000 Ready: False Restart Count: 18 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bhj4m (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false kube-api-access-bhj4m: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 24m default-scheduler Successfully assigned kube-system/coredns-5d78c9869d-rwq8h to ip-172-31-33-185 Normal Started 23m (x4 over 24m) kubelet Started container coredns Warning Unhealthy 23m kubelet Readiness probe failed: Get "http://192.168.0.40:8181/ready": dial tcp 192.168.0.40:8181: connect: connection refused Normal Pulled 23m (x5 over 24m) kubelet Container image "registry.k8s.io/coredns/coredns:v1.10.1" already present on machine Normal Created 23m (x5 over 24m) kubelet Created container coredns Warning BackOff 14m (x54 over 24m) kubelet Back-off restarting failed container coredns in pod coredns-5d78c9869d-rwq8h_kube-system(36ef06b7-f6db-4e6f-8395-bb225d2cc460) Normal Pulled 9m2s (x4 over 10m) kubelet Container image "registry.k8s.io/coredns/coredns:v1.10.1" already present on machine Normal Created 9m2s (x4 over 10m) kubelet Created container coredns Normal Started 9m1s (x4 over 10m) kubelet Started container coredns Warning BackOff 5m23s (x34 over 10m) kubelet Back-off restarting failed container coredns in pod coredns-5d78c9869d-rwq8h_kube-system(36ef06b7-f6db-4e6f-8395-bb225d2cc460) Warning Unhealthy 2m55s kubelet Readiness probe failed: Get "http://192.168.0.40:8181/ready": dial tcp 192.168.0.40:8181: connect: connection refused Warning BackOff 117s (x12 over 3m14s) kubelet Back-off restarting failed container coredns in pod coredns-5d78c9869d-rwq8h_kube-system(36ef06b7-f6db-4e6f-8395-bb225d2cc460) Normal Pulled 103s (x4 over 3m14s) kubelet Container image "registry.k8s.io/coredns/coredns:v1.10.1" already present on machine Normal Created 103s (x4 over 3m14s) kubelet Created container coredns Normal Started 103s (x4 over 3m14s) kubelet Started container coredns

  • chrispokorni
    chrispokorni Posts: 2,178
    Options

    Hi @aditya03,

    Please format your outputs, otherwise they are almost impossible to read...

    One of my students this week faced a similar issue in class, and it seems it was the coredns config map, more precisely the corefile, that was modified in lab 9 that was breaking the coredns pods. There is a slight chance that there is an accidental typo that invalidates the corefile.

    Delete the current coredns configmap:

    kubectl -n kube-system delete configmaps coredns

    Create a new configmap manifest coredns-cm.yaml with the following content:

    apiVersion: v1
    data:
      Corefile: |
        .:53 {
            errors
            health {
               lameduck 5s
            }
            ready
            kubernetes cluster.local in-addr.arpa ip6.arpa {
               pods insecure
               fallthrough in-addr.arpa ip6.arpa
               ttl 30
            }
            prometheus :9153
            forward . /etc/resolv.conf {
               max_concurrent 1000
            }
            cache 30
            loop
            reload
            loadbalance
        }
    kind: ConfigMap
    metadata:
      name: coredns
      namespace: kube-system
    

    Then re-create the configmap:

    kubectl -n kube-system apply -f coredns-cm.yaml

    Then delete the two coredns pods. The new pods should reach Running state.

    Regards,
    -Chris

Categories

Upcoming Training