Isuue with linkerd

aditya03 · February 2024

I am deploying a service mesh linkerd, but I got and stuck at "linkerd check".
How to solve this?

ubuntu@ip:~$ linkerd check

kubernetes-api

√ can initialize the client
√ can query the Kubernetes API

kubernetes-version

√ is running the minimum Kubernetes API version

linkerd-existence

√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
× control plane pods are ready
pod/linkerd-destination-b98b5c974-ppvk5 container sp-validator is not ready
see https://linkerd.io/2.14/checks/#l5d-api-control-ready for hints

Status check results are ×

aditya03 · February 2024

@chrispokorni ++

chrispokorni · February 2024

Hi @aditya03,

This discussion was moved to the LFS258 Forum.

When opening a new discussion topic please do so in its dedicated forum, otherwise the instructors who may be able to provide assistance will not be notified accordingly - as is the case of this discussion.

Linkerd installation and check may fail for many reasons. It could be related to infrastructure networking, or cluster networking misconfiguration, resources, or compatibility.

What is your cluster infrastructure - cloud or local hypervisor, what type of VMs are hosting your lab environment - size and OS, how is your network configured at infrastructure and how are firewalls configured, what CNI plugin do you have installed in your cluster, what is the Kubernetes version of all components (it should be the same, unlike earlier when they were mismatched).

What output is produced by the following commands?

kubectl get nodes -o wide

kubectl get pods -A -o wide

Regards,
-Chris

aditya03 · March 2024

Hi @chrispokorni ,

Thanks point noted.

Here is the list of my kubernetes configuration

Currently I have 1 master- 1 worker cluster hosted on AWS EC2, instance size t2.large .
OS - Ubuntu 20.04.6 LTS
Containerd for CNI

Versions -
Master
Kubernetes v1.28.1
kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.1", GitCommit:"8dc49c4b984b897d423aab4971090e1879eb4f23", GitTreeState:"clean", BuildDate:"2023-08-24T11:21:51Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

Worker
Kubernetes v1.28.1
kubectl version
Client Version: v1.28.1

kubeadm version: &version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:20:04Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}

I have noticed that my kubeadm version doesn't match.
So I was upgrading kubeadm on worker node but got the same error (which I have posted in another discussion) when I tried updating the repositories on worker.

Get:14 https://download.docker.com/linux/ubuntu focal/stable amd64 Packages [38.0 kB]
Err:15 https://packages.cloud.google.com/apt kubernetes-xenial Release
404 Not Found [IP: 142.251.42.78 443]
Get:16 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2752 kB]
Get:17 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [418 kB]
Get:18 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [2606 kB]
Get:19 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [363 kB]
Get:20 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [944 kB]
Get:21 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [198 kB]
Get:22 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [23.9 kB]
Reading package lists... Done
E: The repository 'http://apt.kubernetes.io kubernetes-xenial Release' no longer has a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

aditya03 · March 2024

These are the outputs for the command you asked above

ubuntu@ip-172-31-33-185:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-172-31-33-185 Ready control-plane 47d v1.28.1 172.31.33.185 Ubuntu 20.04.6 LTS 5.15.0-1053-aws containerd://1.6.27
ip-172-31-39-104 Ready 47d v1.28.1 172.31.39.104 Ubuntu 20.04.6 LTS 5.15.0-1052-aws containerd://1.6.27

ubuntu@ip-172-31-33-185:~$ kubectl NAMESPACE NAME default nginx-7854ff8877-x4s64 kube-system cilium-k6nx7 kube-system cilium-operator-788c7d7585-9xkqh kube-system cilium-operator-788c7d7585-ws48h kube-system cilium-q9cmw kube-system coredns-5d78c9869d-dxfwv kube-system coredns-5d78c9869d-tcnf8 kube-system etcd-ip-172-31-33-185 kube-system kube-apiserver-ip-172-31-33-185 kube-system kube-controller-manager-ip-172-31-33-185 kube-system kube-proxy-4wqg7 kube-system kube-proxy-cl6b6 kube-system kube-scheduler-ip-172-31-33-185 linkerd linkerd-destination-b98b5c974-ppvk5 linkerd linkerd-heartbeat-28492501-btvnr linkerd linkerd-heartbeat-28492501-gppfz linkerd linkerd-heartbeat-28492501-qpmpq linkerd linkerd-heartbeat-28492501-t2bht linkerd linkerd-heartbeat-28492501-td2m2 linkerd linkerd-heartbeat-28492501-vlc26 linkerd linkerd-heartbeat-28492501-x5lxm linkerd linkerd-identity-5ddf68fd7b-bkv7l linkerd linkerd-proxy-injector-7775b95864-qpmqw low-usage-limit limited-hog-66d5cd76bc-r2tdc small nginx-nfs-7cfd6b85bf-shf5n get pods -A -o wide
READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
1/1 Running 10 (121m ago) 26d 192.168.0.105 ip-172-31-33-185
1/1 Running 24 (121m ago) 47d 172.31.39.104 ip-172-31-39-104
1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
1/1 Running 22 (121m ago) 45d 172.31.39.104 ip-172-31-39-104
1/1 Running 24 (121m ago) 47d 172.31.33.185 ip-172-31-33-185
0/1 CrashLoopBackOff 44 (2m51s ago) 24h 192.168.1.212 ip-172-31-39-104
0/1 CrashLoopBackOff 70 (2m3s ago) 20d 192.168.0.122 ip-172-31-33-185
1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
1/1 Running 23 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
1/1 Running 22 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
1/1 Running 22 (121m ago) 45d 172.31.39.104 ip-172-31-39-104
1/1 Running 22 (121m ago) 45d 172.31.33.185 ip-172-31-33-185
0/4 CrashLoopBackOff 336 (2s ago) 23d 192.168.0.32 ip-172-31-33-185
0/1 Error 0 109m 192.168.1.144 ip-172-31-39-104
0/1 Error 0 77m 192.168.1.235 ip-172-31-39-104
0/1 Error 0 102m 192.168.1.53 ip-172-31-39-104
0/1 Error 0 120m 192.168.1.252 ip-172-31-39-104
0/1 Error 0 114m 192.168.1.214 ip-172-31-39-104
0/1 Error 0 96m 192.168.1.226 ip-172-31-39-104
0/1 Error 0 87m 192.168.1.165 ip-172-31-39-104
2/2 Running 10 (121m ago) 20d 192.168.0.19 ip-172-31-33-185
0/2 CrashLoopBackOff 107 (30s ago) 20d 192.168.0.82 ip-172-31-33-185
1/1 Running 5 (121m ago) 20d 192.168.0.206 ip-172-31-33-185
0/1 Pending 0 27d

aditya03 · March 2024

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-185 Ready control-plane 47d v1.28.1 172.31.33.185 Ubuntu 20.04.6 LTS 5.15.0-1053-aws containerd://1.6.27 ip-172-31-39-104 Ready 47d v1.28.1 172.31.39.104 Ubuntu 20.04.6 LTS 5.15.0-1052-aws containerd://1.6.27

aditya03 · March 2024

Hi @chrispokorni

I am still stuck with same error as above with linkerd.

My workload is on ec2, all instances being t2.large with 20gb EBS attached volume.
Containerd for CNI

These are the version throughout the cluster.
kubectl version
Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.1", GitCommit:"8dc49c4b984b897d423aab4971090e1879eb4f23", GitTreeState:"clean", BuildDate:"2023-08-24T11:21:51Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

kubelet --version
Kubernetes v1.28.1

OS Version

Ubuntu 20.04.6 LTS

I have single security group applied to all intances and it is full open(inbound & outbound).

Command Outputs
kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-185 Ready control-plane 49d v1.28.1 172.31.33.185 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.27 ip-172-31-37-153 Ready control-plane 24h v1.28.1 172.31.37.153 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.28 ip-172-31-39-104 Ready <none> 48d v1.28.1 172.31.39.104 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.28 ip-172-31-42-14 Ready control-plane 24h v1.28.1 172.31.42.14 <none> Ubuntu 20.04.6 LTS 5.15.0-1055-aws containerd://1.6.28

kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-7854ff8877-x4s64 1/1 Running 14 (70m ago) 28d 192.168.0.32 ip-172-31-33-185 <none> <none> kube-system cilium-9vgj2 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system cilium-db2rl 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system cilium-k6nx7 1/1 Running 27 (70m ago) 48d 172.31.39.104 ip-172-31-39-104 <none> <none> kube-system cilium-operator-788c7d7585-9xkqh 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system cilium-operator-788c7d7585-ws48h 1/1 Running 27 (70m ago) 47d 172.31.39.104 ip-172-31-39-104 <none> <none> kube-system cilium-q9cmw 1/1 Running 28 (70m ago) 48d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system coredns-5d78c9869d-hjxr4 0/1 CrashLoopBackOff 38 (2m44s ago) 24h 192.168.0.71 ip-172-31-33-185 <none> <none> kube-system coredns-5d78c9869d-tcnf8 0/1 CrashLoopBackOff 121 (2m57s ago) 22d 192.168.0.253 ip-172-31-33-185 <none> <none> kube-system etcd-ip-172-31-33-185 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system etcd-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system etcd-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-apiserver-ip-172-31-33-185 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-apiserver-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-apiserver-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-controller-manager-ip-172-31-33-185 1/1 Running 28 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-controller-manager-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-controller-manager-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-proxy-4wqg7 1/1 Running 26 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-proxy-cl6b6 1/1 Running 25 (70m ago) 47d 172.31.39.104 ip-172-31-39-104 <none> <none> kube-system kube-proxy-nj78s 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-proxy-zcsqk 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> kube-system kube-scheduler-ip-172-31-33-185 1/1 Running 27 (70m ago) 47d 172.31.33.185 ip-172-31-33-185 <none> <none> kube-system kube-scheduler-ip-172-31-37-153 1/1 Running 1 (70m ago) 24h 172.31.37.153 ip-172-31-37-153 <none> <none> kube-system kube-scheduler-ip-172-31-42-14 1/1 Running 1 (70m ago) 24h 172.31.42.14 ip-172-31-42-14 <none> <none> linkerd linkerd-destination-f9b45794b-6rlvh 0/4 CrashLoopBackOff 31 (23s ago) 24m 192.168.1.24 ip-172-31-39-104 <none> <none> linkerd linkerd-heartbeat-28495741-7ntz9 0/1 Error 0 15m 192.168.1.17 ip-172-31-39-104 <none> <none> linkerd linkerd-heartbeat-28495741-h9lgp 0/1 Error 0 9m51s 192.168.1.54 ip-172-31-39-104 <none> <none> linkerd linkerd-heartbeat-28495741-qzcg8 1/1 Running 0 4m1s 192.168.1.43 ip-172-31-39-104 <none> <none> linkerd linkerd-identity-6b54bfdbff-c4s47 2/2 Running 0 24m 192.168.1.168 ip-172-31-39-104 <none> <none> linkerd linkerd-proxy-injector-5b47db6c88-vsczq 0/2 CrashLoopBackOff 14 (89s ago) 24m 192.168.1.154 ip-172-31-39-104 <none> <none> low-usage-limit limited-hog-66d5cd76bc-r2tdc 1/1 Running 9 (70m ago) 22d 192.168.0.206 ip-172-31-33-185 <none> <none> small nginx-nfs-7cfd6b85bf-shf5n 0/1 Pending 0 29d <none> <none> <none> <none>

chrispokorni · March 2024

Hi @aditya03,

Prior to installing the service mesh I would attempt to fix the coredns pods that are in CrashLoopBackOff state. Uninstall the service mesh and try to delete the coredns pods, wait for the controller to re-create them and if they return to Running state then try to install the service mesh again.

Regards,
-Chris

aditya03 · March 2024

Hi @chrispokorni

I deleted the service mesh and delted the coredns pod but it it came back in crashLoopbackOff.
I tried couple different things like adding this to /var/lib/kubelet/config.yaml and restart the kubelet but no good.
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%

I untainted the nodes or NoSchideule, that too didn't work

Here is the description of coredns pod should you need it to debug, kindly help. Thanks

kubectl -n kube-system describe pod coredns-5d78c9869d-rwq8h Name: coredns-5d78c9869d-rwq8h Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: coredns Node: ip-172-31-33-185/172.31.33.185 Start Time: Thu, 07 Mar 2024 16:03:32 +0000 Labels: k8s-app=kube-dns pod-template-hash=5d78c9869d Annotations: <none> Status: Running IP: 192.168.0.40 IPs: IP: 192.168.0.40 Controlled By: ReplicaSet/coredns-5d78c9869d Containers: coredns: Container ID: containerd://9bdb4e2a9cd7d3720545c4cd540888604f255194fc2ff5aeede54e07d619d32e Image: registry.k8s.io/coredns/coredns:v1.10.1 Image ID: registry.k8s.io/coredns/coredns@sha256:a0ead06651cf580044aeb0a0feba63591858fb2e43ade8c9dea45a6a89ae7e5e Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Thu, 07 Mar 2024 16:27:57 +0000 Finished: Thu, 07 Mar 2024 16:27:57 +0000 Ready: False Restart Count: 18 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bhj4m (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false kube-api-access-bhj4m: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 24m default-scheduler Successfully assigned kube-system/coredns-5d78c9869d-rwq8h to ip-172-31-33-185 Normal Started 23m (x4 over 24m) kubelet Started container coredns Warning Unhealthy 23m kubelet Readiness probe failed: Get "http://192.168.0.40:8181/ready": dial tcp 192.168.0.40:8181: connect: connection refused Normal Pulled 23m (x5 over 24m) kubelet Container image "registry.k8s.io/coredns/coredns:v1.10.1" already present on machine Normal Created 23m (x5 over 24m) kubelet Created container coredns Warning BackOff 14m (x54 over 24m) kubelet Back-off restarting failed container coredns in pod coredns-5d78c9869d-rwq8h_kube-system(36ef06b7-f6db-4e6f-8395-bb225d2cc460) Normal Pulled 9m2s (x4 over 10m) kubelet Container image "registry.k8s.io/coredns/coredns:v1.10.1" already present on machine Normal Created 9m2s (x4 over 10m) kubelet Created container coredns Normal Started 9m1s (x4 over 10m) kubelet Started container coredns Warning BackOff 5m23s (x34 over 10m) kubelet Back-off restarting failed container coredns in pod coredns-5d78c9869d-rwq8h_kube-system(36ef06b7-f6db-4e6f-8395-bb225d2cc460) Warning Unhealthy 2m55s kubelet Readiness probe failed: Get "http://192.168.0.40:8181/ready": dial tcp 192.168.0.40:8181: connect: connection refused Warning BackOff 117s (x12 over 3m14s) kubelet Back-off restarting failed container coredns in pod coredns-5d78c9869d-rwq8h_kube-system(36ef06b7-f6db-4e6f-8395-bb225d2cc460) Normal Pulled 103s (x4 over 3m14s) kubelet Container image "registry.k8s.io/coredns/coredns:v1.10.1" already present on machine Normal Created 103s (x4 over 3m14s) kubelet Created container coredns Normal Started 103s (x4 over 3m14s) kubelet Started container coredns

chrispokorni · March 2024

Hi @aditya03,

Please format your outputs, otherwise they are almost impossible to read...

One of my students this week faced a similar issue in class, and it seems it was the coredns config map, more precisely the corefile, that was modified in lab 9 that was breaking the coredns pods. There is a slight chance that there is an accidental typo that invalidates the corefile.

Delete the current coredns configmap:

kubectl -n kube-system delete configmaps coredns

Create a new configmap manifest coredns-cm.yaml with the following content:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system

Then re-create the configmap:

kubectl -n kube-system apply -f coredns-cm.yaml

Then delete the two coredns pods. The new pods should reach Running state.

Regards,
-Chris

hemal.hughes · June 24

thanks @chrispokorni. Your solution fixed my issue.

Isuue with linkerd

kubernetes-api

kubernetes-version

linkerd-existence

Welcome!

Comments

OS Version

Welcome!

Welcome!

Quick Links

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)