LAB 3.4 - Failed to pull image when Master and Worker are ready

fjlozano · December 2020

The deployment is not ready when master and worker are ready however if master is only working the error doesn't appear.

USE CASE OK:

_u2004@k8sm0:~$ date
Wed 23 Dec 2020 06:09:17 PM UTC

u2004@k8sm0:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8sm0 Ready control-plane,master 6h22m v1.20.1
k8sw0 NotReady 4h58m v1.20.1

u2004@k8sm0:~$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

u2004@k8sm0:~$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 1/1 1 1 11s <------------------------------------------------- READY 1/1

u2004@k8sm0:~$ kubectl get events --sort-by='.lastTimestamp'
36s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6799fc88d8 to 1
36s Normal SuccessfulCreate replicaset/nginx-6799fc88d8 Created pod: nginx-6799fc88d8-vr2mr
35s Normal Pulling pod/nginx-6799fc88d8-vr2mr Pulling image "nginx"
32s Normal Pulled pod/nginx-6799fc88d8-vr2mr Successfully pulled image "nginx" in 2.797267818s
32s Normal Created pod/nginx-6799fc88d8-vr2mr Created container nginx
32s Normal Started pod/nginx-6799fc88d8-vr2mr Started container nginx

u2004@k8sm0:~$ kubectl delete deployment nginx
deployment.apps "nginx" deleted
u2004@k8sm0:~$ kubectl get deployment
No resources found in default namespace._

USE CASE NOK:

_u2004@k8sm0:~$ date
Wed 23 Dec 2020 06:13:49 PM UTC
u2004@k8sm0:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8sm0 Ready control-plane,master 6h26m v1.20.1
k8sw0 Ready 5h2m v1.20.1

u2004@k8sm0:~$ kubectl get deployment
No resources found in default namespace.

u2004@k8sm0:~$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

u2004@k8sm0:~$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 0/1 1 0 16s
u2004@k8sm0:~$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 0/1 1 0 41s

u2004@k8sm0:~$ kubectl get events --sort-by='.lastTimestamp'
2m23s Normal NodeAllocatableEnforced node/k8sw0 Updated Node Allocatable limit across pods
2m23s Normal NodeHasSufficientPID node/k8sw0 Node k8sw0 status is now: NodeHasSufficientPID
2m23s Normal NodeHasNoDiskPressure node/k8sw0 Node k8sw0 status is now: NodeHasNoDiskPressure
2m23s Normal NodeHasSufficientMemory node/k8sw0 Node k8sw0 status is now: NodeHasSufficientMemory
2m23s Normal Starting node/k8sw0 Starting kubelet.
2m23s Warning Rebooted node/k8sw0 Node k8sw0 has been rebooted, boot id: 3148704a-d187-451c-b603-43b3a30be807
2m23s Normal NodeReady node/k8sw0 Node k8sw0 status is now: NodeReady
2m12s Normal Starting node/k8sw0 Starting kube-proxy.
117s Normal SuccessfulCreate replicaset/nginx-6799fc88d8 Created pod: nginx-6799fc88d8-z8hpq
117s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6799fc88d8 to 1
116s Normal Pulling pod/nginx-6799fc88d8-z8hpq Pulling image "nginx"
0s Warning Failed pod/nginx-6799fc88d8-z8hpq Failed to pull image "nginx": rpc error: code = Unknown desc = dial tcp: lookup registry-1.docker.io: Temporary failure in name resolution
0s Warning Failed pod/nginx-6799fc88d8-z8hpq Error: ErrImagePull_

fjlozano · December 2020

Resolution fails when both nodes are up and running:

u2004@k8sm0:/tmp$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8sm0 Ready control-plane,master 6h47m v1.20.1
k8sw0 Ready 5h23m v1.20.1

u2004@k8sm0:/tmp$ ping google.es
ping: google.es: Temporary failure in name resolution

u2004@k8sm0:/tmp$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8sm0 Ready control-plane,master 6h51m v1.20.1
k8sw0 NotReady 5h26m v1.20.1

u2004@k8sm0:/tmp$ ping google.es
PING google.es (172.217.17.3) 56(84) bytes of data.
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=1 ttl=128 time=15.9 ms
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=2 ttl=128 time=21.0 ms
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=3 ttl=128 time=18.0 ms
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=4 ttl=128 time=19.8 ms
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=5 ttl=128 time=20.4 ms
^C
--- google.es ping statistics ---

serewicz · December 2020

Hello,

I notice that your output is dissimilar to what one would expect. For example, this is what I see when I run kubectl get nodes:
`student@master:~$ kubectl get nodes

NAME STATUS ROLES AGE VERSION

master Ready master 4d19h v1.19.0

worker Ready 4d19h v1.19.0
`
What are you using for your lab environment, GCE, AWS, VirtualBox, bare metal etc...?
What version of Ubuntu are you using?
Please follow the lab guide in using kubeadm to build your cluster? As you are using a different version of Kubernetes than the current course it is clear you have deviated from the lab. How much you deviated could explain why other steps are not working.

Regards,

fjlozano · December 2020

My Lab:

VMWare
Ubuntu 20.04 LTS
K8S 1.20.1

Last version of all component as Kubernetes documentation recommends.

This is a rare case, because is there any depedency between nodes status and the network?

I believe this case is not related to the versions.

Regards

fjlozano · December 2020

In order to disscard the version I have re-deployed my Lab according to current course documentation.

The Issue is reproduced. I have it produced in my laptop and in other environment

###### VERSIONS #######

student@master:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

student@master:~$ apt-show-versions | grep -i kube
cri-tools:amd64/kubernetes-xenial 1.13.0-01 uptodate
kubeadm:amd64/kubernetes-xenial 1.18.1-00 upgradeable to 1.20.1-00
kubectl:amd64/kubernetes-xenial 1.18.1-00 upgradeable to 1.20.1-00
kubelet:amd64/kubernetes-xenial 1.18.1-00 upgradeable to 1.20.1-00
kubernetes-cni:amd64/kubernetes-xenial 0.8.7-00 uptodate

###### NOK USE CASE #####

student@master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 19m v1.18.1
worker Ready 4m11s v1.18.1

student@master:~$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 0/1 1 0 2m2s

student@master:~$ kubectl get events --sort-by='.lastTimestamp'
13m Normal NodeReady node/worker Node worker status is now: NodeReady
6m9s Normal SuccessfulCreate replicaset/nginx-6799fc88d8 Created pod: nginx-6799fc88d8-9xd8t
6m9s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6799fc88d8 to 1
6m9s Normal Scheduled pod/nginx-6799fc88d8-9xd8t Successfully assigned default/nginx-6799fc88d8-9xd8t to worker
4m4s Normal Pulling pod/nginx-6799fc88d8-9xd8t Pulling image "nginx"
3m49s Warning Failed pod/nginx-6799fc88d8-9xd8t Error: ErrImagePull
3m49s Warning Failed pod/nginx-6799fc88d8-9xd8t Failed to pull image "nginx": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
3m22s Warning Failed pod/nginx-6799fc88d8-9xd8t Error: ImagePullBackOff
3m22s Normal BackOff pod/nginx-6799fc88d8-9xd8t Back-off pulling image "nginx"

###### OK USE CASE #####

student@master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 33m v1.18.1
worker NotReady 17m v1.18.1

student@master:~$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 1/1 1 1 20s

student@master:~$ kubectl get events --sort-by='.lastTimestamp'
5m41s Normal NodeNotReady node/worker Node worker status is now: NodeNotReady
89s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6799fc88d8 to 1
89s Normal SuccessfulCreate replicaset/nginx-6799fc88d8 Created pod: nginx-6799fc88d8-wn9m4
89s Normal Scheduled pod/nginx-6799fc88d8-wn9m4 Successfully assigned default/nginx-6799fc88d8-wn9m4 to master
88s Normal Pulling pod/nginx-6799fc88d8-wn9m4 Pulling image "nginx"
74s Normal Started pod/nginx-6799fc88d8-wn9m4 Started container nginx
74s Normal Created pod/nginx-6799fc88d8-wn9m4 Created container nginx
74s Normal Pulled pod/nginx-6799fc88d8-wn9m4 Successfully pulled image "nginx"

###### OTHER TESTS ######

When both nodes are in the cluster ready all name resolutions are affected:

student@master:~$ sudo apt install apt-show-versions
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libapt-pkg-perl
The following NEW packages will be installed:
apt-show-versions libapt-pkg-perl
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 96.6 kB of archives.
After this operation, 312 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Err:1 http://es.archive.ubuntu.com/ubuntu bionic/main amd64 libapt-pkg-perl amd64 0.1.33build1
Temporary failure resolving 'es.archive.ubuntu.com'
0% [Working]^C

student@master:~$ ping google.es
ping: google.es: Temporary failure in name resolution

When MASTER only is in the cluster ready resolution name works fine:

student@master:~$ sudo apt install apt-show-versions
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libapt-pkg-perl
The following NEW packages will be installed:
apt-show-versions libapt-pkg-perl
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 96.6 kB of archives.
After this operation, 312 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://es.archive.ubuntu.com/ubuntu bionic/main amd64 libapt-pkg-perl amd64 0.1.33build1 [68.0 kB]
Get:2 http://es.archive.ubuntu.com/ubuntu bionic/universe amd64 apt-show-versions all 0.22.7ubuntu1 [28.6 kB]
Fetched 96.6 kB in 16s (6,140 B/s)
Selecting previously unselected package libapt-pkg-perl.
(Reading database ... 67530 files and directories currently installed.)
Preparing to unpack .../libapt-pkg-perl_0.1.33build1_amd64.deb ...
Unpacking libapt-pkg-perl (0.1.33build1) ...
Selecting previously unselected package apt-show-versions.
Preparing to unpack .../apt-show-versions_0.22.7ubuntu1_all.deb ...
Unpacking apt-show-versions (0.22.7ubuntu1) ...
Setting up libapt-pkg-perl (0.1.33build1) ...
Setting up apt-show-versions (0.22.7ubuntu1) ...
** initializing cache. This may take a while **
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...

student@master:~$ ping google.es
PING google.es (216.58.209.67) 56(84) bytes of data.
64 bytes from mad07s22-in-f3.1e100.net (216.58.209.67): icmp_seq=1 ttl=128 time=38.2 ms
64 bytes from mad07s22-in-f3.1e100.net (216.58.209.67): icmp_seq=2 ttl=128 time=20.6 ms
64 bytes from mad07s22-in-f3.1e100.net (216.58.209.67): icmp_seq=3 ttl=128 time=42.9 ms
64 bytes from mad07s22-in-f3.1e100.net (216.58.209.67): icmp_seq=4 ttl=128 time=17.5 ms

UmeshKaul · December 2020

check and compare the routes and dns settings changes between the OK and NOK use cases

serewicz · December 2020

Hello,

When you have two nodes, the output indicates that the network stops working. This would mean that the issue is Linux networking, not Kubernetes.

Please make sure that the VMware virtual machines do not have any firewall between each other or the outside world.

If you are using the default calico.yaml file, then you would be using a combination of 10. addresses and 192.168. addresses for your cluster. You can verify this by looking at the pools mentioned in the calico.yaml file. Are you using an overlapping network for your VMWare VMs, or the laptop it is running on? If so then the routing becomes an issue. You can view the before and after routes using ip route. As you compare and contrast, what is different between them?

Regards,

fjlozano · December 2020

There isn't firewall

Routes appear OK

# Use case OK

student@master:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.10.2 0.0.0.0 UG 100 0 0 ens33
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.10.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33
192.168.10.2 0.0.0.0 255.255.255.255 UH 100 0 0 ens33
192.168.219.64 0.0.0.0 255.255.255.192 U 0 0 0 *
192.168.219.76 0.0.0.0 255.255.255.255 UH 0 0 0 cali4f2dae3ae57
192.168.219.77 0.0.0.0 255.255.255.255 UH 0 0 0 cali3b44909318d
192.168.219.78 0.0.0.0 255.255.255.255 UH 0 0 0 calif48570d0d2e

student@master:~$ ip route
default via 192.168.10.2 dev ens33 proto dhcp src 192.168.10.133 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.10.0/24 dev ens33 proto kernel scope link src 192.168.10.133
192.168.10.2 dev ens33 proto dhcp scope link src 192.168.10.133 metric 100
192.168.219.76 dev cali4f2dae3ae57 scope link
192.168.219.77 dev cali3b44909318d scope link
192.168.219.78 dev calif48570d0d2e scope link

# Use case NOK

student@master:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.10.2 0.0.0.0 UG 100 0 0 ens33
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.10.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33
192.168.10.2 192.168.10.134 255.255.255.255 UGH 0 0 0 tunl0
192.168.10.2 0.0.0.0 255.255.255.255 UH 100 0 0 ens33
192.168.171.64 192.168.10.134 255.255.255.192 UG 0 0 0 tunl0
192.168.219.64 0.0.0.0 255.255.255.192 U 0 0 0 *
192.168.219.76 0.0.0.0 255.255.255.255 UH 0 0 0 cali4f2dae3ae57
192.168.219.77 0.0.0.0 255.255.255.255 UH 0 0 0 cali3b44909318d
192.168.219.78 0.0.0.0 255.255.255.255 UH 0 0 0 calif48570d0d2e

student@master:~$ ip route
default via 192.168.10.2 dev ens33 proto dhcp src 192.168.10.133 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.10.0/24 dev ens33 proto kernel scope link src 192.168.10.133
192.168.10.2 via 192.168.10.134 dev tunl0 proto bird onlink
192.168.10.2 dev ens33 proto dhcp scope link src 192.168.10.133 metric 100
192.168.171.64/26 via 192.168.10.134 dev tunl0 proto bird onlink
blackhole 192.168.219.64/26 proto bird
192.168.219.73 dev cali3b44909318d scope link
192.168.219.74 dev cali4f2dae3ae57 scope link
192.168.219.75 dev calif48570d0d2e scope link

fjlozano · December 2020

The problem is with the DNS resolution.

When the second node is Ready, DNS resolution fails.

By default in ubuntu 18.04 the name resolution is managed by systemd-resolved service. (Standard installation using Official Server ISO )

#### Use Case OK

_student@master:/run/systemd/resolve$ dig google.es

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> google.es
;; global options: +cmd
;; connection timed out; no servers could be reached
student@master:/run/systemd/resolve$ dig google.es

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> google.es
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27420
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.es. IN A

;; ANSWER SECTION:
google.es. 5 IN A 172.217.17.3

;; Query time: 24 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sat Dec 26 12:30:10 UTC 2020
;; MSG SIZE rcvd: 54_

student@master:/run/systemd/resolve$ netstat -anp | grep -i ":53"
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
udp 0 0 127.0.0.53:53 0.0.0.0:* -

#### Use Case NOK

_student@master:~$ dig google.es

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> google.es
;; global options: +cmd
;; connection timed out; no servers could be reached

student@master:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=128 time=62.9 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=128 time=23.0 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=128 time=24.0 ms_

student@master:/run/systemd/resolve$ netstat -anp | grep -i ":53"
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
udp 0 0 127.0.0.53:53 0.0.0.0:* -
udp 0 0 192.168.219.64:35529 192.168.10.2:53 ESTABLISHED -

I would like to understand why the behaviour is modified by Kubernetes/Calico/Other component.
As WA I could configure 8.8.8.8 as DNS server but I prefer to understand this use case.

Regards

fjlozano · December 2020

The issues could be related to:

Some Linux distributions (e.g. Ubuntu) use a local DNS resolver by default (systemd-resolved). Systemd-resolved moves and replaces /etc/resolv.conf with a stub file that can cause a fatal forwarding loop when resolving names in upstream servers. This can be fixed manually by using kubelet's --resolv-conf flag to point to the correct resolv.conf (With systemd-resolved, this is /run/systemd/resolve/resolv.conf). kubeadm automatically detects systemd-resolved, and adjusts the kubelet flags accordingly.

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

serewicz · December 2020

Hello,

Are all pods running without error? If you look at the logs for your calico and coredns pods, are there any errors? The command would look something like kubectl -n kube-system logs coredns-<TAB>

It looks as if you are using 192.168. network for the nodes. Which overlaps with the default Calico setup. If you change the pool range to something else, like a 172.16.0.0/12 network when you build the cluster check to see if it works.

Regards,

fjlozano · January 2021

There aren't errors in coredns

I moved my network to 172.16.0.0/12

Finally I solved the issue re-installing the platform completely but I disabled systemd-resolved before.

LAB 3.4 - Failed to pull image when Master and Worker are ready

Comments

# Use case OK

# Use case NOK

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)