Back-off restarting failed container

neirkate · October 2019

Hello all. I may need help on this problem.

Lab Book Version 2019-04-26

This is about Ingress, Lab 10.1 Advanced Service Exposure.
I have done these steps (total 10 steps):
1. kubectl create deployment secondapp --image=nginx
2. kubectl get deployments secondapp -o yaml | grep label -A2
3. kubectl expose deployment secondapp --type=NodePort --port=80
4. vi ingress.rbac.yaml

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
rules:
- apiGroups:
- “”
resources:
- services
- endpoints
- secrets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:

kind: ServiceAccount
name: traefik-ingress-controller
namespace: kube-system

kubectl create -f ingress.rbac.yaml
wget https://bit.ly/2VCSz3s -O traefik-ds.yaml
vi traefik-ds.yaml
Has done it as instructed on pdf

diff traefik-ds.yaml traefik-ds.yaml.1

kubectl create -f traefik-ds.yaml
vi ingress.rule.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-test
annotations:
kubernetes.io/ingress.classs: traefik
spec:
rules:
- host: www.example.com
http:
paths:
- backend:
serviceName: secondapp
servicePort: 80
path: /

kubectl create -f ingress.rule.yaml

Before i continued all steps (total 17 steps), i checked all of my pods on kube-system namespace.
In total, i has 12 pods. 2/12 are traefik-ingres-controller that i just have made.
Originally, both of traefik pods were in error/crashlookbackoff status.
So, i check them by describing the pod and see their logs. Both of them have same errors.
These were errors i got when i describe the pod:

And these were errors from logs:

Because on the logs it said that listen tcp 80 address already in use, i check on my server
sudo lsof -i -P -n | grep 80

I found my apache 2 was using port 80, so i stop the service
sudo systemctl stop apache2

Then when i checked my pods, 1 of 2 traefik pods still has the same errors (1 Running, 1 Error/Crashloopbackoff)

I have tried to delete the pod and reboot the server, but none solve this problem.
When i did sudo lsof -i -P -n | grep 80

I am wondering what's wrong? I don't get why this happened and how to solve this error.
Please help me to solve this. If the information are not enough, please do tell me so i will post more information.
Thank you.

Best Regards,
Neirkate

chrispokorni · October 2019

Hi @neirkate,
What are the errors for the second traefik pod (-ppjn4) after you stopped the apache service? Did you try deleting the problem pod, and then allow the controller to re-create it?

How many nodes do you have in your cluster?
What type of controller manages your traefik pods?
How are the traefik pods distributed in your cluster?
Where is the problem pod running?

From your descriptions and outputs above, it seems to me you may have missed a few additional steps. I am hoping these questions will point you in the right direction to continue your troubleshooting

Good luck!
-Chris

PS: the latest lab book is versioned 2019-08-12 (download it if you still have access to the course materials)

neirkate · October 2019

Hello @chrispokorni
Thanks for helping me on solving this problem. I'm so lost since i'm new to Kubernetes so i'm sorry if the information was not enough before.

The error is still same. I have tried delete the problem pod but still the error haven't solved. Yes, the controller re-create the deleted pod automatically.

There are 2 nodes in my cluster, one for master and one for ubuntu (worker).
Controller that manages my traefik pods is DaemonSet.
The traefik pods both distributed to my ubuntu (worker) node.
The problem pod is running on ubuntu (worker).

What kind of additional steps that i needed to do?
Can 2 pods in one node using same ports?

I still have access and will download it. Thank you for the news

Best Regards,
Neirkate

neirkate · October 2019

Sorry, @chrispokorni

I just checked, and actually i was wrong. So, one traefik goes to master and one goes to worker.
I just stopped the apache2 on the worker and all of my traefik nodes are running now.

But i have another questions.
Still the same Lab Book, it is on Lab 10.1 and step number 11.

The ones in the red rectangle, is that the external IP? and where can i get it?
Is it on the service? I've run kubectl get svc, but did not find the external IP there.
Is the cluster IP meant to be accessed? I have tried to curl from there to get to the nginx homepage, but it did not work.
My nginx pod is on worker node, is it normal that if i curl it using my master IP, it won't get to the nginx homepage, but if i use my worker IP, i can get to the nginx homepage?

Best Regards,
Neirkate

chrispokorni · October 2019

Hi,
I am glad you were able to determine the cause of the second traefik pod misbehaving. In such cases, when you discover a problem with one of your nodes, chances are that other nodes in the cluster will cause the same issues.

The external IP = public IP of the node. It is not a service IP.
Cluster IP is accessible only from within the cluster, and it is not accessible from anywhere outside the cluster, such as the internet.
Curling the pod's IP should display the nginx page, regardless where you are curling from in the cluster, from either a master node or a worker node. Pod IP is not accessible from outside the cluster. That is why we need a service to expose the pod to the outside world.

Regards,
-Chris

neirkate · October 2019

Hello @chrispokorni

Thanks for helping me on troubleshooting the problems i encountered. Sorry, it seems i have many questions to ask now because i need to understand how Kubernetes works, especially the network part.

Then, the external IP is my node's IP?
May i ask in what occassion the ClusterIP is used?
Is the Pod IP is the one that i got from Calico if i used Calico as my CNI? I can't get to the nginx homepage using any of IP except for node IP where the Pod deployed (worker). So, if the nginx is deployed on worker, i must use the worker IP to curl and get to the nginx homepage.
Do i need to use the service to expose my nginx container if i just want to curl it from my master or worker node? I created a deployment using nginx image, i saw the pod was running but i did not exposing the nginx container. Then, i curl nginx container from my master node using the worker node IP since it got deployed at my worker node but it did not work, using the Pod's IP also did not work.

Best Regards,
Neirkate.

chrispokorni · October 2019

Hi @neirkate,

YES
In addition to Chapter 8 on Services, I suggest some reading material from the kubernetes official documentation:

https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/

https://kubernetes.io/docs/concepts/services-networking/service/

https://kubernetes.io/docs/tutorials/kubernetes-basics/expose/expose-intro/
When a pod is created, it receives an IP address - accessible only from within the cluster. So individual pods may be accessed from within the cluster, from either node. What you seem to be experiencing is a network issue between your nodes, if you cannot access an individual pod from any node. On this networking/communication issue, I suggest revisiting page 4.23.
Without a service, you (admin or developer) can only access an individual pod by its individual IP address, only from within the cluster. What happens when this pod's IP address changes? What agent would help in this case, to ensure seamless communication between internal/external clients and the pod? With multiple instances of the same pod, how would you ensure that each pod equally shares the client workload?

Curl is a good testing and troubleshooting tool, but it is not necessarily how a client will ultimately access the applications running in pods on your Kubernetes cluster.

Regards,
-Chris

neirkate · October 2019

Hello @chrispokorni

I see, external ip is equal to node ip then it means the internal ip is divided to pod ip and cluster ip, am i right? Cmiiw.
I will do some reading on the link you gave me. Thank you for the suggestions.
Which page are you referring to? From the lab book, kubernetes website, or other references?
Actually, i don't know how to answer that correctly. I have some guesses but i'm not sure i'm correct. Can you tell me where can i find those answers from your questions?

Best Regards,
Neirkate

chrispokorni · October 2019

Hi @neirkate,

I was suggesting page/slide 4.23 of the course, for Kubernetes communication requirements.

I am not sure I follow your statement about the IPs, but for clarification: the external IP is the public IP of the node, internal IP is the private IP of the node, the services have their own internal/private network (or subnet) of IPs (the Cluster IPs), and the pods have their own internal/private IP subnet. Was this what you meant?

Take the time to read carefully the course lecture materials, the lab exercises, and the references provided. Once the information sets in, it will be quite easy to answer those questions.

Regards,
-Chris

Back-off restarting failed container

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)