Back-off restarting failed container
Hello all. I may need help on this problem.
Lab Book Version 2019-04-26
This is about Ingress, Lab 10.1 Advanced Service Exposure.
I have done these steps (total 10 steps):
1. kubectl create deployment secondapp --image=nginx
2. kubectl get deployments secondapp -o yaml | grep label -A2
3. kubectl expose deployment secondapp --type=NodePort --port=80
4. vi ingress.rbac.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
rules:
- apiGroups:
- “”
resources:
- services
- endpoints
- secrets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
name: traefik-ingress-controller
namespace: kube-system
- kubectl create -f ingress.rbac.yaml
- wget https://bit.ly/2VCSz3s -O traefik-ds.yaml
- vi traefik-ds.yaml
Has done it as instructed on pdf
diff traefik-ds.yaml traefik-ds.yaml.1
- kubectl create -f traefik-ds.yaml
- vi ingress.rule.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-test
annotations:
kubernetes.io/ingress.classs: traefik
spec:
rules:
- host: www.example.com
http:
paths:
- backend:
serviceName: secondapp
servicePort: 80
path: /
- kubectl create -f ingress.rule.yaml
Before i continued all steps (total 17 steps), i checked all of my pods on kube-system namespace.
In total, i has 12 pods. 2/12 are traefik-ingres-controller that i just have made.
Originally, both of traefik pods were in error/crashlookbackoff status.
So, i check them by describing the pod and see their logs. Both of them have same errors.
These were errors i got when i describe the pod:
And these were errors from logs:
Because on the logs it said that listen tcp 80 address already in use, i check on my server
sudo lsof -i -P -n | grep 80
I found my apache 2 was using port 80, so i stop the service
sudo systemctl stop apache2
Then when i checked my pods, 1 of 2 traefik pods still has the same errors (1 Running, 1 Error/Crashloopbackoff)
I have tried to delete the pod and reboot the server, but none solve this problem.
When i did sudo lsof -i -P -n | grep 80
I am wondering what's wrong? I don't get why this happened and how to solve this error.
Please help me to solve this. If the information are not enough, please do tell me so i will post more information.
Thank you.
Best Regards,
Neirkate
Comments
-
Hi @neirkate,
What are the errors for the second traefik pod (-ppjn4) after you stopped the apache service? Did you try deleting the problem pod, and then allow the controller to re-create it?How many nodes do you have in your cluster?
What type of controller manages your traefik pods?
How are the traefik pods distributed in your cluster?
Where is the problem pod running?From your descriptions and outputs above, it seems to me you may have missed a few additional steps. I am hoping these questions will point you in the right direction to continue your troubleshooting
Good luck!
-ChrisPS: the latest lab book is versioned 2019-08-12 (download it if you still have access to the course materials)
1 -
Hello @chrispokorni
Thanks for helping me on solving this problem. I'm so lost since i'm new to Kubernetes so i'm sorry if the information was not enough before.The error is still same. I have tried delete the problem pod but still the error haven't solved. Yes, the controller re-create the deleted pod automatically.
- There are 2 nodes in my cluster, one for master and one for ubuntu (worker).
- Controller that manages my traefik pods is DaemonSet.
- The traefik pods both distributed to my ubuntu (worker) node.
- The problem pod is running on ubuntu (worker).
What kind of additional steps that i needed to do?
Can 2 pods in one node using same ports?I still have access and will download it. Thank you for the news
Best Regards,
Neirkate0 -
Sorry, @chrispokorni
I just checked, and actually i was wrong. So, one traefik goes to master and one goes to worker.
I just stopped the apache2 on the worker and all of my traefik nodes are running now.But i have another questions.
Still the same Lab Book, it is on Lab 10.1 and step number 11.- The ones in the red rectangle, is that the external IP? and where can i get it?
Is it on the service? I've run kubectl get svc, but did not find the external IP there. - Is the cluster IP meant to be accessed? I have tried to curl from there to get to the nginx homepage, but it did not work.
- My nginx pod is on worker node, is it normal that if i curl it using my master IP, it won't get to the nginx homepage, but if i use my worker IP, i can get to the nginx homepage?
Best Regards,
Neirkate0 - The ones in the red rectangle, is that the external IP? and where can i get it?
-
Hi,
I am glad you were able to determine the cause of the second traefik pod misbehaving. In such cases, when you discover a problem with one of your nodes, chances are that other nodes in the cluster will cause the same issues.- The external IP = public IP of the node. It is not a service IP.
- Cluster IP is accessible only from within the cluster, and it is not accessible from anywhere outside the cluster, such as the internet.
- Curling the pod's IP should display the nginx page, regardless where you are curling from in the cluster, from either a master node or a worker node. Pod IP is not accessible from outside the cluster. That is why we need a service to expose the pod to the outside world.
Regards,
-Chris1 -
Hello @chrispokorni
Thanks for helping me on troubleshooting the problems i encountered. Sorry, it seems i have many questions to ask now because i need to understand how Kubernetes works, especially the network part.
- Then, the external IP is my node's IP?
- May i ask in what occassion the ClusterIP is used?
- Is the Pod IP is the one that i got from Calico if i used Calico as my CNI? I can't get to the nginx homepage using any of IP except for node IP where the Pod deployed (worker). So, if the nginx is deployed on worker, i must use the worker IP to curl and get to the nginx homepage.
- Do i need to use the service to expose my nginx container if i just want to curl it from my master or worker node? I created a deployment using nginx image, i saw the pod was running but i did not exposing the nginx container. Then, i curl nginx container from my master node using the worker node IP since it got deployed at my worker node but it did not work, using the Pod's IP also did not work.
Best Regards,
Neirkate.0 -
Hi @neirkate,
- YES
In addition to Chapter 8 on Services, I suggest some reading material from the kubernetes official documentation:
https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/
https://kubernetes.io/docs/concepts/services-networking/service/
https://kubernetes.io/docs/tutorials/kubernetes-basics/expose/expose-intro/
When a pod is created, it receives an IP address - accessible only from within the cluster. So individual pods may be accessed from within the cluster, from either node. What you seem to be experiencing is a network issue between your nodes, if you cannot access an individual pod from any node. On this networking/communication issue, I suggest revisiting page 4.23.
Without a service, you (admin or developer) can only access an individual pod by its individual IP address, only from within the cluster. What happens when this pod's IP address changes? What agent would help in this case, to ensure seamless communication between internal/external clients and the pod? With multiple instances of the same pod, how would you ensure that each pod equally shares the client workload?
Curl is a good testing and troubleshooting tool, but it is not necessarily how a client will ultimately access the applications running in pods on your Kubernetes cluster.
Regards,
-Chris0 -
Hello @chrispokorni
I see, external ip is equal to node ip then it means the internal ip is divided to pod ip and cluster ip, am i right? Cmiiw.
I will do some reading on the link you gave me. Thank you for the suggestions.
Which page are you referring to? From the lab book, kubernetes website, or other references?
Actually, i don't know how to answer that correctly. I have some guesses but i'm not sure i'm correct. Can you tell me where can i find those answers from your questions?Best Regards,
Neirkate0 -
Hi @neirkate,
I was suggesting page/slide 4.23 of the course, for Kubernetes communication requirements.
I am not sure I follow your statement about the IPs, but for clarification: the external IP is the public IP of the node, internal IP is the private IP of the node, the services have their own internal/private network (or subnet) of IPs (the Cluster IPs), and the pods have their own internal/private IP subnet. Was this what you meant?
Take the time to read carefully the course lecture materials, the lab exercises, and the references provided. Once the information sets in, it will be quite easy to answer those questions.
Regards,
-Chris0
Categories
- All Categories
- 217 LFX Mentorship
- 217 LFX Mentorship: Linux Kernel
- 788 Linux Foundation IT Professional Programs
- 352 Cloud Engineer IT Professional Program
- 177 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 146 Cloud Native Developer IT Professional Program
- 137 Express Training Courses
- 137 Express Courses - Discussion Forum
- 6.1K Training Courses
- 46 LFC110 Class Forum - Discontinued
- 70 LFC131 Class Forum
- 42 LFD102 Class Forum
- 226 LFD103 Class Forum
- 18 LFD110 Class Forum
- 37 LFD121 Class Forum
- 18 LFD133 Class Forum
- 7 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 4 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 693 LFD259 Class Forum
- 111 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 144 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 2 LFS116 Class Forum
- 4 LFS118 Class Forum
- 4 LFS142 Class Forum
- 5 LFS144 Class Forum
- 4 LFS145 Class Forum
- 2 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 2 LFS157 Class Forum
- 25 LFS158 Class Forum
- 7 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 31 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム
- 18 LFS203 Class Forum
- 130 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 2 LFS245 Class Forum
- LFS246 Class Forum
- 48 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 150 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 7 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 118 LFS260 Class Forum
- 159 LFS261 Class Forum
- 42 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 22 LFS268 Class Forum
- 30 LFS269 Class Forum
- LFS270 Class Forum
- 202 LFS272 Class Forum
- 2 LFS272-JP クラス フォーラム
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 9 LFW111 Class Forum
- 259 LFW211 Class Forum
- 181 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 795 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 102 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 63 Mobile Computing
- 18 Android
- 33 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 370 Off Topic
- 114 Introductions
- 173 Small Talk
- 22 Study Material
- 805 Programming and Development
- 303 Kernel Development
- 484 Software Development
- 1.8K Software
- 261 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 96 All In Program
- 96 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)