LAB 12.3 Unable to fully collect metrics
When deploying the metrics server, I cannot get any metrics to show. I get errors on the metrics-server pod like:
reststorage.go:144] unable to fetch pod metrics for pod ....
manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from kubelet:kube-master: Get https://kube-master:10250/stats/summary: dial tcp: lookup kube-master on 10.96.0.10:53: server misbehaving
Not sure if there is some additional configuration to be made on the RBAC definition?
Answers
-
Hi @ccamachofg,
From the limited information provided above, it seems that your metrics-server has trouble finding the kube-master kubelet agent. This may happen based on how your cluster DNS is configured.Check the metrics-server's github repo. It may provide additional options to the metrics-server command for exercise 12.3 step 4. Additional values you may try are: "InternalDNS" and "ExternalDNS".
https://github.com/kubernetes-incubator/metrics-server
Regards,
-Chris0 -
Thanks @chrispokorni,
I did some research and found a solution to my issue. Since I am doing all the labs inside VMs in a VirtualBox Nat network I was not able to have dns resolution of my master and worker servers.
So I added static resolution on the coredns configmap like:hosts {
10.0.2.8 kube-worker
10.0.2.8 kube-master
fallthrough
}With this configuration the metrics server was able to resolve and reach the nodes. Everything was fine after that
1 -
I'm facing the same problem. How do you solve the problem editing the configmap? Can you post the configmap?
0 -
I realized that following the lab the metrics-server only works when deployed inside kubernetes master. When the pod is on any worker node it does not reach the kubernetes service ClusterIP, in my case is 10.96.0.1 and port 443. Timeout occurs.
0 -
Hi @MarceloSales ,
That is strange behavior. A service should be accessible on the assigned ClusterIP and exposed port from any node. When it is not, it may be due to a firewall blocking traffic to some ports between the nodes.
Regards,
-Chris1 -
@chrispokorni said:
Hi @MarceloSales ,That is strange behavior. A service should be accessible on the assigned ClusterIP and exposed port from any node. When it is not, it may be due to a firewall blocking traffic to some ports between the nodes.
Regards,
-ChrisThe pod dashboard-metrics-scraper works in worker nodes after insert some iptables rules but does not collect metrics. The pod for kubernetes-dashboard does not works even after insert iptables rules.
0 -
The dashboard is dependent on the metrics-server to display metrics. Without it, the dashboard cannot display any metrics but still allows you to interact with your cluster. You can find out more from the official documentation:
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
IPtables are used for intra-node traffic routing, therefore rules in a particular IPtable will only affect the internal traffic of that node. Kubernetes has a dedicated agent
kube-proxyin charge of maintaining all routing rules in the IPtables. Your issue, however, is with node-to-node communication, not managed by IPtables. Depending on how your infrastructure is setup (cloud or local) there may be some sort of firewall blocking traffic between your nodes (not internal to any specific node).Regards,
-Chris0 -
@MarceloSales said:
I'm facing the same problem. How do you solve the problem editing the configmap? Can you post the configmap?Here is the configuration I made:
[student@kube-master ~]$ kubectl -n kube-system get configmap coredns -o yaml apiVersion: v1 data: Corefile: | .:53 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa ttl 30 } hosts { 10.0.2.9 kube-worker 10.0.2.8 kube-master fallthrough } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } kind: ConfigMap metadata: creationTimestamp: "2019-08-15T09:35:48Z" name: coredns namespace: kube-system resourceVersion: "115068" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns uid: eb8297d2-d440-4e5a-8e15-0ac2c6437704And here is my /etc/hosts file
[student@kube-master ~]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.2.8 kube-master 10.0.2.9 kube-worker
Hope this helps
Regards
Camilo0 -
@chrispokorni said:
The dashboard is dependent on the metrics-server to display metrics. Without it, the dashboard cannot display any metrics but still allows you to interact with your cluster. You can find out more from the official documentation:https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
IPtables are used for intra-node traffic routing, therefore rules in a particular IPtable will only affect the internal traffic of that node. Kubernetes has a dedicated agent
kube-proxyin charge of maintaining all routing rules in the IPtables. Your issue, however, is with node-to-node communication, not managed by IPtables. Depending on how your infrastructure is setup (cloud or local) there may be some sort of firewall blocking traffic between your nodes (not internal to any specific node).Regards,
-ChrisThanks @chrispokorni .
I have three hosts:
192.168.1.200 k8smaster
192.168.1.201 k8sworker1
192.168.1.202 k8sworker2I can ping from every node to every node without problem using his IPs. The clusterip address for my Kubernetes services is 10.96.0.1 but no one except the master can reach this address. This is odd because I have no firewall or iptables rules. Maybe something has to do about the hosts that has two networks interfaces. I'll test the @ccamachofg configuration and see if it works.
Thanks @ccamachofg for your help.
0 -
A ping response is not an indication that all ports are open. For that, you would need to use a different tool (netcat), that allows you to target specific ports during your testing.
What exactly are you trying to accomplish by accessing the kubernetes service? I don't remember any step in the lab exercises working with this particular service.
Are you on Virtualbox? Have you enabled promiscuous mode for the node networking? Is your nodes' subnet overlapping the pod subnet?
Regards,
-Chris0 -
Hi @chrispokorni , thanks again for helping.
Well, this is during metrics-server lab. I'm using virtualbox. The kubernetes service 10.96.0.1 port 443 is running on the master node and that is the ip that the metrics pod trying to connect and receives timeout when this pod is running on any worker node. When I use nodeSelector to force metrics to run inside master the pod starts without problem.
Thanks for the hint with netcat. I'm gonna be crazy, look the output from a worker node:
nc -vv 10.96.0.1 443 Connection to 10.96.0.1 443 port [tcp/https] succeeded!
This is whats happens when I try to start the metric pod in any worker.
```
kubeclt -n kube-system logs metrics-server-XXXXXXOUTPUT BEGIN
Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout
Usage:
[flags]Flags:
--alsologtostderr log to standard error as well as files
--authentication-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to
....
A LOT OF HELP FLAGS
....panic: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
main.main()
/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13bOUTPUT END
I have not enabled promiscuous mode for networking.
Thanks again @chrispokorni for your attention.
0 -
Thanks @serewicz for your attention. I found it. The problem was related for overlap in my network configuration as @chrispokorni has suggested _ "Is your nodes' subnet overlapping the pod subnet?"_.
I have tried to change my CIDR following this guide https://docs.projectcalico.org/networking/migrate-pools but my core-dns pods did not worked anymore. So I have decided reinstall my cluster (Following the exercises I have created a ansible playbook, it's about 5 minutes to have a cluster with vagrant and kubeadm on virtualbox) but at this time I changed the CIDR with a range that does not conflicts with my network 192.168.x.x, I choosed 172.16.0.0/16. Does not forget to edit calico.yaml and adjust the variable CALICO_IPV4POOL_CIDR to your new IP Range. Everything is working fine now. Pay attention when you are installing the cluster to network range to avoid conflicts and overlapping. Hope that this information can help someone. Thanks to everyone that helped me.0 -
@serewicz said:
Thanks for the feedback. I think if you check out Exercise 3.1, step 10 it speaks to your issue specifically. It is important to read each step, more than just the command being run.Changing the IP pools after initialization are near impossible, and most rebuild the cluster rather then track down every possible place the information is used.
Hi @serewicz , thanks. You're right. I read everything and in the step 10 shows exactly the IP 192.168.0.0/16 but I did not knew about the overlapping risk and the exercise does not warn us about it. Maybe some warning about this can help other in the future. My fault, network configurations has a lot of pitfalls for me.
0
Categories
- All Categories
- 175 LFX Mentorship
- 175 LFX Mentorship: Linux Kernel
- 745 Linux Foundation IT Professional Programs
- 372 Cloud Engineer IT Professional Program
- 168 Advanced Cloud Engineer IT Professional Program
- 73 DevOps IT Professional Program - Discontinued
- 3 DevOps & GitOps IT Professional Program
- 98 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- AI & ML Training
- Blockchain & Decentralized Identity Training
- Cloud & Containers Training
- Cybersecurity Training
- DevOps & Site-Reliability Training
- Linux Kernel Development Training
- Networking Training
- Open Source Best Practice Training
- System Administration Training
- System Engineering Training
- Web & Application Development Training
- 55 LFD102 Class Forum
- 261 LFD103 Class Forum
- 2 LFD103-JP クラス フォーラム
- 2 LFD114 Class Forum
- 56 LFD121 Class Forum
- 4 LFD123 Class Forum
- 3 LFD125 Class Forum
- 5 LFD137 Class Forum
- 2 LFD140 Class Forum
- 4 LFD210-CN Class Forum
- 2 LFD221 Class Forum
- 26 LFD254 Class Forum
- 764 LFD259 Class Forum
- 680 LFS101 Class Forum
- 3 LFS114 Class Forum
- 6 LFS118 Class Forum
- 3 LFS120 Class Forum
- 2 LFS140 Class Forum
- 30 LFS148 Class Forum
- 2 LFS158-JP クラス フォーラム
- 4 LFS180 Class Forum
- 3 LFS184 Class Forum
- 162 LFS207 Class Forum
- 3 LFS207-DE-Klassenforum
- 4 LFS207-JP クラス フォーラム
- 61 LFS241 Class Forum
- 52 LFS242 Class Forum
- 42 LFS243 Class Forum
- 19 LFS244 Class Forum
- 9 LFS245 Class Forum
- 3 LFS246 Class Forum
- 2 LFS248 Class Forum
- 179 LFS250 Class Forum
- 4 LFS250-JP クラス フォーラム
- 166 LFS253 Class Forum
- 5 LFS255 Class Forum
- 19 LFS256 Class Forum
- 3 LFS257 Class Forum
- 1.4K LFS258 Class Forum
- 13 LFS258-JP クラス フォーラム
- 165 LFS261 Class Forum
- 26 LFS267 Class Forum
- 28 LFS268 Class Forum
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 768 Linux Distributions
- 81 Debian
- 67 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 105 Mobile Computing
- 18 Android
- 72 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 933 Programming and Development
- 310 Kernel Development
- 605 Software Development
- 974 Software
- 366 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 75 All In Program
- 75 All In Forum
- 25 LFC110 Class Forum - Discontinued
- 2 LFS112 Class Forum - Discontinued
- 22 LFS151 Class Forum - Discontinued
- 1 LFS166 Class Forum - Discontinued
- 9 LFS167 Class Forum - Discontinued
- 4 LFS170 Class Forum - Discontinued
- 1 LFS171 Class Forum - Discontinued
- 3 LFS178 Class Forum - Discontinued
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム - Discontinued
- 301 LFS211 Class Forum - Discontinued
- 55 LFS216 Class Forum - Discontinued
- 2 LFS251 Class Forum - Discontinued
- 1 LFS254 Class Forum - Discontinued
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 2 LFS272-JP クラス フォーラム - Discontinued
- 202 LFS272 Class Forum - Discontinued
- 1 LFS274 Class Forum - Discontinued
- 4 LFS281 Class Forum - Discontinued
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)