LAB 12.3 Unable to fully collect metrics
When deploying the metrics server, I cannot get any metrics to show. I get errors on the metrics-server pod like:
reststorage.go:144] unable to fetch pod metrics for pod ....
manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from kubelet:kube-master: Get https://kube-master:10250/stats/summary: dial tcp: lookup kube-master on 10.96.0.10:53: server misbehaving
Not sure if there is some additional configuration to be made on the RBAC definition?
Answers
-
Hi @ccamachofg,
From the limited information provided above, it seems that your metrics-server has trouble finding the kube-master kubelet agent. This may happen based on how your cluster DNS is configured.Check the metrics-server's github repo. It may provide additional options to the metrics-server command for exercise 12.3 step 4. Additional values you may try are: "InternalDNS" and "ExternalDNS".
https://github.com/kubernetes-incubator/metrics-server
Regards,
-Chris0 -
Thanks @chrispokorni,
I did some research and found a solution to my issue. Since I am doing all the labs inside VMs in a VirtualBox Nat network I was not able to have dns resolution of my master and worker servers.
So I added static resolution on the coredns configmap like:hosts {
10.0.2.8 kube-worker
10.0.2.8 kube-master
fallthrough
}With this configuration the metrics server was able to resolve and reach the nodes. Everything was fine after that
1 -
I'm facing the same problem. How do you solve the problem editing the configmap? Can you post the configmap?
0 -
I realized that following the lab the metrics-server only works when deployed inside kubernetes master. When the pod is on any worker node it does not reach the kubernetes service ClusterIP, in my case is 10.96.0.1 and port 443. Timeout occurs.
0 -
Hi @MarceloSales ,
That is strange behavior. A service should be accessible on the assigned ClusterIP and exposed port from any node. When it is not, it may be due to a firewall blocking traffic to some ports between the nodes.
Regards,
-Chris1 -
@chrispokorni said:
Hi @MarceloSales ,That is strange behavior. A service should be accessible on the assigned ClusterIP and exposed port from any node. When it is not, it may be due to a firewall blocking traffic to some ports between the nodes.
Regards,
-ChrisThe pod dashboard-metrics-scraper works in worker nodes after insert some iptables rules but does not collect metrics. The pod for kubernetes-dashboard does not works even after insert iptables rules.
0 -
The dashboard is dependent on the metrics-server to display metrics. Without it, the dashboard cannot display any metrics but still allows you to interact with your cluster. You can find out more from the official documentation:
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
IPtables are used for intra-node traffic routing, therefore rules in a particular IPtable will only affect the internal traffic of that node. Kubernetes has a dedicated agent
kube-proxy
in charge of maintaining all routing rules in the IPtables. Your issue, however, is with node-to-node communication, not managed by IPtables. Depending on how your infrastructure is setup (cloud or local) there may be some sort of firewall blocking traffic between your nodes (not internal to any specific node).Regards,
-Chris0 -
@MarceloSales said:
I'm facing the same problem. How do you solve the problem editing the configmap? Can you post the configmap?Here is the configuration I made:
[student@kube-master ~]$ kubectl -n kube-system get configmap coredns -o yaml apiVersion: v1 data: Corefile: | .:53 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa ttl 30 } hosts { 10.0.2.9 kube-worker 10.0.2.8 kube-master fallthrough } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } kind: ConfigMap metadata: creationTimestamp: "2019-08-15T09:35:48Z" name: coredns namespace: kube-system resourceVersion: "115068" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns uid: eb8297d2-d440-4e5a-8e15-0ac2c6437704
And here is my /etc/hosts file
[student@kube-master ~]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.2.8 kube-master 10.0.2.9 kube-worker
Hope this helps
Regards
Camilo0 -
@chrispokorni said:
The dashboard is dependent on the metrics-server to display metrics. Without it, the dashboard cannot display any metrics but still allows you to interact with your cluster. You can find out more from the official documentation:https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
IPtables are used for intra-node traffic routing, therefore rules in a particular IPtable will only affect the internal traffic of that node. Kubernetes has a dedicated agent
kube-proxy
in charge of maintaining all routing rules in the IPtables. Your issue, however, is with node-to-node communication, not managed by IPtables. Depending on how your infrastructure is setup (cloud or local) there may be some sort of firewall blocking traffic between your nodes (not internal to any specific node).Regards,
-ChrisThanks @chrispokorni .
I have three hosts:
192.168.1.200 k8smaster
192.168.1.201 k8sworker1
192.168.1.202 k8sworker2I can ping from every node to every node without problem using his IPs. The clusterip address for my Kubernetes services is 10.96.0.1 but no one except the master can reach this address. This is odd because I have no firewall or iptables rules. Maybe something has to do about the hosts that has two networks interfaces. I'll test the @ccamachofg configuration and see if it works.
Thanks @ccamachofg for your help.
0 -
A ping response is not an indication that all ports are open. For that, you would need to use a different tool (netcat), that allows you to target specific ports during your testing.
What exactly are you trying to accomplish by accessing the kubernetes service? I don't remember any step in the lab exercises working with this particular service.
Are you on Virtualbox? Have you enabled promiscuous mode for the node networking? Is your nodes' subnet overlapping the pod subnet?
Regards,
-Chris0 -
Hi @chrispokorni , thanks again for helping.
Well, this is during metrics-server lab. I'm using virtualbox. The kubernetes service 10.96.0.1 port 443 is running on the master node and that is the ip that the metrics pod trying to connect and receives timeout when this pod is running on any worker node. When I use nodeSelector to force metrics to run inside master the pod starts without problem.
Thanks for the hint with netcat. I'm gonna be crazy, look the output from a worker node:
nc -vv 10.96.0.1 443 Connection to 10.96.0.1 443 port [tcp/https] succeeded!
This is whats happens when I try to start the metric pod in any worker.
```
kubeclt -n kube-system logs metrics-server-XXXXXXOUTPUT BEGIN
Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout
Usage:
[flags]Flags:
--alsologtostderr log to standard error as well as files
--authentication-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to
....
A LOT OF HELP FLAGS
....panic: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
main.main()
/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13bOUTPUT END
I have not enabled promiscuous mode for networking.
Thanks again @chrispokorni for your attention.
0 -
Hello,
A couple of possible issues. If I understood your earlier posts you have multiple interfaces in use. Use wireshark, or some other tool, to determine which interface the request is being made. In the past I've had similar issues, which did not happen on single interface systetms. A previous work-around was to initialize the cluster and then add the second interface. If I did that it worked. Somewhere there a request is being mis-routed, is my guess.
The next issue may be promiscuous mode. You said you did not enable it, but the last time I worked with VB it is enabled by default. You may want to double check each interface on each VM to ensure that all traffic is being allowed. This caused me all sorts of issues until I disabled it.
Regards,
0 -
Thanks @serewicz for your attention. I found it. The problem was related for overlap in my network configuration as @chrispokorni has suggested _ "Is your nodes' subnet overlapping the pod subnet?"_.
I have tried to change my CIDR following this guide https://docs.projectcalico.org/networking/migrate-pools but my core-dns pods did not worked anymore. So I have decided reinstall my cluster (Following the exercises I have created a ansible playbook, it's about 5 minutes to have a cluster with vagrant and kubeadm on virtualbox) but at this time I changed the CIDR with a range that does not conflicts with my network 192.168.x.x, I choosed 172.16.0.0/16. Does not forget to edit calico.yaml and adjust the variable CALICO_IPV4POOL_CIDR to your new IP Range. Everything is working fine now. Pay attention when you are installing the cluster to network range to avoid conflicts and overlapping. Hope that this information can help someone. Thanks to everyone that helped me.0 -
Thanks for the feedback. I think if you check out Exercise 3.1, step 10 it speaks to your issue specifically. It is important to read each step, more than just the command being run.
Changing the IP pools after initialization are near impossible, and most rebuild the cluster rather then track down every possible place the information is used.
0 -
@serewicz said:
Thanks for the feedback. I think if you check out Exercise 3.1, step 10 it speaks to your issue specifically. It is important to read each step, more than just the command being run.Changing the IP pools after initialization are near impossible, and most rebuild the cluster rather then track down every possible place the information is used.
Hi @serewicz , thanks. You're right. I read everything and in the step 10 shows exactly the IP 192.168.0.0/16 but I did not knew about the overlapping risk and the exercise does not warn us about it. Maybe some warning about this can help other in the future. My fault, network configurations has a lot of pitfalls for me.
0
Categories
- All Categories
- 206 LFX Mentorship
- 206 LFX Mentorship: Linux Kernel
- 733 Linux Foundation IT Professional Programs
- 339 Cloud Engineer IT Professional Program
- 165 Advanced Cloud Engineer IT Professional Program
- 66 DevOps Engineer IT Professional Program
- 132 Cloud Native Developer IT Professional Program
- 119 Express Training Courses
- 119 Express Courses - Discussion Forum
- 5.9K Training Courses
- 40 LFC110 Class Forum - Discontinued
- 66 LFC131 Class Forum
- 39 LFD102 Class Forum
- 219 LFD103 Class Forum
- 17 LFD110 Class Forum
- 32 LFD121 Class Forum
- 17 LFD133 Class Forum
- 6 LFD134 Class Forum
- 17 LFD137 Class Forum
- 70 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 684 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 96 LFS101 Class Forum
- LFS111 Class Forum
- 2 LFS112 Class Forum
- 1 LFS116 Class Forum
- 3 LFS118 Class Forum
- 2 LFS142 Class Forum
- 3 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 2 LFS147 Class Forum
- 8 LFS151 Class Forum
- 1 LFS157 Class Forum
- 10 LFS158 Class Forum
- 4 LFS162 Class Forum
- 1 LFS166 Class Forum
- 3 LFS167 Class Forum
- 1 LFS170 Class Forum
- 1 LFS171 Class Forum
- 2 LFS178 Class Forum
- 2 LFS180 Class Forum
- 1 LFS182 Class Forum
- 4 LFS183 Class Forum
- 30 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 17 LFS203 Class Forum
- 112 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 49 LFS241 Class Forum
- 43 LFS242 Class Forum
- 37 LFS243 Class Forum
- 13 LFS244 Class Forum
- 1 LFS245 Class Forum
- 45 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 143 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 6 LFS256 Class Forum
- LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 114 LFS260 Class Forum
- 152 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 23 LFS267 Class Forum
- 18 LFS268 Class Forum
- 29 LFS269 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 2 LFW111 Class Forum
- 257 LFW211 Class Forum
- 176 LFW212 Class Forum
- 12 SKF100 Class Forum
- SKF200 Class Forum
- 791 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 98 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 754 Linux Distributions
- 82 Debian
- 67 Fedora
- 16 Linux Mint
- 13 Mageia
- 23 openSUSE
- 147 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 351 Ubuntu
- 464 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 91 Linux Security
- 78 Network Management
- 101 System Management
- 47 Web Management
- 56 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 365 Off Topic
- 113 Introductions
- 171 Small Talk
- 20 Study Material
- 522 Programming and Development
- 291 Kernel Development
- 213 Software Development
- 1.1K Software
- 212 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 311 Installation
- 79 All In Program
- 79 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)