Lab 12.7 Problem timeouts on dashboard
Hi I have problems in the labs 12.7, also 12.x but i has fix it, but in this Lab is impossible to me. I´m problems with timeouts, I know that is problem of network (I think) but how is the best line to fix it.
docker@k8s1:~/k8s$ kubectl -n kubernetes-dashboard logs kubernetes-dashboard-b65488c4-2cp6s
2020/02/05 05:23:28 Using namespace: kubernetes-dashboard
2020/02/05 05:23:28 Using in-cluster config to connect to apiserver
2020/02/05 05:23:28 Starting overwatch
2020/02/05 05:23:28 Using secret token for csrf signing
2020/02/05 05:23:28 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get https://10.96.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf: dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(csrfTokenManager).init(0xc00050f740)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:40 +0x3b4
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:65
github.com/kubernetes/dashboard/src/app/backend/client.(clientManager).initCSRFKey(0xc000381b80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:487 +0xc7
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0xc000381b80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:455 +0x47
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:536
main.main()
/home/travis/build/kubernetes/dashboard/src/app/backend/dashboard.go:105 +0x212
Thanks
Comments
-
Hi @CharcoGreen,
From your output, it seems you are experiencing a timeout on port 443. Is it in use by another application, or is it blocked by a firewall of your OS or a firewall at the infrastructure level?
The first step would be to determine why your traffic is blocked, and after that, come up with an action plan to fix the issue.
Regards,
-Chris0 -
Thanks for your help,
I´m renew my cluster and my firewall rules0 -
I have the same issue. Running nodes on VMWare Fusion. The metrics-server pod logs show:
Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout
If I utilise nodeSelector to force it to the master it works fine.
But, trying to run it on a worker I always get that error.
I have the extra args:
- args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname image: k8s.gcr.io/metrics-server-amd64:v0.3.6
From the worker node I can curl
https://10.96.0.1:443/
just fine, and also from within a pod on the same node (I used the kube-proxy pod container to test from).# curl -k https://10.96.0.1 { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"", "reason": "Forbidden", "details": { }, "code": 403 }
No proimiscious mode on any of my interfaces:
netstat -i #can see no P Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg calib8f1 1440 1109 0 0 0 1074 0 0 0 BMRU docker0 1500 0 0 0 0 0 0 0 0 BMU eth0 1500 199755 0 0 0 86684 0 0 0 BMRU eth1 1500 263085 0 0 0 219869 0 0 0 BMRU lo 65536 208914 0 0 0 208914 0 0 0 LRU tunl0 1440 12756 0 0 0 12719 0 0 0 ORU
Here are my interfaces on the worker:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:0c:29:9d:50:59 brd ff:ff:ff:ff:ff:ff inet 192.168.134.131/24 brd 192.168.134.255 scope global dynamic eth0 valid_lft 1624sec preferred_lft 1624sec inet6 fe80::20c:29ff:fe9d:5059/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:0c:29:9d:50:63 brd ff:ff:ff:ff:ff:ff inet 192.168.10.3/24 brd 192.168.10.255 scope global eth1 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe9d:5063/64 scope link valid_lft forever preferred_lft forever 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:cc:78:79:00 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever 7: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 inet 192.168.230.192/32 brd 192.168.230.192 scope global tunl0 valid_lft forever preferred_lft forever 74: cali810f9f98cd8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
As I have multiple interfaces I've edited the calico daemonset to ensure the vmware private network interface is being used (btw does anyone know how to set this based on specific nodes? If I wanted eth0 on one node, but eth1 on another?):
containers: - env: - name: IP_AUTODETECTION_METHOD value: interface=eth1
Running tshark on the worker when the metrics-server pod is starting I see:
# tshark -i any 'port 443' Running as user "root" and group "root". This could be dangerous. Capturing on 'any' 1 0.000000000 192.168.230.250 → 10.96.0.1 TCP 76 55514 → 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513108884 TSecr=0 WS=128 2 1.018008286 192.168.230.250 → 10.96.0.1 TCP 76 [TCP Retransmission] 55514 → 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513109902 TSecr=0 WS=128 3 3.033443940 192.168.230.250 → 10.96.0.1 TCP 76 [TCP Retransmission] 55514 → 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513111918 TSecr=0 WS=128 4 7.192949413 192.168.230.250 → 10.96.0.1 TCP 76 [TCP Retransmission] 55514 → 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513116077 TSecr=0 WS=128 5 15.385327465 192.168.230.250 → 10.96.0.1 TCP 76 [TCP Retransmission] 55514 → 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513124269 TSecr=0 WS=128
Which shows the traffic via tunl0 I suppose, but I'm now a little lost as to where to go from here.
I've looked through IPTables and can't see anything, and I wondered if maybe there was something with nftables in there but I checked and there are only iptables modules loaded - no nft.
Any more ideas? I feel pretty lost now.
0 -
I've actually got it to 'work' but I neither like nor understand the solution, which irks me.
I edited the metrics-server deployment and added
hostNetwork: true
. New pod starts up on the worker and all is fine. I don't actually understand what this is doing however, and why it works. I also then see nothing from tshark.So, I wonder why is this working, and does it indicate where I can fix the problem properly?
0 -
Hi @dnx,
The
hostNetwork
is a feature borrowed from container runtimes, where a container can share the host's network namespace, hence expose itself directly under the host's IP address. While a convenient feature, it also poses security concerns since the container now has access to the node's network stack, which otherwise would not be allowed considering the resource isolation a container was aimed to provide.The Kubernetes pod operates the same way when the
hostNetwork
attribute is set totrue
. The pod is exposed directly under the node's IP address, sharing the host's network namespace. Easy to implement and use, yet not the most secure. In this case, the pod no longer receives it's IP address from the CNI network plugin (calico) as it is exposed directly via the node's IP address, thus eliminating a level of traffic routing and network abstraction. What seems to be an easy fix, it is not how things were intended to work in Kubernetes. If a pod does not operate as expected over the pod network implemented by the CNI network plugin, there may be several issues with your setup. Several aspects could play a role in why your pod does not behave as expected: the infrastructure networking overall, (in)compatibility between your infra and the CNI plugin or just a missed configuration option specific to the mix of technologies in your setup.Part of being a Kubernetes admin is to figure out compatibility and incompatibilities between your infrastructure and cluster components and to discover specifics about certain configuration options in order to overcome such issues (where such specific options are available). Unfortunately, Kubernetes does not fix misconfigured networks, infrastructure, or incompatibilities for us.
Regards,
-Chris0 -
Thanks for the explanation of hostNetwork @chrispokorni . Given all the things that I've checked and listed above do you have any tips or ideas as to where to check next? I've spent a whole day so far on this and feel like I've run to the end of my abilities thus far.
0 -
I went back to basics and checked my cluster init. After destroying the cluster and recreating with some changes it now works fine.
The two things I changed:
- added
--apiserver-advertise-address
to kubeadm, set to the IP of eth1(vmware private network) - changed calico.yaml and
--pod-network-cidr
to 172.16.0.0/16 as I was using 192.168 ranges for eth0 and eth1
1 - added
-
I am glad it all works now.
I was going to suggest exploring the networking section of your hypervisor's documentation, cross-referenced with the calico network plugin documentation to find the missing link. It seems that you found it in the meantime Great work!Regards,
-Chris0 -
@dnx said:
I went back to basics and checked my cluster init. After destroying the cluster and recreating with some changes it now works fine.The two things I changed:
- added
--apiserver-advertise-address
to kubeadm, set to the IP of eth1(vmware private network) - changed calico.yaml and
--pod-network-cidr
to 172.16.0.0/16 as I was using 192.168 ranges for eth0 and eth1
Thanks this really helped!
My conclusion is: the IP range from which the controlplane and worker nodes get their IP addresses MUST be different from the IP range which is used for the network plugin (CNI) for the pod network.0 - added
Categories
- All Categories
- 217 LFX Mentorship
- 217 LFX Mentorship: Linux Kernel
- 788 Linux Foundation IT Professional Programs
- 352 Cloud Engineer IT Professional Program
- 177 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 146 Cloud Native Developer IT Professional Program
- 137 Express Training Courses
- 137 Express Courses - Discussion Forum
- 6.2K Training Courses
- 46 LFC110 Class Forum - Discontinued
- 70 LFC131 Class Forum
- 42 LFD102 Class Forum
- 226 LFD103 Class Forum
- 18 LFD110 Class Forum
- 37 LFD121 Class Forum
- 18 LFD133 Class Forum
- 7 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 4 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 694 LFD259 Class Forum
- 111 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 146 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 2 LFS116 Class Forum
- 4 LFS118 Class Forum
- 6 LFS142 Class Forum
- 5 LFS144 Class Forum
- 4 LFS145 Class Forum
- 2 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 2 LFS157 Class Forum
- 25 LFS158 Class Forum
- 7 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 31 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム
- 18 LFS203 Class Forum
- 130 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 2 LFS245 Class Forum
- LFS246 Class Forum
- 48 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 151 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 7 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 118 LFS260 Class Forum
- 159 LFS261 Class Forum
- 42 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 22 LFS268 Class Forum
- 30 LFS269 Class Forum
- LFS270 Class Forum
- 202 LFS272 Class Forum
- 2 LFS272-JP クラス フォーラム
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 9 LFW111 Class Forum
- 259 LFW211 Class Forum
- 181 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 795 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 102 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 63 Mobile Computing
- 18 Android
- 33 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 22 Study Material
- 805 Programming and Development
- 303 Kernel Development
- 484 Software Development
- 1.8K Software
- 261 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 96 All In Program
- 96 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)