Welcome to the Linux Foundation Forum!

Lab 12.7 Problem timeouts on dashboard

Posts: 5
edited February 2020 in LFS258 Class Forum

Hi I have problems in the labs 12.7, also 12.x but i has fix it, but in this Lab is impossible to me. I´m problems with timeouts, I know that is problem of network (I think) but how is the best line to fix it.

docker@k8s1:~/k8s$ kubectl -n kubernetes-dashboard logs kubernetes-dashboard-b65488c4-2cp6s
2020/02/05 05:23:28 Using namespace: kubernetes-dashboard
2020/02/05 05:23:28 Using in-cluster config to connect to apiserver
2020/02/05 05:23:28 Starting overwatch
2020/02/05 05:23:28 Using secret token for csrf signing
2020/02/05 05:23:28 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get https://10.96.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf: dial tcp 10.96.0.1:443: i/o timeout

goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(csrfTokenManager).init(0xc00050f740)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:40 +0x3b4
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:65
github.com/kubernetes/dashboard/src/app/backend/client.(
clientManager).initCSRFKey(0xc000381b80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:487 +0xc7
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0xc000381b80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:455 +0x47
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:536
main.main()
/home/travis/build/kubernetes/dashboard/src/app/backend/dashboard.go:105 +0x212

Thanks

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Hi @CharcoGreen,

    From your output, it seems you are experiencing a timeout on port 443. Is it in use by another application, or is it blocked by a firewall of your OS or a firewall at the infrastructure level?

    The first step would be to determine why your traffic is blocked, and after that, come up with an action plan to fix the issue.

    Regards,
    -Chris

  • Thanks for your help,
    I´m renew my cluster and my firewall rules

  • Posts: 32
    edited April 2020

    I have the same issue. Running nodes on VMWare Fusion. The metrics-server pod logs show:

    Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout

    If I utilise nodeSelector to force it to the master it works fine.

    But, trying to run it on a worker I always get that error.

    I have the extra args:

    1. - args:
    2. - --cert-dir=/tmp
    3. - --secure-port=4443
    4. - --kubelet-insecure-tls
    5. - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    6. image: k8s.gcr.io/metrics-server-amd64:v0.3.6

    From the worker node I can curl https://10.96.0.1:443/ just fine, and also from within a pod on the same node (I used the kube-proxy pod container to test from).

    1. # curl -k https://10.96.0.1
    2. {
    3. "kind": "Status",
    4. "apiVersion": "v1",
    5. "metadata": {
    6.  
    7. },
    8. "status": "Failure",
    9. "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
    10. "reason": "Forbidden",
    11. "details": {
    12.  
    13. },
    14. "code": 403
    15. }

    No proimiscious mode on any of my interfaces:

    1. netstat -i #can see no P
    2. Kernel Interface table
    3. Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
    4. calib8f1 1440 1109 0 0 0 1074 0 0 0 BMRU
    5. docker0 1500 0 0 0 0 0 0 0 0 BMU
    6. eth0 1500 199755 0 0 0 86684 0 0 0 BMRU
    7. eth1 1500 263085 0 0 0 219869 0 0 0 BMRU
    8. lo 65536 208914 0 0 0 208914 0 0 0 LRU
    9. tunl0 1440 12756 0 0 0 12719 0 0 0 ORU

    Here are my interfaces on the worker:

    1. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    2. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    3. inet 127.0.0.1/8 scope host lo
    4. valid_lft forever preferred_lft forever
    5. 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    6. link/ether 00:0c:29:9d:50:59 brd ff:ff:ff:ff:ff:ff
    7. inet 192.168.134.131/24 brd 192.168.134.255 scope global dynamic eth0
    8. valid_lft 1624sec preferred_lft 1624sec
    9. inet6 fe80::20c:29ff:fe9d:5059/64 scope link
    10. valid_lft forever preferred_lft forever
    11. 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    12. link/ether 00:0c:29:9d:50:63 brd ff:ff:ff:ff:ff:ff
    13. inet 192.168.10.3/24 brd 192.168.10.255 scope global eth1
    14. valid_lft forever preferred_lft forever
    15. inet6 fe80::20c:29ff:fe9d:5063/64 scope link
    16. valid_lft forever preferred_lft forever
    17. 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    18. link/ether 02:42:cc:78:79:00 brd ff:ff:ff:ff:ff:ff
    19. inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
    20. valid_lft forever preferred_lft forever
    21. 7: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    22. link/ipip 0.0.0.0 brd 0.0.0.0
    23. inet 192.168.230.192/32 brd 192.168.230.192 scope global tunl0
    24. valid_lft forever preferred_lft forever
    25. 74: cali810f9f98cd8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
    26. link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0

    As I have multiple interfaces I've edited the calico daemonset to ensure the vmware private network interface is being used (btw does anyone know how to set this based on specific nodes? If I wanted eth0 on one node, but eth1 on another?):

    1. containers:
    2. - env:
    3. - name: IP_AUTODETECTION_METHOD
    4. value: interface=eth1

    Running tshark on the worker when the metrics-server pod is starting I see:

    1. # tshark -i any 'port 443'
    2. Running as user "root" and group "root". This could be dangerous.
    3. Capturing on 'any'
    4. 1 0.000000000 192.168.230.250 10.96.0.1 TCP 76 55514 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513108884 TSecr=0 WS=128
    5. 2 1.018008286 192.168.230.250 10.96.0.1 TCP 76 [TCP Retransmission] 55514 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513109902 TSecr=0 WS=128
    6. 3 3.033443940 192.168.230.250 10.96.0.1 TCP 76 [TCP Retransmission] 55514 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513111918 TSecr=0 WS=128
    7. 4 7.192949413 192.168.230.250 10.96.0.1 TCP 76 [TCP Retransmission] 55514 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513116077 TSecr=0 WS=128
    8. 5 15.385327465 192.168.230.250 10.96.0.1 TCP 76 [TCP Retransmission] 55514 443 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=2513124269 TSecr=0 WS=128

    Which shows the traffic via tunl0 I suppose, but I'm now a little lost as to where to go from here.

    I've looked through IPTables and can't see anything, and I wondered if maybe there was something with nftables in there but I checked and there are only iptables modules loaded - no nft.

    Any more ideas? I feel pretty lost now.

  • Posts: 32
    edited April 2020

    I've actually got it to 'work' but I neither like nor understand the solution, which irks me.

    I edited the metrics-server deployment and added hostNetwork: true. New pod starts up on the worker and all is fine. I don't actually understand what this is doing however, and why it works. I also then see nothing from tshark.

    So, I wonder why is this working, and does it indicate where I can fix the problem properly?

  • Posts: 2,438

    Hi @dnx,

    The hostNetwork is a feature borrowed from container runtimes, where a container can share the host's network namespace, hence expose itself directly under the host's IP address. While a convenient feature, it also poses security concerns since the container now has access to the node's network stack, which otherwise would not be allowed considering the resource isolation a container was aimed to provide.

    The Kubernetes pod operates the same way when the hostNetwork attribute is set to true. The pod is exposed directly under the node's IP address, sharing the host's network namespace. Easy to implement and use, yet not the most secure. In this case, the pod no longer receives it's IP address from the CNI network plugin (calico) as it is exposed directly via the node's IP address, thus eliminating a level of traffic routing and network abstraction. What seems to be an easy fix, it is not how things were intended to work in Kubernetes. If a pod does not operate as expected over the pod network implemented by the CNI network plugin, there may be several issues with your setup. Several aspects could play a role in why your pod does not behave as expected: the infrastructure networking overall, (in)compatibility between your infra and the CNI plugin or just a missed configuration option specific to the mix of technologies in your setup.

    Part of being a Kubernetes admin is to figure out compatibility and incompatibilities between your infrastructure and cluster components and to discover specifics about certain configuration options in order to overcome such issues (where such specific options are available). Unfortunately, Kubernetes does not fix misconfigured networks, infrastructure, or incompatibilities for us.

    Regards,
    -Chris

  • Posts: 32

    Thanks for the explanation of hostNetwork @chrispokorni . Given all the things that I've checked and listed above do you have any tips or ideas as to where to check next? I've spent a whole day so far on this and feel like I've run to the end of my abilities thus far.

  • Posts: 32

    I went back to basics and checked my cluster init. After destroying the cluster and recreating with some changes it now works fine.

    The two things I changed:

    • added --apiserver-advertise-address to kubeadm, set to the IP of eth1(vmware private network)
    • changed calico.yaml and --pod-network-cidr to 172.16.0.0/16 as I was using 192.168 ranges for eth0 and eth1
  • Posts: 2,438

    I am glad it all works now.
    I was going to suggest exploring the networking section of your hypervisor's documentation, cross-referenced with the calico network plugin documentation to find the missing link. It seems that you found it in the meantime :smile: Great work!

    Regards,
    -Chris

  • @dnx said:
    I went back to basics and checked my cluster init. After destroying the cluster and recreating with some changes it now works fine.

    The two things I changed:

    • added --apiserver-advertise-address to kubeadm, set to the IP of eth1(vmware private network)
    • changed calico.yaml and --pod-network-cidr to 172.16.0.0/16 as I was using 192.168 ranges for eth0 and eth1

    Thanks this really helped!
    My conclusion is: the IP range from which the controlplane and worker nodes get their IP addresses MUST be different from the IP range which is used for the network plugin (CNI) for the pod network.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training