Welcome to the Linux Foundation Forum!

Lab 10.1

promagnoli
promagnoli Posts: 14
edited June 2020 in LFS258 Class Forum

When I "curl -H "Host: www.example.com" http://k8smaster/" I get a timeout. I can curl to nginx welcome page using NodePort ClusterIP. I tried to use master IP rather than alias, but with no success.

When I create the ingress rule I notice that the response is different from those reported in the lab doc.

My Lab:
$kubectl create -f ingress.rule.yaml
ingress.networking.k8s.io/ingress-test created

Lab Docs:
$ kubectl create -f ingress.rule.yaml
ingress.extensions "ingress-test" created

Here is a description of the ingress that is created:
$ kubectl describe ingresses ingress-test
Name: ingress-test
Namespace: default
Address:
Default backend: default-http-backend:80 ()
Rules:
Host Path Backends
---- ---- --------
www.example.com / secondapp:80 (192.168.131.30:80)

Annotations:
kubernetes.io/ingress.class: traefik
Events:

Since we create a NodePort service for the secondapp I would expect to see the NodePort ClusterIP in the backend, actually it's showing the deployment endpoint.

Any idea?

Comments

  • chrispokorni
    chrispokorni Posts: 2,349

    Hi @promagnoli,

    A similar issue was reported earlier in the forum. Take a look at the earlier post, as your issue may be related.

    https://forum.linuxfoundation.org/discussion/comment/24095#Comment_24095

    Regards,
    -Chris

  • promagnoli
    promagnoli Posts: 14

    Hi @chrispokorni,

    It took to me some time to follow up your feedback as I had no time to work on this.

    I reviewed the post you mentioned, however the issue mentioned in the post is related to traefik version which is not my case. However, I used thread as a guide for troubleshooting. This led me to find a warning message in to traefik pods log "time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp".

    My current situation is as follows:
    1. nginx pod is running
    2. nginx pod is running on the worker node and I can get nginx welcome page from worker node localhost
    3. I cannot get nginx page from endpoint issuing the command from master (see below), but I can get it from worker node
    4. I cannot get nginx page from Cluster IP issuing the commnad from both master and worker node
    5. traefik pods are running
    6. nginx pod is correctly selected by the service secondapp
    7. the service secpondapp has an endpoint
    8. the ingress is using the service as backend
    9. the cluster role is in place
    10. the role binding is in place

    I don't know how to further troubleshoot that worning message. Any idea?

    In the following post I am going to share deatils of my env and the 10 checks mentioned above.

    Thanks
    Paolo

  • promagnoli
    promagnoli Posts: 14

    Here is my env summary:

    $ kubectl get svc
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    kubernetes ClusterIP 10.96.0.1 443/TCP 119d
    secondapp NodePort 10.109.105.48 80:30382/TCP 17d

    $ kubectl get ingress ingress-test
    NAME HOSTS ADDRESS PORTS AGE
    ingress-test www.example.com 80 17d

    $ kubectl get ep
    NAME ENDPOINTS AGE
    kubernetes 10.0.0.141:6443,10.0.0.18:6443 119d
    secondapp 192.168.131.27:80 17d

    1. nginx pod is running

    $ kubectl get po
    NAME READY STATUS RESTARTS AGE
    secondapp-5cf87c9f48-f9krw 1/1 Running 1 17d

    2. nginx pod is running on the worker node and I can get nginx welcome page from worker node localhost

    $ curl -H "Host: www.example.com" http://localhost/
    <!DOCTYPE html>


    Welcome to nginx!

    body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; }


    Welcome to nginx!

    If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

    For online documentation and support please refer to nginx.org.
    Commercial support is available at nginx.com.

    Thank you for using nginx.


    3. I cannot get nginx page from endpoint issuing the command from master (see below), but I can get it from worker node

    $ curl -H "Host: www.example.com" http://192.168.131.27/ -v

    • Trying 192.168.131.27...
    • TCP_NODELAY set
    • connect to 192.168.131.27 port 80 failed: Connection timed out
    • Failed to connect to 192.168.131.27 port 80: Connection timed out
    • Closing connection 0
      curl: (7) Failed to connect to 192.168.131.27 port 80: Connection timed out

    4. I cannot get nginx page from Cluster IP issuing the commnad from both master and worker node

    $ curl -H "Host: www.example.com" http://10.109.105.48:30382/ -v

    • Trying 10.109.105.48...
    • TCP_NODELAY set
    • connect to 10.109.105.48 port 30382 failed: Connection timed out
    • Failed to connect to 10.109.105.48 port 30382: Connection timed out
    • Closing connection 0
      curl: (7) Failed to connect to 10.109.105.48 port 30382: Connection timed out

    5. traefik pods are running
    $ kubectl get po -n kube-system | grep traefik
    traefik-ingress-controller-6tffr 1/1 Running 1 17d
    traefik-ingress-controller-8xrg5 1/1 Running 1 17d

    I can see a warning message in the traefik pods logs (same warning messages on both pods, reporting log just for a pod)

    $ kubectl -n kube-system logs traefik-ingress-controller-8xrg5
    time="2020-07-17T10:30:22Z" level=info msg="Traefik version v1.7.13 built on 2019-08-08_04:46:14PM"
    time="2020-07-17T10:30:22Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/basics/#collected-data\n"
    time="2020-07-17T10:30:22Z" level=info msg="Preparing server traefik &{Address::8080 TLS: Redirect: Auth: WhitelistSourceRange:[] WhiteList: Compress:false ProxyProtocol: ForwardedHeaders:0xc00031d120} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
    time="2020-07-17T10:30:22Z" level=info msg="Preparing server http &{Address::80 TLS: Redirect: Auth: WhitelistSourceRange:[] WhiteList: Compress:false ProxyProtocol: ForwardedHeaders:0xc00031d100} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
    time="2020-07-17T10:30:22Z" level=info msg="Starting provider configuration.ProviderAggregator {}"
    time="2020-07-17T10:30:22Z" level=info msg="Starting server on :8080"
    time="2020-07-17T10:30:22Z" level=info msg="Starting server on :80"
    time="2020-07-17T10:30:22Z" level=info msg="Starting provider *kubernetes.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"\",\"IngressEndpoint\":null}"
    time="2020-07-17T10:30:22Z" level=info msg="ingress label selector is: \"\""
    time="2020-07-17T10:30:22Z" level=info msg="Creating in-cluster Provider client"
    E0717 10:30:52.998496 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0717 10:30:52.998606 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0717 10:30:53.026236 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.Ingress: Get https://10.96.0.1:443/apis/extensions/v1beta1/ingresses?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:54Z" level=info msg="Server configuration reloaded on :8080"
    time="2020-07-17T10:30:54Z" level=info msg="Server configuration reloaded on :80"
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:56Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:56Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:58Z" level=warning msg="Endpoints not available for default/secondapp"

    6. nginx pod is correctly selected by the service secondapp

    $ kubectl describe po secondapp-5cf87c9f48-f9krw
    Name: secondapp-5cf87c9f48-f9krw
    Namespace: default
    Priority: 0
    Node: ip-10-0-0-18/10.0.0.18
    Start Time: Mon, 29 Jun 2020 13:55:18 +0000
    Labels: app=secondapp <<<--- pod-template-hash=5cf87c9f48 Annotations: cni.projectcalico.org/podIP: 192.168.131.27/32 cni.projectcalico.org/podIPs: 192.168.131.27/32 Status: Running IP: 192.168.131.27 IPs: IP: 192.168.131.27 Controlled By: ReplicaSet/secondapp-5cf87c9f48 Containers: nginx: Container ID: docker://d7ccf81a9c7c2a906393a6e21a3c9f2f1f74a57b358a7965cf31c8ed66c90ef5 Image: nginx Image ID: docker-pullable://nginx@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd Port: <none>
    Host Port:
    State: Running
    Started: Fri, 17 Jul 2020 10:31:07 +0000
    Last State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Mon, 29 Jun 2020 13:55:19 +0000
    Finished: Mon, 29 Jun 2020 16:30:47 +0000
    Ready: True
    Restart Count: 1
    Environment:
    Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-s2vxn (ro)
    Conditions:
    Type Status
    Initialized True
    Ready True
    ContainersReady True
    PodScheduled True
    Volumes:
    default-token-s2vxn:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-s2vxn
    Optional: false
    QoS Class: BestEffort
    Node-Selectors:
    Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
    node.kubernetes.io/unreachable:NoExecute for 300s
    Events:

  • serewicz
    serewicz Posts: 1,000

    Hello Paolo,

    This sounds like a networking issue between your KVM instances and the IPv4 pool. Could you verify that your pod IPv4 pool (192.168.0.0 if you followed the lab) does not overlap your VMs IP range or one of your host's ranges?

    The troubleshooting path I would follow, some of which you have already done, is:
    Test curl from the worker node to the pod IP. You indicated this worked
    Test curl from the master node to the pod IP, I think you indicated this did not work. *This indicates to me the issue is your VM networking

    If/once I fixed that and the ingress still did not work I would next:
    Test curl from the worker node to the ClusterIP. you indicated this did not work. If the ClusterIP does not work for the worker than it is probably a mis-match in the label/selector being used. Double check that the service and the pod have the same label - remember it is case sensitive.

    If I had issues once I got the ClusterIP to work on the worker I would next:
    Test curl from the master to the NodePort IP and port.
    Test curl from the host to the worker node using the NodePort IP and port.
    Test curl from worker to ingress IP using -H host
    Test curl from master to Ingress IP using -H host.
    Test curl from host to Ingress IP using -H host.

    Regards,

  • promagnoli
    promagnoli Posts: 14

    Hi @serewicz,

    I am on AWS EC2 nodes, not not KVM. K8s nodes IP are 10.0.0.141 and 10.0.0.18, thus they are not overlapping with pod's IP (192.168.131.35).

    I definitively agree with you that I am facing a network issue, however firewall is down on both K8s nodes and I am using super broad security group.

    No idea.

    Regards

  • promagnoli
    promagnoli Posts: 14

    Let me add that I can test curl from the master node to the worker node IP, so I have the feeling that the issue could be related to calico as the troubles start when using the pod network. In the first calico pod-node I can see the following warning messages:

    229-2020-07-17 10:30:26.477 [INFO][55] feature_detect.go 234: Looked up iptables command backendMode="legacy" candidates=[]string{"iptables-legacy-save", "iptables-save"} command="iptables-legacy-save" ipVersion=0x4 saveOrRestore="save"
    230-2020-07-17 10:30:26.477 [INFO][55] int_dataplane.go 390: Checking if we need to clean up the VXLAN device

    231:2020-07-17 10:30:26.477 [WARNING][55] int_dataplane.go 392: Failed to query VXLAN device error=Link not found

    330-2020-07-17 10:30:26.638 [INFO][55] hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
    331-2020-07-17 10:30:26.638 [INFO][55] ipsets.go 119: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
    332:2020-07-17 10:30:26.639 [WARNING][55] ipip_mgr.go 112: Failed to add IPIP tunnel device error=exit status 1
    333:2020-07-17 10:30:26.639 [WARNING][55] ipip_mgr.go 89: Failed configure IPIP tunnel device, retrying... error=exit status 1

    In the other pod-node I can see the following errors:

    203-2020-07-17 10:30:25.421 [INFO][45] client.go 352: Calico Syncer has indicated it is in sync
    204-2020-07-17 10:30:25.433 [INFO][45] resource.go 220: Target config /etc/calico/confd/config/bird6.cfg out of sync
    205:2020-07-17 10:30:25.436 [ERROR][45] resource.go 288: Error from checkcmd "bird6 -p -c /etc/calico/confd/config/.bird6.cfg072659676": "bird: /etc/calico/confd/config/.bird6.cfg072659676:2:1 Unable to open included file /etc/calico/confd/config/bird6_aggr.cfg: No such file or directory\n

    and the following warning messages:

    225-2020-07-17 10:30:25.470 [INFO][46] feature_detect.go 234: Looked up iptables command backendMode="legacy" candidates=[]string{"iptables-legacy-save", "iptables-save"} command="iptables-legacy-save" ipVersion=0x4 saveOrRestore="save"
    226-2020-07-17 10:30:25.470 [INFO][46] int_dataplane.go 390: Checking if we need to clean up the VXLAN device

    227:2020-07-17 10:30:25.470 [WARNING][46] int_dataplane.go 392: Failed to query VXLAN device error=Link not found

    313-2020-07-17 10:30:25.618 [INFO][46] async_calc_graph.go 135: AsyncCalcGraph running
    314-2020-07-17 10:30:25.631 [INFO][46] daemon.go 631: No driver process to monitor
    315:2020-07-17 10:30:25.634 [WARNING][46] ipip_mgr.go 112: Failed to add IPIP tunnel device error=exit status 1
    316:2020-07-17 10:30:25.634 [WARNING][46] ipip_mgr.go 89: Failed configure IPIP tunnel device, retrying... error=exit status 1

    Can be this connected to the fact that my calico.yaml has the IPV4 pool section commented out:

    # The default IPv4 pool to create on startup if none exists. Pod IPs will be
    # chosen from this range. Changing this value after installation will have
    # no effect. This should fall within --cluster-cidr.
    # - name: CALICO_IPV4POOL_CIDR
    # value: "192.168.0.0/16"

  • serewicz
    serewicz Posts: 1,000

    Hello,

    From the errors it does seem to be node network related. As you are using AWS, are you using the standard Ubuntu 18.04 LTS - Bionic?

    This is what my EC2 Security Group looks like: All traffic All All 0.0.0.0/0

    This is what my network looks like prior to deploying Kubernetes:
    ubuntu@ip-172-31-62-227:~$ ip link
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0e:b0:86:3a:82:97 brd ff:ff:ff:ff:ff:ff

    Do you interfaces have one interface configured? That could also create an issue.

    There is a video which can be used to set up the AWS environment. Did you follow that or are you using a pre-existing environment.

    Regards,

  • promagnoli
    promagnoli Posts: 14

    Here is my env summary:

    $ kubectl get svc
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    kubernetes ClusterIP 10.96.0.1 443/TCP 119d
    secondapp NodePort 10.109.105.48 80:30382/TCP 17d

    $ kubectl get ingress ingress-test
    NAME HOSTS ADDRESS PORTS AGE
    ingress-test www.example.com 80 17d

    $ kubectl get ep
    NAME ENDPOINTS AGE
    kubernetes 10.0.0.141:6443,10.0.0.18:6443 119d
    secondapp 192.168.131.27:80 17d

    1. nginx pod is running

    $ kubectl get po
    NAME READY STATUS RESTARTS AGE
    secondapp-5cf87c9f48-f9krw 1/1 Running 1 17d

    2. nginx pod is running on the worker node and I can get nginx welcome page from worker node localhost

    $ curl -H "Host: www.example.com" http://localhost/
    <!DOCTYPE html>


    Welcome to nginx!

    body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; }


    Welcome to nginx!

    If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

    For online documentation and support please refer to nginx.org.
    Commercial support is available at nginx.com.

    Thank you for using nginx.


    ** 3. I cannot get nginx page from endpoint issuing the command from master (see below), but I can get it from worker node**

    $ curl -H "Host: www.example.com" http://192.168.131.27/ -v

    • Trying 192.168.131.27...
    • TCP_NODELAY set
    • connect to 192.168.131.27 port 80 failed: Connection timed out
    • Failed to connect to 192.168.131.27 port 80: Connection timed out
    • Closing connection 0
      curl: (7) Failed to connect to 192.168.131.27 port 80: Connection timed out

    4. I cannot get nginx page from Cluster IP issuing the commnad from both master and worker node

    $ curl -H "Host: www.example.com" http://10.109.105.48:30382/ -v

    • Trying 10.109.105.48...
    • TCP_NODELAY set
    • connect to 10.109.105.48 port 30382 failed: Connection timed out
    • Failed to connect to 10.109.105.48 port 30382: Connection timed out
    • Closing connection 0
      curl: (7) Failed to connect to 10.109.105.48 port 30382: Connection timed out

    5. traefik pods are running
    $ kubectl get po -n kube-system | grep traefik
    traefik-ingress-controller-6tffr 1/1 Running 1 17d
    traefik-ingress-controller-8xrg5 1/1 Running 1 17d

    I can see a warning message in the traefik pods logs (same warning messages on both pods, reporting log just for a pod)

    $ kubectl -n kube-system logs traefik-ingress-controller-8xrg5
    time="2020-07-17T10:30:22Z" level=info msg="Traefik version v1.7.13 built on 2019-08-08_04:46:14PM"
    time="2020-07-17T10:30:22Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/basics/#collected-data\n"
    time="2020-07-17T10:30:22Z" level=info msg="Preparing server traefik &{Address::8080 TLS: Redirect: Auth: WhitelistSourceRange:[] WhiteList: Compress:false ProxyProtocol: ForwardedHeaders:0xc00031d120} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
    time="2020-07-17T10:30:22Z" level=info msg="Preparing server http &{Address::80 TLS: Redirect: Auth: WhitelistSourceRange:[] WhiteList: Compress:false ProxyProtocol: ForwardedHeaders:0xc00031d100} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
    time="2020-07-17T10:30:22Z" level=info msg="Starting provider configuration.ProviderAggregator {}"
    time="2020-07-17T10:30:22Z" level=info msg="Starting server on :8080"
    time="2020-07-17T10:30:22Z" level=info msg="Starting server on :80"
    time="2020-07-17T10:30:22Z" level=info msg="Starting provider *kubernetes.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"\",\"IngressEndpoint\":null}"
    time="2020-07-17T10:30:22Z" level=info msg="ingress label selector is: \"\""
    time="2020-07-17T10:30:22Z" level=info msg="Creating in-cluster Provider client"
    E0717 10:30:52.998496 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0717 10:30:52.998606 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0717 10:30:53.026236 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.Ingress: Get https://10.96.0.1:443/apis/extensions/v1beta1/ingresses?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:54Z" level=info msg="Server configuration reloaded on :8080"
    time="2020-07-17T10:30:54Z" level=info msg="Server configuration reloaded on :80"
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:54Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:56Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:56Z" level=warning msg="Endpoints not available for default/secondapp"
    time="2020-07-17T10:30:58Z" level=warning msg="Endpoints not available for default/secondapp"

    6. nginx pod is correctly selected by the service secondapp

    $ kubectl describe po secondapp-5cf87c9f48-f9krw
    Name: secondapp-5cf87c9f48-f9krw
    Namespace: default
    Priority: 0
    Node: ip-10-0-0-18/10.0.0.18
    Start Time: Mon, 29 Jun 2020 13:55:18 +0000
    Labels: app=secondapp <<<--- pod-template-hash=5cf87c9f48 Annotations: cni.projectcalico.org/podIP: 192.168.131.27/32 cni.projectcalico.org/podIPs: 192.168.131.27/32 Status: Running IP: 192.168.131.27 IPs: IP: 192.168.131.27 Controlled By: ReplicaSet/secondapp-5cf87c9f48 Containers: nginx: Container ID: docker://d7ccf81a9c7c2a906393a6e21a3c9f2f1f74a57b358a7965cf31c8ed66c90ef5 Image: nginx Image ID: docker-pullable://nginx@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd Port: <none>
    Host Port:
    State: Running
    Started: Fri, 17 Jul 2020 10:31:07 +0000
    Last State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Mon, 29 Jun 2020 13:55:19 +0000
    Finished: Mon, 29 Jun 2020 16:30:47 +0000
    Ready: True
    Restart Count: 1
    Environment:
    Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-s2vxn (ro)
    Conditions:
    Type Status
    Initialized True
    Ready True
    ContainersReady True
    PodScheduled True
    Volumes:
    default-token-s2vxn:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-s2vxn
    Optional: false
    QoS Class: BestEffort
    Node-Selectors:
    Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
    node.kubernetes.io/unreachable:NoExecute for 300s
    Events:

  • promagnoli
    promagnoli Posts: 14

    7. the service secpondapp has an endpoint

    $ kubectl describe svc secondapp
    Name: secondapp
    Namespace: default
    Labels: app=secondapp <<<--- Annotations: <none>
    Selector: app=secondapp <<<--- Type: NodePort IP: 10.109.105.48 Port: <unset> 80/TCP
    TargetPort: 80/TCP
    NodePort: 30382/TCP
    Endpoints: 192.168.131.27:80
    Session Affinity: None
    External Traffic Policy: Cluster
    Events:

    8. the ingress is using the service secondapp as backend

    $ kubectl describe ingress ingress-test
    Name: ingress-test
    Namespace: default
    Address:
    Default backend: default-http-backend:80 ()
    Rules:
    Host Path Backends
    ---- ---- --------
    www.example.com
    / secondapp:80 (192.168.131.27:80)
    Annotations:
    kubernetes.io/ingress.class: traefik
    Events:

    9. the cluster role is in place

    $ kubectl describe clusterroles.rbac.authorization.k8s.io traefik-ingress-controller
    Name: traefik-ingress-controller
    Labels:
    Annotations:
    PolicyRule:
    Resources Non-Resource URLs Resource Names Verbs
    --------- ----------------- -------------- -----
    endpoints [] [] [get list watch]
    secrets [] [] [get list watch]
    services [] [] [get list watch]
    ingresses.extensions [] [] [get list watch]

    10. the role binding is in place
    $ kubectl describe clusterrolebindings.rbac.authorization.k8s.io traefik-ingress-controller
    Name: traefik-ingress-controller
    Labels:
    Annotations:
    Role:
    Kind: ClusterRole
    Name: traefik-ingress-controller
    Subjects:
    Kind Name Namespace
    ---- ---- ---------
    ServiceAccount traefik-ingress-controller kube-system

  • serewicz
    serewicz Posts: 1,000

    Please review my previous post as it seems to be an issue with networking of the instance not Kubernetes.

  • chrispokorni
    chrispokorni Posts: 2,349

    Hi @promagnoli,

    The issue you are seeing is related to node networking, which has to be fixed on your AWS EC2 infrastructure. Kubernetes is sensitive to misconfigured node networking, but it has no control over that configuration. AWS VPCs may have default rules to block some traffic. The SG rule suggested by @serewicz, "All traffic All All 0.0.0.0/0" should fix your issue. SG rules that allow only specific traffic may not be enough for all the protocols that are used by Kubernetes and all plugins it uses.

    Regards,
    -Chris

  • promagnoli
    promagnoli Posts: 14

    @serewicz my posts where queued as I had to split my original large post into multiple and it seems this platform doesn't allow multiple posts.

    By the way, changing the SG inbound rule to "All traffic All All 0.0.0.0/0" fixed the issue. thanks to both @serewicz and @chrispokorni for you help and patience. It is really appreciated here.

  • serewicz
    serewicz Posts: 1,000

    Glad the issue got fixed! Cheers

  • Hi @serewicz, I am facing exactly the same problem described in this discussion. I am using EC2 instances and have confirmed that the SG rules are set to allow all traffic and IP being 0.0.0.0/0. So, I should not be seeing networking issues hopefully.

    The details of the setup follow:

    On the master node

    ubuntu@ip-172-31-32-211:~$ kubectl describe ingress ingress-test
    Name:             ingress-test
    Namespace:        default
    Address:
    Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
    Rules:
      Host          Path  Backends
      ----          ----  --------
      www.gani.com
                    /   secondapp:80 (192.168.89.15:80)
    Annotations:    kubernetes.io/ingress.class: traefik
    Events:         <none>
    ubuntu@ip-172-31-32-211:~$ kubectl describe svc secondapp
    Name:                     secondapp
    Namespace:                default
    Labels:                   app=secondapp
    Annotations:              <none>
    Selector:                 app=secondapp
    Type:                     NodePort
    IP:                       10.105.176.35
    Port:                     <unset>  80/TCP
    TargetPort:               80/TCP
    NodePort:                 <unset>  31595/TCP
    Endpoints:                192.168.89.15:80
    Session Affinity:         None
    External Traffic Policy:  Cluster
    Events:                   <none>
    ubuntu@ip-172-31-32-211:~$ kubectl get ep
    NAME         ENDPOINTS            AGE
    kubernetes   172.31.32.211:6443   15d
    nginx        <none>               9d
    secondapp    192.168.89.15:80     132m
    
    ubuntu@ip-172-31-32-211:~$ kubectl logs traefik-ingress-controller-xv5pl -n kube-system
    time="2020-08-28T16:45:13Z" level=info msg="Traefik version v1.7.13 built on 2019-08-08_04:46:14PM"
    time="2020-08-28T16:45:13Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/basics/#collected-data\n"
    time="2020-08-28T16:45:13Z" level=info msg="Preparing server traefik &{Address::8080 TLS:<nil> Redirect:<nil> Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc00089f000} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
    time="2020-08-28T16:45:13Z" level=info msg="Preparing server http &{Address::80 TLS:<nil> Redirect:<nil> Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc00089efe0} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
    time="2020-08-28T16:45:13Z" level=info msg="Starting provider configuration.ProviderAggregator {}"
    time="2020-08-28T16:45:13Z" level=info msg="Starting server on :8080"
    time="2020-08-28T16:45:13Z" level=info msg="Starting server on :80"
    time="2020-08-28T16:45:13Z" level=info msg="Starting provider *kubernetes.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"\",\"IngressEndpoint\":null}"
    time="2020-08-28T16:45:13Z" level=info msg="ingress label selector is: \"\""
    time="2020-08-28T16:45:13Z" level=info msg="Creating in-cluster Provider client"
    time="2020-08-28T16:45:13Z" level=info msg="Server configuration reloaded on :80"
    time="2020-08-28T16:45:13Z" level=info msg="Server configuration reloaded on :8080"
    time="2020-08-28T16:55:15Z" level=warning msg="A new release has been found: 2.2.8. Please consider updating."
    time="2020-08-28T17:01:09Z" level=info msg="Server configuration reloaded on :80"
    time="2020-08-28T17:01:09Z" level=info msg="Server configuration reloaded on :8080"
    time="2020-08-28T17:25:08Z" level=info msg="Server configuration reloaded on :80"
    time="2020-08-28T17:25:08Z" level=info msg="Server configuration reloaded on :8080"
    

    It works when I use the ClusterIP

    ```ubuntu@ip-172-31-32-211:~$ curl -H "Host: www.gani.com" http://10.105.176.35:80
    <!DOCTYPE html>


    Welcome to nginx!

    body {
    width: 35em;
    margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif;
    }



    Welcome to nginx!


    If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.

    For online documentation and support please refer to
    nginx.org.

    Commercial support is available at
    nginx.com.

    Thank you for using nginx.




    ubuntu@ip-172-31-32-211:~$ curl -H "Host: www.gani.com" http://k8smaster
    curl: (7) Failed to connect to k8smaster port 80: Connection refused
    ubuntu@ip-172-31-32-211:~$ curl -H "Host: www.gani.com" http://172.31.32.211
    curl: (7) Failed to connect to 172.31.32.211 port 80: Connection refused
    ubuntu@ip-172-31-32-211:~$


    ubuntu@ip-172-31-38-235:~$ curl -H "Host: www.gani.com" http://172.31.32.211/
    curl: (7) Failed to connect to 172.31.32.211 port 80: Connection refused
    ubuntu@ip-172-31-38-235:~$ curl -H "Host: www.gani.com" http://172.31.38.235/
    <!DOCTYPE html>


    Welcome to nginx!

    body {
    width: 35em;
    margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif;
    }



    Welcome to nginx!


    If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.

    For online documentation and support please refer to
    nginx.org.

    Commercial support is available at
    nginx.com.

    Thank you for using nginx.




    ubuntu@ip-172-31-38-235:~$
    ```

    Please help.

  • As a follow-up to the earlier post, I see that the curl gets a 200 OK when I use the worker node's public IP, not the master node's.

    ubuntu@ip-172-31-32-211:~$ curl  -H "Host: www.giri.com" http://<master_node_public_IP> 
    curl: (7) Failed to connect to 13.235.214.225 port 80: Connection refused
    ubuntu@ip-172-31-32-211:~$ curl  -H "Host: www.giri.com" http://<worker_node_public_IP>
    <!DOCTYPE html>
    <html>
    <head>
    <title>Third Page</title>
    <style>
        body {
            width: 35em;
            margin: 0 auto;
            font-family: Tahoma, Verdana, Arial, sans-serif;
        }
    </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    ubuntu@ip-172-31-32-211:~$
    
  • chrispokorni
    chrispokorni Posts: 2,349

    Hi @ganeshahv,

    The Kubernetes cluster behavior you are describing is consistent with networking issues between your cluster nodes, which is managed at the cloud infrastructure level. Unfortunately, Kubernetes has no control over its infrastructure.

    The following are normal/expected Kubernetes cluster behaviors:
    1. The ingress should be accessible thru both nodes Public and Private IP addresses.
    2. Node Port type Services should be accessible thru both nodes Public and Private IP addresses, regardless of the Pods' location in the cluster.
    3. The ClusterIP type Service should be accessible from both nodes, regardless of the Pods' location in the cluster.

    On AWS I would recommend starting with a new VPC with a new all-open/allow-all SG, then the two EC2 instances provisioned in that VPC.

    Regards,
    -Chris

  • Hi @chrispokorni,

    Thank you for the response.

    I think I have nailed down the issue to this,but unable to continue.

    Master

    ubuntu@ip-172-31-32-211:~$ curl http://127.0.0.1
    curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
    

    Worker

    ubuntu@ip-172-31-38-235:~$ curl http://127.0.0.1
    404 page not found
    
  • Hi @chrispokorni, @serewicz, I was able to resolce the issue.

    The master node was tainted and hence I could not launch the ingress controller pod on it.

    ubuntu@ip-172-31-32-211:~$     kubectl describe nodes | grep -i taint
    Taints:             node-role.kubernetes.io/master:NoSchedule
    

    I resolved it removing the taint on it and things started working fine.

    ubuntu@ip-172-31-32-211:~$ curl  -H "Host: www.shourya.com" http://k8smaster
    <!DOCTYPE html>
    <Output ommitted>
    

    Thanks to @dzhigalin for his inputs.

Categories

Upcoming Training