Lab 7.2 <error: endpoints "default-http-backend" not found>

chrischia · January 2021

Hi,
I was doing my exercise 7.2 and failed at Step 8
curl -H "Host: www.example.com" http://10.128.0.7/

I did my check by running:
kubectl describe ing ingress-test -n default

Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Name:             ingress-test
Namespace:        default
Address:          
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host             Path  Backends
  ----             ----  --------
  www.example.com  
                   /   secondapp:80   192.168.235.153:80)
Annotations:       kubernetes.io/ingress.class: traefik
Events:            <none>

I also check the svc in all namespaces:
kubectl get svc --all-namespaces

NAMESPACE     NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes                ClusterIP      10.96.0.1        <none>        443/TCP                  19d
default       nginx                     ClusterIP      10.108.189.140   <none>        443/TCP                  17d
default       registry                  ClusterIP      10.96.74.244     <none>        5000/TCP                 17d
default       secondapp                 LoadBalancer   10.100.192.227   <pending>     80:32000/TCP             96m
kube-system   kube-dns                  ClusterIP      10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   19d
kube-system   traefik-ingress-service   ClusterIP      10.110.163.198   <none>        80/TCP,8080/TCP          35m
multitenant   shopping                  NodePort       10.110.156.71    <none>        80:30381/TCP             4d4h

How do i create default-http-backend svc? Is it a simple svc code with port 80?
fyi, I am using GCP.

chrischia · January 2021

ok i solved the problem by installing a new traefik and configuring ingress controller again using the following guide:
https://doc.traefik.io/traefik/v1.7/user-guide/kubernetes/

the error endpoints "default-http-backend" remains unfound but the code
curl -H "Host: www.example.com" http://10.2.0.6/

did return 404 page not found.

serewicz · January 2021

Hello,

Glad you were able to get it working. The 404 error is an indication the ingress controller is working but is not able to associate traffic with an existing service. The most common issue I find is a typo with the service or the pod labels. Should you revisit the lab I would work out from using the pod IP to view the default web page. Then use the service IP, and finally the ingress controller.

Regards,

chrischia · January 2021

NAME            READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
pod/secondapp   2/2     Running   3          3h48m   192.168.56.9   instance-2   <none>           <none>

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE   SELECTOR
service/kubernetes   ClusterIP      10.96.0.1       <none>        443/TCP        47h   <none>
service/secondapp    LoadBalancer   10.104.151.51   <pending>     80:32000/TCP   94m   example=second

Yes 404 does come from the ingress controller and i am happy that it works on GCE.
The strange part is that after the ingress has been activated, the indivdual PO and SVC ip is no longer accessible.

For e.g.
Both curl 10.104.151.51 & curl 192.168.56.9
do not return nginx default html response.

Hence this might be the result why it returns 404. However i do not know why this happens after traefik ingress activation. Any advice?

chrischia · January 2021

also the terminal stuck when i ran curl 192.168.56.9 & curl 10.104.151.51.

A deep dive to check why... the container has weird respond.

kubectl exec -it secondapp -c busy -- sh
/ $ nslookup secondapp
;; connection timed out; no servers could be reached

chrischia · January 2021

any clue? I am stuck for 3 days already and can't proceed with the lab exercise.

chrispokorni · January 2021

Hi @chrischia,

Can you try running the same two curl commands from the instance-2 node?

Regards,
-Chris

chrischia · January 2021

@chrispokorni said:
Hi @chrischia,

Can you try running the same two curl commands from the instance-2 node?

Regards,
-Chris

Hi Chris, they both work in instance-2 (worker node). Both return the html content.

also from my local machine:
curl -H "Host: www.example.com" http://[instance2-ip] does return the html content too.
but
curl -H "Host: www.example.com" http://[instance1-ip] fails as usual

chrispokorni · January 2021

Hi @chrischia,

Thank you for checking. This behavior indicates that the issue is not related to the Kubernetes cluster and any of its components: Pods, Services, and Ingress, but with the way the networking was configured on the underlying infrastructure.

Where did you provision your nodes? In the cloud? Local VMs (which hypervisor)?
For your nodes, did you allow traffic to all ports, all protocols, from all sources - either through a custom VPC and firewall rule or through local hypervisor config options?

Regards,
-Chris

chrischia · January 2021

Thanks @chrispokorni for your prompt reply.
I am using GCE and have been following the tutorial and guide suggested in the LFD259.
I was able to do the curl on both secondapp pod and svc oninstance-1.
But the problem arises after i setup the traefik 1.7.13 on the instance-1.

I followed the traefik setup using the link below:
https://doc.traefik.io/traefik/v1.7/

I am wondering if i should use AWS instead...
and for the sake of completing the course, can i continue the Lab 7.2 part 8 using Instance-2 (for those using CURL commands)?

serewicz · January 2021

Hello,

Could you look at the output of kubectl get pods --all-namespaces and ensure that all pods on both systems are running. You should see an ingress controller pod running on both nodes.

Also please check that both nodes are running properly and have enough resources. If you're using 2cpu/8G nodes you should be fine with what the exercises ask you to run.

If the ingress controller is working on one node, but not another it indicates the issue is not with the ingress controller, but some other configuration or issue. If the issue were with Kubernetes or an improper setting of a rule or the ingress controller it would not work anywhere.

Regards,

chrischia · January 2021

Hi,
yes i now see that there are issues with calico-nodes.

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
default       secondapp                                  2/2     Running   21         21h
kube-system   calico-kube-controllers-744cfdf676-mm5wq   1/1     Running   0          2d17h
kube-system   calico-node-9b6zs                          0/1     Running   0          2d17h
kube-system   calico-node-lpfg2                          0/1     Running   0          2d17h
kube-system   coredns-f9fd979d6-2frqp                    1/1     Running   0          2d17h
kube-system   coredns-f9fd979d6-vm5rt                    1/1     Running   0          2d17h
kube-system   etcd-instance-1                            1/1     Running   0          2d17h
kube-system   kube-apiserver-instance-1                  1/1     Running   0          2d17h
kube-system   kube-controller-manager-instance-1         1/1     Running   0          2d17h
kube-system   kube-proxy-6cqjp                           1/1     Running   0          2d17h
kube-system   kube-proxy-7kv7v                           1/1     Running   0          2d17h
kube-system   kube-scheduler-instance-1                  1/1     Running   0          2d17h
kube-system   traefik-ingress-controller-w6sdr           1/1     Running   0          18h
multitenant   mainapp-64f7bb4cc6-lghj7                   1/1     Running   0          18h

when doing describe on the individual node. I see one peculiar event.

Events:
  Type     Reason     Age                   From                 Message
  ----     ------     ----                  ----                 -------
  Warning  Unhealthy  31s (x6713 over 18h)  kubelet, instance-1  (combined from similar events): Readiness probe failed: 2021-01-22 02:28:18.334 [INFO][31504] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.9

chrischia · January 2021

Ok. i have resolved the calico-node issues with the following help.
https://stackoverflow.com/questions/54465963/calico-node-is-not-ready-bird-is-not-ready-bgp-not-established

Now the direct curl command to the ip of pod and svc works!

chrischia · January 2021

All issues are almost fixed except for one weird behaviour which i have noticed.

From [instance-1] (master node), ip a would yield the following output for ens4 (the only ens)

2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
    link/ether 42:01:0a:02:00:08 brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.8/32 scope global dynamic ens4
       valid_lft 1930sec preferred_lft 1930sec
    inet6 fe80::4001:aff:fe02:8/64 scope link 
       valid_lft forever preferred_lft forever

While the output of ens shows clearly that i will need to use 10.2.0.8 for step 8 of Lab 7.2.
curl -H "Host: www.example.com" http://10.2.0.8/
this commands fails with 404.

From the previous calico node issue which i have encountered for the week, we see that the connection of BGP fails at
10.2.0.9.
I did a try with
curl -H "Host: www.example.com" http://10.2.0.9/
this commands works perfectly with the html response.

serewicz · January 2021

Hello,

Correct, as the the network configuration is not proper, as happens if IP ranges overlap, the traffic may be sent across the tunnel interface to the other node, which has the 10.2.0.9 IP address. If you curl from the worker node it would be interesting if the traffic were to work to .8 instead.

Regards,

chrischia · January 2021

From [instance-2] (worker node), ip a would yield the following output for ens4 (the only ens).

2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
    link/ether 42:01:0a:02:00:09 brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.9/32 scope global dynamic ens4
       valid_lft 2560sec preferred_lft 2560sec
    inet6 fe80::4001:aff:fe02:9/64 scope link 
       valid_lft forever preferred_lft forever

The ens4 is different for worker. It is 10.2.0.9 instead.
Hence,
curl -H "Host: www.example.com" http://10.2.0.9/ works fine
curl -H "Host: www.example.com" http://10.2.0.8/ doesn't work as expected here.

Is there any way to change the ens4 in my master to point to the 10.2.0.9, same as the worker?

chrischia · January 2021

it is also obvious but would like to mention here that
user@laptop:˜$ curl -H "Host: www.example.com" http://[worker ip] works perfectly fine since the curl works in the example above (to 9 its ens ip)

chrispokorni · January 2021

Hi @chrischia,

This would imply that both nodes share the same Private IP address, introducing even more conflicts into the cluster

Regards,
-Chris

Lab 7.2 <error: endpoints "default-http-backend" not found>

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)