Welcome to the Linux Foundation Forum!

Nodes Cannot Reach ClusterIP(s) and Endpoint(s) If Not Running Them Directly

Options
gcorradini
gcorradini Posts: 2
edited October 2022 in LFS258 Class Forum

I've setup the control-plane node and multiple worker nodes in GCP as detailed in Chapter 3 3.1/3.2 exercises. Everything is working fine.

Problem Context:
When walking through 3.3 exercise we start setting up deploy/services with -image=nginx but only the nodes that are actually running those pods can curl against the private ClusterIP or Endpoint (if a service is setup). This seems to go against everything that the K8 documentation says about these resources and things we're learning in this class given that any node and pod should be able to communicate with each other.

There's a note in exercise 3.3, step 20 that says:

Test access to the Cluster IP, port 80. You should see the generic nginx installed and working page. The output should be the same when you look at the ENDPOINTS IP address. If the curl command times out the pod may be running on the other node. Run the same command on that node and it should work

So for our setup this is a known issue, it must be.

Questions:
What about our setup is so different that it can't have ANY node in the cluster query the ClusterIP or Endpoints (for a service)? Is there anyway to rectify this?

It also seems to happen when I ssh into a particular pod and try to curl the ClusterIP or Endpoint for a pod running on a different worker node

Any debugging advice to make sure my cluster is setup correctly would be valuable, thank you :smile:

Comments

  • chrispokorni
    chrispokorni Posts: 2,208
    Options

    Hi @gcorradini,

    Please watch the "IMPORTANT: Using GCE to Set Up the Lab Environment" video from the introductory chapter, for tips on how to correctly configure the GCP VPC firewall rule for the lab.

    Regards,
    -Chris

  • gcorradini
    Options

    Thanks Chris,
    Yeah, it turns out it wasn't my firewall. I think what happened is that initially my workers had errors with kubeadm join commands failed and instead of running kubeadm reset I manually stopped the correct systemd services and removed config files and got the workers running again. But that seemed to put me in a really weird state where things seemed to be mostly working in the cluster except for the networking

    I rebuilt the whole cluster and it's working now :blush:

Categories

Upcoming Training