Welcome to the Linux Foundation Forum!

Lab 9.2 curl

Options

I'm following lab 9.2. I have the nginx deployment, and I set up the NodePort service the way the lab told me to. However, the lab says I should be able successfully curl my controlplane endpoint with the NodePort IP and it does not work for me. I'm logged into my control plane node. The pods are running on my worker node. From my CP node, I can successfully curl the following:

  1. The worker node's private IP and nodeport
  2. The worker node's public IP and nodeport
    However, the following curl commands will time out:
  3. Control plane endpoint (public Azure Load Balancer which is forwarding the nodeport to the backend pool)
  4. The private IP of the control plane node and the nodeport
  5. The public IP of the control plane node and the nodeport.

Running the commands from the worker node doesn't make a difference either.

I tried allowing all traffic in my Azure Network Security Group attached to my subnet and that didn't help.

This feels like a problem with my cluster, but I'm not sure what it would be. Any guidance or thoughts?

Best Answer

  • serewicz
    serewicz Posts: 1,000
    Answer ✓
    Options

    Hello,

    Indeed when using a properly configured network one would be able to connect to a NodePort on any node in a cluster and the network plugin and kube-proxy would ensure the traffic is sent to a node with a pod of matching label and connected to a particular port of a pod.

    There is a reason we do not support labs that are running on Azure. Also, if you add or remove steps your lab outcomes will probably be different.

    Regards,

Answers

  • chrispokorni
    chrispokorni Posts: 2,165
    Options

    Hi @brianmoore,

    Since Azure networking is less seamless in comparison to AWS, GCP, and even local type 2 hypervisors, the Azure cloud infrastructure is not recommended nor supported for this class. However, you may be able to find a few forum discussions where students have documented possible solutions around the challenges faced when running these lab exercises on Azure.

    However, to help isolate the root cause, let's try to determine if your cluster is properly configured. Is the CP node tainted? Was the workload ever deployed to the CP node? If you scale the nginx deployment to 10 or 20 replicas, are any of the replicas assigned to the CP node? Are all the control plane pods running in the cluster?

    Regards,
    -Chris

  • brianmoore
    Options

    Thank you for your help! The CP is tainted, and all its infrastructure pods are running. The nginx workload was never deployed to the CP node. 2 pods are running on my worker. I scaled the deployment to 20 and all 20 pods still run on the worker.

    I searched the forums like you recommended and it does seem like Calico won't work on Azure. Ugh.

    I do plan to convert my Terraform to AWS at some point, but for the CKA exam my voucher and course are going to expire in 30 days, so it's a tight window for me (and I already was granted a complimentary extension, so I guess this is my last chance).

    Maybe I could do the conversion to AWS without too much trouble, I just hope it doesn't interfere with me getting ready for the exam in time.

  • chrispokorni
    chrispokorni Posts: 2,165
    Options

    Hi @brianmoore,

    The taint is the reason why your pods only run on the worker node. In case it was missed earlier, Step 3 of Lab 3.3 describes the taint removal process. Once removed, your pods should be distribute evenly between the two nodes (cp and worker).

    However, the curl issues may still persist on Azure. On EC2 with fully open VPC SG, or on GCE with fully open VPC Firewall rule, curl should work as expected.

    As you made progress through the course, did you notice any discrepancies or similar connectivity issues in any of the prior chapters?

    Regards,
    -Chris

  • brianmoore
    Options

    Ah, I see. I do remember that taint removal process. I didn't realize I was supposed to permanently remove my CP's taint, because I figured that the pods should generally just run on the worker and not the CP. And yes, I did get the same confusion in Lab 3.4 when I created a ClusterIP but I wasn't able to curl it from the CP.

    Theoretically, though (and disregarding the crappy Azure networking)... if all my pods are deployed to the worker, and I have a nodeport service, then should I be able to curl my control plane node's public IP address and the node port, and then the nodeport service would forward the traffic to the worker node's endpoint? Maybe I'm just misunderstanding how the NodePort service works. I thought that every node listens on that same high-number random port and then forwards the traffic to the nginx pod on the worker. Please correct me if I'm wrong.

Categories

Upcoming Training