Pod network across nodes does not work
I followed the installation procedure of lab 3.1 to 3.3 closely. Everything looks nice, but whenever I try to establish a network connection between a pod on one node and another pod on another node, that does not work. The calico-node pods are up and running. In their logs I don't see any error messages.calicoctl node status for the cp node results in:
Calico process is running. IPv4 BGP status +--------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+-------------+ | 10.0.0.7 | node-to-node mesh | up | 13:15:03 | Established | +--------------+-------------------+-------+----------+-------------+ IPv6 BGP status No IPv6 peers found.
For the worker node, I get
Calico process is running. IPv4 BGP status +--------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+-------------+ | 10.0.0.6 | node-to-node mesh | up | 13:15:03 | Established | +--------------+-------------------+-------+----------+-------------+ IPv6 BGP status No IPv6 peers found.
On the cp node ip routereturns:
default via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.6 metric 100 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.6 168.63.129.16 via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.6 metric 100 169.254.169.254 via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.6 metric 100 blackhole 192.168.74.128/26 proto bird 192.168.74.136 dev calie739583d8fa scope link 192.168.74.137 dev cali9270933bb0b scope link 192.168.74.138 dev cali73bd7dd6478 scope link 192.168.74.139 dev cali3344860a0ad scope link 192.168.189.64/26 via 10.0.0.7 dev tunl0 proto bird onlink
On the worker node I see:
default via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.7 metric 100 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.7 168.63.129.16 via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.7 metric 100 169.254.169.254 via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.7 metric 100 192.168.74.128/26 via 10.0.0.6 dev tunl0 proto bird onlink blackhole 192.168.189.64/26 proto bird 192.168.189.76 dev cali3140fc1dafd scope link 192.168.189.77 dev calia35b901ce89 scope link
calicoctl get workloadendpoints -A returns:
NAMESPACE WORKLOAD NODE NETWORKS INTERFACE accounting nginx-one-575f648647-j2rwh worker2 192.168.189.77/32 calia35b901ce89 accounting nginx-one-575f648647-x5c5c worker2 192.168.189.76/32 cali3140fc1dafd default bb2 k8scp 192.168.74.137/32 cali9270933bb0b kube-system calico-kube-controllers-5f6cfd688c-h29qd k8scp 192.168.74.136/32 calie739583d8fa kube-system coredns-74ff55c5b-69n8g k8scp 192.168.74.139/32 cali3344860a0ad kube-system coredns-74ff55c5b-bngtf k8scp 192.168.74.138/32 cali73bd7dd6478
There is the example from lab 9.1 deployed. In addition I used the pod bb2 containing busybox for debug purposes. The problem became obvious to me, when I tried to curl the nginx pods. This only works when logged into the worker node.
This is my second cluster. I called the cp node k8scp and the worker worker2, as in my first cluster it is still master and worker. The issue occurs in both clusters. The first one was set up with docker, the second one with cri-o.
The whole setup runs on VMs on Azure.
Is there anything obvious I missed?
One thing that appears odd to me is that the pods do not get addresses out of the PodCIDR range of the according node. If I do kubectl describe node k8scp |grep PodCIDR, I get
PodCIDR: 192.168.0.0/24 PodCIDRs: 192.168.0.0/24
The pods on that node are in 192.168.74.128/26, though, as ip route shows. Is that normal?
Comments
-
Hi @deissnerk,
Azure is not a recommended or supported environment for labs in this course. However, there are learners who ran lab exercises on Azure and shared their findings in the forum. You may use the search option of the forum to locate them for reference.
Regards,
Chris0 -
Thanks for the quick response @chrispokorni. I suppose I'm running into similar issues as @luis-garza has been describing here.
In the beginning of lab 3.1 it is stated:The labs were written using Ubuntu instances running on GoogleCloudPlatform (GCP). They have been written to be vendor-agnostic so could run on AWS, local hardware, or inside of virtualization to give you the most flexibility and options.
I didn't read this as a clear recommendation. After all it should just be about two Ubuntu VMs in an IP subnet. I was prepared to figure out some Azure specifics on my own, but an incompatibility on this level comes to me as a surprise. A warning in section 3.1 that the components used in the lab might have compatibility issues with other cloud providers would be helpful.
Regards,
Klaus
1 -
Got same problem on AWS:
- all firewalls on cp and worker node disabled
- all input / output traffic enabled
Any help?
0 -
Hi @joov,
On AWS the VPC and Security Group configurations directly impact the cluster networking. If you have not done so already, I would invite you to watch the video "Using AWS to set up labs" found in the introductory chapter of this course. The video outlines important settings needed to enable the networking of your cluster.
Also, when provisioning the second EC2 instance, make sure it is placed in the same VPC subnet, and under the same SG as the first instance.
Regards,
-Chris0 -
I followed the video and got it working already. Thank you.
0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 5 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 984 Software
- 376 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)