connection timeout connecting from worker to the local registry running on master via ClusterIP

I'm stuck at alb 3.1 - 28
The worker node joined successfully the cluster using kubeadm join... and status is Ready.
I'm not able to connect the worker to the registry curl http://10.107.88.26:5000/v2/ -> curl: (7) Failed to connect to 10.107.88.26 port 5000: Connection timed out
same command works w/o problem on the master node.
is there network utilities/tools/commands I could use to provide you details for troubleshooting?
that's the following output of "kubectl describe nodes"
- Name: k8s-master1
- Roles: master
- Labels: beta.kubernetes.io/arch=amd64
- beta.kubernetes.io/os=linux
- kubernetes.io/hostname=k8s-master1
- node-role.kubernetes.io/master=
- Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
- node.alpha.kubernetes.io/ttl: 0
- volumes.kubernetes.io/controller-managed-attach-detach: true
- CreationTimestamp: Sat, 05 Jan 2019 14:54:13 +0000
- Taints: <none>
- Unschedulable: false
- Conditions:
- ___omitted___
- ready status. AppArmor enabled
- Addresses:
- InternalIP: 10.0.0.4
- Hostname: k8s-master1
- Capacity:
- attachable-volumes-azure-disk: 16
- cpu: 2
- ephemeral-storage: 30428648Ki
- hugepages-1Gi: 0
- hugepages-2Mi: 0
- memory: 4040536Ki
- pods: 110
- Allocatable:
- attachable-volumes-azure-disk: 16
- cpu: 2
- ephemeral-storage: 28043041951
- hugepages-1Gi: 0
- hugepages-2Mi: 0
- memory: 3938136Ki
- pods: 110
- System Info:
- Machine ID: 39f567ff6e2f4b29bd860fd7228c1322
- System UUID: AE723471-2D04-1B41-A684-3FB8B12C8C31
- Boot ID: bf187457-5b0e-43ca-83c2-0e200a7572c9
- Kernel Version: 4.15.0-1036-azure
- OS Image: Ubuntu 16.04.5 LTS
- Operating System: linux
- Architecture: amd64
- Container Runtime Version: docker://18.6.1
- Kubelet Version: v1.12.1
- Kube-Proxy Version: v1.12.1
- PodCIDR: 192.168.0.0/24
- Non-terminated Pods: (10 in total)
- Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
- --------- ---- ------------ ---------- --------------- -------------
- ___omitted___
- Allocated resources:
- (Total limits may be over 100 percent, i.e., overcommitted.)
- Resource Requests Limits
- -------- -------- ------
- cpu 900m (45%) 0 (0%)
- memory 70Mi (1%) 170Mi (4%)
- attachable-volumes-azure-disk 0 0
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- ___omitted___
- Name: k8s-worker1
- Roles: <none>
- Labels: beta.kubernetes.io/arch=amd64
- beta.kubernetes.io/os=linux
- kubernetes.io/hostname=k8s-worker1
- Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
- node.alpha.kubernetes.io/ttl: 0
- volumes.kubernetes.io/controller-managed-attach-detach: true
- CreationTimestamp: Sat, 05 Jan 2019 14:55:07 +0000
- Taints: <none>
- Unschedulable: false
- Conditions:
- Type Status LastHeartbeatTime LastTransitionTime Reason Message
- ---- ------ ----------------- ------------------ ------ -------
- ___omitted___
- ready status. AppArmor enabled
- Addresses:
- InternalIP: 10.0.0.5
- Hostname: k8s-worker1
- Capacity:
- attachable-volumes-azure-disk: 16
- cpu: 1
- ephemeral-storage: 30428648Ki
- hugepages-1Gi: 0
- hugepages-2Mi: 0
- memory: 944136Ki
- pods: 110
- Allocatable:
- attachable-volumes-azure-disk: 16
- cpu: 1
- ephemeral-storage: 28043041951
- hugepages-1Gi: 0
- hugepages-2Mi: 0
- memory: 841736Ki
- pods: 110
- System Info:
- Machine ID: 8977d6e96cdd47ef9fd20f9496ab84f2
- System UUID: 92429DB1-12B5-3342-8C66-5A5119371B50
- Boot ID: 23371f94-ebaa-4186-b27c-7a756f327aa6
- Kernel Version: 4.15.0-1036-azure
- OS Image: Ubuntu 16.04.5 LTS
- Operating System: linux
- Architecture: amd64
- Container Runtime Version: docker://18.6.1
- Kubelet Version: v1.12.1
- Kube-Proxy Version: v1.12.1
- PodCIDR: 192.168.1.0/24
- Non-terminated Pods: (4 in total)
- Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
- --------- ---- ------------ ---------- --------------- -------------
- ___omitted___
- Allocated resources:
- (Total limits may be over 100 percent, i.e., overcommitted.)
- Resource Requests Limits
- -------- -------- ------
- cpu 350m (35%) 0 (0%)
- memory 70Mi (8%) 170Mi (20%)
- attachable-volumes-azure-disk 0 0
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- ___omitted___
Comments
-
BTW: both azure node/VM are on the same vnet having both the default rule: 65000
AllowVnetInBound
Any
Any
VirtualNetwork
VirtualNetwork
Allowping 10.107.88.26 succeeded
- ping 10.107.88.26
- PING 10.107.88.26 (10.107.88.26) 56(84) bytes of data.
- ^C
- --- 10.107.88.26 ping statistics ---
- 22 packets transmitted, 0 received, 100% packet loss, time 21484ms
0 -
More details
PS: ping did NOT succeeded (cannot edit previous comment)My VMs provisioning script
- RESOURCE_GROUP='<my-rg-group>'
- LOCATION='westeurope'
- IMAGE='UbuntuLTS'
- #MASTER_SKU='Standard_D1_v2'
- MASTER_SKU='Standard_B2s'
- AGENT_SKU='Standard_B1s'
- MASTER_NAME='k8s-master1'
- az group create -g $RESOURCE_GROUP -l $LOCATION
- az vm create -g $RESOURCE_GROUP -n $MASTER_NAME \
- --size $MASTER_SKU \
- --image $IMAGE \
- --public-ip-address-dns-name 'master1-'$RESOURCE_GROUP \
- --vnet-name vnet1 \
- --subnet subnet1 \
- --custom-data k8sMaster.sh \
- --ssh-key-value @/Users/cristiano/.ssh/azure-vm_rsa.pub
- ##--generate-ssh-keys
- az vm create -g $RESOURCE_GROUP -n 'k8s-worker1' \
- --size $AGENT_SKU \
- --image $IMAGE \
- --public-ip-address-dns-name 'worker1-'$RESOURCE_GROUP \
- --vnet-name vnet1 \
- --subnet subnet1 \
- --custom-data k8sSecond.sh \
- --ssh-key-value @/Users/cristiano/.ssh/azure-vm_rsa.pub
- #--generate-ssh-keys
- #https://docs.microsoft.com/en-us/cli/azure/vm?view=azure-cli-latest#az-vm-open-port
- az vm open-port -g $RESOURCE_GROUP -n $MASTER_NAME --port 30000-33000 --priority 1010
Script has been expired by this article
https://www.aaronmsft.com/posts/azure-vmss-kubernetes-kubeadm/Network settings in azure
https://docs.microsoft.com/en-us/azure/virtual-network/security-overview#denyallinboundDo you think I have to add additional rules rather the default one? Not sure if ClusterIPs requires extra rules or vnet/subnet settings should be enough
Thanks a lot for support and I apologize for poor networking skills
0 -
The only ClusterIP reachable from worker1 node is service/kubernetes
Tthe following curl enquirescurl --insecure https://10.96.0.1/api
succeed on both nodessame for endpoints: the only one working on both nodes is endpoints/kubernetes
On master1 node all endpoints and clusterip are working as expected
ClusterIPs should work on all nodes, regardless where the pod(s) is running. Does the same rule apply to endpoints as well?
I mean, endpoints should be reachable on all nodes?I'm providing here below the output of "route -n" on both node. Endpoint IP should be included on both nodes or the current output is correct?
Where is stored the mappings between clusterip vs endpoints?
cristiano@k8s-master1:~$ kubectl get pods,svc,pvc,pv,deploy,endpoints
- NAME READY STATUS RESTARTS AGE
- pod/basicpod 1/1 Running 0 55m
- pod/nginx-67f8fb575f-r8kh5 1/1 Running 0 179m
- pod/registry-56cffc98d6-jkzj5 1/1 Running 0 179m
- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
- service/basicservice ClusterIP 10.100.134.88 <none> 80/TCP 55m
- service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h17m
- service/nginx ClusterIP 10.111.131.30 <none> 443/TCP 70m
- service/registry ClusterIP 10.104.96.176 <none> 5000/TCP 70m
- NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
- persistentvolumeclaim/nginx-claim0 Bound task-pv-volume 200Mi RWO 179m
- persistentvolumeclaim/registry-claim0 Bound registryvm 200Mi RWO 179m
- NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
- persistentvolume/registryvm 200Mi RWO Retain Bound default/registry-claim0 3h3m
- persistentvolume/task-pv-volume 200Mi RWO Retain Bound default/nginx-claim0 3h3m
- NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
- deployment.extensions/nginx 1 1 1 1 179m
- deployment.extensions/registry 1 1 1 1 179m
- NAME ENDPOINTS AGE
- endpoints/basicservice 192.168.159.131:80 55m
- endpoints/kubernetes 10.0.0.4:6443 3h17m
- endpoints/nginx 192.168.159.129:443 70m
- endpoints/registry 192.168.159.130:5000 70m
cristiano@k8s-master1:~$ curl http://10.100.134.88 -> OK
cristiano@k8s-worker1:~$ curl http://10.100.134.88 -> Timeoutcristiano@k8s-master1:~$ route -n
- Kernel IP routing table
- Destination Gateway Genmask Flags Metric Ref Use Iface
- 0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0
- 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
- 168.63.129.16 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0
- 169.254.169.254 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0
- 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
- 192.168.159.128 0.0.0.0 255.255.255.192 U 0 0 0 *
- 192.168.159.129 0.0.0.0 255.255.255.255 UH 0 0 0 cali0968f46c6a2
- 192.168.159.130 0.0.0.0 255.255.255.255 UH 0 0 0 calib0effd1e9a6
- 192.168.159.131 0.0.0.0 255.255.255.255 UH 0 0 0 cali3fc1b4ac805
- 192.168.159.132 0.0.0.0 255.255.255.255 UH 0 0 0 calif49f354fce4
- 192.168.159.133 0.0.0.0 255.255.255.255 UH 0 0 0 calie2160929446
- 192.168.194.128 10.0.0.5 255.255.255.192 UG 0 0 0 tunl0
cristiano@k8s-worker1:~$ route -n
- Kernel IP routing table
- Destination Gateway Genmask Flags Metric Ref Use Iface
- 0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0
- 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
- 168.63.129.16 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0
- 169.254.169.254 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0
- 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
- 192.168.159.128 10.0.0.4 255.255.255.192 UG 0 0 0 tunl0
- 192.168.194.128 0.0.0.0 255.255.255.192 U 0 0 0 *
0 -
Hi @crixo ,
From your first output, it seems the firewall (AppArmor) is enabled on each node, and it may be blocking traffic to some of the ports - the reason why curls and pings are timing out.
See the instructions from Lab 2.1 - Overview section.
Regards,
-Chris0 -
this is the exact error i am facing. Not able to get to registry from the worker node. My cluster is in Azure and virtual network has traffic enabled on all the ports within cluster (1 master , 2 nodes ).
0 -
Hi @tanwarsatya ,
Also make sure the firewalls at OS level are disabled, as they may be blocking some traffic.
Regards,
-Chris0 -
ufw status is showing inactive on both master and worker nodes.
0 -
Hi @chrispokorni,
If I understood, AppArmor is sort of firewall at kernel-level while the Network settings in azure are somenthing on top of it at azure platform level.
I tried on AWS as well and I did not have that issues; azure and aws VM are both based on ubuntu 16.04, does it mean azure vm and aws VM (both based on ubuntu16.04) have different kernel settings? I mean, azure ubuntu does have AppArmor enabled while aws ubuntu does not?
I'll try to disable AppArmor on azure and i let you know. I assume you are referring to this activity: http://www.techytalk.info/disable-and-remove-apparmor-on-ubuntu-based-linux-distributions/@chrispokorni, do you have any scripts/tools to verify that AppArmor is blocking some ports/ip? I'd like to be able to troubleshoot /verify with some tool the root cause of the problem rather than simply fix the problem.
0 -
@crixo
You can use different tools on your nodes to troubleshoot networking: netcat (nc) or wireshark.
Each cloud provider makes available images for you to chose from, and these images are customized to work best with the provider's infrastructure.
Also, each provider's networking features will work slightly differently.
Regards,
-Chris0 -
@tanwarsatya
You may have a different firewall enabled. The output provided by @crixo for Ubuntu 16 on Azure VMs shows AppArmor as being enabled.
-Chris0 -
@chrispokorni i disabled the apparmor and still not able to work with registry. Will be trying couple more options to see what may be causing this.
0 -
I fully removed apparmor: https://webhostinggeeks.com/howto/how-to-disable-and-remove-apparmor-on-ubuntu-14-04/
now "AppArmor enabled" disappeared from "kubectl describe nodes" but stiil same timeout error.
On azure VM firewall I also add a rule to allow all-traffic in and out on both nodes, but same issue: the clusterip are reachable only on the node where the pod has been deployed.
ufw is disable as well- sudo ufw status verbose
- Status: inactive
0 -
Hi @chrispokorni,
according to my previous post, all firewalls should be disabled/remove(ufw and AppArmor)e also the one provide by azure in front of the VM (Network settings in azure ).
I tried to use some tool you and @serewicz () suggested me in other post but I'm lost due to lack of networking skills... not sure I ran the right tools/script or maybe I'm not able to evaluate the results.
Not sure if the problem is related to a firewall restriction or to some k8s network misconfiguration: can calico have some issue on azure VM? I tried to inspect iptables and I saw a lot settings related to calico... is there a way/script to check ClusterIP vs iptables settings?
Do you have a chance to try the cluster deployment on azure? I posted here above VM provisioning scripts. I'd like to understand what's the issue, how to identify the issue and of course solve it to resume the lab.
Thanks a lot for your support0 -
Hi @crixo ,
Kubernetes' networking is not as complex as we may think, and it doesn't fix networking for us either. It relies on well-configured network infrastructure. So unless our VMs/nodes' networking is setup properly, Kubernetes will not work as expected.
I have not used Azure as of yet, and I am not familiar with its networking configurations. In my research, however, I did find lots of posts around Kubernetes on Azure, with tips on how to fix certain configuration issues.
Is there in Azure a networking configuration similar to a VPC on GCP and on AWS? It allows you to create a custom virtual network (not a VPN) in which to run your nodes for Kubernetes. This configuration resolves similar networking issues on GCP and AWS.
Regards,
-Chris0 -
I guess i have found the issue, and i am afraid it's not possible to just simply create a kubernete cluster with vnet and couple of nodes.
https://docs.projectcalico.org/v3.3/reference/public-cloud/azure
I will still try some more options by installing Azure CNI plugin, if this won't work I am moving to Google for these labs.
0 -
Hi @tanwarsatya,
thanks a lot for your update. I had the same feeling as well.
I assume you'd like to try Azure CNI plug-in instead of the calico plugin included into the our lab setup script. Please keep me posted on the result.
I have been misled by the article I shared in my previous post
https://www.aaronmsft.com/posts/azure-vmss-kubernetes-kubeadm/
in that article the setup script is pretty much equals to the one provided by our lab: it uses same calico plugin.
I did not test in full the suggested approach, but i assume should fail as well.0 -
I have the same problem in my kubernetes running on VM through hyper-v. I have tried all the options mentioned above (disabling apparmor and ufw), but no help. Did anyone managed to resolve this issue? Please help me resolve this issue.
0 -
Hello,
The issue remains the environment network. If there is anything blocking traffic between nodes you will have issues. While you may have disabled UFW in the instance you would also need to ensure the environment allows all traffic.
From what others have found this is not allowed. You may consider a more accessible environment.
Regards,
0 -
Hello Serewic,
I am able to ssh between master and worker, even from the host machine, which indicates network connectivity/traffic fine. Are you suggesting to setup the environment in Google cloud in order to complete the exercise? I would certainly be glad to attempt to setup the environment in Google cloud, as long as it works fine at least in Google cloud. Please advise.
0 -
SSH uses port 22, there are many more in use. Ensure there are no ports blocked between instances.
Should you decide to use GCE there is a setup video I would encourage you to follow. I believe the video is in the resources URL mentioned in the class setup information.
Regards,
0
Categories
- All Categories
- 243 LFX Mentorship
- 243 LFX Mentorship: Linux Kernel
- 815 Linux Foundation IT Professional Programs
- 366 Cloud Engineer IT Professional Program
- 165 Advanced Cloud Engineer IT Professional Program
- 83 DevOps Engineer IT Professional Program
- 131 Cloud Native Developer IT Professional Program
- 142 Express Training Courses & Microlearning
- 119 Express Courses - Discussion Forum
- Microlearning - Discussion Forum
- 6.6K Training Courses
- 40 LFC110 Class Forum - Discontinued
- 72 LFC131 Class Forum
- 49 LFD102 Class Forum
- 234 LFD103 Class Forum
- 21 LFD110 Class Forum
- 31 LFD121 Class Forum
- LFD123 Class Forum
- LFD125 Class Forum
- 16 LFD133 Class Forum
- 6 LFD134 Class Forum
- 17 LFD137 Class Forum
- 70 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 2 LFD237 Class Forum
- 24 LFD254 Class Forum
- 719 LFD259 Class Forum
- 109 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 234 LFS101 Class Forum
- 2 LFS111 Class Forum
- 2 LFS112 Class Forum
- 1 LFS116 Class Forum
- 3 LFS118 Class Forum
- LFS120 Class Forum
- 2 LFS142 Class Forum
- 2 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 15 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 63 LFS158 Class Forum
- LFS158-JP クラス フォーラム
- 4 LFS162 Class Forum
- 1 LFS166 Class Forum
- 5 LFS167 Class Forum
- 1 LFS170 Class Forum
- 2 LFS171 Class Forum
- 2 LFS178 Class Forum
- 2 LFS180 Class Forum
- 1 LFS182 Class Forum
- 5 LFS183 Class Forum
- 30 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 17 LFS203 Class Forum
- 135 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 55 LFS216 Class Forum
- 53 LFS241 Class Forum
- 50 LFS242 Class Forum
- 37 LFS243 Class Forum
- 13 LFS244 Class Forum
- 1 LFS245 Class Forum
- LFS246 Class Forum
- LFS248 Class Forum
- 81 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 143 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 6 LFS256 Class Forum
- LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 135 LFS260 Class Forum
- 151 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 22 LFS267 Class Forum
- 25 LFS268 Class Forum
- 29 LFS269 Class Forum
- 7 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- 2 LFS147 Class Forum
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 2 LFW111 Class Forum
- 257 LFW211 Class Forum
- 176 LFW212 Class Forum
- 12 SKF100 Class Forum
- SKF200 Class Forum
- 2 SKF201 Class Forum
- 789 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 98 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 753 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 147 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 350 Ubuntu
- 464 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 91 Linux Security
- 78 Network Management
- 101 System Management
- 47 Web Management
- 69 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 379 Off Topic
- 113 Introductions
- 177 Small Talk
- 26 Study Material
- 521 Programming and Development
- 291 Kernel Development
- 212 Software Development
- 1.1K Software
- 263 Applications
- 180 Command Line
- 3 Compiling/Installing
- 988 Games
- 311 Installation
- 79 All In Program
- 79 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)