connection timeout connecting from worker to the local registry running on master via ClusterIP
I'm stuck at alb 3.1 - 28
The worker node joined successfully the cluster using kubeadm join... and status is Ready.
I'm not able to connect the worker to the registry curl http://10.107.88.26:5000/v2/ -> curl: (7) Failed to connect to 10.107.88.26 port 5000: Connection timed out
same command works w/o problem on the master node.
is there network utilities/tools/commands I could use to provide you details for troubleshooting?
that's the following output of "kubectl describe nodes"
Name: k8s-master1 Roles: master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=k8s-master1 node-role.kubernetes.io/master= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sat, 05 Jan 2019 14:54:13 +0000 Taints: <none> Unschedulable: false Conditions: ___omitted___ ready status. AppArmor enabled Addresses: InternalIP: 10.0.0.4 Hostname: k8s-master1 Capacity: attachable-volumes-azure-disk: 16 cpu: 2 ephemeral-storage: 30428648Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 4040536Ki pods: 110 Allocatable: attachable-volumes-azure-disk: 16 cpu: 2 ephemeral-storage: 28043041951 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 3938136Ki pods: 110 System Info: Machine ID: 39f567ff6e2f4b29bd860fd7228c1322 System UUID: AE723471-2D04-1B41-A684-3FB8B12C8C31 Boot ID: bf187457-5b0e-43ca-83c2-0e200a7572c9 Kernel Version: 4.15.0-1036-azure OS Image: Ubuntu 16.04.5 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.6.1 Kubelet Version: v1.12.1 Kube-Proxy Version: v1.12.1 PodCIDR: 192.168.0.0/24 Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- ___omitted___ Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 900m (45%) 0 (0%) memory 70Mi (1%) 170Mi (4%) attachable-volumes-azure-disk 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- ___omitted___ Name: k8s-worker1 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=k8s-worker1 Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sat, 05 Jan 2019 14:55:07 +0000 Taints: <none> Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- ___omitted___ ready status. AppArmor enabled Addresses: InternalIP: 10.0.0.5 Hostname: k8s-worker1 Capacity: attachable-volumes-azure-disk: 16 cpu: 1 ephemeral-storage: 30428648Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 944136Ki pods: 110 Allocatable: attachable-volumes-azure-disk: 16 cpu: 1 ephemeral-storage: 28043041951 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 841736Ki pods: 110 System Info: Machine ID: 8977d6e96cdd47ef9fd20f9496ab84f2 System UUID: 92429DB1-12B5-3342-8C66-5A5119371B50 Boot ID: 23371f94-ebaa-4186-b27c-7a756f327aa6 Kernel Version: 4.15.0-1036-azure OS Image: Ubuntu 16.04.5 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.6.1 Kubelet Version: v1.12.1 Kube-Proxy Version: v1.12.1 PodCIDR: 192.168.1.0/24 Non-terminated Pods: (4 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- ___omitted___ Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 350m (35%) 0 (0%) memory 70Mi (8%) 170Mi (20%) attachable-volumes-azure-disk 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- ___omitted___
Comments
-
BTW: both azure node/VM are on the same vnet having both the default rule: 65000
AllowVnetInBound
Any
Any
VirtualNetwork
VirtualNetwork
Allowping 10.107.88.26 succeeded
ping 10.107.88.26 PING 10.107.88.26 (10.107.88.26) 56(84) bytes of data. ^C --- 10.107.88.26 ping statistics --- 22 packets transmitted, 0 received, 100% packet loss, time 21484ms
0 -
More details
PS: ping did NOT succeeded (cannot edit previous comment)My VMs provisioning script
RESOURCE_GROUP='<my-rg-group>' LOCATION='westeurope' IMAGE='UbuntuLTS' #MASTER_SKU='Standard_D1_v2' MASTER_SKU='Standard_B2s' AGENT_SKU='Standard_B1s' MASTER_NAME='k8s-master1' az group create -g $RESOURCE_GROUP -l $LOCATION az vm create -g $RESOURCE_GROUP -n $MASTER_NAME \ --size $MASTER_SKU \ --image $IMAGE \ --public-ip-address-dns-name 'master1-'$RESOURCE_GROUP \ --vnet-name vnet1 \ --subnet subnet1 \ --custom-data k8sMaster.sh \ --ssh-key-value @/Users/cristiano/.ssh/azure-vm_rsa.pub ##--generate-ssh-keys az vm create -g $RESOURCE_GROUP -n 'k8s-worker1' \ --size $AGENT_SKU \ --image $IMAGE \ --public-ip-address-dns-name 'worker1-'$RESOURCE_GROUP \ --vnet-name vnet1 \ --subnet subnet1 \ --custom-data k8sSecond.sh \ --ssh-key-value @/Users/cristiano/.ssh/azure-vm_rsa.pub #--generate-ssh-keys #https://docs.microsoft.com/en-us/cli/azure/vm?view=azure-cli-latest#az-vm-open-port az vm open-port -g $RESOURCE_GROUP -n $MASTER_NAME --port 30000-33000 --priority 1010
Script has been expired by this article
https://www.aaronmsft.com/posts/azure-vmss-kubernetes-kubeadm/Network settings in azure
https://docs.microsoft.com/en-us/azure/virtual-network/security-overview#denyallinboundDo you think I have to add additional rules rather the default one? Not sure if ClusterIPs requires extra rules or vnet/subnet settings should be enough
Thanks a lot for support and I apologize for poor networking skills
0 -
The only ClusterIP reachable from worker1 node is service/kubernetes
Tthe following curl enquirescurl --insecure https://10.96.0.1/api
succeed on both nodessame for endpoints: the only one working on both nodes is endpoints/kubernetes
On master1 node all endpoints and clusterip are working as expected
ClusterIPs should work on all nodes, regardless where the pod(s) is running. Does the same rule apply to endpoints as well?
I mean, endpoints should be reachable on all nodes?I'm providing here below the output of "route -n" on both node. Endpoint IP should be included on both nodes or the current output is correct?
Where is stored the mappings between clusterip vs endpoints?
cristiano@k8s-master1:~$ kubectl get pods,svc,pvc,pv,deploy,endpoints
NAME READY STATUS RESTARTS AGE pod/basicpod 1/1 Running 0 55m pod/nginx-67f8fb575f-r8kh5 1/1 Running 0 179m pod/registry-56cffc98d6-jkzj5 1/1 Running 0 179m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/basicservice ClusterIP 10.100.134.88 <none> 80/TCP 55m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h17m service/nginx ClusterIP 10.111.131.30 <none> 443/TCP 70m service/registry ClusterIP 10.104.96.176 <none> 5000/TCP 70m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/nginx-claim0 Bound task-pv-volume 200Mi RWO 179m persistentvolumeclaim/registry-claim0 Bound registryvm 200Mi RWO 179m NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/registryvm 200Mi RWO Retain Bound default/registry-claim0 3h3m persistentvolume/task-pv-volume 200Mi RWO Retain Bound default/nginx-claim0 3h3m NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.extensions/nginx 1 1 1 1 179m deployment.extensions/registry 1 1 1 1 179m NAME ENDPOINTS AGE endpoints/basicservice 192.168.159.131:80 55m endpoints/kubernetes 10.0.0.4:6443 3h17m endpoints/nginx 192.168.159.129:443 70m endpoints/registry 192.168.159.130:5000 70m
cristiano@k8s-master1:~$ curl http://10.100.134.88 -> OK
cristiano@k8s-worker1:~$ curl http://10.100.134.88 -> Timeoutcristiano@k8s-master1:~$ route -n
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 168.63.129.16 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0 169.254.169.254 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.159.128 0.0.0.0 255.255.255.192 U 0 0 0 * 192.168.159.129 0.0.0.0 255.255.255.255 UH 0 0 0 cali0968f46c6a2 192.168.159.130 0.0.0.0 255.255.255.255 UH 0 0 0 calib0effd1e9a6 192.168.159.131 0.0.0.0 255.255.255.255 UH 0 0 0 cali3fc1b4ac805 192.168.159.132 0.0.0.0 255.255.255.255 UH 0 0 0 calif49f354fce4 192.168.159.133 0.0.0.0 255.255.255.255 UH 0 0 0 calie2160929446 192.168.194.128 10.0.0.5 255.255.255.192 UG 0 0 0 tunl0
cristiano@k8s-worker1:~$ route -n
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 168.63.129.16 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0 169.254.169.254 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.159.128 10.0.0.4 255.255.255.192 UG 0 0 0 tunl0 192.168.194.128 0.0.0.0 255.255.255.192 U 0 0 0 *
0 -
Hi @crixo ,
From your first output, it seems the firewall (AppArmor) is enabled on each node, and it may be blocking traffic to some of the ports - the reason why curls and pings are timing out.
See the instructions from Lab 2.1 - Overview section.
Regards,
-Chris0 -
this is the exact error i am facing. Not able to get to registry from the worker node. My cluster is in Azure and virtual network has traffic enabled on all the ports within cluster (1 master , 2 nodes ).
0 -
Hi @tanwarsatya ,
Also make sure the firewalls at OS level are disabled, as they may be blocking some traffic.
Regards,
-Chris0 -
ufw status is showing inactive on both master and worker nodes.
0 -
Hi @chrispokorni,
If I understood, AppArmor is sort of firewall at kernel-level while the Network settings in azure are somenthing on top of it at azure platform level.
I tried on AWS as well and I did not have that issues; azure and aws VM are both based on ubuntu 16.04, does it mean azure vm and aws VM (both based on ubuntu16.04) have different kernel settings? I mean, azure ubuntu does have AppArmor enabled while aws ubuntu does not?
I'll try to disable AppArmor on azure and i let you know. I assume you are referring to this activity: http://www.techytalk.info/disable-and-remove-apparmor-on-ubuntu-based-linux-distributions/@chrispokorni, do you have any scripts/tools to verify that AppArmor is blocking some ports/ip? I'd like to be able to troubleshoot /verify with some tool the root cause of the problem rather than simply fix the problem.
0 -
@crixo
You can use different tools on your nodes to troubleshoot networking: netcat (nc) or wireshark.
Each cloud provider makes available images for you to chose from, and these images are customized to work best with the provider's infrastructure.
Also, each provider's networking features will work slightly differently.
Regards,
-Chris0 -
@tanwarsatya
You may have a different firewall enabled. The output provided by @crixo for Ubuntu 16 on Azure VMs shows AppArmor as being enabled.
-Chris0 -
@chrispokorni i disabled the apparmor and still not able to work with registry. Will be trying couple more options to see what may be causing this.
0 -
I fully removed apparmor: https://webhostinggeeks.com/howto/how-to-disable-and-remove-apparmor-on-ubuntu-14-04/
now "AppArmor enabled" disappeared from "kubectl describe nodes" but stiil same timeout error.
On azure VM firewall I also add a rule to allow all-traffic in and out on both nodes, but same issue: the clusterip are reachable only on the node where the pod has been deployed.
ufw is disable as wellsudo ufw status verbose Status: inactive
0 -
Hi @chrispokorni,
according to my previous post, all firewalls should be disabled/remove(ufw and AppArmor)e also the one provide by azure in front of the VM (Network settings in azure ).
I tried to use some tool you and @serewicz () suggested me in other post but I'm lost due to lack of networking skills... not sure I ran the right tools/script or maybe I'm not able to evaluate the results.
Not sure if the problem is related to a firewall restriction or to some k8s network misconfiguration: can calico have some issue on azure VM? I tried to inspect iptables and I saw a lot settings related to calico... is there a way/script to check ClusterIP vs iptables settings?
Do you have a chance to try the cluster deployment on azure? I posted here above VM provisioning scripts. I'd like to understand what's the issue, how to identify the issue and of course solve it to resume the lab.
Thanks a lot for your support0 -
Hi @crixo ,
Kubernetes' networking is not as complex as we may think, and it doesn't fix networking for us either. It relies on well-configured network infrastructure. So unless our VMs/nodes' networking is setup properly, Kubernetes will not work as expected.
I have not used Azure as of yet, and I am not familiar with its networking configurations. In my research, however, I did find lots of posts around Kubernetes on Azure, with tips on how to fix certain configuration issues.
Is there in Azure a networking configuration similar to a VPC on GCP and on AWS? It allows you to create a custom virtual network (not a VPN) in which to run your nodes for Kubernetes. This configuration resolves similar networking issues on GCP and AWS.
Regards,
-Chris0 -
I guess i have found the issue, and i am afraid it's not possible to just simply create a kubernete cluster with vnet and couple of nodes.
https://docs.projectcalico.org/v3.3/reference/public-cloud/azure
I will still try some more options by installing Azure CNI plugin, if this won't work I am moving to Google for these labs.
0 -
Hi @tanwarsatya,
thanks a lot for your update. I had the same feeling as well.
I assume you'd like to try Azure CNI plug-in instead of the calico plugin included into the our lab setup script. Please keep me posted on the result.
I have been misled by the article I shared in my previous post
https://www.aaronmsft.com/posts/azure-vmss-kubernetes-kubeadm/
in that article the setup script is pretty much equals to the one provided by our lab: it uses same calico plugin.
I did not test in full the suggested approach, but i assume should fail as well.0 -
I have the same problem in my kubernetes running on VM through hyper-v. I have tried all the options mentioned above (disabling apparmor and ufw), but no help. Did anyone managed to resolve this issue? Please help me resolve this issue.
0 -
Hello,
The issue remains the environment network. If there is anything blocking traffic between nodes you will have issues. While you may have disabled UFW in the instance you would also need to ensure the environment allows all traffic.
From what others have found this is not allowed. You may consider a more accessible environment.
Regards,
0 -
Hello Serewic,
I am able to ssh between master and worker, even from the host machine, which indicates network connectivity/traffic fine. Are you suggesting to setup the environment in Google cloud in order to complete the exercise? I would certainly be glad to attempt to setup the environment in Google cloud, as long as it works fine at least in Google cloud. Please advise.
0 -
SSH uses port 22, there are many more in use. Ensure there are no ports blocked between instances.
Should you decide to use GCE there is a setup video I would encourage you to follow. I believe the video is in the resources URL mentioned in the class setup information.
Regards,
0
Categories
- All Categories
- 167 LFX Mentorship
- 219 LFX Mentorship: Linux Kernel
- 798 Linux Foundation IT Professional Programs
- 356 Cloud Engineer IT Professional Program
- 180 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 149 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 138 Express Courses - Discussion Forum
- 6.2K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 35 LFD102 Class Forum
- 227 LFD103 Class Forum
- 19 LFD110 Class Forum
- 39 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 697 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 154 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 1 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 7 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 34 LFS158 Class Forum
- 8 LFS162 Class Forum
- 1 LFS166 Class Forum
- 1 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 1 LFS178 Class Forum
- 1 LFS180 Class Forum
- 1 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 102 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 48 LFS241 Class Forum
- 48 LFS242 Class Forum
- 37 LFS243 Class Forum
- 15 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- 50 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 155 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 5 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 121 LFS260 Class Forum
- 159 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 25 LFS268 Class Forum
- 31 LFS269 Class Forum
- 1 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 10 LFW111 Class Forum
- 261 LFW211 Class Forum
- 182 LFW212 Class Forum
- 15 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 96 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 758 Linux Distributions
- 80 Debian
- 67 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 143 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 461 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 64 Mobile Computing
- 17 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 19 Study Material
- 806 Programming and Development
- 304 Kernel Development
- 204 Software Development
- 1.8K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 309 Installation
- 97 All In Program
- 97 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)