Welcome to the new Linux Foundation Forum!
connection timeout connecting from worker to the local registry running on master via ClusterIP

I'm stuck at alb 3.1 - 28
The worker node joined successfully the cluster using kubeadm join... and status is Ready.
I'm not able to connect the worker to the registry
curl http://10.107.88.26:5000/v2/ -> curl: (7) Failed to connect to 10.107.88.26 port 5000: Connection timed out
same command works w/o problem on the master node.
is there network utilities/tools/commands I could use to provide you details for troubleshooting?
that's the following output of "kubectl describe nodes"
Name: k8s-master1
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=k8s-master1
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 05 Jan 2019 14:54:13 +0000
Taints: <none>
Unschedulable: false
Conditions:
___omitted___
ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.4
Hostname: k8s-master1
Capacity:
attachable-volumes-azure-disk: 16
cpu: 2
ephemeral-storage: 30428648Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4040536Ki
pods: 110
Allocatable:
attachable-volumes-azure-disk: 16
cpu: 2
ephemeral-storage: 28043041951
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3938136Ki
pods: 110
System Info:
Machine ID: 39f567ff6e2f4b29bd860fd7228c1322
System UUID: AE723471-2D04-1B41-A684-3FB8B12C8C31
Boot ID: bf187457-5b0e-43ca-83c2-0e200a7572c9
Kernel Version: 4.15.0-1036-azure
OS Image: Ubuntu 16.04.5 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.1
Kube-Proxy Version: v1.12.1
PodCIDR: 192.168.0.0/24
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
___omitted___
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 900m (45%) 0 (0%)
memory 70Mi (1%) 170Mi (4%)
attachable-volumes-azure-disk 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
___omitted___
Name: k8s-worker1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=k8s-worker1
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 05 Jan 2019 14:55:07 +0000
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
___omitted___
ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.5
Hostname: k8s-worker1
Capacity:
attachable-volumes-azure-disk: 16
cpu: 1
ephemeral-storage: 30428648Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 944136Ki
pods: 110
Allocatable:
attachable-volumes-azure-disk: 16
cpu: 1
ephemeral-storage: 28043041951
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 841736Ki
pods: 110
System Info:
Machine ID: 8977d6e96cdd47ef9fd20f9496ab84f2
System UUID: 92429DB1-12B5-3342-8C66-5A5119371B50
Boot ID: 23371f94-ebaa-4186-b27c-7a756f327aa6
Kernel Version: 4.15.0-1036-azure
OS Image: Ubuntu 16.04.5 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.1
Kube-Proxy Version: v1.12.1
PodCIDR: 192.168.1.0/24
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
___omitted___
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 350m (35%) 0 (0%)
memory 70Mi (8%) 170Mi (20%)
attachable-volumes-azure-disk 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
___omitted___
0
Comments
BTW: both azure node/VM are on the same vnet having both the default rule: 65000
AllowVnetInBound
Any
Any
VirtualNetwork
VirtualNetwork
Allow
ping 10.107.88.26 succeeded
More details
PS: ping did NOT succeeded (cannot edit previous comment)
My VMs provisioning script
Script has been expired by this article
https://www.aaronmsft.com/posts/azure-vmss-kubernetes-kubeadm/
Network settings in azure
https://docs.microsoft.com/en-us/azure/virtual-network/security-overview#denyallinbound
Do you think I have to add additional rules rather the default one? Not sure if ClusterIPs requires extra rules or vnet/subnet settings should be enough
Thanks a lot for support and I apologize for poor networking skills
The only ClusterIP reachable from worker1 node is service/kubernetes
Tthe following curl enquires
curl --insecure https://10.96.0.1/api
succeed on both nodes
same for endpoints: the only one working on both nodes is endpoints/kubernetes
On master1 node all endpoints and clusterip are working as expected
ClusterIPs should work on all nodes, regardless where the pod(s) is running. Does the same rule apply to endpoints as well?
I mean, endpoints should be reachable on all nodes?
I'm providing here below the output of "route -n" on both node. Endpoint IP should be included on both nodes or the current output is correct?
Where is stored the mappings between clusterip vs endpoints?
[email protected]:~$ kubectl get pods,svc,pvc,pv,deploy,endpoints
[email protected]:~$ curl http://10.100.134.88 -> OK
[email protected]:~$ curl http://10.100.134.88 -> Timeout
[email protected]:~$ route -n
[email protected]:~$ route -n
Hi @crixo ,
From your first output, it seems the firewall (AppArmor) is enabled on each node, and it may be blocking traffic to some of the ports - the reason why curls and pings are timing out.
See the instructions from Lab 2.1 - Overview section.
Regards,
-Chris
this is the exact error i am facing. Not able to get to registry from the worker node. My cluster is in Azure and virtual network has traffic enabled on all the ports within cluster (1 master , 2 nodes ).
Hi @tanwarsatya ,
Also make sure the firewalls at OS level are disabled, as they may be blocking some traffic.
Regards,
-Chris
ufw status is showing inactive on both master and worker nodes.
Hi @chrispokorni,
If I understood, AppArmor is sort of firewall at kernel-level while the Network settings in azure are somenthing on top of it at azure platform level.
I tried on AWS as well and I did not have that issues; azure and aws VM are both based on ubuntu 16.04, does it mean azure vm and aws VM (both based on ubuntu16.04) have different kernel settings? I mean, azure ubuntu does have AppArmor enabled while aws ubuntu does not?
I'll try to disable AppArmor on azure and i let you know. I assume you are referring to this activity: http://www.techytalk.info/disable-and-remove-apparmor-on-ubuntu-based-linux-distributions/
@chrispokorni, do you have any scripts/tools to verify that AppArmor is blocking some ports/ip? I'd like to be able to troubleshoot /verify with some tool the root cause of the problem rather than simply fix the problem.
@crixo
You can use different tools on your nodes to troubleshoot networking: netcat (nc) or wireshark.
Each cloud provider makes available images for you to chose from, and these images are customized to work best with the provider's infrastructure.
Also, each provider's networking features will work slightly differently.
Regards,
-Chris
@tanwarsatya
You may have a different firewall enabled. The output provided by @crixo for Ubuntu 16 on Azure VMs shows AppArmor as being enabled.
-Chris
@chrispokorni i disabled the apparmor and still not able to work with registry. Will be trying couple more options to see what may be causing this.
I fully removed apparmor: https://webhostinggeeks.com/howto/how-to-disable-and-remove-apparmor-on-ubuntu-14-04/
now "AppArmor enabled" disappeared from "kubectl describe nodes" but stiil same timeout error.
On azure VM firewall I also add a rule to allow all-traffic in and out on both nodes, but same issue: the clusterip are reachable only on the node where the pod has been deployed.
ufw is disable as well
Hi @chrispokorni,
according to my previous post, all firewalls should be disabled/remove(ufw and AppArmor)e also the one provide by azure in front of the VM (Network settings in azure ).
I tried to use some tool you and @serewicz () suggested me in other post but I'm lost due to lack of networking skills... not sure I ran the right tools/script or maybe I'm not able to evaluate the results.
Not sure if the problem is related to a firewall restriction or to some k8s network misconfiguration: can calico have some issue on azure VM? I tried to inspect iptables and I saw a lot settings related to calico... is there a way/script to check ClusterIP vs iptables settings?
Do you have a chance to try the cluster deployment on azure? I posted here above VM provisioning scripts. I'd like to understand what's the issue, how to identify the issue and of course solve it to resume the lab.
Thanks a lot for your support
Hi @crixo ,
Kubernetes' networking is not as complex as we may think, and it doesn't fix networking for us either. It relies on well-configured network infrastructure. So unless our VMs/nodes' networking is setup properly, Kubernetes will not work as expected.
I have not used Azure as of yet, and I am not familiar with its networking configurations. In my research, however, I did find lots of posts around Kubernetes on Azure, with tips on how to fix certain configuration issues.
Is there in Azure a networking configuration similar to a VPC on GCP and on AWS? It allows you to create a custom virtual network (not a VPN) in which to run your nodes for Kubernetes. This configuration resolves similar networking issues on GCP and AWS.
Regards,
-Chris
I guess i have found the issue, and i am afraid it's not possible to just simply create a kubernete cluster with vnet and couple of nodes.
https://docs.projectcalico.org/v3.3/reference/public-cloud/azure
I will still try some more options by installing Azure CNI plugin, if this won't work I am moving to Google for these labs.
Hi @tanwarsatya,
thanks a lot for your update. I had the same feeling as well.
I assume you'd like to try Azure CNI plug-in instead of the calico plugin included into the our lab setup script. Please keep me posted on the result.
I have been misled by the article I shared in my previous post
https://www.aaronmsft.com/posts/azure-vmss-kubernetes-kubeadm/
in that article the setup script is pretty much equals to the one provided by our lab: it uses same calico plugin.
I did not test in full the suggested approach, but i assume should fail as well.