Welcome to the Linux Foundation Forum!

Unable to start pod/container in lab 2.3 - Error message is "Error from server (BadRequest) .."

Hi,

I have completed lab 2.2 to setup the Kubernetes cluster. When I tried to create a pod/container in lab 2.3, I keep seeing this error:

Error from server (BadRequest): container "nginx" in pod "nginx" is waiting to start: ContainerCreating

When I do a kubectl describe pod nginx, I see these errors:

.....
Warning FailedCreatePodSandBox 95s kubelet Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_ngi
nx_default_e1106b28-303c-4e75-afc2-d6d14bd67913_0 in pod sandbox k8s_nginx_defau
lt_e1106b28-303c-4e75-afc2-d6d14bd67913_0(507780f27bf6a769b6e7178ebe52a032e8967f
2af9d720f1931933e0a202c917): error creating overlay mount to /var/lib/containers
/storage/overlay/0c9cccedaee7f6a42d1546dc06d3100072fb4ac860040aeb7b58d85d3e39c9a
c/merged, mount_data="nodev,metacopy=on,lowerdir=/var/lib/containers/storage/ove
rlay/l/4NMXH7DOMDNBJNKEOKA65YCBZS,upperdir=/var/lib/containers/storage/overlay/0
c9cccedaee7f6a42d1546dc06d3100072fb4ac860040aeb7b58d85d3e39c9ac/diff,workdir=/va
r/lib/containers/storage/overlay/0c9cccedaee7f6a42d1546dc06d3100072fb4ac860040ae
b7b58d85d3e39c9ac/work": invalid argument
.....

What am I doing wrong?

Thanks

TW

Best Answers

  • serewicz
    serewicz Posts: 994
    Answer ✓

    Hello,

    I notice you have AppArmor enabled. That could be the cause of some headaches, does the problem persist when you disable it?

    As all the failed pods are on your worker I would suspect it is either AppArmor, GCE VPC firewall issue, or a networking issue where the nodes are using overlapping IP addresses with the host.

    Could you disable AppArmor on all nodes, ensure your VPC allows all traffic, and show the IP ranges used by your primary interface (something like ens4) on both nodes and show the results after.

    Regards,

  • tanwee
    tanwee Posts: 5
    Answer ✓

    Hi,

    Thanks for your help. I recreated the VMs in GCE again and it is working now. I must have misconfigured VPC the first time. I managed to pass the exam after going through the labs.

    Rgds

    TW

Answers

  • Hi @tanwee,

    Is this a generic symptom observed on multiple pods or just this one? Please provide the output of the following command:

    kubectl get pods -A -o wide

    Regards,
    -Chris

  • Hi Chris,

    This error is occurring for any pod that I try to create on the cluster. The output of the command is:

    NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    default nginx 0/1 ContainerCreating 0 52s worker
    kube-system calico-kube-controllers-5d995d45d6-6mk6b 1/1 Running 1 2d23h 192.168.242.65 cp
    kube-system calico-node-s824n 0/1 Init:0/3 0 2d23h 10.2.0.5 worker
    kube-system calico-node-zkxrn 1/1 Running 1 2d23h 10.2.0.4 cp
    kube-system coredns-78fcd69978-4svtg 1/1 Running 1 2d23h 192.168.242.66 cp
    kube-system coredns-78fcd69978-m4nsp 1/1 Running 1 2d23h 192.168.242.67 cp
    kube-system etcd-cp 1/1 Running 1 2d23h 10.2.0.4 cp
    kube-system kube-apiserver-cp 1/1 Running 1 2d23h 10.2.0.4 cp
    kube-system kube-controller-manager-cp 1/1 Running 1 2d23h 10.2.0.4 cp
    kube-system kube-proxy-fn5xm 0/1 ContainerCreating 0 2d23h 10.2.0.5 worker
    kube-system kube-proxy-trxxb 1/1 Running 1 2d23h 10.2.0.4 cp
    kube-system kube-scheduler-cp 1/1 Running 1 2d23h 10.2.0.4 cp

    Rgds

    Tan Wee

  • serewicz
    serewicz Posts: 994

    Hello,

    From the look of things Calico is not running on your worker. There are a few reasons this could happen, but chances are it has to do with a networking configuration error or a firewall between your instances.

    What are you using to run the lab exercises, GCE, AWS, Digital Ocean, VMWare, VirtualBox, two Linux laptops?

    Regards,

  • Hi,

    I created the 2 VMs in GCE following the GCE Lab setup video.

    Thanks

    TW

  • Hi @tanwee,

    Thank you for the provided output. It seems that none of the containers scheduled to the worker node are able to start. The node itself may not be ready.
    What may help are the outputs of the following two commands:

    kubectl get nodes

    kubectl describe node worker

    Regards,
    -Chris

  • serewicz
    serewicz Posts: 994

    I would also double check that the VPC is allowing all traffic between your VMs, as well.

    Regards,

  • Hi,

    This is the output of kubectl describe node worker. You can see the last message:

    Node worker status is now: NodeReady

    Rgds

    TW


    Name: worker
    Roles:
    Labels: beta.kubernetes.io/arch=amd64
    beta.kubernetes.io/os=linux
    kubernetes.io/arch=amd64
    kubernetes.io/hostname=worker
    kubernetes.io/os=linux
    Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/crio/crio.sock
    node.alpha.kubernetes.io/ttl: 0
    volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp: Sat, 20 Nov 2021 12:45:36 +0000
    Taints:
    Unschedulable: false
    Lease:
    HolderIdentity: worker
    AcquireTime:
    RenewTime: Wed, 24 Nov 2021 12:52:02 +0000
    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason Message
    ---- ------ ----------------- ------------------ ------ -------
    MemoryPressure False Wed, 24 Nov 2021 12:48:48 +0000 Wed, 24 Nov 2021 12:48:38 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
    DiskPressure False Wed, 24 Nov 2021 12:48:48 +0000 Wed, 24 Nov 2021 12:48:38 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
    PIDPressure False Wed, 24 Nov 2021 12:48:48 +0000 Wed, 24 Nov 2021 12:48:38 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
    Ready True Wed, 24 Nov 2021 12:48:48 +0000 Wed, 24 Nov 2021 12:48:48 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
    Addresses:
    InternalIP: 10.2.0.5
    Hostname: worker
    Capacity:
    cpu: 2
    ephemeral-storage: 20145724Ki
    hugepages-1Gi: 0
    hugepages-2Mi: 0
    memory: 7977Mi
    pods: 110
    Allocatable:
    cpu: 2
    ephemeral-storage: 18566299208
    hugepages-1Gi: 0
    hugepages-2Mi: 0
    memory: 7877Mi
    pods: 110
    System Info:
    Machine ID: 9df579dd8f5e6ed7ca105568417ac070
    System UUID: 9DF579DD-8F5E-6ED7-CA10-5568417AC070
    Boot ID: e3e3b6a9-26d9-4c48-bd0a-c0233a067a5c
    Kernel Version: 4.15.0-1006-gcp
    OS Image: Ubuntu 18.04 LTS
    Operating System: linux
    Architecture: amd64
    Container Runtime Version: cri-o://1.22.1
    Kubelet Version: v1.22.1
    Kube-Proxy Version: v1.22.1
    PodCIDR: 192.168.1.0/24
    PodCIDRs: 192.168.1.0/24
    Non-terminated Pods: (3 in total)
    Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
    --------- ---- ------------ ---------- --------------- ------------- ---
    default nginx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 24h
    kube-system calico-node-s824n 250m (12%) 0 (0%) 0 (0%) 0 (0%) 4d
    kube-system kube-proxy-fn5xm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    Resource Requests Limits
    -------- -------- ------
    cpu 250m (12%) 0 (0%)
    memory 0 (0%) 0 (0%)
    ephemeral-storage 0 (0%) 0 (0%)
    hugepages-1Gi 0 (0%) 0 (0%)
    hugepages-2Mi 0 (0%) 0 (0%)
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal NodeHasSufficientMemory 4d (x2 over 4d) kubelet Node worker status is now: NodeHasSufficientMemory
    Normal NodeHasNoDiskPressure 4d (x2 over 4d) kubelet Node worker status is now: NodeHasNoDiskPressure
    Normal NodeHasSufficientPID 4d (x2 over 4d) kubelet Node worker status is now: NodeHasSufficientPID
    Normal NodeAllocatableEnforced 4d kubelet Updated Node Allocatable limit across pods
    Normal Starting 4d kubelet Starting kubelet.
    Normal NodeReady 4d kubelet Node worker status is now: NodeReady
    Normal NodeAllocatableEnforced 24h kubelet Updated Node Allocatable limit across pods
    Normal Starting 24h kubelet Starting kubelet.
    Normal NodeHasSufficientMemory 24h (x2 over 24h) kubelet Node worker status is now: NodeHasSufficientMemory
    Normal NodeHasNoDiskPressure 24h (x2 over 24h) kubelet Node worker status is now: NodeHasNoDiskPressure
    Normal NodeHasSufficientPID 24h (x2 over 24h) kubelet Node worker status is now: NodeHasSufficientPID
    Warning Rebooted 24h kubelet Node worker has been rebooted, boot id: 603e4fdf-67f1-4f49-8986-5e9f749a6d95
    Normal NodeNotReady 24h kubelet Node worker status is now: NodeNotReady
    Normal NodeReady 24h kubelet Node worker status is now: NodeReady
    Normal Starting 3m27s kubelet Starting kubelet.
    Normal NodeHasSufficientMemory 3m26s (x2 over 3m26s) kubelet Node worker status is now: NodeHasSufficientMemory
    Normal NodeHasNoDiskPressure 3m26s (x2 over 3m26s) kubelet Node worker status is now: NodeHasNoDiskPressure
    Normal NodeHasSufficientPID 3m26s (x2 over 3m26s) kubelet Node worker status is now: NodeHasSufficientPID
    Warning Rebooted 3m26s kubelet Node worker has been rebooted, boot id: e3e3b6a9-26d9-4c48-bd0a-c0233a067a5c
    Normal NodeNotReady 3m26s kubelet Node worker status is now: NodeNotReady
    Normal NodeAllocatableEnforced 3m26s kubelet Updated Node Allocatable limit across pods
    Normal NodeReady 3m16s kubelet Node worker status is now: NodeReady

  • Glad it all worked out and congratulations on passing the exam @tanwee!

    Regards,
    -Chris

  • Hello, I have the same problem using GCE

    kubectl get pods -A -o wide

  • any advice on how can I debug why that calico-node-82fl is not running?
    I checked firewall and it has Ingress allow all, not sure how to debug it

  • serewicz
    serewicz Posts: 994

    Hello,

    When you say you added an Ingress allow all, are you talking about the GCE VPC? Be sure to allow all traffic between nodes from the Google perspective.

    Other than that what IP ranges did you choose? DId you assign 192.168 to your nodes by any chance?

    Any deviation from the lab setup and exercise?

    Regards,

  • Yes, I created a new GCE VPC network with 10.2.0.0/16 subnet, then added a Firewall rule with Ingress allow all ports. Then made sure I used that network in my instances
    I am able to ping one node from another, it feels like a network issue, but lucking the experience I am not able to find the root cause. Current UI is little different from the lab setup video on GCE, but I tried to follow as close as possible.

    I am now trying this guide: https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-public-cloud/gce. they use a little different network setup. I will let you know if this is going to work.

  • Using that guide and gcloud sdk I was able to create VPC network and two instances and then ran provisioning scripts from LFD259 solutions, calico is started successfully now:

  • Hi @sgurenkov,

    The guide from tigera.io/project-calico/ may still be using docker as container runtime, which is different from the cri-o runtime recommended for this class. In the near future docker will no longer be supported as runtime for Kubernetes, therefore the labs have migrated onto a different container runtime.

    The 10.8x.0.0 network connecting the calico-kube-controllers and the coredns pods was most likely initiated by Podman, and during the the init phase somehow the pods were assigned IP addresses from Podman's bridge network instead of being exposed over their respective nodes' IP addresses - the expected behavior.

    A simple delete of these pods typically allows for the IP address assignment to be corrected.

    Regards,
    -Chris

  • Hi Everyone, I am also facing the same issue, can you please help ?

    I am using two Virtualbox vms connected with a nat network . AppArmor is uninstalled on both instances. all installation and requirement are exactly as on the lab exercises , I removed the vms and started fresh multiple times.

    Thanks in advance!

    result for : kubectl get nodes

    **result for kubectl get pods -A -o wide : **

    **result for kubectl describe node wn1 : **

    Name: wn1 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=wn1 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/crio/crio.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 13 Jan 2022 15:56:16 +0000 Taints: Unschedulable: false Lease: HolderIdentity: wn1 AcquireTime: RenewTime: Thu, 13 Jan 2022 16:32:22 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Thu, 13 Jan 2022 16:31:59 +0000 Thu, 13 Jan 2022 15:56:16 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 13 Jan 2022 16:31:59 +0000 Thu, 13 Jan 2022 15:56:16 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 13 Jan 2022 16:31:59 +0000 Thu, 13 Jan 2022 15:56:16 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 13 Jan 2022 16:31:59 +0000 Thu, 13 Jan 2022 15:56:26 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 10.0.2.5 Hostname: wn1 Capacity: cpu: 2 ephemeral-storage: 25668836Ki hugepages-2Mi: 0 memory: 4039160Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 23656399219 hugepages-2Mi: 0 memory: 3936760Ki pods: 110 System Info: Machine ID: 05291b4c92144126979989eab08c9a58 System UUID: 7CDF5996-062F-4049-B6C0-087A3C62288F Boot ID: a43595d1-f2b7-40dc-bef2-3063db315ff0 Kernel Version: 4.15.0-166-generic OS Image: Ubuntu 18.04.6 LTS Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.22.1 Kubelet Version: v1.22.1 Kube-Proxy Version: v1.22.1 PodCIDR: 192.168.1.0/24 PodCIDRs: 192.168.1.0/24 Non-terminated Pods: (2 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system calico-node-zph59 250m (12%) 0 (0%) 0 (0%) 0 (0%) 36m kube-system kube-proxy-82vv9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 36m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 250m (12%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 36m kubelet Starting kubelet. Normal NodeHasSufficientMemory 36m (x2 over 36m) kubelet Node wn1 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 36m (x2 over 36m) kubelet Node wn1 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 36m (x2 over 36m) kubelet Node wn1 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 36m kubelet Updated Node Allocatable limit across pods Normal NodeReady 35m kubelet Node wn1 status is now: NodeReady

    last lines from result for : kubectl describe pod calico-node-zph59 --namespace=kube-system

    Warning FailedCreatePodSandBox 2m5s (x151 over 35m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_calico-node-zph59_kube-system_e8d117b7-aa0a-432f-8376-fe45ce85a4fe_0 in pod sandbox k8s_calico-node-zph59_kube-system_e8d117b7-aa0a-432f-8376-fe45ce85a4fe_0(d7ebf9a863e5dfa8018a6e1a233f0726a4ec9f16c73596e6d183f29313f3cd0c): error creating overlay mount to /var/lib/containers/storage/overlay/71684365cde8ce52493971416573fd46038082aaa807b64f55690d1744e53a78/merged, mount_data="nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/N6RBQYMDVUFBOBV74KB3GTYZF7,upperdir=/var/lib/containers/storage/overlay/71684365cde8ce52493971416573fd46038082aaa807b64f55690d1744e53a78/diff,workdir=/var/lib/containers/storage/overlay/71684365cde8ce52493971416573fd46038082aaa807b64f55690d1744e53a78/work": invalid argument

  • Hi Everyone, I have solved the issue by adding

    sudo sed -i 's/,metacopy=on//g' /etc/containers/storage.conf

    to k8sSecond.sh after sudo apt-get install -y cri-o cri-o-runc podman buildah

    This was an issue related to ubuntu 18.04

  • chrispokorni
    chrispokorni Posts: 1,552

    Hi @elmoussaoui,

    What type of VMs are you using and which Ubuntu 18.04 did you have installed (desktop/server) ?

    Regards,
    -Chris

  • Hi @chrispokorni,

    Two virtual box VMs, connected with a Nat network, ubuntu-18.04.6-live-server-amd64

    Regards,

  • chrispokorni
    chrispokorni Posts: 1,552

    Hi @elmoussaoui,

    This seems to be encountered on local installs, while cloud ubuntu 18.04 server images do not display the same behavior.

    Regards,
    -Chris

Categories

Upcoming Training