Welcome to the Linux Foundation Forum!

Lab 3.2. Deploy simpleapp + error on workernode

Fafa0803
Fafa0803 Posts: 11
edited September 2020 in LFD259 Class Forum

Hi,

I am stuck on lab 3.2. from the task 29. to task 35.

What I have done so far:

  • I have pushed the simpleapp to the local registry.
  • I created the daemon.json file with the line { "insecure-registries":["IP-Adress:5000"] }
  • I restarted the local registry

My Problem is:
1) I can not pull the image from the worker node
2) I can not deploy the whole application (6 pods / 6 replicas) from the master node. 3 Pods (I assume the pods for the worker node) are not starting and I get an error message for these pods.

Error Messages for:
1) sudo docker pull 10.103.255.115:5000/tagtest
Using default tag: latest
Error response from daemon: Get https://10.103.255.115:5000/v2/: http: server gave HTTP response to HTTPS client

2) Please have a look at the screenshot:

I assumed that there is something wrong with the worker node, but as you can see, the worker node is configured in the K8s cluster.

I am using the aws cloud and configured the two virtual machines as described in the video.
I have no clue how to fix this problem.

Could you help please?
Thank you very much.

Comments

  • Hi @Fafa0803,

    In step 30, what image are you pulling from the private registry? From your notes, you are attempting to pull the tagtest image. According to step 30, you should be pulling the simpleapp image.

    Also, what image are you using for the try1 deployment in step 31?

    As an overall verification, try to run step 21 from both nodes, your ckad1 and ckad2 (assuming that 10.103.255.115 is the ClusterIP of your registry service) :

    curl http://10.103.255.115:5000/v2/

    It may have been successful from ckad1, but I would like to see the output it produces when attempted from ckad2.

    Regards,
    -Chris

  • serewicz
    serewicz Posts: 1,000

    Hello,

    As well as what Chris suggested, if you run kubectl get pods -o wide, and find that all the failed pods are on the worker, this usually indicates the steps to setup the repository on the worker was skipped or incorrectly done. Make sure the /etc/docker/daemon.json file on the worker is the same as the master, and remember you have two services to restart.

    Regards,

  • Hi,

    Thank you for your quick response :).

    1) I am sorry. This was my mistake. It works with the simpleapp.

    2) For step 31 I used simpleapp - its saved in my command line. But I tried a second deployment with the name try2 and it does also not work:

    The command: curl http://10.103.255.115:5000/v2/ is producing the following results:

    Masternode:

    Workernode:

    Thank you for help.

    Best regards,
    Fabian

  • Fafa0803
    Fafa0803 Posts: 11
    edited September 2020

    I just saw the comment from serewicz. Thank you. I will try this also.

    Update:
    I checked the daemon.json on the master and worker node. Both were identically and i also restarted the service.
    I still can not scale the application to 6 nodes. I still have the errors for the 3 pods.

    Best regards,
    Fabian

  • Fafa0803
    Fafa0803 Posts: 11
    edited September 2020

    Good morning,

    to further test the issue, i created a new deployment file, which i found on the kubernetes documentation: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ and i changed the replicas to 6.

    When I created a new deployment using this file, all pods are deployed to the master and worker node.

  • chrispokorni
    chrispokorni Posts: 2,349
    edited September 2020

    Hi @Fafa0803,

    The nginx deployment does not help with your original issue reported above, because it does not make use of the private registry we are setting up as part of Lab 3.

    Would you be able to provide the output to the command requested by @serewicz earlier?

    In addition, picking the name of a troubled pod (one with ImagePullBackOff or ErrImagePull), can you show the Events (displayed at the bottom of the output) from the command:

    kubectl describe pod <troubled-pod-name>

    Regards,
    -Chris

  • Now I am really confused, because after starting my machines today, i can not get the pods running.

    I get this error message:

    So I started again from step 29. On the step 30, i get the following error message on the worker node:

    When i continue with stept 31 and 32 on master and look at my pods, I get this:

    I would have thought, that I can get at least the pods on master running.
    When I describe one pod, than i get this:

    Is the "simpleapp" no longer on the repository?
    How can that be?

    Now I am totaly lost.

  • serewicz
    serewicz Posts: 1,000

    Hello,

    Many things can happen when you reboot a node, such as a change in IP addresses due to DHCP. As everything is an API call, this can be problematic. That the nginx pod continues to run would indicate that the issue is only with the repository configuration, not the Kubernetes cluster itself.

    First, let us see the condition of the node , infrastructure pods, and services. Please run
    kubectl get node
    kubectl -n kube-system get pod
    kubectl get svc --all-namespaces

    Then show the contents, on both nodes, of /etc/docker/daemon.json.

    Finally what are the errors when you run kubectl describe pod try2-some-long-name where you use one of your pod's actual name instead. Near the end of the output you should see several errors, probably having do with not finding the repository. Please paste all of them.

    Hopefully from this output we'll know where to look next.

    Regards,

  • Hello @serewicz ,

    thank you so much for your help.

    Please find the output for the following commands below:

    Kubectl get node

    kubectl -n kube-system get pod

    kubectl get svc --all-namespaces

    /etc/docker/daemon.json

    Master:

    Worker:

    kubectl describe pod try2-some-long-name

    Again, thank you very much for your help :).

    Best regards,
    Fabian

  • Hi Fabian,

    As you can see from your output above, the daemon.json file on your worker node is incomplete. Edit the file to add the missing port number, then restart the service.

    Regards,
    -Chris

  • Hi chris,

    thank you for your help.

    I fixed this issue:

    I also restarted the try2 deployment:

    Pod description:

    The registry ist still running:

    How can I check, if a docker image is still in the local registry?
    When I want to pull the simpleapp, i get the following error message:

    Thank you for your help.

  • Fafa0803
    Fafa0803 Posts: 11
    edited September 2020

    Hi,

    I checked the docker repository again and found several images with the name simpleapp
    One with a wrong port number.

    This command works also:

    Should I delete a some docker images and push a new image to the repository?

    Thank you very much for your helpf.

    Best regards,
    Fabian

  • chrispokorni
    chrispokorni Posts: 2,349
    edited September 2020

    Hi Fabian,

    You can remove any unwanted image from the registry, if needed.

    Revisit the steps on the master node to create the image and push it to the private registry. Then continue with the verification steps for the worker node, to ensure you can pull the new image from there as well.

    Moving forward please double-check your work, because such typos can cause serious issues with your cluster.

    Regards,
    -Chris

  • Hi @chrispokorni , @serewicz ,

    thank you very much for your help.
    I repeated the labs 3.1 and 3.2. and now it all works.
    I am not sure where the problem was, but nevertheless i learned a lot.

    Thank you very much for your help and patience.

    Best regards,
    Fabian

  • Glad it all works!

    Keep in mind that slight typos and omissions, introduced early on, may not immediately cause errors. They may reveal issues in later exercises, making it so much more difficult to troubleshoot and fix (if possible).

    Regards,
    -Chris

  • Hello,

    I have problems with the local Docker registry, again.

    When i wanted to continue with the Lab 3.3, on September 29, i had the same problems than before (as described above).
    My deployment files could not pull the image from the local docker registry.
    So i learned all about the probes from the official documentation website.

    But now, in Lab 5.1., we are using the simpleapp and the local docker registry again and I have the same problems.

    Can i find the same image on the official docker registry, so i can circumvent the local docker registry?
    I want to continue my labs :).

    Best regards,
    Fabian

  • I set the imagePullPolicy to Never. Now this works :).

  • chrispokorni
    chrispokorni Posts: 2,349

    @Fafa0803, that implies that you will manually pull the image from the private registry to the local cache of the ckad-2 node, whenever updating the container image in the registry.

Categories

Upcoming Training