Welcome to the Linux Foundation Forum!

----->>k8scp.sh install script issues <<<------

serewicz
serewicz Posts: 1,000
edited June 2021 in LFD259 Class Forum

Hello,

The install script stopped working this morning. From initial research there was an update to some support software, and the kubeadm init is failing with a strange node error, that has some interesting output. Working on it. May be a bug that will be fixed soon. Updates as possible.

Regards,

Comments

  • serewicz
    serewicz Posts: 1,000

    runc and crio package recently updated. Am investigating a work around. May include a script to run cluster with Docker until this issue is fixed upstream.

    These are the exact issues you will find in production. Which is why we chose to download code live, not use a sanitized and near worthless demo type environment. Better to learn here then be surprised and seem clueless when you're in production. The downside is sometimes things break. With such an active community usually the fix is done quickly as well.

    More updates as I find them.

  • ioef
    ioef Posts: 4
    edited June 2021

    Hello tutor!

    I was wondering if the issue we are facing is related to the "metacopy=on" setting:
    https://github.com/cri-o/cri-o/issues/4574

    From my side after I applied this change --> "sed -i 's/,metacopy=on//g' /etc/containers/storage.conf" , removing basically the metacopy=on setting from the storage.conf file, i have managed to successfully execute the "k8scp.sh" script in the master/cp node.

    Best regards,
    Efthimis

    PS: Info from the logs after the change

    etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [apiclient] All control plane components are healthy after 22.002670 seconds

    [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    [kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
    [upload-certs] Skipping phase. Please see --upload-certs
    [mark-control-plane] Marking the node cp as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
    [mark-control-plane] Marking the node cp as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

  • serewicz
    serewicz Posts: 1,000

    Thank you! I had been tracking down lots of other dead-ends, and adding Docker as an alternate engine in case the bug persists. Appreciate you letting me know. I'll test and add this to the course. Could be something fixed soon, but good troubleshooting in any case.

    Thanks again!

  • ioef
    ioef Posts: 4

    Hello Tutor!

    From my side after removing from the "/etc/containers/storage.conf" the "metacopy=on" option I have successfully executed the "k8scp.sh" script and installed the software in the master/cp node.

    The Workaround has been found here:
    https://github.com/cri-o/cri-o/issues/4574

    "sed -i 's/,metacopy=on//g' /etc/containers/storage.conf"

    Best regards,
    Efthimis

    PS: The Info from the terminal after applying it is as follows:

    [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
    [kubeconfig] Writing "admin.conf" kubeconfig file
    [kubeconfig] Writing "kubelet.conf" kubeconfig file
    [kubeconfig] Writing "controller-manager.conf" kubeconfig file
    [kubeconfig] Writing "scheduler.conf" kubeconfig file
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Starting the kubelet
    [control-plane] Using manifest folder "/etc/kubernetes/manifests"
    [control-plane] Creating static Pod manifest for "kube-apiserver"
    [control-plane] Creating static Pod manifest for "kube-controller-manager"
    [control-plane] Creating static Pod manifest for "kube-scheduler"
    [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [apiclient] All control plane components are healthy after 22.002670 seconds

    [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    [kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
    [upload-certs] Skipping phase. Please see --upload-certs
    [mark-control-plane] Marking the node cp as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
    [mark-control-plane] Marking the node cp as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

  • serewicz
    serewicz Posts: 1,000

    Hello again,

    I added the sed command to the script, and it worked!! I'll update the script and have the updated labs out soon.

    Thanks!

  • Hi,

    @serewicz said:
    Hello again,

    I added the sed command to the script, and it worked!! I'll update the script and have the updated labs out soon.

    Thanks!

    It seems that the change is still missing in the k8sSecond.sh script. While I can get the control plane node up successfully, the worker node is still giving me failed to mount overlay for metacopy check with \"nodev,metacopy=on\" options: invalid argument message. An indeed the storage.conf has not been updated on the worker node.

    I guess a workaround would be to recommend a newer kernel in course documentation instead of using the pretty old 4.15.x that is installed on the ubuntu image by default.

  • Hi,

    Same here. Two Ubuntu 18.04 VM's on a laptop. The scripts ran fine but because the metacopy fix is missing from the k8sSecond.sh script I could not get pods running on the worker node. After removing the metacopy=on from the mountopt line in /etc/containers/storage.conf I was able to get my worker node working.

    Found the issue by comparing the scripts. Should have checked the forum first :-)

  • This sounds like the same issue I reported back in October. More information that might be helpful here: https://forum.linuxfoundation.org/discussion/comment/32059

Categories

Upcoming Training