LAB 13.3: Adding tools for monitoring and metrics - Metrics API not available after 10+ minutes
I'm hands on with lab 13.3, at step 6 I've done everything with no issues (and I've already check twice every step); but in step 7 after 15 minutes waiting for a different output from the command "kubectl top pod" or "kubectl top nodes", I'm still getting the same:
error: Metrics API not available
Can anybody help me telling me if there is something missing in instructions?
Thank you in advance.
Comments
-
Hi @juanalmaraz,
From your
metrics-serverdeployment, can you provide the code snippet representing the containerargsand theimage, similar to the snippet shown in Lab 13.3 step 5 of the lab guide? Typically typos in this section can cause issues with themetrics-server.Regards,
-Chris0 -
@chrispokorni I've got the same problem. This is the kubectl -n kube-system describe deployment metrics-server :
Containers: metrics-server: Image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7 Port: 4443/TCP Host Port: 0/TCP Args: --cert-dir=/tmp --secure-port=4443 --kubelet-insecure-tls --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname0 -
Hi @zmicier0k,
Since Kubernetes release v1.22 the metrics-server v0.3.x may no longer be compatible with latest releases. I would suggest installing the latest metrics-server release v0.6.x and at step 5 provide the following arguments when editing the metrics-server Deployment resource:
- --kubelet-insecure-tls - --kubelet-preferred-address-types=Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP
Regards,
-Chris0 -
Hi @chrispokorni I am facing the same issue, after reading the documentation I am not sure If I have to add an additional node, still not working for me even applying the new version.
0 -
Hi @lzambra,
Please ensure that the metrics-server (latest release) installation command from step 3 runs successfully and all necessary artifacts are created. The following step 4 should display the metrics-server pod in a running state. If the metrics-server pod is not listed, the previous step may have failed.
Once the pod is visible, only then proceed to step 5 and edit the metrics-server deployment, as described in the lab guide and my comment above.These steps should ensure the installation and proper configuration of your metrics-server deployment.
When installing, do you see any errors?
When listing pods, what is the state of the metrics-server pod?Regards,
-Chris0 -
I've added the configuration previous mentioned on this post, and still not working. This is the logs of one pod:
I0103 01:17:10.298497 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0103 01:17:11.417949 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0103 01:17:11.418056 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0103 01:17:11.418124 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0103 01:17:11.418199 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0103 01:17:11.418247 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0103 01:17:11.418317 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0103 01:17:11.418679 1 secure_serving.go:267] Serving securely on [::]:4443
I0103 01:17:11.418808 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0103 01:17:11.419381 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0103 01:17:11.419697 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0103 01:17:11.519010 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0103 01:17:11.519024 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0103 01:17:11.519052 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
E0103 01:17:24.912919 1 scraper.go:140] "Failed to scrape node" err="Get \"https://worker:10250/metrics/resource\": context deadline exceeded" node="worker"
E0103 01:17:24.912964 1 scraper.go:140] "Failed to scrape node" err="Get \"https://cp:10250/metrics/resource\": context deadline exceeded" node="cp"
I0103 01:17:29.229201 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0103 01:17:39.233714 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0103 01:17:39.913491 1 scraper.go:140] "Failed to scrape node" err="Get \"https://cp:10250/metrics/resource\": context deadline exceeded" node="cp"
E0103 01:17:39.913492 1 scraper.go:140] "Failed to scrape node" err="Get \"https://worker:10250/metrics/resource\": context deadline exceeded" node="worker"
I0103 01:17:49.230115 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"0 -
Hi @lzambra,
What is the output of
kubectl -n kube-system describe deployment metrics-server?Regards,
-Chris0 -
Name: metrics-server
Namespace: kube-system
CreationTimestamp: Wed, 03 Jan 2024 01:08:57 +0000
Labels: k8s-app=metrics-server
Annotations: deployment.kubernetes.io/revision: 3
Selector: k8s-app=metrics-server
Replicas: 1 desired | 1 updated | 2 total | 0 available | 2 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 0 max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=metrics-server
Service Account: metrics-server
Containers:
metrics-server:
Image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-insecure-tls
--kubelet-preferred-address-types=Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP
--kubelet-use-node-status-port
--metric-resolution=15s
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get http://:http/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment:
Mounts:
/tmp from tmp-dir (rw)
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: metrics-server-fbb469ccc (0/0 replicas created), metrics-server-67865f7db4 (1/1 replicas created)
NewReplicaSet: metrics-server-b58456f69 (1/1 replicas created)
Events:0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 4 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 984 Software
- 376 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)