Kubernetes Metrics

Introduction

Master agent deployment helps to collect k8s-apiserver, k8s-controller, k8s-scheduler, k8s-kube-state, k8s-metrics-server, k8s-coreDNS / kubeDNS metrics required to monitor Kubernetes.

Metrics for Docker

Metrics	Display Name	Description	Units
docker.containers.running_total	Docker Container Running Total	The total number of containers running on the host machine	-
docker.containers.stopped_total	Total Containers Stopped	The total number of containers stopped (not running) on the host machine	-
docker.container.states	Docker Container states	The state of Container	-
docker.containers.running	Containers Running by Image	The number of containers running on host plotted with image as instance	-
docker.containers.stopped	Containers Stopped by Image	The number of containers stopped on host plotted with image as instance	-
docker.image.size	Image Size	The amount of data (on disk) that is used for the writable layer of each container	megabytes
docker.image.virtual_size	Image Virtual Size	The total amount of disk-space used for the read-only image data used (shared) by each container and the writable layer of each container	megabytes
docker.images.available	Images Available	The number of top-level images	-
docker.images.intermediate	Images intermediate	The number of intermediate images, which are intermediate layers that make up other images	-
docker.container.size_rootfs	Root Filesystem Size	The total size of all the files in the container	megabytes
docker.container.size_rw	Total Files Size	The total size of the files (plotted as megabytes) that is changed or newly created if you compare the container to its base image. This indicates that just after the container creation, size should be zero and as you modify (or create) files, size will increase	megabytes
docker.cpu.usage	CPU Usage	The percentage of CPU time obtained by container with regard to to all CPUs	percent
docker.cpu.usage.overlimit	CPU Usage Over Limit	The percentage of CPU time obtained by container over its CPU limit set ( If limit is not set , this metric will not be monitored & even /graph will not be plotted )	percent
docker.cpu.usage.percpu	CPU Usage per CPU	The percentage of CPU time obtained by container with regard to to each CPU	percent
docker.cpu.shares	Shares of CPU	Shares of CPU usage allocated to the container	-
docker.cpu.system	CPU System	The percentage of time the CPU is executing system calls on behalf of processes of this container, unnormalized	percent
docker.cpu.throttled	CPU Throttled	Number of times the cgroup is throttled	-
docker.cpu.user	CPU User	The percentage of time the CPU is under direct control of processes of this container, unnormalized	percent
docker.mem.usage	Memory Usage	The percentage of used memory out of total node memory	percent
docker.mem.usage.overlimit	Memory Usage Over Limit	The percentage of used memory out of memory limit ( If limit is not set , this metric will not be monitored & even < metric value >/graph will not be plotted )	percent
docker.mem.in_use	Memory In Use	The fraction of used memory to available memory limit if the limit is set. Otherwise, it is against the node memory	-
docker.mem.limit	Memory Limit	The memory limit for the container, if set	megabytes
docker.io.read_bytes	IO Read Bytes	Bytes read per second from disk by the processes of the container	bytes/second
docker.io.write_bytes	IO Write Bytes	Bytes written per second to disk by the processes of the container	bytes/second
docker.mem.active_anon	Active RSS Memory	The amount of active RSS memory. Active memory is not swapped to disk	megabytes
docker.mem.active_file	Active Cache Memory	The amount of active cache memory. Active memory is reclaimed by the system only after inactive is reclaimed	megabytes
docker.mem.cache	Cache Size	The amount of memory that is being used to cache data from disk (For example, memory content that can be associated precisely with a block on a block device)	megabytes
docker.mem.inactive_anon	Inactive RSS Memory	The amount of inactive RSS memory. Inactive memory is swapped to disk when necessary	megabytes
docker.mem.inactive_file	Inactive Cache Memory	The amount of inactive cache memory. Inactive memory may be reclaimed first when the system needs memory	megabytes
docker.mem.mapped_file	Memory Mapped by Process	The amount of memory mapped by the processes in the control group	megabytes
docker.mem.pgfault	Memory Page Faults	The rate that processes in the container trigger page faults by accessing a non-existent or protected part of its virtual address space. Usually a page fault of this type results in a segmentation fault	per second
docker.mem.pgmajfault	Memory Page Faults Virtual	The rate that processes in the container trigger page faults by accessing a part virtual address space that was swapped out or corresponded to a mapped file. Usually, a page fault of type results in fetching the data from disk instead of memory	per second
docker.mem.pgpgin	Pages Charged Rate	The rate at which pages are charged (added to the accounting) of a cgroup	per second
docker.mem.pgpgout	Pages Uncharged Rate	The rate at which pages are uncharged (removed from the accounting) of a cgroup	per second
docker.mem.rss	RSS Memory	The amount of non-cache memory that belongs to the container's processes. For example, used for stacks and heaps	megabytes
docker.mem.soft_limit	Memory Reservation Limit	The memory reservation limit for the container, when set	megabytes
docker.mem.sw_in_use	Swap Memory In Use	The fraction of used swap + memory to available swap + memory if the limit is set	-
docker.mem.sw_limit	Swap Memory Limit	The swap + memory limit for the container, when set	megabytes
docker.container.interface.traffic.in	Network Rx Bytes per Sec	Network Rx Bytes per Second	bytes/second
docker.container.interface.traffic.out	Network Tx Bytes per Sec	Network Tx Bytes per Second	bytes/second
docker.container.interface.packets.in	Network Rx Packets per Sec	Network Rx Packets per Second	per second
docker.container.interface.packets.out	Network Tx Packets per Sec	Network Tx Packets per Second	per second
docker.container.interface.errors.in	Network Rx Errors per Sec	Network Rx Errors per Second	per second
docker.container.interface.errors.out	Network Tx Errors per Sec	Network Tx Errors per Second	per second
docker.container.interface.discards.in	Network Rx Drops per Sec	Network Rx Drops per Second	per second
docker.container.interface.discards.out	Network Tx Drops per Sec	Network Tx Drops per Second	per second

Metrics for ContainerD

Metrics	Display Name	Description	Units
containerd_hugetlb_failcnt	ContainerD HugeTLB fail Rate	Rate of allocation failure due to HugeTLB limit	-
containerd_hugetlb_max	ContainerD HugeTLB max usage	max hugepagesize hugetlb usage recorded	bytes
containerd_hugetlb_usage	ContianerD HugeTLB usage	Current usage for hugepagesize hugetlb	bytes
containerd_memory_usage	ContinaerD Memory Usage	Memory Usage	bytes
containerd_memory_usage_failcnt	ContainerD Memory Usage fail Rate	Rate of number of times the cgroup limit exceeded	-
containerd_memory_usage_limit	ContainerD Memory Usage Limit	limit of memory usage	bytes
containerd_memory_usage_max	ContainerD Memory Usage Max	show maximum memory usage recorded	bytes
containerd_memory_cache	ContainerD Memory Cache	bytes of page cache memory	bytes
containerd_memory_rss	ContainerD Memory RSS	bytes of anonymous and swap cache memory (includes transparent huge pages)	bytes
containerd_memory_rss_huge	ContainerD Memory RSS Huge	bytes of anonymous transparent huge pages	bytes
containerd_memory_dirty	ContainerD Memory Dirty	bytes that are waiting to get written back to the disk	bytes
containerd_memory_swap_usage	ContinaerD Swap Usage	swap Usage	bytes
containerd_memory_swap_failcnt	DisplContainerD Swap Usage fail Rate	Rate of number of times the cgroup swap limit exceeded	-
containerd_memory_swap_limit	ContainerD Swap Usage Limit	limit of swap usage	bytes
containerd_memory_swap_max	ContainerD Swap Usage Max	show maximum swap usage recorded	bytes
containerd_memory_kernel_usage	ContainerD Kernel Usage Name	current kernel memory allocation	bytes
containerd_memory_kernel_failcnt	ContainerD Kernel fail count	rate of the number of kernel memory usage hits limits	-
containerd_memory_kernel_limit	ContainerD Kernel Limit	hard limit for kernel memory	bytes
containerd_memory_kernel_max	ContainerD Kernel Max	max kernel memory usage recorded	bytes
containerd_memory_kernel_tcp_usage	ContainerD Kernel TCP Usage	current TCP buffer memory allocation	bytes
containerd_memory_kernel_tcp_failcnt	ContainerD Kernel TCP fail rate	rate of number of tcp buf memory usage hits limits	-
containerd_memory_kernel_tcp_limit	ContainerD Kernel TCP Limit	show hard limit for TCP buffer memory	bytes
containerd_memory_kernel_tcp_max	ContainerD Kernel TCP Max	maximum TCP buffer memory usage recorded	bytes
containerd_cpu_throttling_throttledTime	ContainerD CPU Throttled Time	CPU throttled time	percent
containerd_cpu_usage_system	ContainerD CPU System Usage	system CPU usage of container with repect to host system	percent
containerd_cpu_usage_total	ContainerD CPU Total Usage	total CPU usage of container with repect to host system	percent
containerd_cpu_usage_user	ContainerD CPU User Usage	user CPU usage of container with repect to host system	percent
containerd_blkio_service_bytes_recursive	ContainerD BlkIO Service Bytes	Number of bytes transferred to/from the disk	bytes
containerd_blkio_serviced_recursive	ContainerD BlkIO Serviced	Number of IOs (bio) issued to the disk by the group	bytes
containerd_blkio_queued_recursive	ContainerD BlkIO Queued	Total number of requests queued up at any given instant for the cgroup	bytes
containerd_blkio_service_time_recursive	ContainerD BlkIO Service Time	Total amount of time between request dispatch and request completion for the IOs	bytes
containerd_blkio_wait_time_recursive	ContainerD BlkIO Wait Time	Total amount of time the IOs for this cgroup spent waiting in the scheduler queues for service	bytes
containerd_blkio_merged_recursive	ContainerD BlkIO Merged	Total number of bios/requests merged into requests belonging to this cgroup	bytes
containerd_blkio_time_recursive	ContainerD BlkIO Time	disk time allocated to cgroup per device in milliseconds	bytes
containerd_blkio_sectors_recursive	ContainerD BlkIO Sectors	number of sectors transferred to/from disk by the group	bytes
containerd_proc_open_fds	ContainerD number of open fd	Number of open file descriptors	-
containerd_container_uptime	ContainerD Container Uptime	Uptime of the Current Container	second
containerd_containers_running	ContainerD Running Containers	Total number of running containers	-
containerd_containers_stopped	ContainerD Stopped Containers	Total number of Stopped Containers	-
containerd_image_size	ContainerD Image Size	Image sizes of different container images	bytes

Metrics for CRI-O

Metrics	Display Name	Description	Units
crio_operations	Operations Count	Cumulative number of CRI-O operations by operation type	-
crio_operations_latency_microseconds	Operations Latency Microseconds	Latency of CRI-O operations. Broken down by operation type	microseconds
crio_operations_latency_microseconds_sum	Operations Latency Microseconds Sum	Latency of CRI-O operations. Broken down by operation type. sum value	microseconds
crio_operations_latency_microseconds_count	Operations Latency Microseconds Count	Latency of CRI-O operations. Broken down by operation type. count value	microseconds
crio_operations_errors	Operations Errors	Cumulative number of CRI-O operation errors by operation type	-
crio_image_pulls_by_digest	Image Pulls by Digest	Bytes transferred by CRI-O image pulls by digest	-
crio_image_pulls_by_name	Image Pulls by Name	Bytes transferred by CRI-O image pulls by name	-
crio_image_pulls_by_name_skipped	Image Pulls by Name Skipped	Bytes skipped by CRI-O image pulls by name	-
crio_image_pulls_successes	Image Pulls Successes	Successful image pulls by image name	-
crio_image_pulls_failures	Image Pulls Failures	Failed image pulls by image name and their error category	-
crio_image_layer_reuse	Image Layer Reuse	Reused (not pulled) local image layer count by name	-
crio_cpu_time	CPU Time	Total user and system CPU time spent	seconds
crio_mem_resident	Mem Resident	Resident memory size	bytes
crio_mem_virtual	Mem Virtual	Virtual memory size	bytes
crio_process_open_fds	Process Open Fds	Number of open file descriptors	-
crio_cpu_usage_core	CPU Usage	Cumulative CPU usage (sum across all cores) since object creation	nanoseconds
crio_memory_working_set	Memory Working Set	Amount of working set memory	bytes
crio_filesystem_used	Filesystem Used	Represents the bytes used for images on the filesystem. (This may differ from the total bytes used on the filesystem and may not equal CapacityBytes - AvailableBytes)	bytes
crio_inodes_used	Inodes Used	Represents the inodes used by the images. (This may not equal InodesCapacity - InodesAvailable because the underlying filesystem may also be used for purposes other than storing images)	-

Metrics for Kubelet

Metrics	Display Name	Description	Units
kube_pods_running	Pods Running	The number of running pods	-
kube_containers_running	Containers Running	The number of running containers	-
kube_containers_restarts	Containers Restarts	The number of times the container is restarted	-
kube_cpu_load_10s_avg	Cpu Load 10S Avg	Container CPU load average over the last 10 seconds	-
kube_cpu_system_total	Cpu System Total	System CPU time consumed in seconds	per second
kube_cpu_user_total	Cpu User Total	User cpu time consumed in seconds	per second
kube_cpu_cfs_periods	Cpu Cfs Periods	Number of elapsed enforcement period intervals	per second
kube_cpu_cfs_throttled_periods	Cpu Cfs Throttled Periods	Number of throttled period intervals	per second
kube_cpu_cfs_throttled_seconds	Cpu Cfs Throttled Seconds	Total duration of the container being throttled	per second
kube_node_cpu_capacity	Node Cpu Capacity	CPU capacity of Node (Plotted in Millicores)	millicores
kube_node_memory_capacity	Node Memory Capacity	Memory capacity of node (Plotted in Megabytes)	megabytes
kube_node_cpu_usage_percentage	Node Cpu Usage Percentage	CPU usage percentage of node	percent
kube_node_memory_usage_percentage	Node Memory Usage Percentage	Memory usage percentage of node	percent
kube_node_cpu_allocatable	Node Cpu Allocatable	CPU allocatable of node	millicores
kube_node_memory_allocatable	Node Memory Allocatable	Memory allocatable of node	megabytes
kube_node_cpu_usage	Node Cpu Usage	CPU usage of node (Plotted in Millicores)	millicores
kube_node_memory_usage	Node Memory Usage	Memory usage of node (Plotted in Megabytes)	megabytes
kube_cpu_usage_total	Cpu Usage Total	CPU time consumed in seconds	per second
kube_cpu_limits	Cpu Limits	The limit of CPU cores set	millicores
kube_cpu_requests	Cpu Requests	The requested CPU cores	millicores
kube_filesystem_usage	Filesystem Usage	Number of megabytes that are consumed by the container on this filesystem	megabytes
kube_filesystem_usage_pct	Filesystem Usage Pct	Number of megabytes that can be consumed by the container on this filesystem	Fraction
kube_io_read_bytes	Io Read Bytes	The amount of bytes read from the disk	bytes/second
kube_io_write_bytes	Io Write Bytes	The amount of bytes written to the disk	bytes/second
kube_memory_limits	Memory Limits	Memory limit for the container	megabytes
kube_memory_sw_limit	Memory Sw Limit	Memory swap limit for the container	bytes
kube_memory_requests	Memory Requests	The requested memory	megabytes
kube_memory_usage	Memory Usage	Current memory usage in bytes including all memory regardless of when it was accessed	bytes
kube_memory_working_set	Memory Working Set	Current working set in megabytes, for which the OOM killer is watching for	megabytes
kube_memory_cache	Memory Cache	Number of bytes of page cache memory	bytes
kube_memory_rss	Memory Rss	Size of RSS in bytes	bytes
kube_memory_swap	Memory Swap	Container swap usage in bytes	bytes
kube_network_rx_bytes	Network Rx Bytes	The amount of bytes received per second	bytes/second
kube_network_rx_dropped	Network Rx Dropped	The amount of Rx packets dropped per second	packets/second
kube_network_rx_errors	Network Rx Errors	The amount of Rx errors per second	errors/second
kube_network_tx_bytes	Network Tx Bytes	The number of bytes transmitted per second	bytes/second
kube_network_tx_dropped	Network Tx Dropped	The amount of tx packets dropped per second	packets/second
kube_network_tx_errors	Network Tx Errors	The amount of tx errors per second	errors/second
kube_apiserver_certificate_expiration	Apiserver Certificate Expiration	Average distribution of the remaining lifetime on the certificate used to authenticate a request since last pool	seconds
kube_rest_client_requests	Rest Client Requests	The number of HTTP requests	operations/second
kube_rest_client_latency	Rest Client Latency	Average Request latency in seconds. Broken down by verb and URL since last pool	seconds
kube_kubelet_runtime_operations	Kubelet Runtime Operations	The number of runtime operations	operations/second
kube_kubelet_runtime_errors	Kubelet Runtime Errors	The number of runtime operations errors	operations/second
kube_kubelet_network_plugin_latency	Kubelet Network Plugin Latency	Average latency in seconds of network plugin operations. Broken down by operation type since the last pool	seconds
kube_kubelet_volume_stats_available_bytes	Kubelet Volume Stats Available Bytes	The number of available bytes in the volume	bytes
kube_kubelet_volume_stats_capacity_bytes	Kubelet Volume Stats Capacity Bytes	The capacity in bytes of the volume	bytes
kube_kubelet_volume_stats_used_bytes	Kubelet Volume Stats Used Bytes	The number of used bytes in the volume	bytes
kube_kubelet_volume_stats_inodes	Kubelet Volume Stats Inodes	The maximum number of inodes in the volume	Inode
kube_kubelet_volume_stats_inodes_free	Kubelet Volume Stats Inodes Free	The number of free inodes in the volume	Inode
kube_kubelet_volume_stats_inodes_used	Kubelet Volume Stats Inodes Used	The number of used inodes in the volume	Inode
kube_ephemeral_storage_usage	Ephemeral Storage Usage	Ephemeral storage usage of the POD	megabytes
kube_kubelet_evictions	Kubelet Evictions	The number of pods that have been evicted from the kubelet (ALPHA in kubernetes v1.16)	-
kube_kubelet_cpu_usage	Kubelet Cpu Usage	The number of cores used by kubelet	millicores
kube_kubelet_memory_rss	Kubelet Memory Rss	Size of kubelet RSS in megabytes	megabytes
kube_runtime_cpu_usage	Runtime Cpu Usage	The number of cores used by the runtime	millicores
kube_runtime_memory_rss	Runtime Memory Rss	Size of runtime RSS	megabytes
kube_kubelet_container_log_filesystem_used_bytes	Kubelet Container Log Filesystem Used Bytes	Bytes used by the container's logs on the filesystem (requires kubernetes 1.14+)	bytes

Metrics for Kube State

Metrics	Display Name	Description	Units
kubernetes_state.container.cpu_limit	Container Cpu Limit	The limit on CPU cores to be used by a container	cpu
kubernetes_state.container.cpu_requested	Container Cpu Requested	The number of requested CPU cores by a container	cpu
kubernetes_state.container.memory_limit	Container Memory Limit	The limit on memory to be used by a container	bytes
kubernetes_state.container.memory_requested	Container Memory Requested	The number of requested memory bytes by a container	bytes
kubernetes_state.container.ready	Container Ready	Describes whether the containers readiness check succeeded	-
kubernetes_state.container.ready.total	Total Containers Ready	Total containers whose readiness check succeeded	-
kubernetes_state.container.restarts	Container Restarts	The number of restarts per container	-
kubernetes_state.container.restarts.total	Total Containers Restarts Count	Total containers restarts count	-
kubernetes_state.container.running	Container Running	Describes whether the container is currently in running state	-
kubernetes_state.container.running.total	Total Containers Running	Total containers currently in running state	-
kubernetes_state.container.terminated	Container Terminated	Describes whether the container is currently in terminated state	-
kubernetes_state.container.terminated.total	Total Containers Terminated	Total containers currently in terminated state	-
kubernetes_state.container.waiting	Container Waiting	Whether the container is currently in waiting state	-
kubernetes_state.container.waiting.total	Total Containers Waiting	Total containers currently in waiting state	-
kubernetes_state.daemonset.desired	Daemonset Desired	The number of nodes that should be running the daemon pod	-
kubernetes_state.daemonset.misscheduled	Daemonset Misscheduled	The number of nodes running a daemon pod but are not expected to	-
kubernetes_state.daemonset.ready	Daemonset Ready	The number of nodes that should be running the daemon pod and have one or more of the daemon pods running and ready	-
kubernetes_state.daemonset.scheduled	Daemonset Scheduled	The number of nodes running at least one daemon pod as expected	-
kubernetes_state.deployment.paused	Deployment Paused	The deployment is paused and will not be processed by the deployment controller	-
kubernetes_state.deployment.replicas	Deployment Replicas	The number of replicas per deployment	-
kubernetes_state.deployment.replicas_available	Deployment Replicas Available	The number of available replicas per deployment	-
kubernetes_state.deployment.replicas_desired	Deployment Replicas Desired	The number of desired replicas per deployment	-
kubernetes_state.deployment.replicas_unavailable	Deployment Replicas Unavailable	The number of unavailable replicas per deployment	-
kubernetes_state.deployment.replicas_updated	Deployment Replicas Updated	The number of updated replicas per deployment	-
kubernetes_state.deployment.rollingupdate.max_unavailable	Deployment Rollingupdate Max Unavailable	Maximum number of unavailable replicas during a rolling update of a deployment	-
kubernetes_state.node.cpu_allocatable	Node Cpu Allocatable	The CPU resources of a node that are available for scheduling	-
kubernetes_state.node.cpu_capacity	Node Cpu Capacity	The total CPU resources of the node	cpu
kubernetes_state.node.memory_allocatable	Node Memory Allocatable	The memory resources of a node that are available for scheduling	bytes
kubernetes_state.node.memory_capacity	Node Memory Capacity	The total memory resources of the node	bytes
kubernetes_state.node.pods_allocatable	Node Pods Allocatable	The pod resources of a node that are available for scheduling	-
kubernetes_state.node.pods_capacity	Node Pods Capacity	The total pod resources of the node	-
kubernetes_state.node.status	Node Status	The condition of a cluster node plotted with node as an instance. This metric gives status of each node with values either 0 or 1.	-
kubernetes_state.pod.ready	Pod Ready	Describes whether the pod is ready to serve requests. In association with the condition tag, whether the pod is ready to serve requests. For example, condition:true keeps the pods that are in a ready state	-
kubernetes_state.pod.scheduled	Pod Scheduled	Describes the status of the scheduling process for the pod	-
kubernetes_state.replicaset.fully_labeled_replicas	Replicaset Fully Labeled Replicas	The number of fully labeled replicas per ReplicaSet	-
kubernetes_state.replicaset.replicas	Replicaset Replicas	The number of replicas per ReplicaSet	-
kubernetes_state.replicaset.replicas_desired	Replicaset Replicas Desired	Number of desired pods for a ReplicaSet	-
kubernetes_state.replicaset.replicas_ready	Replicaset Replicas Ready	The number of ready replicas per ReplicaSet	-
kubernetes_state.resourcequota.limits.cpu.limit	Resourcequota Limits Cpu Limit	Hard limit on the sum of CPU core limits for a resource quota	cpu
kubernetes_state.resourcequota.limits.cpu.used	Resourcequota Limits Cpu Used	Observed sum of limits for CPU cores for a resource quota	cpu
kubernetes_state.resourcequota.limits.memory.limit	Resourcequota Limits Memory Limit	Hard limit on the sum of memory bytes limits for a resource quota	bytes
kubernetes_state.resourcequota.limits.memory.used	Resourcequota Limits Memory Used	Observed sum of limits for memory bytes for a resource quota	bytes
kubernetes_state.resourcequota.persistentvolumeclaims.limit	Resourcequota Persistentvolumeclaims Limit	Hard limit of the number of PVC for a resource quota	-
kubernetes_state.resourcequota.persistentvolumeclaims.used	Resourcequota Persistentvolumeclaims Used	Observed number of persistent volume claims used for a resource quota	-
kubernetes_state.resourcequota.pods.limit	Resourcequota Pods Limit	Hard limit of the number of pods for a resource quota	-
kubernetes_state.resourcequota.pods.used	Resourcequota Pods Used	Observed number of pods used for a resource quota	-
kubernetes_state.resourcequota.requests.cpu.limit	Resourcequota Requests Cpu Limit	Hard limit on the total of CPU core requested for a resource quota	cpu
kubernetes_state.resourcequota.requests.cpu.used	Resourcequota Requests Cpu Used	Observed sum of CPU cores requested for a resource quota	cpu
kubernetes_state.resourcequota.requests.memory.limit	Resourcequota Requests Memory Limit	Hard limit on the total of memory bytes requested for a resource quota	bytes
kubernetes_state.resourcequota.requests.memory.used	Resourcequota Requests Memory Used	Observed sum of memory bytes requested for a resource quota	bytes
kubernetes_state.resourcequota.requests.storage.limit	Resourcequota Requests Storage Limit	Hard limit on the total of storage bytes requested for a resource quota	bytes
kubernetes_state.resourcequota.requests.storage.used	Resourcequota Requests Storage Used	Observed sum of storage bytes requested for a resource quota	bytes
kubernetes_state.resourcequota.services.limit	Resourcequota Services Limit	Hard limit of the number of services for a resource quota	-
kubernetes_state.resourcequota.services.loadbalancers.limit	Resourcequota Services Loadbalancers Limit	Hard limit of the number of load balancers for a resource quota	-
kubernetes_state.resourcequota.services.loadbalancers.used	Resourcequota Services Loadbalancers Used	Observed number of load balancers used for a resource quota	-
kubernetes_state.resourcequota.services.nodeports.limit	Resourcequota Services Nodeports Limit	Hard limit of the number of node ports for a resource quota	-
kubernetes_state.resourcequota.services.nodeports.used	Resourcequota Services Nodeports Used	Observed number of node ports used for a resource quota	-
kubernetes_state.resourcequota.services.used	Resourcequota Services Used	Observed number of services used for a resource quota	-

Metrics for CoreDNS

Metrics	Display Name	Description	Units
coredns.panics	Total Panics	Total number of panics	-
coredns.query.count	Query count	Total query count	-
coredns.request_duration.seconds.sum	Request Duration Seconds Sum	Duration to process each query	-
coredns.request_duration.seconds.count	Request Duration Seconds Count	Duration per upstream interaction	-
coredns.response_size.bytes.sum	Response Size Bytes Sum	Size of the returns response	bytes

Metrics for KubeDNS

Metrics	Display Name	Description	Units
kubedns.cachemiss_count	Cachemiss Count	Number of DNS cache misses (from start of process)	-
kubedns.error_count	Error Count	Number of DNS requests resulting in an error	-
kubedns.request_count	Request Count	Total number of DNS requests made	-
kubedns.request_duration.seconds.count	Request Duration Seconds Count	Number of requests on which the kubedns.request_duration.seconds.sum metric is evaluated	-
kubedns.request_duration.seconds.sum	Request Duration Seconds Sum	Time (in seconds) taken to resolve each request	-
kubedns.response_size.bytes.count	Response Size Bytes Count	Number of responses on which the kubedns.response_size.bytes.sum metric is evaluated	-
kubedns.response_size.bytes.sum	Response Size Bytes Sum	Size of the returns response in bytes	bytes

Metrics for Kube Controller

Metrics	Display Name	Description	Units
controller.workqueue.work_duration.sum	Kube Controller Workqueue Work Duration Seconds Sum	Duration taken in seconds to process an item from workqueue	seconds
controller.workqueue.work_duration.count	Kube Controller Workqueue Work Duration Seconds Count	Total time taken in seconds to process an item from workqueue	seconds
controller.workqueue.work_unfinished_duration	Kube Controller Workqueue Unfinished Work Seconds	Time in seconds taken for the work in progress and has not been observed by work_duration. Large values indicate stuck threads	seconds
controller.workqueue.work_longest_duration	Kube Controller Workqueue Longest Running Processor Seconds	Time in seconds for which the longest running processor for workqueue is running	-
controller.workqueue.queue_duration.sum	Kube Controller Workqueue Queue Duration Seconds Sum	Duration in seconds for whichan item remains in workqueue before being requested	-
controller.workqueue.queue_duration.count	Kube Controller Workqueue Queue Duration Seconds Count	Total duration in seconds for which an item remains in workqueue before being requested	-
controller.workqueue.nodes.count	Kube Controller Registered Nodes	Number of registered Nodes per zone	-
controller.workqueue.nodes.unhealthy	Kube Controller Node Collector Unhealthy Nodes in Zone	Number of Nodes not ready per zone	-
controller.workqueue.nodes.evictions	Kube Controller Node Collector Evictions Number	Number of Node evictions that happened since current instance of NodeController started	-
controller.workqueue.depth	Kube Controller Workqueue Depth	Current depth of workqueue	-
controller.workqueue.adds	Kube Controller Workqueue Adds Total	Total number of additions/insertions handled by workqueue	-
controller.workqueue.retries	Kube Controller Workqueue Retries Total	Total number of retries handled by workqueue	-
controller.rate_limiter.use	Kube Controller Node Lifecycle Controller Rate Limiter Use	A metric measuring the saturation of the rate limiter for node_lifecycle_controller	-
controller.go.goroutines	Kube Controller Go Goroutines	Number of goroutines that currently exist	-
controller.threads	Kube Controller Os Threads	Number of OS threads created	-
controller.process.max_fds	Kube Controller Process Max Fds	Maximum number of open file descriptors	-
controller.process.open_fds	Kube Controller Process Open Fds	Number of open file descriptors	-

Metrics for Kube Scheduler

Metrics	Display Name	Description	Units
scheduler.binding.duration.count	Kube Scheduler Binding Duration Seconds Count	Total Binding duration in seconds	seconds
scheduler.binding.duration.seconds	Kube Scheduler Binding Duration Seconds Sum	Binding duration	seconds
scheduler.binding.latency.count	Kube Scheduler Binding Latency Microseconds Count	Total Binding latency	microseconds
scheduler.binding.latency.sum	Kube Scheduler Binding Latency Microseconds	Binding latency sum	microseconds
scheduler.cache.lookups	Kube Scheduler Equiv Cache Lookups Total	Total number of equivalent cache lookups, by whether a cache entry was found	-
scheduler.client.http.requests	Kube Scheduler Rest Client Requests Total	Number of HTTP requests, partitioned by status code, method, and host	-
scheduler.client.http.requests_duration.count	Kube Scheduler Rest Client Request Latency Seconds Count	Total request latency. Broken down by verb and URL	seconds
scheduler.client.http.requests_duration.sum	Kube Scheduler Rest Client Request Latency Seconds Sum	Request latency. Broken down by verb and URL	seconds
scheduler.gc_duration_seconds.count	Kube Scheduler Go GC Duration Seconds Count	A summary of the GC invocation durations	-
scheduler.gc_duration_seconds.quantile	Kube Scheduler Go GC Duration Seconds	A summary of the GC invocation durations	-
scheduler.gc_duration_seconds.sum	Kube Scheduler Go GC Duration Seconds Sum	A summary of the GC invocation durations	-
scheduler.go.goroutines	Kube Scheduler Go Goroutines	Number of goroutines that currently exist	-
scheduler.process.max_fds	Kube Scheduler Process Max Fds	Maximum number of open file descriptors	-
scheduler.process.open_fds	Kube Scheduler Process Open Fds	Number of open file descriptors	-
scheduler.pod_preemption.victims	Kube Scheduler Pod Preemption Victims	Number of selected preemption victims	-
scheduler.pod_preemption.attempts	Kube Scheduler Total Preemption Attempts	Total preemption attempts in the cluster till now	-
scheduler.schedule_attempts.total	Kube Scheduler Schedule Attempts Total	Number of attempts to schedule pods, by the result. unschedulable means a pod could not be scheduled, and error means an internal scheduler problem	-
scheduler.scheduling.algorithm_duration.count	Kube Scheduler Scheduling Algorithm Duration Seconds Count	Total Scheduling algorithm latency	seconds
scheduler.scheduling.algorithm_duration.sum	Kube Scheduler Scheduling Algorithm Duration Seconds Sum	Scheduling algorithm latency	seconds
scheduler.scheduling.algorithm_latency.count	Kube Scheduler Scheduling Algorithm Latency Microseconds Count	Total Scheduling algorithm latency	microseconds
scheduler.scheduling.algorithm_latency.sum	Kube Scheduler Scheduling Algorithm Latency Microseconds Sum	Scheduling algorithm latency	microseconds
scheduler.scheduling.algorithm.predicate_duration.count	Kube Scheduler Scheduling Algorithm Predicate Evaluation Count	Scheduling algorithm predicate evaluation duration	-
scheduler.scheduling.algorithm.predicate_duration.sum	Kube Scheduler Scheduling Algorithm Predicate Evaluation Sum	Scheduling algorithm predicate evaluation duration	-
scheduler.scheduling.algorithm.preemption_duration.count	Kube Scheduler Scheduling Algorithm Preemption Evaluation Count	Scheduling algorithm preemption evaluation duration	-
scheduler.scheduling.algorithm.preemption_duration.sum	Kube Scheduler Scheduling Algorithm Preemption Evaluation Sum	Scheduling algorithm preemption evaluation duration	-
scheduler.scheduling.algorithm.priority_duration.count	Kube Scheduler Scheduling Algorithm Priority Evaluation Count	Scheduling algorithm priority evaluation duration	-
scheduler.scheduling.algorithm.priority_duration.sum	Kube Scheduler Scheduling Algorithm Priority Evaluation Sum	Scheduling algorithm priority evaluation duration	-
scheduler.e2e.scheduling_duration.count	Kube Scheduler E2E Scheduling Duration Seconds Count	Total E2e scheduling latency (scheduling algorithm + binding)	seconds
scheduler.e2e.scheduling_duration.sum	Kube Scheduler E2E Scheduling Duration Seconds Sum	E2e scheduling latency (scheduling algorithm + binding)	seconds
scheduler.e2e.scheduling_latency.count	Kube Scheduler E2E Scheduling Latency Microseconds Count	Total E2e scheduling latency (scheduling algorithm + binding)	microseconds
scheduler.e2e.scheduling_latency.sum	Kube Scheduler E2E Scheduling Latency Microseconds Sum	E2e scheduling latency (scheduling algorithm + binding)	microseconds
scheduler.scheduling.scheduling_duration.count	Kube Scheduler Scheduling Duration Seconds Count	Scheduling latency split by sub-parts of the scheduling operation	seconds
scheduler.scheduling.scheduling_duration.quantile	Kube Scheduler Scheduling Duration Seconds	Scheduling latency split by sub-parts of the scheduling operation	seconds
scheduler.scheduling.scheduling_duration.sum	Kube Scheduler Scheduling Duration Seconds Sum	Scheduling latency split by sub-parts of the scheduling operation	seconds
scheduler.scheduling.scheduling_latency.count	Kube Scheduler Scheduling Latency Seconds Count	Scheduling latency split by sub-parts of the scheduling operation	seconds
scheduler.scheduling.scheduling_latency.quantile	Kube Scheduler Scheduling Latency Seconds	Scheduling latency split by sub-parts of the scheduling operation	seconds
scheduler.scheduling.scheduling_latency.sum	Kube Scheduler Scheduling Latency Seconds Sum	Scheduling latency split by sub-parts of the scheduling operation	seconds
scheduler.threads	Kube Scheduler OS Threads	Number of OS threads created	-
scheduler.volume_scheduling_duration.sum	scheduler.volume_scheduling_duration.sum Kube Scheduler Volume Scheduling Duration Seconds Sum	Volume scheduling stage latency sum	-
scheduler.volume_scheduling_duration.count	Kube Scheduler Volume Scheduling Duration Seconds Count	Volume scheduling stage latency count	-

Metrics for Server

Metrics	Display Name	Description	Units
metrics_server.go_gc_duration_seconds_sum	Go GC Duration Seconds Sum	A summary of the GC invocation durations	seconds
metrics_server.authenticated_user_requests	Authenticated User Requests	Counter of authenticated requests broken out by username	-
metrics_server.go_goroutines	Go Goroutines	Number of goroutines that currently exist	-
metrics_server.manager_tick_duration_sum	Manager Tick Duration Sum	The total time spent collecting and storing metrics	seconds
metrics_server.scraper_duration_count	Scraper Duration Count	Time spent scraping sources	seconds
metrics_server.scraper_duration_sum	Scraper Duration Sum	Time spent scraping sources	seconds
metrics_server.scraper_last_time	Scraper Last Time	Last time metrics-server performed a scrape since unix epoch	seconds
metrics_server.go_gc_duration_seconds_quantile	Go GC Duration Seconds Quantile	A summary of the GC invocation durations	seconds
metrics_server.kubelet_summary_request_duration_sum	Kubelet Summary Request Duration Sum	The Kubelet summary request latencies	seconds
metrics_server.kubelet_summary_scrapes_total	Kubelet Summary Scrapes Total	Total number of attempted Summary API scrapes done by Metrics Server	-
metrics_server.manager_tick_duration_count	Manager Tick Duration Count	The total time spent collecting and storing metrics	seconds
metrics_server.process_max_fds	Process Max Fds	Maximum number of open file descriptors	-
metrics_server.process_open_fds	Process Open Fds	Number of open file descriptors	-
metrics_server.go_gc_duration_seconds_count	Go GC Duration Seconds Count	A summary of the GC invocation durations	-
metrics_server.kubelet_summary_request_duration_count	Kubelet Summary Request Duration Count	The Kubelet summary request latencies	seconds
metrics_server.process_cpu_seconds_total	Process Cpu Seconds Total	Total user and system CPU time spent	seconds

Metrics for Kube API server

Metrics	Display Name	Description	Units
apiserver.go.threads.total	Kube apiserver Go Threads Total	Number of OS threads created	-
apiserver.authenticated.user.requests	Kube apiserver Authenticated User Requests	Counter of authenticated requests broken out by username	-
apiserver.http.requests.total.count	Kube apiserver HTTP Requests Total Count	Total number of HTTP requests made	-
apiserver.authenticated.user.requests.count	Kube apiserver Authenticated User Requests Count	Counter of authenticated requests broken out by username	-
apiserver.dropped.requests.total	Kube apiserver Dropped Requests Total	Accumulated number of requests dropped with Try-again-later response	-
apiserver.http.requests.total	Kube apiserver HTTP Requests Total	Total number of HTTP requests made	-
apiserver.audit.event.total	Kube apiserver Audit Event Total	Counter of audit events generated and sent to the audit back end	-
apiserver.rest.client.requests.total	Kube apiserver Rest Client Requests Total	Number of HTTP requests, partitioned by status code, method, and host	-
apiserver.request.count	Kube apiserver Request Count	Counter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code	-
apiserver.request.count.count	Kube apiserver Request Count Count	Counter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code	-
apiserver.dropped.requests.total.count	Kube apiserver Dropped Requests Total Count	Monotonic count of requests dropped with Try-again-later response	-
apiserver.inflight.requests	Kube apiserver Inflight Requests	Maximal number of currently used inflight request limit of this API server per request kind in the last second	-
apiserver.go.goroutines	Kube apiserver Goroutines	Number of goroutines that currently exist	-
apiserver.APIServiceRegistrationController.depth	Kube apiserver APIService Registration Controller Depth	Current depth of workqueue: APIServiceRegistrationController	-
apiserver.etcd.object.counts	Kube apiserver ETCD Object Counts	Number of stored objects at the time of last check split by kind	-
apiserver.rest.client.requests.total.count	Kube apiserver Rest Client Requests Total Count	Number of HTTP requests, partitioned by status code, method, and host	-