Introduction
Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed from the ground up with the benefit of Google’s deep experience and leadership in machine learning.
Cloud TPU runs your machine learning workloads on Google’s TPU accelerator hardware using TensorFlow. Cloud TPU is designed for maximum performance and flexibility to help researchers, developers, and businesses to build TensorFlow compute clusters that can leverage CPUs, GPUs, and TPUs. High-level Tensorflow APIs help you to get models running on the Cloud TPU hardware.
Setup
To set up the OpsRamp Google integration and discover the Google service,
go to Google Integration Discovery Profile and select Tpu
.
Metrics
OpsRamp Metric | Metric Display Name | Unit | Aggregation Type | Description |
---|---|---|---|---|
google_tpu_cpu_utilization | CPU utilization | Percent | Average | Utilization of CPUs on the TPU Worker as a percent. Sampled every 60 seconds. After sampling, data is not visible for up to 180 seconds. |
google_tpu_memory_usage | Memory usage | Bytes | Average | Memory usage in bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 180 seconds. |
google_tpu_network_received_bytes_count | Network bytes received | Bytes | Count | Cumulative bytes of data this server has received over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 180 seconds. |
google_tpu_network_sent_bytes_count | Network bytes sent | Bytes | Count | Cumulative bytes of data this server has sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 180 seconds. |
Event support
- Not Supported