Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Prerequisites
- Configure the following endpoints to collect the respective metrics:
stats-metrics : http://<ip_addr>:<port>/json/
app-url : http://<ip_addr>:<port>/app/?appId=<app-id>
job-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/jobs
stage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/stages
storage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/storage/rdd
executor-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/executors
streaming-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/streaming/statistics
- For Virtual Machines, install the Linux Agent.
Configuring the credentials
Configure the credentials in the directory /opt/opsramp/agent/conf/app.d/creds.yaml
spark:
- name: spark
user: <username>
pwd: <Password>
encoding-type: plain
labels:
key1: val1
key2: val2
Configuring the application
Virtual machine
Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-detection.yaml
- name: spark
instance-checks:
process-check:
- spark
port-check:
- 8080
Docker environment
Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-container-detection.yaml
- name: spark
container-checks:
image-check:
- spark
port-check:
- 8080
Kubernetes environment
Configure the application in config.yaml
- name: spark
container-checks:
image-check:
- spark
port-check:
- 8080
Note
The specified port is used to fetch all the URLs for each app.Validate
Go to Resources under the Infrastructure tab to check if your resources are onboarded and the metrics are collected.
Metrics
OpsRamp Metric | Metric Display Name | Unit | Description |
---|---|---|---|
spark_workers | Workers | Number of workers connected to the master | |
spark_cores | Cores | Number of CPUs available for all workers | |
spark_cores_used | Cores Used | Number of CPUs used for all applications | |
spark_applications_active | Applications Active | Number of applications waiting or running | |
spark_applications_completed | Applications Completed | Number of application completed | |
spark_drivers_active | Drivers Active | Number of drivers available | |
spark_status | Status | Available status of spark master. For example, alive | |
spark_memory | Memory | megabytes | Calculates the total memory available on Spark Master |
spark_memory_used | Memory Used | megabytes | Calculates the memory used by the applications on Spark Master |
spark_job_count | Jobs | Number of jobs | |
spark_job_num_tasks | Tasks | Number of tasks in the application (different instances are shown using AppID_jobID) | |
spark_job_num_active_tasks | Active Tasks | Number of active tasks in the application (different instances are shown using AppID_jobID) | |
spark_job_num_skipped_tasks | Skipped Tasks | Number of skipped tasks in the application (different instances are shown using AppID_jobID) | |
spark_job_num_failed_tasks | Failed Tasks | Number of failed tasks in the application (different instances are shown using AppID_jobID) | |
spark_job_num_completed_tasks | Completed Tasks | Number of completed tasks in the application (different instances are shown using AppID_jobID) | |
spark_job_num_active_stages | Active Stages | Number of active stages in the application (different instances are shown using AppID_jobID) | |
spark_job_num_completed_stages | Completed Stages | Number of completed stages in the application (different instances are shown using AppID_jobID) | |
spark_job_num_skipped_stages | Skipped Stages | Number of skipped stages in the application (different instances are shown using AppID_jobID) | |
spark_job_num_failed_stages | Failed Stages | Number of failed stages in the application (different instances are shown using AppID_jobID) | |
spark_stage_count | Stage Count | Number of stages (different instances are shown using AppID_stageID) | |
spark_stage_num_active_tasks | Stage Num Active Tasks | Number of active tasks in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_num_complete_tasks | Stage Num Complete Tasks | Number of complete tasks in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_num_failed_tasks | Stage Num Failed Tasks | Number of failed tasks in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_executor_run_time | Stage Executor Run Time | Time spent by the executor in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_input_bytes | Stage Input Bytes | bytes | Input bytes in the application's stages (different instances are shown using AppID_stageID) |
spark_stage_input_records | Stage Input Records | Input records in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_output_bytes | Stage Output Bytes | bytes | Output bytes in the application's stages (different instances are shown using AppID_stageID) |
spark_stage_output_records | Stage Output Records | Output records in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_shuffle_read_bytes | Stage Shuffle Read Bytes | bytes | Number of bytes read during a shuffle in the application's stages (different instances are shown using AppID_stageID) |
spark_stage_shuffle_read_records | Stage Shuffle Read Records | Number of records read during a shuffle in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_shuffle_write_bytes | Stage Shuffle Write Bytes | bytes | Number of shuffled bytes in the application's stages (different instances are shown using AppID_stageID) |
spark_stage_shuffle_write_records | Stage Shuffle Write Records | Number of shuffled records in the application's stages (different instances are shown using AppID_stageID) | |
spark_stage_memory_bytes_spilled | Stage Memory Bytes Spilled | bytes | Number of bytes spilled to disk in the application's stages (different instances are shown using AppID_stageID) |
spark_stage_disk_bytes_spilled | Stage Disk Bytes Spilled | bytes | Maximum size on disk of the spilled bytes in the application's stages (different instances are shown using AppID_stageID) |
spark_driver_rdd_blocks | Driver Rdd Blocks | Number of RDD blocks in the driver | |
spark_driver_memory_used | Driver Memory Used | Amount of memory used in the driver | |
spark_driver_disk_used | Driver Disk Used | Amount of disk used in the driver | |
spark_driver_active_tasks | Driver Active Tasks | Number of active tasks in the driver | |
spark_driver_failed_tasks | Driver Failed Tasks | Number of failed tasks in the driver | |
spark_driver_completed_tasks | Driver Completed Tasks | Number of completed tasks in the driver | |
spark_driver_total_tasks | Driver Total Tasks | Number of total tasks in the driver | |
spark_driver_total_duration | Driver Total Duration | Time spent in the driver | |
spark_driver_total_input_bytes | Driver Total Input Bytes | bytes | Number of input bytes in the driver |
spark_driver_total_shuffle_read | Driver Total Shuffle Read | Number of bytes read during a shuffle in the driver | |
spark_driver_total_shuffle_write | Driver Total Shuffle Write | Number of shuffled bytes in the driver | |
spark_driver_max_memory | Driver Max Memory | Maximum memory used in the driver | |
spark_executor_count | Executor Count | Number of executors | |
spark_executor_rdd_blocks | Executor Rdd Blocks | Number of persisted RDD blocks in the application's executors | |
spark_executor_memory_used | Executor Memory Used | Amount of memory used for cached RDDs in the application's executors | |
spark_executor_max_memory | Executor Max Memory | Maximum memory across all executors working for a particular application | |
spark_executor_disk_used | Executor Disk Used | Amount of disk space used by persisted RDDs in the application's executors | |
spark_executor_active_tasks | Executor Active Tasks | Number of active tasks in the application's executors | |
spark_executor_failed_tasks | Executor Failed Tasks | Number of failed tasks in the application's executors | |
spark_executor_completed_tasks | Executor Completed Tasks | Number of completed tasks in the application's executors | |
spark_executor_total_tasks | Executor Total Tasks | Total number of tasks in the application's executors | |
spark_executor_total_duration | Executor Total Duration | Time spent by the application's executors executing tasks | |
spark_executor_total_input_bytes | Executor Total Input Bytes | bytes | Total number of input bytes in the application's executors |
spark_executor_total_shuffle_read | Executor Total Shuffle Read | Total number of bytes read during a shuffle in the application's executors | |
spark_executor_total_shuffle_write | Executor Total Shuffle Write | Total number of shuffled bytes in the application's executors | |
spark_rdd_count | Rdd Count | Number of RDDs | |
spark_rdd_num_partitions | Rdd Num Partitions | Number of persisted RDD partitions in the application | |
spark_rdd_num_cached_partitions | Rdd Num Cached Partitions | Number of in-memory cached RDD partitions in the application | |
spark_rdd_memory_used | Rdd Memory Used | Amount of memory used in the application's persisted RDDs | |
spark_rdd_disk_used | Rdd Disk Used | Amount of disk space used by persisted RDDs in the application | |
spark_streaming_statistics_avg_input_rate | Streaming Avg Input Rate | Average streaming input data rate | |
spark_streaming_statistics_avg_processing_time | Streaming Avg Processing Time | Average application's streaming batch processing time | |
spark_streaming_statistics_avg_scheduling_delay | Streaming Avg Scheduling Delay | Average application's streaming batch scheduling delay | |
spark_streaming_statistics_avg_total_delay | Streaming Avg Total Delay | Average application's streaming batch total delay | |
spark_streaming_statistics_batch_duration | Streaming Batch Duration | Application's streaming batch duration | |
spark_streaming_statistics_num_active_batches | Streaming Num Active Batches | Number of active streaming batches | |
spark_streaming_statistics_num_active_receivers | Streaming Num Active Receivers | Number of active streaming receivers | |
spark_streaming_statistics_num_inactive_receivers | Streaming Num Inactive Receivers | Number of inactive streaming receivers | |
spark_streaming_statistics_num_processed_records | Streaming Num Processed Records | Number of processed streaming records | |
spark_streaming_statistics_num_received_records | Streaming Num Received Records | Number of received streaming records | |
spark_streaming_statistics_num_receivers | Streaming Num Receivers | Number of streaming application's receivers | |
spark_streaming_statistics_num_retained_completed_batches | Streaming Num Retained Completed Batches | Number of retained completed application's streaming batches | |
spark_streaming_statistics_num_total_completed_batches | Streaming Num Total Completed Batches | Total number of completed application's streaming batches |