Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Prerequisites

  1. Configure the following endpoints to collect the respective metrics:
stats-metrics : http://<ip_addr>:<port>/json/

app-url : http://<ip_addr>:<port>/app/?appId=<app-id>

job-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/jobs

stage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/stages

storage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/storage/rdd

executor-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/executors

streaming-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/streaming/statistics
  1. For Virtual Machines, install the Linux Agent.

Configuring the credentials

Configure the credentials in the directory /opt/opsramp/agent/conf/app.d/creds.yaml

spark:
- name: spark
  user: <username>
  pwd: <Password>
  encoding-type: plain
  labels:
    key1: val1
    key2: val2

Configuring the application

Virtual machine

Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-detection.yaml

- name: spark
  instance-checks:
    process-check:
      - spark
    port-check:
      - 8080

Docker environment

Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-container-detection.yaml

- name: spark
  container-checks:
    image-check:
      - spark
    port-check:
      - 8080

Kubernetes environment

Configure the application in config.yaml

- name: spark
  container-checks:
    image-check:
      - spark
    port-check:
      - 8080

Validate

Go to Resources under the Infrastructure tab to check if your resources are onboarded and the metrics are collected.

Metrics

OpsRamp MetricMetric Display NameUnitDescription
spark_workersWorkersNumber of workers connected to the master
spark_coresCoresNumber of CPUs available for all workers
spark_cores_usedCores UsedNumber of CPUs used for all applications
spark_applications_activeApplications ActiveNumber of applications waiting or running
spark_applications_completedApplications CompletedNumber of application completed
spark_drivers_activeDrivers ActiveNumber of drivers available
spark_statusStatusAvailable status of spark master. For example, alive
spark_memoryMemorymegabytesCalculates the total memory available on Spark Master
spark_memory_usedMemory UsedmegabytesCalculates the memory used by the applications on Spark Master
spark_job_countJobsNumber of jobs
spark_job_num_tasksTasksNumber of tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_active_tasksActive TasksNumber of active tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_skipped_tasksSkipped TasksNumber of skipped tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_failed_tasksFailed TasksNumber of failed tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_completed_tasksCompleted TasksNumber of completed tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_active_stagesActive StagesNumber of active stages in the application (different instances are shown using AppID_jobID)
spark_job_num_completed_stagesCompleted StagesNumber of completed stages in the application (different instances are shown using AppID_jobID)
spark_job_num_skipped_stagesSkipped StagesNumber of skipped stages in the application (different instances are shown using AppID_jobID)
spark_job_num_failed_stagesFailed StagesNumber of failed stages in the application (different instances are shown using AppID_jobID)
spark_stage_countStage CountNumber of stages (different instances are shown using AppID_stageID)
spark_stage_num_active_tasksStage Num Active TasksNumber of active tasks in the application's stages (different instances are shown using AppID_stageID)
spark_stage_num_complete_tasksStage Num Complete TasksNumber of complete tasks in the application's stages (different instances are shown using AppID_stageID)
spark_stage_num_failed_tasksStage Num Failed TasksNumber of failed tasks in the application's stages (different instances are shown using AppID_stageID)
spark_stage_executor_run_timeStage Executor Run TimeTime spent by the executor in the application's stages (different instances are shown using AppID_stageID)
spark_stage_input_bytesStage Input BytesbytesInput bytes in the application's stages (different instances are shown using AppID_stageID)
spark_stage_input_recordsStage Input RecordsInput records in the application's stages (different instances are shown using AppID_stageID)
spark_stage_output_bytesStage Output BytesbytesOutput bytes in the application's stages (different instances are shown using AppID_stageID)
spark_stage_output_recordsStage Output RecordsOutput records in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_read_bytesStage Shuffle Read BytesbytesNumber of bytes read during a shuffle in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_read_recordsStage Shuffle Read RecordsNumber of records read during a shuffle in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_write_bytesStage Shuffle Write BytesbytesNumber of shuffled bytes in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_write_recordsStage Shuffle Write RecordsNumber of shuffled records in the application's stages (different instances are shown using AppID_stageID)
spark_stage_memory_bytes_spilledStage Memory Bytes SpilledbytesNumber of bytes spilled to disk in the application's stages (different instances are shown using AppID_stageID)
spark_stage_disk_bytes_spilledStage Disk Bytes SpilledbytesMaximum size on disk of the spilled bytes in the application's stages (different instances are shown using AppID_stageID)
spark_driver_rdd_blocksDriver Rdd BlocksNumber of RDD blocks in the driver
spark_driver_memory_usedDriver Memory UsedAmount of memory used in the driver
spark_driver_disk_usedDriver Disk UsedAmount of disk used in the driver
spark_driver_active_tasksDriver Active TasksNumber of active tasks in the driver
spark_driver_failed_tasksDriver Failed TasksNumber of failed tasks in the driver
spark_driver_completed_tasksDriver Completed TasksNumber of completed tasks in the driver
spark_driver_total_tasksDriver Total TasksNumber of total tasks in the driver
spark_driver_total_durationDriver Total DurationTime spent in the driver
spark_driver_total_input_bytesDriver Total Input BytesbytesNumber of input bytes in the driver
spark_driver_total_shuffle_readDriver Total Shuffle ReadNumber of bytes read during a shuffle in the driver
spark_driver_total_shuffle_writeDriver Total Shuffle WriteNumber of shuffled bytes in the driver
spark_driver_max_memoryDriver Max MemoryMaximum memory used in the driver
spark_executor_countExecutor CountNumber of executors
spark_executor_rdd_blocksExecutor Rdd BlocksNumber of persisted RDD blocks in the application's executors
spark_executor_memory_usedExecutor Memory UsedAmount of memory used for cached RDDs in the application's executors
spark_executor_max_memoryExecutor Max MemoryMaximum memory across all executors working for a particular application
spark_executor_disk_usedExecutor Disk UsedAmount of disk space used by persisted RDDs in the application's executors
spark_executor_active_tasksExecutor Active TasksNumber of active tasks in the application's executors
spark_executor_failed_tasksExecutor Failed TasksNumber of failed tasks in the application's executors
spark_executor_completed_tasksExecutor Completed TasksNumber of completed tasks in the application's executors
spark_executor_total_tasksExecutor Total TasksTotal number of tasks in the application's executors
spark_executor_total_durationExecutor Total DurationTime spent by the application's executors executing tasks
spark_executor_total_input_bytesExecutor Total Input BytesbytesTotal number of input bytes in the application's executors
spark_executor_total_shuffle_readExecutor Total Shuffle ReadTotal number of bytes read during a shuffle in the application's executors
spark_executor_total_shuffle_writeExecutor Total Shuffle WriteTotal number of shuffled bytes in the application's executors
spark_rdd_countRdd CountNumber of RDDs
spark_rdd_num_partitionsRdd Num PartitionsNumber of persisted RDD partitions in the application
spark_rdd_num_cached_partitionsRdd Num Cached PartitionsNumber of in-memory cached RDD partitions in the application
spark_rdd_memory_usedRdd Memory UsedAmount of memory used in the application's persisted RDDs
spark_rdd_disk_usedRdd Disk UsedAmount of disk space used by persisted RDDs in the application
spark_streaming_statistics_avg_input_rateStreaming Avg Input RateAverage streaming input data rate
spark_streaming_statistics_avg_processing_timeStreaming Avg Processing TimeAverage application's streaming batch processing time
spark_streaming_statistics_avg_scheduling_delayStreaming Avg Scheduling DelayAverage application's streaming batch scheduling delay
spark_streaming_statistics_avg_total_delayStreaming Avg Total DelayAverage application's streaming batch total delay
spark_streaming_statistics_batch_durationStreaming Batch DurationApplication's streaming batch duration
spark_streaming_statistics_num_active_batchesStreaming Num Active BatchesNumber of active streaming batches
spark_streaming_statistics_num_active_receiversStreaming Num Active ReceiversNumber of active streaming receivers
spark_streaming_statistics_num_inactive_receiversStreaming Num Inactive ReceiversNumber of inactive streaming receivers
spark_streaming_statistics_num_processed_recordsStreaming Num Processed RecordsNumber of processed streaming records
spark_streaming_statistics_num_received_recordsStreaming Num Received RecordsNumber of received streaming records
spark_streaming_statistics_num_receiversStreaming Num ReceiversNumber of streaming application's receivers
spark_streaming_statistics_num_retained_completed_batchesStreaming Num Retained Completed BatchesNumber of retained completed application's streaming batches
spark_streaming_statistics_num_total_completed_batchesStreaming Num Total Completed BatchesTotal number of completed application's streaming batches