Using WFM Prometheus metrics for monitoring & troubleshooting
To support additional resiliency and observability capabilities for (but not limited to) cloud based environment and deployment, the backend components of Genesys Workforce Management solution are modified to support Prometheus based metrics, available via http endpoints for engage on premise platform.
Use the below URL for WFM Prometheus based metrics:
http://<server-host>:<port>/metrics
Where:
<server-host> - Host on which WFM backend component running (WFM Server, Builder, Data Aggregator or Daemon)
<port> - Port on which WFM backend component (WFM Server, Builder, Data Aggregator or Daemon) accepting client requests. This <port> can either be the default server listening port or a dedicated management port that need to be enabled with the option:
management-port = <port>
For more information, see the the following URLS:
Prometheus models: https://prometheus.io/docs/concepts/data_model/
Prometheus supported metrics types: https://prometheus.io/docs/concepts/metric_types/
Grafana dashboards: https://prometheus.io/docs/visualization/grafana/ (WFM backend components now supports wide list of metrics which will be defined later in this chapter. These metrics can be called and used to build Grafana like dashboards for solution monitoring.)
Following tables describes all supported and available metrics that can be used to build dashboards, reports, alerts and gives you opportunity to monitor solution heath.
System
Name
|
Type
|
Description
|
Labels
|
wfm_system_start_time_seconds
|
Gauge
|
Start time as epoch time, in seconds
|
[app_name, component, host, version]
|
wfm_system_uptime_seconds
|
Gauge
|
System uptime, in seconds
|
[component, host]
|
wfm_system_leader
|
Gauge
|
Leader indicator 0/1
|
[component, host]
|
wfm_system_cpu_count
|
Gauge
|
System CPU count
|
[component, host]
|
wfm_system_process_private_bytes
|
Gauge
|
Process private bytes
|
[component, host]
|
wfm_system_process_virtual_bytes
|
Gauge
|
Process virtual bytes
|
[component, host]
|
wfm_system_process_cpu_time_ratio
|
Gauge
|
Process CPU time %
|
[component, host]
|
wfm_system_total_cpu_time_ratio
|
Gauge
|
Total system CPU time %
|
[component, host]
|
wfm_system_total_committed_bytes
|
Gauge
|
Total system committed bytes
|
[component, host]
|
wfm_system_total_commit_limit
|
Gauge
|
Total system memory limit, in bytes
|
[component, host]
|
wfm_system_total_physical_memory_bytes
|
Gauge
|
Total system physical memory, in bytes
|
[component, host]
|
wfm_system_total_virtual_memory_bytes
|
Gauge
|
Total system virtual memory, in bytes
|
[component, host]
|
wfm_system_available_physical_memory_bytes
|
Gauge
|
Available physical memory, in bytes
|
[component, host]
|
wfm_system_physical_memory_load_ratio
|
Gauge
|
Physical memory load %
|
[component, host]
|
Session
Name
|
Type
|
Description
|
Labels
|
wfm_session_count
|
Gauge
|
Current session count labeled by the session scope, which can be ‘agent’, ‘user’, ‘user agent’ or ‘system’
|
[component, host, scope]
|
Socket Connections
Name
|
Type
|
Description
|
Labels
|
wfm_connection_total
|
Counter
|
Total connections
|
[component, host]
|
wfm_connection_refused_total
|
Counter
|
Refused connections
|
[component, host]
|
wfm_connection_open
|
Gauge
|
Open connections
|
[component, host]
|
wfm_connection_idle
|
Gauge
|
Idle connections
|
[component, host]
|
wfm_connection_queued
|
Gauge
|
Queued connections
|
[component, host, direction]
|
wfm_connection_threads
|
Gauge
|
Connection thread count
|
[component, host, direction]
|
wfm_connection_threads_limit
|
Gauge
|
Connection thread count limit
|
[component, host, direction]
|
HTTP
Name
|
Type
|
Description
|
Labels
|
wfm_http_request_total
|
Counter
|
Total requests
|
[component, host]
|
wfm_http_request_failed_total
|
Counter
|
Total failed requests
|
[component, host]
|
wfm_http_request_duration_seconds
|
Histogram
|
Successful requests duration, in seconds
|
[component, host]
|
wfm_http_request_failed_duration_seconds
|
Histogram
|
Failed requests duration, in seconds
|
[component, host]
|
wfm_http_request_latency_seconds
|
Summary
|
Successful requests latency over the rolling time window, in seconds
|
[component, host]
|
wfm_http_request_failed_latency_seconds
|
Summary
|
Failed requests latency over the rolling time window, in seconds
|
[component, host]
|
wfm_http_request_failed_ratio
|
Summary
|
Failed requests ratio over the rolling time window
|
[component, host]
|
wfm_http_request_rps
|
Summary
|
Requests per second (RPS) over the rolling time window
|
[component, host]
|
wfm_http_request_active
|
Gauge
|
Active requests
|
[component, host, operation, uri]
|
wfm_http_request_read_time_seconds
|
Histogram
|
Request read time, in seconds
|
[component, host, operation, uri]
|
wfm_http_request_read_bytes
|
Counter
|
Request read bytes
|
[component, host, operation, uri]
|
wfm_http_request_write_time_seconds
|
Histogram
|
Request write time, in seconds
|
[component, host, operation, uri]
|
wfm_http_request_write_bytes
|
Counter
|
Request written bytes
|
[component, host, operation, uri]
|
wfm_http_response_total
|
Counter
|
Total responses
|
[component, host, code, operation, error, uri]
|
wfm_http_response_time_seconds
|
Histogram
|
Response time, in seconds
|
[component, host, code, operation, error, uri]
|
wfm_http_response_latency_seconds
|
Summary
|
Successful response latency over the rolling time window, in seconds
|
[component, host, code, operation, error, uri]
|
wfm_http_response_failed_latency_seconds
|
Summary
|
Failed response latency over the rolling time window, in seconds
|
[component, host, code, operation, error, uri]
|
Task
Name
|
Type
|
Description
|
Labels
|
wfm_task_total
|
Counter
|
Total tasks
|
[component, host, task]
|
wfm_task_refused_total
|
Counter
|
Total refused tasks
|
[component, host, task]
|
wfm_task_cancelled_total
|
Counter
|
Total cancelled tasks
|
[component, host, task]
|
wfm_task_failed_total
|
Counter
|
Total failed tasks
|
[component, host, task]
|
wfm_task_active
|
Gauge
|
Active tasks
|
[component, host, task]
|
wfm_task_active_max
|
Gauge
|
Maximum active tasks over the rolling time window
|
[component, host, task]
|
wfm_task_active_limit
|
Gauge
|
Active tasks limit
|
[component, host, task]
|
wfm_task_queued
|
Gauge
|
Queued tasks
|
[component, host, task]
|
wfm_task_queued_max
|
Gauge
|
Maximum queued tasks over the rolling time window
|
[component, host, task]
|
wfm_task_queued_limit
|
Gauge
|
Queued tasks limit
|
[component, host, task]
|
wfm_task_queued_time_seconds
|
Histogram
|
Task time in the queue, in seconds
|
[component, host, task]
|
wfm_task_handle_time_seconds
|
Histogram
|
Task handle time, in seconds
|
[component, host, task]
|
wfm_task_duration_seconds
|
Histogram
|
Task duration, in seconds
|
[component, host, task]
|
wfm_task_latency_seconds
|
Summary
|
Task latency over the rolling time window, in seconds
|
[component, host, task]
|
wfm_task_all_threads
|
Gauge
|
Task thread pool size
|
[component, host]
|
wfm_task_all_active
|
Gauge
|
Active tasks
|
[component, host]
|
wfm_task_all_active_max
|
Gauge
|
Maximum number of active tasks since last restart
|
[component, host]
|
wfm_task_all_active_limit
|
Gauge
|
Active task limit
|
[component, host]
|
wfm_task_all_queued
|
Gauge
|
Queued tasks
|
[component, host]
|
wfm_task_all_queued_max
|
Gauge
|
Maximum number of queued tasks since last restart
|
[component, host]
|
wfm_task_all_queued_limit
|
Gauge
|
Queued task limit
|
[component, host]
|
wfm_task_all_throttled
|
Gauge
|
Throttled tasks
|
[component, host]
|
wfm_task_all_throttled_max
|
Gauge
|
Maximum number of throttled tasks since last restart
|
[component, host]
|
Database
Name
|
Type
|
Description
|
Labels
|
wfm_db_connection_total
|
Counter
|
Total database connections
|
[component, host]
|
wfm_db_connection_failed_total
|
Counter
|
Total failed database connections
|
[component, host]
|
wfm_db_connections
|
Gauge
|
Current database connections
|
[component, host]
|
wfm_db_connection_time_seconds
|
Histogram
|
Time to establish database connection, in seconds
|
[component, host]
|
wfm_db_command_total
|
Counter
|
Total number of database commands executed
|
[component, host, task]
|
wfm_db_command_failed_total
|
Counter
|
Total number of failed database commands
|
[component, host, task]
|
wfm_db_command_duration_seconds
|
Histogram
|
Database command duration, in seconds
|
[component, host, task]
|
wfm_db_fetch_total
|
Counter
|
Total number of database fetches
|
[component, host, task]
|
wfm_db_fetch_duration_seconds
|
Histogram
|
Database fetch duration, in seconds
|
[component, host, task]
|
wfm_db_deadlock_total
|
Counter
|
Total number of database deadlocks detected
|
[component, host, task]
|
Cache
Name
|
Type
|
Description
|
Labels
|
wfm_cache_size_bytes
|
Gauge
|
Cache size, in bytes, labeled by cache type
|
[component, host, cache]
|
wfm_cache_hit_count
|
Counter
|
Cache hit count, labeled by cache type
|
[component, host, cache]
|
wfm_cache_miss_count
|
Counter
|
Cache miss count, labeled by cache type
|
[component, host, cache]
|
wfm_cache_hit_ratio
|
Summary
|
Cache hit ratio over the rolling time window
|
[component, host, cache]
|
Memory Allocations
Name
|
Type
|
Description
|
Labels
|
wfm_alloc_objects
|
Gauge
|
Allocated object count, labeled by object type
|
[component, host, object]
|
wfm_alloc_object_size_bytes
|
Gauge
|
Object allocation size, in bytes, labeled by object type
|
[component, host, object]
|
ETL
Name
|
Type
|
Description
|
Labels
|
wfm_etl_run_total
|
Counter
|
Total ETL runs
|
[component, host]
|
wfm_etl_run_failed_total
|
Counter
|
Total failed ETL runs
|
[component, host]
|
wfm_etl_run_cancelled_total
|
Counter
|
Total cancelled ETL runs
|
[component, host]
|
wfm_etl_run_progress_perc
|
Gauge
|
Last ETL run progress %
|
[component, host]
|
wfm_etl_run_start_time_seconds
|
Gauge
|
Last ETL run start time as epoch time, in seconds
|
[component, host]
|
wfm_etl_run_end_time_seconds
|
Gauge
|
Last ETL run end time as epoch time, in seconds
|
[component, host]
|
wfm_etl_run_outcome
|
Gauge
|
Last ETL run outcome: 0 - complete, 1 - cancelled, 2 - failed
|
[component, host]
|
wfm_etl_record_total
|
Counter
|
Total ETL records transferred by subsystem: ‘configuration’, ‘adherence’, ‘schedule’, ‘performance’
|
[component, host, subsystem]
|
Data Aggregator (DA)
Name
|
Type
|
Description
|
Labels
|
wfm_da_writes_db_total
|
Counter
|
Total number of DA database record writes
|
[component, host, record_type]
|
wfm_da_writes_db_failed_total
|
Counter
|
Total number of failed DA database record writes
|
[component, host, record_type]
|
wfm_da_writes_db_retried_total
|
Counter
|
Total number of retried DA database record writes
|
[component, host, record_type]
|
wfm_da_writes_db_queued_time_seconds
|
Histogram
|
DA database record time in queue, in seconds
|
[component, host, record_type]
|
wfm_da_writes_db_write_time_seconds
|
Histogram
|
DA database record write time, in seconds
|
[component, host, record_type]
|
wfm_da_writes_db_duration_seconds
|
Histogram
|
DA database record write duration, in seconds
|
[component, host, record_type]
|
wfm_da_writes_file_total
|
Counter
|
Total number of DA dump file data writes
|
[component, host]
|
wfm_da_writes_file_failed_total
|
Counter
|
Total number of DA dump failed file data writes
|
[component, host]
|
wfm_da_writes_queue_size
|
Gauge
|
DA database writer queue size
|
[component, host]
|
wfm_da_statserver_event_total
|
Counter
|
Total number of events received from StatServer, labeled by event type
|
[component, host, event]
|
wfm_da_statserver_error_total
|
Counter
|
Total number of errors received from StatServer, labeled by event type
|
[component, host, event]
|
Builder
Name
|
Type
|
Description
|
Labels
|
wfm_builder_job_total
|
Counter
|
Total schedule build jobs
|
[component, host]
|
wfm_builder_job_failed_total
|
Counter
|
Total failed schedule build jobs labeled by error type. Possible ‘error’ label values: ‘internal’, ‘data’, ‘network’, ‘wfmserver’, ‘cfgserver’, ‘system’.
|
[component, host, error]
|
wfm_builder_job_cancelled_total
|
Counter
|
Total cancelled schedule build jobs
|
[component, host]
|
wfm_builder_job_active
|
Gauge
|
Active schedule build jobs
|
[component, host]
|
wfm_builder_job_active_limit
|
Gauge
|
Maximum allowed number of active concurrent schedule build jobs
|
[component, host]
|
wfm_builder_job_queued
|
Gauge
|
Queued schedule build jobs
|
[component, host]
|
wfm_builder_job_reading
|
Gauge
|
Schedule build jobs reading input data
|
[component, host]
|
wfm_builder_job_writing
|
Gauge
|
Schedule build jobs saving the results
|
[component, host]
|
wfm_builder_job_queue_time_seconds
|
Histogram
|
Schedule build jobs time in queue, in seconds
|
[component, host]
|
wfm_builder_job_queued_latency
|
Summary
|
Job time in queue over the rolling time window, in seconds
|
[component, host]
|
wfm_builder_job_read_time_seconds
|
Histogram
|
Schedule build jobs reading input data time, in seconds
|
[component, host]
|
wfm_builder_job_build_time_seconds
|
Histogram
|
Schedule build jobs scheduling time, in seconds
|
[component, host]
|
wfm_builder_job_write_time_seconds
|
Histogram
|
Schedule build results saving time, in seconds
|
[component, host]
|
wfm_builder_job_duration_seconds
|
Histogram
|
Schedule build jobs duration, in seconds
|
[component, host]
|
wfm_builder_job_sites
|
Histogram
|
Schedule build site count
|
[component, host]
|
wfm_builder_job_agents
|
Histogram
|
Schedule build agent count
|
[component, host]
|
wfm_builder_job_days
|
Histogram
|
Schedule build day count
|
[component, host]
|
wfm_builder_task_active
|
Gauge
|
Active scheduling tasks
|
[component, host]
|
wfm_builder_task_active_limit
|
Gauge
|
Maximum allowed number of active concurrent scheduling tasks
|
[component, host]
|
wfm_builder_task_active_ratio
|
Summary
|
Active task ratio (task_active / task_active_limit) over the rolling time window
|
[component, host]
|
wfm_builder_task_queued
|
Gauge
|
Queued scheduling tasks
|
[component, host]
|
Golden Metrics
Name
|
Type
|
Description
|
Labels
|
golden_signals:traffic
|
Gauge
|
Traffic normalized in the range from 0 to 1
|
[component, host]
|
golden_signals:latency
|
Gauge
|
Latency normalized in the range from 0 to 1
|
[component, host]
|
golden_signals:errors
|
Gauge
|
Errors ratio
|
[component, host]
|
golden_signals:saturation
|
Gauge
|
Saturation normalized in the range from 0 to 1
|
[component, host]
|
Health
Name
|
Type
|
Description
|
Labels
|
wfm_health_status
|
Gauge
|
Component health status: 0 - green, 1 - yellow, 2 – red includes component’s dependencies and their health statuses
|
[component, host, dependency]
|