Monitoring and Alerting

Internal Monitoring

System monitoring backend

The platform uses the Prometheus monitoring backend on new installations and on upgraded systems that have been migrated.

The platform uses various monitoring backend services to monitor many aspects of the system, including CPU, memory, swap, disk, filesystem, network, processes, NTP, Nginx, Redis and MySQL.

The gathered information is stored in VictoriaMetrics which is a long-term storage backend for Prometheus. NOTE: Both VictoriaMetrics and Prometheus can act as the prometheus server implementation, and are mutually exclusive in their execution.

Sipwise C5 specific monitoring via ngcp-witnessd

The platform uses the internal ngcp-witnessd service to monitor Sipwise C5 specific metrics or system metrics currently not tracked by the monitoring backend (via Prometheus exporters), including HA status, MTA, Kamailio, SIP and MySQL.

The gathered information is stored in VictoriaMetrics in the ngcp namespace on its time-series database.

Some of the data gathering can be disabled (most are enabled by default) through the config.yml file, and those data points will then either be missing from the database or be initialized with a stub value. This will then cascade into other subsystems using this monitoring information, such as Grafana dashboards. The enable/disable flags can be found in the witnessd.gather section.

Monitoring data in the monitoring backend

The platform uses VictoriaMetrics as a long-term Prometheus time series database to store most of the metrics collected in the system.

The monitoring data is used by the statistics dashboard powered by Grafana.

The monitoring data can also be accessed directly by various means. On new installations by using the promtool command-line tool; or by using the HTTP API with curl (or other HTTP fetchers), or with the NGCP::Prometheus::HTTP perl module.

Monitoring metrics

See appendices-main:appendices-main.adoc#prometheus-monitoring-metrics for detailed information about the list of ngcp namespaced metrics stored in the Prometheus monitoring database.

PromQL

See https://prometheus.io/docs/prometheus/latest/querying/basics/ for information about PromQL, the query language used by Prometheus.

To get the list of all metrics for a specific namespace the following query can be used {__name__=~"^namespace_.+"}.

Statistics Dashboard

The platform’s administration interface (described in basicconfiguration:basicconfiguration.adoc#administrative-configuration) provides a graphical overview based on Grafana of the most important system health indicators, such as memory usage, load averages and disk usage. VoIP statistics, such as the number of concurrent active calls, the number of provisioned and registered subscribers, etc. is also present.