Monitoring and Alerting

Internal Monitoring

System monitoring backend

The platform uses the Prometheus monitoring backend on new installations and on upgraded systems that have been migrated. On older systems the monitoring backend was InfluxDB, which is now deprecated.

The platform uses various monitoring backend services to monitor many aspects of the system, including CPU, memory, swap, disk, filesystem, network, processes, NTP, Nginx, Redis and MySQL.

The gathered information is stored in VictoriaMetrics which is a long-term storage backend for Prometheus. NOTE: Both VictoriaMetrics and Prometheus can act as the prometheus server implementation, and are mutually exclusive in their execution. On systems still using InfluxDB the information is stored in the telegraf database.

Sipwise C5 specific monitoring via ngcp-witnessd

The platform uses the internal ngcp-witnessd service to monitor Sipwise C5 specific metrics or system metrics currently not tracked by the monitoring backend (either Prometheus exporters or the telegraf service when using the deprecated InfluxDB), including HA status, MTA, Kamailio, SIP and MySQL.

The gathered information is stored in VictoriaMetrics in the ngcp namespace, or in InfluxDB in the ngcp database.

Some of the data gathering can be disabled (most are enabled by default) through the config.yml file, and those data points will then either be missing from the database or be initialized with a stub value. This will then cascade into other subsystems using this monitoring information, such as Grafana dashboards. The enable/disable flags can be found in the witnessd.gather section.

Monitoring data in the monitoring backend

The platform uses VictoriaMetrics as a long-term Prometheus time series database to store most of the metrics collected in the system. On systems still using InfluxDB the time series databases role is filled by InfluxDB itself.

The monitoring data is used by the statistics dashboard powered by Grafana.

The monitoring data can also be accessed directly by various means. On new installations by using the promtool command-line tool; or by using the HTTP API with curl (or other HTTP fetchers), or with the NGCP::Prometheus::HTTP perl module. On old installations by using the influx command-line tool in CLI or TUI modes; by using the ngcp-influxdb-extract wrapper which provides two convenience commands to run arbitrary queries or to fetch the last value for a measurement’s field; or by using the HTTP API with curl (or other HTTP fetchers), or with the NGCP::InfluxDB::HTTP perl module.

Monitoring metrics

See appendices-main:appendices-main.adoc#prometheus-monitoring-metrics for detailed information about the list of ngcp namespaced metrics stored in the Prometheus monitoring database.

See appendices-main:appendices-main.adoc#influxdb-monitoring-keys for detailed information about the list of data stored in the InfluxDB ngcp monitoring database.

PromQL

See https://prometheus.io/docs/prometheus/latest/querying/basics/ for information about PromQL, the query language used by Prometheus.

To get the list of all metrics for a specific namespace the following query can be used {__name__=~"^namespace_.+"}.

InfluxQL

See https://docs.influxdata.com/influxdb/v1.1/query_language/spec/ for information about InfluxQL, the query language used by InfluxDB.

To get the list of all measurements for a specific database the following query can be used SHOW MEASUREMENTS.

Statistics Dashboard

The platform’s administration interface (described in basicconfiguration:basicconfiguration.adoc#administrative-configuration) provides a graphical overview based on Grafana of the most important system health indicators, such as memory usage, load averages and disk usage. VoIP statistics, such as the number of concurrent active calls, the number of provisioned and registered subscribers, etc. is also present.