The platform uses the monit daemon internally to monitor all essential
services. Since the sip:carrier runs in an active/standby mode, not
all services are always running on both nodes, some of them will only run
on the active node and be stopped on the standby node. At any time, you
can use the command monit summary
to get a list of all services and
their current status, or monit status
for the same list with more detail.
![]() | |
sip:carrier has a monit services dependencies since mr3.5.1.
Services specified in a depend statement will be checked during
stop/start/monitor/unmonitor operations. If a service is stopped or
unmonitored it will stop/unmonitor any services that depends on itself.
Which means that kamailio/sbc/asterisk/prosody/… will be stopped on
|
The monit daemon takes care of quickly restarting a service should
it ever fail for whatever reason. When that happens, the daemon will
send a notification email to the address specified in the config.yml
file under the key general.adminmail
. It will also send warning emails
to this address under certain abnormal conditions, such as when the system
is low on memory (> 75% used) or under high-load conditions.
![]() | |
In order for monit to be able to send email to the specified
address, the local MTA (exim4) must be configured correctly. If you
haven’t done so already, run |
The platform’s administration interface (described in Section 4, “Administrative Configuration”) provides a simple graphical overview of the most important system health data points, such as memory usage, load averages and disk usage, as well as statistics about the VoIP system itself, such as the number of concurrent active calls, number of provisioned and registered subscribers, etc.
The sip:carrier exports a variety of cluster health data and statistics
over standard SNMP. By default, the SNMP interface can only be accessed
locally. To make it possible to poll the SNMP data from an external
system, the config.yml
file needs to be edited and the list of
allowed community names and allowed hosts/IP ranges must be populated.
This list can be found under the checktools.snmpd.communities
key
and consists of one or more community
/source
value pairs. The
community
is the SNMP community string to be allowed, while
source
is the IP address or IP block to allow this community from.
A source
of default
equals the IP address 127.0.0.1
. Other
legal values are single IP addresses or IP blocks in IP/prefix notation,
for example 192.168.115.0/24
. It is recommended that you leave the
default entry (public
and default
) in place for local testing
of SNMP functionality.
![]() | |
To locally check if SNMP is working correctly, execute the command
|
![]() | |
SNMP version 1 and version 2c are supported. |
There are two types of information that can be retrieved from SNMP. The first one is the native NGCP cluster overview from the Sipwise MIBs. The second is the legacy ad-hoc information using the Net-SNMP extension OIDs, and detailed information for the node running the SNMP daemon using standard OIDs.
The entire NGCP cluster can be monitored by using the SIPWISE-NGCP-MIB
,
SIPWISE-NGCP-MONITOR-MIB
and SIPWISE-NGCP-STATS-MIB
. These OIDs are
rooted at the Sipwise NGCP slot .1.3.6.1.4.1.34274.1.*
.
The MIBs are self-documented, and can be found as part of the ngcp-snmp-mibs package. The NGCP SNMP Agent can be found as part of the ngcp-snmp-agent package, which when installed it should work out-of-the-box as long as the snmpd has been properly configured.
The SIPWISE-NGCP-MIB
acts as the root MIB and exposes information
about the cluster licensing and layout (which is mostly static data about
each node, such as node name, its IP address, its roles, etc), information
required to access the OIDs from the other MIBs.
The SIPWISE-NGCP-MONITOR-MIB
exposes current monitoring information,
global health conditions, number of provisioned and registered subscribers
and devices; and per node information (independently of the number of nodes
or their names) of their filesystem, processes, databases, system load,
memory, heartbeat status, MTA queues, etc.
The SIPWISE-NGCP-STATS-MIB
exposes statistics on billing, performance,
and message activity over time.
NOTICE: Neither traps nor some of the OIDs are yet implemented. Namely anything under the following trees: ngcpMonitorPeering, ngcpMonitorFraud, ngcpMonitorPerformance.perfCAPSCurTable, and ngcpStats.
![]() | |
The following OIDs have been pretty much superseded by the Sipwise NGCP OIDs, but are still provided for backwards compatibility. |
All basic system health variables (such as memory, disk, swap, CPU usage,
network statistics, process lists, etc) for the mgmt_ node can be found
in standard OID slots from standard MIBs. For example, memory statistics
can be found through the UCD-SNMP-MIB__ in OIDs such as memTotalSwap.0
,
memAvailSwap.0
, memTotalReal.0
, memAvailReal.0
, etc., which
translate to numeric OIDs .1.3.6.1.4.1.2021.4.*
. In fact,
UCD-SNMP-MIB
is the most useful MIB for overall system health checks.
Additionally, there’s a list of specially monitored processes, also
found through the UCD-SNMP-MIB
. UCD-SNMP-MIB::prNames
(.1.3.6.1.4.1.2021.2.1.2
) gives the list of monitored processes,
prCount
(.1.3.6.1.4.1.2021.2.1.5
) is how many of each process are
running and prErrorFlag
(.1.3.6.1.4.1.2021.2.1.100
) gives a 0/1
error indication (with prErrMessage
(.1.3.6.1.4.1.2021.2.1.101
)
providing an explanation of any error).
![]() | |
Some of these processes are not supposed to be running on the standby node, so you’ll see the error flag raised there. A possible solution is to run these SNMP checks against the shared service IP of the cluster. |
Furthermore, UCD-SNMP-MIB
provides a list of custom, external checks.
The names of these can be found under the UCD-SNMP-MIB::extNames
(.2
) tree, with extOutput
(.101
) providing the output (one
line) from each check and extResult
(.100
) the exit code from
each check.
The first of these external checks called collective_check
provides
a combined and overall system health status indicator. It gathers
information from both nodes and returns 0 in extResult.1
(.100.1
) if everything is OK and running as it should. If it finds
a problem somewhere, but with the system still operational (e.g. a
service is stopped on the inactive node), extResult.1
will return
1 and extOutput.1
will be set to a string that can be used to
diagnose the problem. In case the system is found in a critical and
non-operational state, extResult.1
will return 2, again with
an error message set. If you want to keep it really simple, you can
just monitor this one OID and raise an alarm if it ever goes to non-zero.
![]() | |
The 0/1/2 status codes allow for easy integration with Nagios. |
The remaining external checks simply return statistics about the system,
they all return a number in extOutput
and have extResult
always
set to zero.
The full list of such checks is below. All of these checks exist in three
flavors: the first returns the statistics from sp1
(the first node in
the sip:carrier pair), the second from sp2
, and the third from
whichever node is being queried (which is useful when querying the shared
service IP). For example, the local SIP response time from sp1
is
in sip_check_sp1
, from sp2
is in sip_check_sp2
and from the
host itself in sip_check_self
.
The base OID of the Result and Output OID is always .1.3.6.1.4.1.2021.8.1
,
so if you read .100.1
, the full OID is .1.3.6.1.4.1.2021.8.1.100.1
.
Name in MIB | Result OID | Output OID | Name | Description |
---|---|---|---|---|
UCD-SNMP-MIB::extNames.1 | .100.1 | .101.1 | collective_check | Summarized platform check |
UCD-SNMP-MIB::extNames.2 | .100.2 | .101.2 | sip_check_sp1 | SIP response time in seconds on sp1 |
UCD-SNMP-MIB::extNames.3 | .100.3 | .101.3 | sip_check_sp2 | SIP response time in seconds on sp2 |
UCD-SNMP-MIB::extNames.4 | .100.4 | .101.4 | mysql_check_sp1 | Average number of MySQL queries per second on sp1 |
UCD-SNMP-MIB::extNames.5 | .100.5 | .101.5 | mysql_check_sp2 | Average number of MySQL queries per second on sp2 |
UCD-SNMP-MIB::extNames.6 | .100.6 | .101.6 | mysql_replication_check_sp1 | MySQL replication delay in seconds on sp1 |
UCD-SNMP-MIB::extNames.7 | .100.7 | .101.7 | mysql_replication_check_sp2 | MySQL replication delay in seconds on sp2 |
UCD-SNMP-MIB::extNames.8 | .100.8 | .101.8 | mpt_check_sp1 | RAID status on sp1 |
UCD-SNMP-MIB::extNames.9 | .100.9 | .101.9 | mpt_check_sp2 | RAID status on sp2 |
UCD-SNMP-MIB::extNames.10 | .100.10 | .101.10 | exim_queue_check_sp1 | Number of mails undelivered in MTA queue on sp1 |
UCD-SNMP-MIB::extNames.11 | .100.11 | .101.11 | exim_queue_check_sp2 | Number of mails undelivered in MTA queue on sp2 |
UCD-SNMP-MIB::extNames.12 | .100.12 | .101.12 | provisioned_subscribers_check_sp1 | Number of subscribers provisioned on sp1 |
UCD-SNMP-MIB::extNames.13 | .100.13 | .101.13 | provisioned_subscribers_check_sp2 | Number of subscribers provisioned on sp2 |
UCD-SNMP-MIB::extNames.14 | .100.14 | .101.14 | kam_dialog_active_check_sp1 | Number of active calls on sp1 |
UCD-SNMP-MIB::extNames.15 | .100.15 | .101.15 | kam_dialog_active_check_sp2 | Number of active calls on sp2 |
UCD-SNMP-MIB::extNames.16 | .100.16 | .101.16 | kam_dialog_early_check_sp1 | Number of calls in Early Media state on sp1 |
UCD-SNMP-MIB::extNames.17 | .100.17 | .101.17 | kam_dialog_early_check_sp2 | Number of calls in Early Media state on sp2 |
UCD-SNMP-MIB::extNames.18 | .100.18 | .101.18 | kam_dialog_type_local_check_sp1 | Number of active calls local on sp1 |
UCD-SNMP-MIB::extNames.19 | .100.19 | .101.19 | kam_dialog_type_local_check_sp2 | Number of active calls local on sp2 |
UCD-SNMP-MIB::extNames.20 | .100.20 | .101.20 | kam_dialog_type_relay_check_sp1 | Number of active calls routed via peers on sp1 |
UCD-SNMP-MIB::extNames.21 | .100.21 | .101.21 | kam_dialog_type_relay_check_sp2 | Number of active calls routed via peers on sp2 |
UCD-SNMP-MIB::extNames.22 | .100.22 | .101.22 | kam_dialog_type_incoming_check_sp1 | Number of incoming calls on sp1 |
UCD-SNMP-MIB::extNames.23 | .100.23 | .101.23 | kam_dialog_type_incoming_check_sp2 | Number of incoming calls on sp2 |
UCD-SNMP-MIB::extNames.24 | .100.24 | .101.24 | kam_dialog_type_outgoing_check_sp1 | Number of outgoing calls on sp1 |
UCD-SNMP-MIB::extNames.25 | .100.25 | .101.25 | kam_dialog_type_outgoing_check_sp2 | Number of outgoing calls on sp2 |
UCD-SNMP-MIB::extNames.26 | .100.26 | .101.26 | kam_usrloc_regusers_check_sp1 | Number of subscribers with at least one active registration on sp1 |
UCD-SNMP-MIB::extNames.27 | .100.27 | .101.27 | kam_usrloc_regusers_check_sp2 | Number of subscribers with at least one active registration on sp2 |
UCD-SNMP-MIB::extNames.28 | .100.28 | .101.28 | kam_usrloc_regdevices_check_sp1 | Total number of registered end devices on sp1 |
UCD-SNMP-MIB::extNames.29 | .100.29 | .101.29 | kam_usrloc_regdevices_check_sp2 | Total number of registered end devices on sp2 |
UCD-SNMP-MIB::extNames.30 | .100.30 | .101.30 | mysql_replication_discrepancies_check_sp1 | Number of MySQL tables not in sync between sp1 and sp2 |
UCD-SNMP-MIB::extNames.31 | .100.31 | .101.31 | mysql_replication_discrepancies_check_sp2 | Number of MySQL tables not in sync between sp1 and sp2 |
UCD-SNMP-MIB::extNames.32 | .100.32 | .101.32 | sip_check_self | Summarized platform check on active node |
UCD-SNMP-MIB::extNames.33 | .100.33 | .101.33 | mysql_check_self | Average number of MySQL queries per second on active node |
UCD-SNMP-MIB::extNames.34 | .100.34 | .101.34 | mysql_replication_check_self | MySQL replication delay in seconds on active node |
UCD-SNMP-MIB::extNames.35 | .100.35 | .101.35 | mpt_check_self | RAID status on active node |
UCD-SNMP-MIB::extNames.36 | .100.36 | .101.36 | exim_queue_check_self | Number of mails undelivered in MTA queue on active node |
UCD-SNMP-MIB::extNames.37 | .100.37 | .101.37 | provisioned_subscribers_check_self | Number of subscribers provisioned on active node |
UCD-SNMP-MIB::extNames.44 | .100.44 | .101.44 | kam_usrloc_regusers_check_self | Number of subscribers with at least one active registration on active node |
UCD-SNMP-MIB::extNames.45 | .100.45 | .101.45 | kam_usrloc_regdevices_check_self | Total number of registered end devices on active node |
UCD-SNMP-MIB::extNames.46 | .100.46 | .101.46 | mysql_replication_discrepancies_check_self | Number of MySQL tables not in sync between sp1 and sp2 |
UCD-SNMP-MIB::extNames.47 | .100.47 | .101.47 | kam_dialog_type_local_check_prx0X | Number of active calls local on activeproxy X |
UCD-SNMP-MIB::extNames.48 | .100.48 | .101.48 | kam_dialog_type_relay_check_prx0X | Number of active calls routed via peers on active proxy X |
UCD-SNMP-MIB::extNames.49 | .100.49 | .101.49 | kam_dialog_type_incoming_check_prx0X | Number of incoming calls on active proxy X |
UCD-SNMP-MIB::extNames.50 | .100.50 | .101.50 | kam_dialog_type_outgoing_check_prx0X | Number of outgoing calls on active proxy X |
UCD-SNMP-MIB::extNames.51 | .100.51 | .101.51 | kam_dialog_active_check_prx0X | Number of active calls on active proxy X |
UCD-SNMP-MIB::extNames.52 | .100.52 | .101.52 | kam_dialog_early_check_prx0X | Number of calls in Early Media state on active proxy X |
![]() | |
Some of the checks can be disabled (most are enabled by default)
through the |