Skip to end of metadata
Go to start of metadata

ClustrixDB constantly self-monitors to ensure your cluster is healthy and operating optimally. When it detects conditions that require attention, ClustrixDB attempts to notify you via email using its Alerter. The alerts are of different severities (INFO, WARNING, ERROR, and CRITICAL) and ClustrixDB is preconfigured with default thresholds for each.

The contacts and communication details that control how alerts are sent must be configured for your cluster.

Configuring Alerts

Each cluster must be configured to manage how ClustrixDB should send its alerts and to whom. To configure the alerts for your system, follow these steps.

Icon

These steps are required for ClustrixDB to properly send email alerts.

Step 1. Set Identifying Global Variables

Set these identifying global variables for your database. These are especially important to aid Clustrix Support in troubleshooting.

sql> SET GLOBAL customer_name = 'customer name';
sql> SET GLOBAL cluster_name = 'cluster identifier';

Step 2. Configure alerts_parameters for smtp Server 

The parameters defined in the system.alerts_parameters table control how alerts are formatted and sent. 

Icon

ClustrixDB requires an smtp server to send the alert messages. These instructions presume that an smtp server has already been set up correctly for your environment.

For specifics on establishing an smtp server in AWS, see Setting up an SMTP Server.

Set the following smtp parameters as they apply to your cluster. Clustrix Support can provide assistance if needed.

Parameter NameWhat's Needed?Required?

smtp_server

Provide the identification for your smtp server.

Yes

smtp_port

Provide the smtp port for your environment, if different from the default of TCP port 25.

Yes

smtp_username

Provide the smtp username of your smtp server.

No

smtp_password

Provide the smtp password of your smtp server.

No

smtp_security

Provide the smtp security code of your smtp server.

No

Follow this syntax to update the parameters shown:  

UPDATE  system.alerts_parameters
  SET   value ='your smpt-specific value'
  WHERE name='parameter name'

Step 3. Configure alerts_subscriptions

Add email addresses of the individual(s) or group(s) who are to receive the alerts to the system.alerts_subscriptions table. You can insert, update, and delete from this table using standard SQL commands.

Use this sql to see the system.alerts_subscriptions defined:

sql> SELECT * FROM system.alerts_subscriptions;

Insert a new email address per this sample:

sql> INSERT INTO system.alerts_subscriptions VALUES ('desired_email@domain_name.com');

Step 4. RESET Alerter

Icon

Any time that changes are made to the system.alerts_parameters or system.alerts_subscriptions table(s), the alerter must be RESET.

Your changes will not take effect until this is done.

To reset the alerter, run the following sql:

sql> ALTER CLUSTER RESET ALERTER;

Unlike some other ALTER CLUSTER commands, this will not cause a group change on your cluster.

If invalid information is provided, you may encounter the following error:

sql> ALTER CLUSTER RESET ALERTER; 
sql> ERROR 1 (HY000): [64512] Bad configuration for alerts:

Check clustrix.log for more information. Here is an example where the smtp_server parameter was not specified:

2016-10-11 21:07:51.068524 UTC karma068.colo.sproutsys.com clxnode: ERROR cluster/alerter.ct:219 prepare_write(): Couldn't write alerter config: Bad configuration for alerts: No smtp_server specified

Step 5. Test Alerts

To verify that the configuration works properly, send a test alert following this syntax:

SELECT alert(severity, 'alert text')

In this sample, we used a severity code of 3 for “INFO”. The alert text can be anything you wish.

sql> SELECT alert(3,'Testing alert configuration');
     +----------------------------------------+
     | alert(3,'Testing alert configuration') |
     +----------------------------------------+
     |                                      0 |
     +----------------------------------------+
     1 row in set (0.00 sec)

That SQL statement will cause an informational alert to be sent, thereby testing the configuration for your cluster. If you do not receive the expected alert email, the alert configuration is incorrect. Review your setup beginning with Step 1.

Sample Emailed Alerts

Here are some sample emailed alert messages that may be similar to some you could encounter on your cluster. These alerts will also appear in the query.log.

Sample 1: Database Space WARNING

This alert is a WARNING for a cluster with a device1 file that is at least 80% full. If you receive a similar warning, see “Issue Resolution” in Managing File Space and Database Capacity.

Severity: WARNING
Date: 2016-10-02 18:49:24.177250 UTC
Host: clxdb003
HWID: b8:ca:3a:6b:7b:d0
Cluster: Clustrix-Dogfood
Version: 5.0.45-clustrix-7.5.1
Image Version: CentOS release 6.7 (Final)
Message: Database space is 80% used. Soon user queries will fail. path=/data/clustrix/device1 device_total=4,247,830,372,352 wal_total=1,073,741,824 device_free=327,733,190,656 temp_total_space=161,061,273,600 system_avail=758,480,666,624 system_total=3,757,962,166,272 total_used=2,999,481,499,648 %=80 user_avail=382,684,449,996 user_total=3,382,165,949,644 cont_type=USER trx_type=USER

Sample 2: Backup INFO

This INFO alert shows that the backup has failed. If you receive similar errors during backup processing, please see List of Errors for Backup and Restore.

Severity: INFO
Date: 2016-09-25 23:42:59.798249 UTC
Host: clxdb005
HWID: 00:25:90:8e:e3:0a
Cluster: Clustrix-Dogfood
Version: 5.0.45-clustrix-7.5.1
Image Version: CentOS release 6.7 (Final)
Message: [SQL] backup-25-09-2016 ERROR 2016-09-25 22:52:02

Sample 3: Read ERROR

This ERROR alert indicates that your system’s HD/SSD is experiencing hardware failures. Contact Clustrix Support for suggestions.

Severity: ERROR
Date: 2016-09-09 13:18:25.769801 UTC
Host: clxdb001
HWID: b8:ca:3a:6b:7b:d0
Cluster: Clustrix-Dogfood
Version: 5.0.45-clustrix-7.5.1
Image Version: CentOS release 6.7 (Final)
Message: Error reading 32768 bytes at offset 0x1d7367d0000 of "/data/clustrix/device1": Input/output error

Additional Information

Alerting Conditions

These are the conditions that ClustrixDB monitors and for which alerts are issued. These alerts are predefined within the database (system.alerts_messages) and may not be changed. The severity of these alerts range from critical to simply informational.

If you need help resolving an alert, contact Clustrix Support.

NameSummaryMessage
ACTIVATION_FAILEDActivation FailedActivation of device &device1 failed
DATABASE_SPACE_CRITICALDatabase space criticalDatabase space is &percent used. User queries will fail, and soon system queries will fail.
DATABASE_SPACE_EXHAUSTEDDatabase space exhaustedDatabase space is &percent used. User queries and system queries will now fail.
DATABASE_SPACE_EXTREMEDatabase space extremeDatabase space is &percent used. User queries will now fail.
DATABASE_SPACE_LOWDatabase space lowDatabase space is &percent used. Soon user queries will fail.
DATABASE_SPACE_OKAYDatabase space okayDatabase space is &percent used.
DBSTART_SPACE_PAUSEPausing dbstart due to space exhaustionNo space left for system transactions; not resulting continuation, awaiting cp command
DDL_TOO_LONGDDL lock has been held for too longThe DDL lock has been held for too long. While it is held, all new DDL transactions will block.
DEVICE_DEACTIVATEDDevice DeactivatedDeactivating device &device1
DM_READ_ERRORDevice Manager Read ErrorError reading &bytes bytes at offset &offset
EXCESSIVE_CLOCK_SKEWExcessive Clock SkewClock skew from nid &node_id to &node_id is &seconds seconds. Is NTP set up and working?
HOST_FILE_ERRORError writing host files&error
INACCESSIBLE_TABLESInaccessible TablesThe following is/are not fully accessible in this cluster: &table_name, &table_name...
INSUFFICIENT_REPROTECT_MEMORYInsufficient memory for reprotectionNot enough memory to reprotect if another node is lost: &percent memory table usage (without softfailed nodes) is greater than max &percent
INSUFFICIENT_REPROTECT_NODESInsufficient nodes for reprotectionNot enough nodes to reprotect if another node is lost
INSUFFICIENT_REPROTECT_SPACEInsufficient space for reprotectionNot enough space to reprotect if another node is lost: &percent usage (without softfailed nodes) is greater than max &percent
LICENSE_INVALIDLicense is invalidInvalid license installed
LICENSE_NEAR_EXPIRATIONLicense is nearing expirationLicense will expire at: (&expiration)
LOST_QUORUMLost QuorumNode &node_id lost quorum for group &group_id
MEMORY_TABLE_SPACE_EXHAUSTEDMemory table space exhaustedMemory table space is &percent used. User queries will now fail.
MEMORY_TABLE_SPACE_LOWMemory table space lowMemory table space is &percent used. Soon user queries may fail.
MONITORED_WAL_SYNC_EXCESSIVE_TIMESlow syncNode &node_id is slow to sync (took &synch_miliseconds ms, cluster avg &avg_miliseconds ms, hard threshold &threshold_miliseconds ms)
NEW_GROUPNew GroupNode &node_id has new group &group_id
PROTECTION_LOSTProtection LostFull protection lost for some data; queueing writes for down node; reprotection will begin in &seconds seconds if node has not recovered
PROTECTION_RESTOREDProtection RestoredFull protection restored for all data after &seconds seconds
SLAVE_RESTARTSlave RestartRestarting mysqlslave &slave_name
SLAVE_STOPSlave StoppedStopped mysqlslave &slave_name on non-transient error: &Error
USERUser Invoked From SQL&SQL_error

Preconfigured alerts_parameters

These additional entries from the system.alerts_parameters table are preconfigured and shown here for information only.

Some of these parameters include “meta tags” to denote that metadata contents will be substituted in the alert content when that parameter is used. The meta tags are explained in the next section.

parameter_nameValue

body_max_chars

50000

email_body

Severity: ${severity}
Date: ${date} ${tz}
Host: ${host}
HWID: ${hwid}
Cluster: ${cluster_name}
Version: ${version}
Image Version: ${image_version}
Message: ${message}

email_encoding

quoted-printable

email_subject

${alerts_name} [${severity}] ${summary}

smtp_sender

${alerts_name} CLX Log Alert

subject_max_chars

100

Metadata Used in alerts_parameters  

The alert parameters sometimes contain metadata that is identified by “meta tags”. These meta tags cause real-time information to be substituted within a generated alert.

The following chart shows how each meta tag will be resolved whenever it is used.

Parameter (meta tag) Description

{alerts_name}

Concatenation of cluster name and customer name.

{cluster_name}

Name for the cluster from the global “cluster_name”.

{customer_name}

Name of the customer as identified in the global “customer_name”.

{date}

The system’s current_timestamp.

{group}

ID of the current cluster group.

{host}

Name of host sending the alert.

{hwid}

The cluster hardware ID.

{image_version}

Operating system version.

{message}

Text of the error message from system.alerts_messages.message

{severity}

Severity level of the alert as follows:

0 - CRITICAL
1 - ERROR
2 - WARNING
3 - INFO

{summary}

Short form of the error message from system.alerts_messages.summary

{tz}

System time zone from global variable "system_time_zone".

{version}

Software version from global variable "version”.