Skip to end of metadata
Go to start of metadata
ALTER CLUSTER ADD 'ip' [, 'ip'] ... [COORDINATE]
    
ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ...
ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...  
ALTER CLUSTER REFORM

ALTER CLUSTER DROP nodeid

ALTER CLUSTER RESIZE DEVICES size 

ALTER CLUSTER SET MAX_FAILURES = number of simultaneous node failures 

ALTER CLUSTER changes independent characteristics of a ClustrixDB cluster. The changes are automatically logged within the query.log. Use ALTER CLUSTER to:

One change may be made at a time.  A user must have root access to use ALTER CLUSTER.

ALTER CLUSTER ADD (Flex Up)

ALTER CLUSTER ADD 'ip' [, 'ip'] ... [COORDINATE]

Use ALTER CLUSTER ADD to add new node(s) to your cluster, aka Flex Up. If adding multiple nodes at one time, COORDINATE (the default) ensures that all nodes are prepared before the cluster automatically forms a new group. For additional information see Expanding Your Cluster's Capacity - Flex Up.

ALTER CLUSTER SOFTFAIL (Flex Down)

ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ... 
ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...  
ALTER CLUSTER REFORM

ALTER CLUSTER SOFTFAIL causes the Rebalancer to move all data from the designated node(s). Once all the data is relocated, the cluster must be directed to REFORM the cluster.

ALTER CLUSTER UNSOFTFAIL abandons a previous SOFTFAIL request before it completes. The node(s) are again made available for use and the Rebalancer will work to relocate data to the node(s). 

ALTER CLUSTER REFORM forces a group change for the cluster. This is the final step of the Flex Down process. For detailed instructions see Reducing Your Cluster's Capacity - Flex Down.

ALTER CLUSTER DROP

ALTER CLUSTER DROP nodeid  

Use ALTER CLUSTER DROP to immediately remove a node from the cluster. This should only be used in emergency situations such as when there are hardware failures. Additional information can be found in Administering Failure and Recovery.

Icon

ALTER CLUSTER DROP should be used with caution as this operation cannot be undone.

Dropping more than one node before reprotect is finished will result in permanent data loss.

Using this command to remove a functioning node should be avoided, if possible. Use Flex Down instead.

ALTER CLUSTER RESIZE DEVICES  

ALTER CLUSTER RESIZE DEVICES size 

Use ALTER CLUSTER RESIZE DEVICES to immediately expand the  device1 file on all online nodes. The device1 files should be the same size cluster-wide. (This command does not affect the device1-temp file that is used for sorting and grouping large query results.)

In special circumstances, it may be necessary to reduce the size of the device1 file cluster-wide. See Decreasing device1 Size.  

size is the total calculated bytes or a rounded whole integer suffixed by k/m/g for kilobytes, megabytes, or gigabytes. For example:

sql> ALTER CLUSTER RESIZE DEVICES 50g;

See Managing File Space and Database Capacity for additional information. 

Icon

Clustrix recommends resizing devices during off-peak hours.

ALTER CLUSTER SET MAX_FAILURES

MAX_FAILURES is the number of nodes that can become permanently unavailable simultaneously while ensuring that no data is lost. If there are N nodes in a cluster, MAX_FAILURES must be less than N/2+1. MAX_FAILURES is also called nResiliency

By default, all database tables and indexes are defined with REPLICAS = 2, which ensures that each slice of data has two copies (a slice and a replica) and MAX_FAILURES = 1.

Icon

A cluster can tolerate more than MAX_FAILURES node failures as long as after each failure, the cluster has time and disk space to complete the reprotect process.

The value of MAX_FAILURES is displayed as a read-only global variable:

NameDescription

Default Value

max_failuresNumber of nodes that can fail simultaneously without losing the ability to resolve transactions1

Change the value of MAX_FAILURES

To change the value of MAX_FAILURES to k (in this example, we will use k=2), perform the following:

Step 1: Alter Individual Tables

Alter tables to have REPLICAS= k+1
sql> ALTER TABLE foo LAZY PROTECT REPLICAS = 3; 

The LAZY PROTECT option tells the Rebalancer to queue the work of creating additional replicas.

Use this query to generate a list of ALTER statements

Icon
sql> SELECT concat ('ALTER TABLE ',`database`, '.', `Table`, ' LAZY PROTECT REPLICAS = 3;')      
     FROM system.table_replicas
     where database not in ('system', 'clustrix_dbi', 'clustrix_statd')
     GROUP BY `table`, `database`
     having (count(1) / count(distinct slice)) < MAX_FAILURES + 1;

Wait for the Rebalancer to finish creating additional replicas. For more on this, see Managing the Rebalancer

Once the rebalancer is done with creating the additional replicas, verify that all tables have at least k+1 replicas:
sql> SELECT `database`, `Table`, (count(1) / count(distinct slice)) num_replicas   
FROM system.table_replicas
where database != 'system'
GROUP BY `table`, `database`;
Icon
This query intentionally excludes the SYSTEM database, whose replicas are managed by internal ClustrixDB processes.


Step 2: Set the global that specifies the number of Replicas for new Tables.

Set the global variable for default_replicas so new tables have sufficient replicas (k+1) by default:
sql> SET GLOBAL DEFAULT_REPLICAS = 3;

Step 3: Set the Cluster-Wide Failure Threshold.

Update MAX_FAILURES to the desired value
sql>  ALTER CLUSTER SET MAX_FAILURES = 2;
Icon

Running this command will result in a group change.

Log Messages

When the value for MAX_FAILURES is modified, you will see an entry in clustrix.log that notes the number of node failures that are configured:

INFO tm/gtm_resolve.c:168 gtm_r_validate_paxos_f(): group 1cfffe supports 2 simultaneous failures

If the value for MAX_FAILURES exceeds (N/2-1), the logs will indicate what is supported. In this example, a 5 node cluster was updated with MAX_FAILURES = 2.

INFO tm/gtm_resolve.c:168 gtm_r_validate_paxos_f(): group 2bfffe supports 2 simultaneous failures (2 configured)
  • No labels