Page tree
Skip to end of metadata
Go to start of metadata
ALTER CLUSTER ADD 'ip' [, 'ip'] ...
    
ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ...
ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...  
ALTER CLUSTER REFORM

ALTER CLUSTER DROP nodeid

ALTER CLUSTER RESIZE DEVICES size 

ALTER CLUSTER nodeid [, nodeid] ... ZONE zoneid  

ALTER CLUSTER SET MAX_FAILURES = number of simultaneous failures 

Use ALTER CLUSTER to:

One change may be made at a time. Users must have the SUPER privilege to use ALTER CLUSTER.

ALTER CLUSTER ADD (Flex Up)

ALTER CLUSTER ADD 'ip' [, 'ip'] ... 

Use ALTER CLUSTER ADD to add new node(s) to your cluster. For full instructions, see Expanding Your Cluster's Capacity - Flex Up.

ALTER CLUSTER SOFTFAIL (Flex Down)

ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ... 
ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...  
ALTER CLUSTER REFORM

ALTER CLUSTER SOFTFAIL directs the Rebalancer to move all data from the designated node(s) but does not remove it from the cluster.

ALTER CLUSTER UNSOFTFAIL cancels a previous SOFTFAIL request. The node(s) are again made available for use and the Rebalancer will work to relocate data to the node(s). 

ALTER CLUSTER REFORM removes softfailed nodes from the cluster and performs a group change. For full instructions see Reducing Your Cluster's Capacity - Flex Down.

Softfailing a Zone

To softfail a zone, mark all the nodes in the zone as softfailed.  

ALTER CLUSTER DROP

ALTER CLUSTER DROP nodeid  

Use ALTER CLUSTER DROP to immediately remove a node from the cluster without ensuring all data has sufficient replicas. This should only be used in emergency situations. Additional information can be found in Administering Failure and Recovery.

ALTER CLUSTER DROP should be used with caution as this operation cannot be undone. Dropping more than one node before reprotect is finished can result in permanent data loss. When possible, use the Flex Down instead of DROP.

ALTER CLUSTER RESIZE DEVICES  

ALTER CLUSTER RESIZE DEVICES size 

Use ALTER CLUSTER RESIZE DEVICES to expand the device1 file on all online nodes. size is the total calculated bytes or a rounded whole integer suffixed by k/m/g for kilobytes, megabytes, or gigabytes. 

sql> ALTER CLUSTER RESIZE DEVICES 50g;

All device1 files should be the same size cluster-wide. This command does not affect the device1-temp file that is used for sorting and grouping large query results.

ALTER CLUSTER RESIZE DEVICES does not support reducing the size of the device1 file. See Decreasing device1 Size for how to perform this operation. 

See Managing File Space and Database Capacity for additional information. 

Clustrix recommends resizing devices during off-peak hours.

ALTER CLUSTER ZONE  

ClustrixDB allows nodes to be grouped into zones to improve fault tolerance, where a zone can be availability zones within the same AWS Region, different server racks, or even separate servers in different data centers.  Once you have determined your target zone configuration, use  ALTER CLUSTER ZONE to assign nodes of your cluster to a zone.

ALTER CLUSTER nodeid [, nodeid] ... ZONE zoneid 

Running this command will result in a group change.

Assigning a node to a zone allows ClustrixDB to be aware of outside fault tolerance considerations. By ensuring that replicas are placed across zones, no data is lost if a zone becomes unavailable.

After all nodes are assigned to a zone, verify that there is an equal number of nodes in each zone and that no nodes are assigned to zone 0. Clustrix supports a maximum of 3 zones. 

sql> SELECT * FROM system.nodeinfo ORDER BY zone;

If you no longer wish to use zones, simply assign all nodes to zoneid 0 using ALTER CLUSTER ... ZONE

Clustrix recommends allocating enough disk space so that should a zone fail, there is sufficient space to reprotect. See Allocating Disk Space for Fault Tolerance and Availability.

ALTER CLUSTER SET MAX_FAILURES

The max_failures global determines the number of failures that can occur simultaneously while ensuring that no data is lost. By default, this is the number of node failures that can be tolerated.  If zones are in use, this is the number of node or zone failures tolerated. For example, if MAX_FAILURES = 1 (the default), the cluster can lose one node or one zone, regardless of the number of nodes in that zone. The value of max_failures is also used to determine the number of replicas created by default for a table or index.  For example, if MAX_FAILURES = 1, new database entities are created with REPLICAS = 2. See MAX_FAILURES for additional information. 

max_failures is a read-only global variable that can only be set using ALTER CLUSTER.

NameDescription

Default Value

max_failuresNumber of simultaneous failures that the cluster can withstand while maintaining transaction resolution and without suffering data loss. 1

Change the Value of MAX_FAILURES

Increasing the value for max_failures increases the number of replicas required, which can have a significant performance impact to writes and requires additional disk space. Due to the high performance overhead, Clustrix does not recommend exceeding MAX_FAILURES = 2

To change the value of the global max_failures, perform the steps outlined below. In this sample, the number of allowable failures is modified from the default of 1 to 2. This will cause all new tables and indexes to be created with REPLICAS = 3.

Step 1: Ensure there is sufficient disk space 

Ensure that the cluster has sufficient disk space for additional replicas. 

Step 2: Set the Cluster-Wide Failure Threshold

sql>  ALTER CLUSTER SET MAX_FAILURES = 2;

Running this command will result in a group change.

Step 3: Alter Existing Tables

Tables created after the value for max_failures has been updated will automatically have sufficient replicas. However, tables created before max_failures was updated may not have sufficient replicas and need to be altered. This query generates ALTER statements for all representations that are under-protected. 

sql> SELECT concat ('ALTER TABLE ',`database`, '.', `Table`, ' LAZY PROTECT REPLICAS = MAX_FAILURES + 1;')      
     FROM system.table_replicas
     where database not in ('system', 'clustrix_dbi', 'clustrix_statd', '_replication')
     GROUP BY `table`, `database`
          having (count(1) / count(distinct slice)) < MAX_FAILURES + 1;

The resulting SQL will look like: 

sql> ALTER TABLE foo LAZY PROTECT REPLICAS = 3;

The LAZY PROTECT option tells the Rebalancer to queue the work of creating additional replicas versus performing that work immediately. Run the generated script and monitor the Rebalancer as it creates additional replicas. See Managing the Rebalancer

Log Messages

When the value for max_failures is modified, you will see an entry in clustrix.log that notes the number of failures that are configured:

INFO tm/gtm_resolve.c:168 gtm_r_validate_paxos_f(): group 1cfffe supports 2 simultaneous failures
  • No labels