On occasion, you may need to reduce your cluster's capacity:

  • To reduce operating costs following a peak event (i.e. following Cyber-Monday).
  • To allocate servers for other purposes.
  • To remove failing hardware. (See ALTER CLUSTER DROP to drop a permanently failed node.)

The process to downsize your cluster within Xpand is a simple:  

Xpand recommends running this process while logged on to a node other than one you wish to drop.

Review target cluster configuration

  • Xpand requires a minimum of three nodes to support production systems. Going from three or more nodes to a single node is not supported via the steps outlined on this page.
  • When Zones are configured, Xpand requires a minimum of 3 zones.
  • For clusters deployed in zones, Xpand requires an equal number of nodes in each zone. 
  • Ensure that the target cluster configuration has has sufficient space. See Allocating Disk Space for Fault Tolerance and Availability.

Flex Down

Step 1: Initiate SOFTFAIL

Marking a node as softfailed directs the Xpand Rebalancer to move all data from the node(s) specified to others within the cluster. The Rebalancer proceeds in the background while the database continues to serve your ongoing production needs.

If necessary, determine the nodeid assigned to a given IP or hostname by running the following SQL select.

sql> SELECT * FROM system.nodeinfo ORDER BY nodeid; 

To initiate a SOFTFAIL, using ALTER CLUSTER.

ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ...

The SOFTFAIL operation will issue an error if there is not sufficient space to complete the softfail or if the softfail would leave the cluster unable to protect data should an additional node be lost. 

To cancel a SOFTFAIL process before it completes, use the following syntax. Your system will be restored to its prior configuration.

ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...  

Step 2: Monitor the SOFTFAIL Process

Once marked as softfailed, the Rebalancer moves data from the softfailed node(s). The Rebalancer process runs in the background while foreground processing continues to serve your production workload. 

To monitor the progress of the SOFTFAIL: 

Verify that the node(s) you specified are indeed marked for removal.

sql> SELECT * FROM system.softfailed_nodes;

The system.softfailing_containers tables will show the list of containers that are slated to be moved as part of the SOFTFAIL operation. When this query returns 0 rows, the data migration is complete.

sql> SELECT count(1) FROM system.softfailing_containers;

This query shows the list of softfailed node(s) that are ready for removal. 

sql> SELECT * FROM system.softfailed_nodes 
     WHERE nodeid NOT IN 
        (SELECT DISTINCT nodeid 
         FROM system.softfailing_containers); 

Step 3: ALTER CLUSTER REFORM 

Once data has been moved off the nodes and there are no more entries in system.softfailing_containers, run an ALTER CLUSTER REFORM:

sql> ALTER CLUSTER REFORM; 

This will initiate a brief interruption of service while the cluster is re-formed. If you do not have any binlogs, the softfailed node(s) will be removed from the cluster and the flex down operation is complete. If you have binlogs, continue with the steps that follow. 

Step 4: Wait for binlog softfail

If your cluster has binlogs, the previous ALTER CLUSTER REFORM will result in the softfailed node being part of the cluster, but designated to be in the LEAVING state and will not be chosen as an acceptor: 

 INFO dbcore/dbstate.c:292 dbprepare_done(): Running dbstart for membership afffe { 1-3 leaving: 2}

In the meantime, the binlog_commits table is being rebalanced across the non-softfailed nodes. You can monitor what process with this query:

sql> SELECT count(1) FROM system.binlog_commits_segments WHERE softfailed_replicas > 0;

Once the query returns a count of 0, the binlog_commits table is done being rebalanced and the following log message will appear on all nodes: 
INFO dbcore/softfail.ct:27 softfail_node_msg_signal(): softfailing nodes are ready to be removed: 2

Step 5: ALTER CLUSTER REFORM. 

Once data has been moved off the nodes and there are no more entries in system.softfailing_containers, run an ALTER CLUSTER REFORM:

sql> ALTER CLUSTER REFORM; 

This will remove the softfailed nodes from the cluster. 

  • No labels