This is documentation for a previous version of ClustrixDB. Documentation for the latest version can be found here

Skip to end of metadata
Go to start of metadata

ClustrixDB maintains multiple copies of each slice of data to provide fault-tolerance, high-availability, and balanced reads called replicas. There are at least two physical replicas of each logical slice, stored on separate nodes. While writes are applied to all replicas simultaneously, one replica is designated as the "ranked replica" and is used for reads. This helps to keep the distribution of reads even across the cluster. The rebalancer rerank process continuously assesses cluster load balance for reads, and as necessary, may re-designate the ranked replica of a given slice to keep load even.

The following rules determine replica placement within the cluster:

  • Each slice is replicated two or more times. 

  • Replicas are distributed across the cluster for redundancy and to balance reads, writes, and disk usage.

  • No two replicas for the same slice will exist on the same node.

  • ClustrixDB will make new replicas while the database remains online, without suspending or blocking writes to the slice.

Specifying the Number of Replicas

The number of replicas is configurable by representation. For example, a user may require three replicas for the base representation of a table and only two replicas for the other representations of that table.

When creating a table, you can explicitly specify the number of replicas. If no value is specified, the value of the global default_replicas is used (default = 2).

For most workloads, the default of 2 replicas is optimal for balancing fault tolerance with performance. Setting REPLICAS = 1 will remove fault tolerance and is not recommended.

To configure the number of replicas for a specific representation at table creation:

CREATE TABLE tbl_name col_names [REPLICAS = n]

Once a table has been created, you can use an ALTER statement to modify the number of replicas:

ALTER TABLE tbl_name col_names [REPLICAS = n]

This will automatically copy your sliced data to multiple nodes until the desired number of replicas are created.

Configuring additional replicas is not sufficient to ensure availability in the event of multiple simultaneous node failures. For more information on configuring your cluster for multi-node failures, see MAX_FAILURES.

ALLNODES

To optimize performance, REPLICAS = ALLNODES can be used in lieu of a number to indicate that every node should maintain a full copy of a table. A complete copy of the table is maintained on every node, so writes to the table are far more expensive. By using ALLNODES, ClustrixDB is able to utilize the local copy of the table to better optimize queries. Changing small tables with very few writes to ALLNODES will reduce the need to broadcast to other nodes in the cluster to get table data. This especially helps when small tables are joined with larger tables.

ALLNODES is best used for tables that meet the following criteria:

  • Relatively small (Table Size < 10MiB)

  • Written to infrequently (Write Frequency < 1K)

  • Read from frequently (Read Frequency > 1M)

  • Used frequently in joins to other, larger tables (e.g. metadata, lookup tables)

ALLNODES should not be used for:

  • Tables that take frequent writes

  • Partitioned tables

  • In conjunction with LAZY PROTECT

REPLICAS = ALLNODES Syntax
CREATE TABLE tbl_name (col_names) [REPLICAS = ALLNODES]
ALTER  TABLE tbl_name             [REPLICAS = ALLNODES]

LAZY PROTECT

You can delay replica creation by using the LAZY PROTECT option of ALTER TABLE. When used, one replica will initially be restored to your system and additional replicas will be created in the background by the Rebalancer while other processing continues on your cluster.

For large tables, LAZY PROTECT could result in longer re-protection time, but it will have less overall impact on your cluster's performance. As before, defaulted number of replicas will be created if no specific value is supplied.

LAZY PROTECT Syntax
ALTER TABLE tbl_name  [LAZY PROTECT]    [REPLICAS = n]

While LAZY PROTECT is in progress, there may be warnings in the logs such as ALERT PROTECTION_LOST. This indicates that until all replicas have been created, the system could suffer data loss in the event of a node failure. The warnings will persist until the reprotect process is complete.

LAZY PROTECT can be particularly helpful to optimize performance during bulk data loads:

  1. Create tables with REPLICAS = 1 (and no secondary indexes)

  2. Perform the data import

  3. Create secondary indexes

  4. ALTER TABLE tbl_name LAZY PROTECT REPLICAS = n

To track the progress of a LAZY PROTECT operation, see Managing the Rebalancer.