This is documentation for a previous version of ClustrixDB. Documentation for the latest version can be found here

Skip to end of metadata
Go to start of metadata

ClustrixDB breaks each representation (primary key + table or other index) into smaller, more manageable segments called “slices”, each of which is assigned a portion of the representation’s rows. There are multiple copies of each slice, called replicas.

  • Slices are distributed throughout the cluster to facilitate evenly distributed query processing.
  • The number of slices specified for the table itself applies to the table’s data as well as to its primary key.
  • A different number of slices may be specified for each representation.
  • The number of slices for a given representation should never be less than the number of nodes in the cluster. (Tables distributed to ALLNODES are an exception.)
  • When a slice becomes too large, the Rebalancer will split the slice into new slices and distribute the original slice's rows among them. The larger a slice becomes, the more expensive it is to move or copy it across the system. 
  • A slice must physically fit in its entirety on the storage device to which it is assigned. One slice may not span multiple devices. 

To modify the number of slices for an existing table or index, follow this syntax:

ALTER TABLE tbl_name   [SLICES = n]  [ , INDEX index_name  [SLICES = n]]

Variable Definitions                 

The following global variables impact ClustrixDB slicing.

NameDescriptionDefault ValueSession Variable
hash_dist_min_slicesControls how data is sliced. If set to 0, (the default), ClustrixDB will create new representations with the number of slices equal to the number of nodes currently in the cluster. At least one slice of the table or index is placed on each node. If set to a specific integer, that number of slices will be created for each new table and index instead.0

(tick)

rebalancer_split_threshold_kbControls the maximum slice size. By default, this is set to split slices greater than 1 GB.1048576 
task_rebalancer_reprotect_interval_msDefines how frequently the Rebalancer will assess if additional slices are needed. Specify 0 to disable slice splitting.15000 

Best Practices

Generally, table and index slices should be 1-2 GB. The default size of GB is optimal for most use cases. For clusters with very large tables (over 2TB in size), increasing the max slice size to GB may be recommended.

Tables and indexes should have a minimum number of slices equal to the number of nodes, with ALLNODES tables being an exception. When adding nodes to your cluster, re-slicing of tables should be considered. Representations are not automatically resliced to match the number of nodes in the expanded cluster.

Use the following query to identify tables that contain fewer slices than the current number of nodes:

Tables with fewer slices than the total number of nodes
sql> SELECT   fd.name  `Database`,
              f.name   `Table`,
              Count(*) Slices
     FROM     system.slices
         JOIN system.representations fp USING (representation)
         JOIN system.relations f
                ON ( relation = `table` )
         JOIN system.DATABASES fd USING (db)
     GROUP BY `Database`,
              `Table`,
              fp.name
     HAVING   fd.name NOT IN ( 'system', 'clustrix_statd','clustrix_ui','_replication' )
        AND   slices < (SELECT Count(*)
                        FROM   system.membership
                        WHERE  status = 'quorum') 
        AND  (fp.name LIKE '%__PRIMARY%' OR fp.name LIKE '__base%')
     ORDER BY `Database`, `Table`, Slices;

Pre-slicing Tables 

During normal operation, relations are resliced on demand, however it can be advantageous to pre-slice tables for which large data growth is anticipated. Creating or altering a representation to have a slice count commensurate with the expected size will allow the cluster to add data to the representation at maximum speed as slice splitting will be unnecessary. For additional information, see Loading Data onto ClustrixDB.

Use the following equation to determine the optimal number of slices for a table: (expected table size + 10%) / rebalancer_split_threshold_kb)