Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space ML1 and version 9.2

ClustrixDB uses a has to a hash to determine where a given row of data or a table’s index (representationsrepresentation) should reside in a cluster. The columns selected for hashing are referred to as the “distribution key” for that representation. Every index requires guidance relative to which columns should comprise the distribution key.

...

The DISTRIBUTE = clause may be used to override the single column default and define which columns of an index are to be hashed instead.

Table of Contents

Using Default Distribution 

In this example, the primary key of post_id was hashed and data for the table was distributed using that value and the default distribution setting of DISTRIBUTE = 1.

sql> CREATE TABLE user_posts (
     post_id int AUTO_INCREMENT, 
     user_id int,  
     posted_on timestamp, 
     data blob, 
     PRIMARY KEY (`post_id`) /*$ DISTRIBUTE=1 */,
     KEY `user_id_posted_on_idx` (`user_id`,`posted_on`) /*$ DISTRIBUTE=1 */
     );

In some cases, data and index distribution based on a single column can result in poor or "lumpy" distribution. To resolve this, you can expand the basis of we recommend putting the most unique (selective) column into the first column of composite index, or expanding the distribution key from a single column to multiple columns. There are two ways to do this.

...

The example below shows the multi-column alternate key user_id_posted_on_idx distributed using both columns in the index instead of just the first.

sql> CREATE TABLE user_posts (
     post_id int AUTO_INCREMENT, 
     user_id int,  
     posted_on timestamp, 
     data blob, 
     PRIMARY KEY (`post_id`) /*$ DISTRIBUTE=1 */,
     KEY `user_id_posted_on_idx` (`user_id`,`posted_on`) /*$ DISTRIBUTE=2 */
     );

2. Expand the distribution key to also include column(s) of the primary key

The example below shows the alternate key user_id_posted_on_idx with DISTRIBUTE = 3. This means that the index will be distributed on both of its columns (user_id, posted_on) as well as the primary key (post_id). If the primary key is a compound key, you could further expand the distribution to include additional columns of the primary key.

sql> CREATE TABLE user_posts (
     post_id int AUTO_INCREMENT, 
     user_id int,  
     posted_on timestamp, 
     data blob, 
     PRIMARY KEY (`post_id`) /*$ DISTRIBUTE=1 */,
     KEY `user_id_posted_on_idx` (`user_id`,`posted_on`) /*$ DISTRIBUTE=3 */
     );

Modifying Distribution

Modifying Distribution - Primary Key

...