Skip to end of metadata
Go to start of metadata

Xpand uses a hash to determine where a given row of data or a table’s index (representation) should reside in a cluster. The columns selected for hashing are referred to as the “distribution key” for that representation. Every index requires guidance relative to which columns should comprise the distribution key.

By default, the distribution key uses the first column of an index (DISTRIBUTE=1), regardless of how many columns comprise the index. This is true for all indices including the primary key. In some cases (usually when the first column in a key is not unique enough) increasing distribution can help to more evenly distribute data, reads and writes across the cluster. 

The information below describes ways to identify tables and indexes that could benefit from various redistribute actions. A redistribute action is a table re-write, so this may take a long time on many of your larger tables.

Modifying Distribution

Modifying Distribution - Primary Key

To modify the distribution of a composite primary key after a table has been created, follow this ALTER TABLE syntax:

ALTER TABLE tbl_name  PRIMARY KEY   [DISTRIBUTE = n] 

The distribution count for a primary key cannot exceed the number of columns in the primary key.

Modifying Distribution - Alternate Keys

To modify the distribution of other indexes (non- primary key) after a table has been created, specify the index_name in an ALTER TABLE per this syntax:

ALTER TABLE tbl_name  [ ,INDEX index_name   [DISTRIBUTE = n]]  [ ,INDEX  index_name   [DISTRIBUTE = n ]] 

The maximum distribution value for non-primary keys is the combined number of columns in both the alternate and primary key.

Addressing "Lumpy" Distribution"

In this example, the primary key of post_id was hashed and data for the table was distributed using that value and the default distribution setting of DISTRIBUTE = 1.

sql> CREATE TABLE user_posts (
     post_id int AUTO_INCREMENT, 
     user_id int,  
     posted_on timestamp, 
     data blob, 
     PRIMARY KEY (`post_id`) /*$ DISTRIBUTE=1 */,
     KEY `user_id_posted_on_idx` (`user_id`,`posted_on`) /*$ DISTRIBUTE=1 */
     );

In some cases, data and index distribution based on a single column can result in poor or "lumpy" distribution. To resolve this, we recommend putting the most unique (selective) column into the first column of composite index, or expanding the distribution key from a single column to multiple columns. There are two ways to do this.

1. Expand the distribution key to include more columns of the index

The example below shows the multi-column alternate key user_id_posted_on_idx distributed using both columns in the index instead of just the first.

sql> CREATE TABLE user_posts (
     post_id int AUTO_INCREMENT, 
     user_id int,  
     posted_on timestamp, 
     data blob, 
     PRIMARY KEY (`post_id`) /*$ DISTRIBUTE=1 */,
     KEY `user_id_posted_on_idx` (`user_id`,`posted_on`) /*$ DISTRIBUTE=2 */
     );

2. Expand the distribution key to also include column(s) of the primary key

The example below shows the alternate key user_id_posted_on_idx with DISTRIBUTE = 3. This means that the index will be distributed on both of its columns (user_id, posted_on) as well as the primary key (post_id). If the primary key is a compound key, you could further expand the distribution to include additional columns of the primary key.

sql> CREATE TABLE user_posts (
     post_id int AUTO_INCREMENT, 
     user_id int,  
     posted_on timestamp, 
     data blob, 
     PRIMARY KEY (`post_id`) /*$ DISTRIBUTE=1 */,
     KEY `user_id_posted_on_idx` (`user_id`,`posted_on`) /*$ DISTRIBUTE=3 */
     );

Reducing the value of DISTRIBUTE

When multi-column indexes have DISTRIBUTE greater than 1, there can be a benefit to reducing the distribution in some instances. 

  • Distribution is great than the number of columns - Change distribution to the number of columns (or to 1)
  • Indexes that have mostly unique initial columns - Consider reducing distribution if unnecessary broadcasts are being performed.
  • EXPLAIN queries that use an index and look for stream_combine, which is usually necessary following a broadcast. 



  • No labels