Page tree
Skip to end of metadata
Go to start of metadata

 

Before partitioning your tables, please contact Clustrix Support to review your partitioning scheme, including the maximum number of planned partitions.

Why Partition a Table?

ClustrixDB’s distributed database architecture automatically handles many use cases for which legacy RDBMS applications required table partitions. Let’s look at some typical scenarios for partitioned tables and see if and how they are handled by ClustrixDB.

Historical usage of partitioned tables

How this translates to ClustrixDB

Partition purging. Partitioned tables are often used to expedite loading, purging, or archiving of dated information. For example, one might partition a table by year and month and drop the oldest month's data at the beginning of each new month.

Dropping partitions is more efficient than deleting data and ClustrixDB supports table partitioning by ranges to optimize removal of aging data.

 

The above is the only legitimate use case for partitioning tables within ClustrixDB. Here's why:

Legacy usage of partitioned tables (other RDBMS)Why this is unnecessary in ClustrixDB

For better query performance from partition elimination or partition pruning. Only partitions that contain potential matching values for a query are accessed.

ClustrixDB distributes data and indexes into slices and uses slices to ensure that only relevant data is accessed. Table partitioning provides no additional query performance benefit for ClustrixDB.

Table partitioning to utilize multiple disks. Large tables can be spread across multiple partitions over multiple disks to reduce disk contention.

ClustrixDB automatically distributes data slices across multiple nodes of the cluster using its Rebalancer. The Rebalancer automatically arranges data throughout the cluster to lessen disk contention. Explicit table partitioning is not required to utilize multiple disks and does not provide any improvement to disk utilization.

Partitioning tables to provide parallel execution.

ClustrixDB leverages distributed data slices and distributed query fragments for parallel execution on multiple nodes. Partitioning tables within ClustrixDB will provide no additional parallelism.

Table partitioning to expedite data management. Processes such as table indexing, backups, restores, etc. are streamlined when performed on subsets of partitioned tables.

ClustrixDB leverages data slicing to provide parallelism for fast backup and restore. There is no additional benefit to backup or restore performance by partitioning a table.

ClustrixDB Table Partitioning Syntax

Range Partitioning 

> CREATE TABLE <table_definition> PARTITION BY RANGE <partition_function> (<partition_definition>);

> ALTER TABLE <table_name> ADD | DROP | TRUNCATE PARTITION  <partition_name>;
 
> ALTER TABLE <table_name> REORGANIZE PARTITION <partition_name> into (<partition_definition>);

> ALTER TABLE <table_name> PARTITION BY RANGE   <partition_function>  (<partition_definition>);

> ALTER TABLE <table_name> REMOVE PARTITIONING;

<partition_function>

The partition_function defines the column(s) and process by which a table will be partitioned. It is generally a function that returns an integer such as: YEAR(column_name).

Functions supported to derive an integer result are:

  • CEILING()

  • FLOOR()

  • TO_DAYS()

  • UNIX_TIMESTAMP()

  • YEAR() (See Caveat regarding YEAR datatype) 

  • YEARWEEK()

<partition_definition>

The partition_definition defines the uppermost value for a table’s partition. Data is placed in a table partition when the result of the partition_function (above) falls between two table partition boundaries. Data must qualify for one and only one partition. It is always advisable to define the last partition of a table using the maximum integer value allowable by specifying the MAXVALUE.

The format for defining partition ranges follows this syntax:

 PARTITION <partition_name> VALUES LESS THAN (integer),
 [PARTITION <partition_name> VALUES LESS THAN (integer), ...]
 PARTITION <partition_name> VALUES LESS THAN MAXVALUE

<partition_name>

Table partition names must conform to the standard conventions for naming database objects and must be unique for a given table. It is advisable to also make them meaningful.

Important <partition_function> Rules

Include <partition_function> Columns in Unique Indexes

Note: The columns used by the <partition_function> must be part of each unique key defined for a table, including the primary key.

 Click here to view/hide errors that occur if this rule is not followed. (ERROR #12314 and #12298)
Column used for partitioning is not within primary key ...
> CREATE TABLE sample (
  col1 INT NOT NULL,
  col2 DATE NOT NULL,
  col3 INT NOT NULL,
  col4 INT NOT NULL,
  PRIMARY KEY(col1, col3),
  UNIQUE KEY(col2, col4)
 )
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION pmax VALUES LESS THAN MAXVALUE);
 ERROR 1 (HY000): [12314] Invalid partitioning scheme: Partition function column "col2" is not part of primary key. All partition function columns must be part of the primary key.

Column used for partitioning is not within unique key ...
> CREATE TABLE sample (
  col1 INT NOT NULL,
  col2 DATE NOT NULL,
  col3 INT NOT NULL,
  col4 INT NOT NULL,
  PRIMARY KEY(col1, col2),
  UNIQUE KEY sample_UK (col3, col4)
 )
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION pmax VALUES LESS THAN MAXVALUE);
 ERROR 1 (HY000): [12298] Invalid constraint specification: Index sample.sample_UK on a partitioned table must contain the partition column

User Privileges Needed for Partitioning Tables

A user must have the following privileges to manage partitions for a table:

  • CREATE TABLE
  • ALTER TABLE
  • DROP TABLE

SHOW Partition Specifics for a Table

Use the following SQL to display the partitioning characteristics defined for a table. Use the FULL parameter to view more extensive information. Additional database specifics regarding partitions can also be seen by querying the information_schema.partitions view.

> SHOW [FULL] CREATE TABLE <table_name>;

Examples

Create Table with Partitions

> CREATE TABLE sample (
  col1 INT NOT NULL,
  col2 DATE NOT NULL,
  col3 INT NOT NULL,
  col4 INT NOT NULL,
  PRIMARY KEY(col1, col2),
  UNIQUE KEY(col2, col4)
 )
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION pmax VALUES LESS THAN MAXVALUE);

Adding More Partitions to a Table

Before the Last Partition

The ranges specified for table partitions cannot overlap. Data must qualify for one and only one table partition. If your table has been defined as the sample shown above with a catch-all partition defined for the upper limit (using MAXVALUE), you must reorganize the top level table partition to enable a new partition to be inserted beneath it.

> ALTER TABLE sample REORGANIZE PARTITION pmax into (
      PARTITION p2015 VALUES LESS THAN (2015),
      PARTITION pmax VALUES LESS THAN MAXVALUE);

Attempting to add a table partition that conflicts with another partition will result in this error:

> ALTER TABLE sample ADD PARTITION (PARTITION p2015 VALUES LESS THAN (2015));
ERROR 1 (HY000): [12314] Invalid partitioning scheme: New partition "p2015" (2015) overlaps with existing partition "pmax" (MAXVALUE)

After the Last Partition

If your table is partitioned such that it can be expanded beyond the current upper limit, then SQL similar to the following can be used.

> ALTER TABLE diff_sample ADD PARTITION (PARTITION p2015 VALUES LESS THAN (2015));

Implement Partitioning for an Unpartitioned Table

> ALTER TABLE sample_unpartitioned
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION p2015 VALUES LESS THAN (2015),
 PARTITION pmax VALUES LESS THAN MAXVALUE);

Drop Specific Table Partition(s)

Dropping a table partition removes both the data and the physical partition. This is particularly useful to purge historical data if you are partitioning your table by dates. Note that the DROP PARTITION command differs from TRUNCATE PARTITION which purges the data but leaves the partition available for reuse. Multiple table partitions can be dropped at one time.

> ALTER TABLE sample DROP PARTITION p2000, p2010;

Remove Table Partitioning Completely

> ALTER TABLE sample REMOVE PARTITIONING;

Modify Table Partition Ranges

To Expand Table Partitions (More Granularity)

To increase the number of partitions presently used for a table, emulate this SQL:

> ALTER TABLE sample REORGANIZE PARTITION p2015 into (
      PARTITION p2011 VALUES LESS THAN (2011),
      PARTITION p2012 VALUES LESS THAN (2012),
      PARTITION p2013 VALUES LESS THAN (2013),
      PARTITION p2014 VALUES LESS THAN (2014),
      PARTITION p2015 VALUES LESS THAN (2015));

To Consolidate Table Partitions (Less Granularity)

Table partitions can be consolidated provided they are consecutive and contiguous.

> ALTER TABLE sample REORGANIZE PARTITION 
 p2011, p2012, p2013, p2014, p2015 into (
      PARTITION p2015 VALUES LESS THAN (2015));

Rename Table Partitions

> ALTER TABLE sample REORGANIZE PARTITION pmax into (
        PARTITION new_name VALUES LESS THAN MAXVALUE);

Truncate Table Partitions

Truncating removes all data from a table partition resulting in an empty partition that can be reused. TRUNCATE PARTITION differs from DROP PARTITION which removes both a table partition and its contents. TRUNCATE is a DDL command and cannot be rolled back. DELETE can be used to remove the same data, but as a DML command, DELETE involves both row locking and logging for rollback and is slower as a result. TRUNCATE and DROP PARTITION are more efficient than DELETE.

ClustrixDB supports truncating a single table partition at a time.

ALTER TABLE sample TRUNCATE PARTITION P2000;

Unsupported in Partitioned Tables

This is the list of specific MySQL Partitioned Table features that are unsupported in ClustrixDB.

  • LIST, HASH, KEY, LINEAR KEY partitions

  • SUBPARTITIONS

  • Partitioning across functions such as: ABS(), DAYOFMONTH(), DAYOFWEEK(), DAYOFYEAR(), DATEDIFF(), EXTRACT(), MICROSECOND(), MINUTE(), MOD(), MONTH(), HOUR(), SECOND(), QUARTER(), TIME_TO_SEC(), WEEKDAY()

  • Temporary tables may not be partitioned

  • COALESCE PARTITION number

  • ANALYZE PARTITION {partition_names | ALL }

  • CHECK PARTITION {partition_names | ALL }

  • OPTIMIZE PARTITION {partition_names | ALL }

  • REBUILD PARTITION {partition_names | ALL }

  • REPAIR PARTITION {partition_names | ALL }

  • EXPLAIN PARTITION

  • SELECT FROM table_name PARTITION {partition_name | ALL}

Caveats of Using Partitioned Tables

  • To determine the number of slices utilized by a partition, SHOW FULL CREATE TABLE and count the number of HASHES.

  • A partition’s ordinal position (the consecutive sequence in which a partition was added to the database) is unavailable in ClustrixDB.

  • Adding data to a table that does not fit within any partition will cause this error - "ERROR 1 (HY000): [6144] No distribution for representation: No partition found for write".
  • Altering a partitioned table and modifying characteristics of a column used for partitioning requires that you respecify partitioning ranges for the table. For example:

    Given this table definition:
    > CREATE TABLE test (id INT KEY)
     PARTITION BY RANGE (id)
     (PARTITION p0 VALUES LESS THAN (1000),
     PARTITION p1 VALUES LESS THAN (2000));
     Query OK, 0 rows affected (0.19 sec)
    
    Changing characteristics of the partitioned column causes this ALTER to fail:
    > ALTER TABLE test
     MODIFY COLUMN id INT AUTO_INCREMENT;
     ERROR 1 (HY000): [12298] Invalid constraint specification: Index
     test.__idx_test__PRIMARY on a partitioned table must contain the partition column
    
    But respecifying the partitioning details allows this ALTER to succeed:
    > ALTER TABLE test
     MODIFY COLUMN id INT AUTO_INCREMENT
     PARTITION BY RANGE (id)
     (PARTITION p0 VALUES LESS THAN (1000),
     PARTITION p1 VALUES LESS THAN (2000));
     Query OK, 0 rows affected (3.98 sec)
  • If partitioning using a partition key that includes the YEAR datatype, you must also use the YEAR function as shown in this example:

     CREATE TABLE sample_year_partitioning (year_col YEAR PRIMARY KEY)
     PARTITION BY RANGE (YEAR(year_col))
     (PARTITION p0 VALUES LESS THAN (5),
     PARTITION p1 VALUES LESS THAN (10),
     PARTITION p2 VALUES LESS THAN MAXVALUE);

 

  • No labels