Page tree
Skip to end of metadata
Go to start of metadata

 

Before partitioning your tables, please contact Clustrix Support to review your partitioning scheme, including the maximum number of planned partitions.

Why Partition a Table?

There are numerous reasons to partition a table within an RDBMS system. Unlike many single-instance RDBMS, ClustrixDB’s distributed database architecture eliminates the need for table partitioning for most of those reasons. Let’s look at some typical scenarios for partitioned tables and see if and how they are alternatively accommodated by ClustrixDB.

Historical usage of partitioned tables

How this translates to ClustrixDB

Partition purging. Partitioned tables are often used to expedite loading, purging, or archiving of dated information. For example, one might partition a table by year and month and drop the oldest month's data at the beginning of each new month.

As with other systems, dropping partitions is more efficient than deleting data within ClustrixDB. ClustrixDB supports table partitioning by ranges and supports partitioning tables to optimize removal of aging data.

 

The above is the only legitimate use case for partitioning tables within ClustrixDB. Here's why:

Historical usage of partitioned tables

Why this is unnecessary in ClustrixDB

For better query performance from partition elimination or partition pruning. The principle behind partition pruning is simple: Only those partitions that contain potential matching values for a query are accessed and performance is improved as a result. 

ClustrixDB distributes data into slices and then optimizing queries relative to those slices. Slices and indexes are distributed throughout the cluster and data is located using information found within data maps. This is similar to other system's partition pruning, in that only those slices that contain relevant data are accessed. Table partitioning provides no additional query performance benefit for ClustrixDB.

Table partitioning to utilize multiple disks. By breaking tables into multiple partitions, it is possible to span disks to accommodate large sized tables and reduce disk contention.

ClustrixDB automatically distributes data onto multiple disks across multiple nodes of the cluster using its Rebalancer. The Rebalancer automatically arranges data throughout the cluster to lessen disk contention, enable optimized performance, and additionally provide fault-tolerance. Explicit table partitioning is not required to utilize multiple disks and does not provide any improvement to ClustrixDB's disk utilization.

Partitioning tables to provide parallel execution. While MySQL presently does not provide parallel execution on partitioned tables, other RDBMS products do and MySQL may in the future.

ClustrixDB accesses data that has already been distributed throughout the cluster utilizing query fragments. Such query fragments intelligently target relevant slices of data and are processed in parallel on multiple nodes of the cluster. Partitioning tables within ClustrixDB will provide no additional parallelism.

Table partitioning to expedite data management. Processes such as table indexing, backups, restores, etc. are streamlined when performed on subsets of partitioned tables.ClustrixDB performs schema changes while the database remains online. ClustrixDB also provides a built-in parallel fast backup and restore capability. The ClustrixDB’s Rebalancer continuously assesses the health of the cluster and moves data about while the database remains online and operational. As a result, operator involvement for routine data management tasks is significantly reduced and partitioning tables within ClustrixDB provides no additional performance benefits for these use cases.

ClustrixDB Table Partitioning Syntax

Range Partitioning 

> CREATE TABLE <table_definition> PARTITION BY RANGE <partition_function> (<partition_definition>);

> ALTER TABLE <table_name> ADD | DROP | TRUNCATE PARTITION  <partition_name>;
 
> ALTER TABLE <table_name> REORGANIZE PARTITION <partition_name> into (<partition_definition>);

> ALTER TABLE <table_name> PARTITION BY RANGE   <partition_function>  (<partition_definition>);

> ALTER TABLE <table_name> REMOVE PARTITIONING;

<partition_function>

The partition_function defines the column(s) and process by which a table will be partitioned. It is generally a function that returns an integer such as: YEAR(column_name).

Functions supported to derive an integer result are:

  • CEILING()

  • FLOOR()

  • TO_DAYS()

  • UNIX_TIMESTAMP()

  • YEAR() (See Caveat regarding YEAR datatype) 

  • YEARWEEK()

<partition_definition>

The partition_definition defines the uppermost value for a table’s partition. Data is placed in a table partition when the result of the partition_function (above) falls between two table partition boundaries. Data must qualify for one and only one partition. It is always advisable to define the last partition of a table using the maximum integer value allowable by specifying the MAXVALUE.

The format for defining partition ranges follows this syntax:

 PARTITION <partition_name> VALUES LESS THAN (integer),
 [PARTITION <partition_name> VALUES LESS THAN (integer), ...]
 PARTITION <partition_name> VALUES LESS THAN MAXVALUE

<partition_name>

Table partition names must conform to the standard conventions for naming database objects and must be unique for a given table. It is advisable to also make them meaningful.

Important <partition_function> Rules

Include <partition_function> Columns in Unique Indexes

Note: The columns used by the <partition_function> must be part of each unique key defined for a table, including the primary key.

 Click here to view/hide errors that occur if this rule is not followed. (ERROR #12314 and #12298)
Column used for partitioning is not within primary key ...
> CREATE TABLE sample (
  col1 INT NOT NULL,
  col2 DATE NOT NULL,
  col3 INT NOT NULL,
  col4 INT NOT NULL,
  PRIMARY KEY(col1, col3),
  UNIQUE KEY(col2, col4)
 )
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION pmax VALUES LESS THAN MAXVALUE);
 ERROR 1 (HY000): [12314] Invalid partitioning scheme: Partition function column "col2" is not part of primary key. All partition function columns must be part of the primary key.

Column used for partitioning is not within unique key ...
> CREATE TABLE sample (
  col1 INT NOT NULL,
  col2 DATE NOT NULL,
  col3 INT NOT NULL,
  col4 INT NOT NULL,
  PRIMARY KEY(col1, col2),
  UNIQUE KEY sample_UK (col3, col4)
 )
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION pmax VALUES LESS THAN MAXVALUE);
 ERROR 1 (HY000): [12298] Invalid constraint specification: Index sample.sample_UK on a partitioned table must contain the partition column

User Privileges Needed for Partitioning Tables

A user must have the following privileges to manage partitions for a table:

  • CREATE TABLE
  • ALTER TABLE
  • DROP TABLE

SHOW Partition Specifics for a Table

Use the following SQL to display the partitioning characteristics defined for a table. Use the FULL parameter to view more extensive information. Additional database specifics regarding partitions can also be seen by querying the information_schema.partitions view.

> SHOW [FULL] CREATE TABLE <table_name>;

Examples

Create Table with Partitions

> CREATE TABLE sample (
  col1 INT NOT NULL,
  col2 DATE NOT NULL,
  col3 INT NOT NULL,
  col4 INT NOT NULL,
  PRIMARY KEY(col1, col2),
  UNIQUE KEY(col2, col4)
 )
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION pmax VALUES LESS THAN MAXVALUE);

Adding More Partitions to a Table

Before the Last Partition

The ranges specified for table partitions cannot overlap. Data must qualify for one and only one table partition. If your table has been defined as the sample shown above with a catch-all partition defined for the upper limit (using MAXVALUE), you must reorganize the top level table partition to enable a new partition to be inserted beneath it.

> ALTER TABLE sample REORGANIZE PARTITION pmax into (
      PARTITION p2015 VALUES LESS THAN (2015),
      PARTITION pmax VALUES LESS THAN MAXVALUE);

Attempting to add a table partition that conflicts with another partition will result in this error:

> ALTER TABLE sample ADD PARTITION (PARTITION p2015 VALUES LESS THAN (2015));
ERROR 1 (HY000): [12314] Invalid partitioning scheme: New partition "p2015" (2015) overlaps with existing partition "pmax" (MAXVALUE)

After the Last Partition

If your table is partitioned such that it can be expanded beyond the current upper limit, then SQL similar to the following can be used.

> ALTER TABLE diff_sample ADD PARTITION (PARTITION p2015 VALUES LESS THAN (2015));

Implement Partitioning for an Unpartitioned Table

> ALTER TABLE sample_unpartitioned
 PARTITION BY RANGE (YEAR(col2)) (
 PARTITION p2000 VALUES LESS THAN (2000),
 PARTITION p2010 VALUES LESS THAN (2010),
 PARTITION p2015 VALUES LESS THAN (2015),
 PARTITION pmax VALUES LESS THAN MAXVALUE);

Drop Specific Table Partition(s)

Dropping a table partition removes both the data and the physical partition. This is particularly useful to purge historical data if you are partitioning your table by dates. Note that the DROP PARTITION command differs from TRUNCATE PARTITION which purges the data but leaves the partition available for reuse. Multiple table partitions can be dropped at one time.

> ALTER TABLE sample DROP PARTITION p2000, p2010;

Remove Table Partitioning Completely

> ALTER TABLE sample REMOVE PARTITIONING;

Modify Table Partition Ranges

To Expand Table Partitions (More Granularity)

To increase the number of partitions presently used for a table, emulate this SQL:

> ALTER TABLE sample REORGANIZE PARTITION p2015 into (
      PARTITION p2011 VALUES LESS THAN (2011),
      PARTITION p2012 VALUES LESS THAN (2012),
      PARTITION p2013 VALUES LESS THAN (2013),
      PARTITION p2014 VALUES LESS THAN (2014),
      PARTITION p2015 VALUES LESS THAN (2015));

To Consolidate Table Partitions (Less Granularity)

Table partitions can be consolidated provided they are consecutive and contiguous.

> ALTER TABLE sample REORGANIZE PARTITION 
 p2011, p2012, p2013, p2014, p2015 into (
      PARTITION p2015 VALUES LESS THAN (2015));

Rename Table Partitions

> ALTER TABLE sample REORGANIZE PARTITION pmax into (
        PARTITION new_name VALUES LESS THAN MAXVALUE);

Truncate Table Partitions

Truncating removes all data from a table partition resulting in an empty partition that can be reused. TRUNCATE PARTITION differs from DROP PARTITION which removes both a table partition and its contents. TRUNCATE is a DDL command and cannot be rolled back. DELETE can be used to remove the same data, but as a DML command, DELETE involves both row locking and logging for rollback and is slower as a result. TRUNCATE and DROP PARTITION are more efficient than DELETE.

ClustrixDB supports truncating a single table partition at a time.

ALTER TABLE sample TRUNCATE PARTITION P2000;

Unsupported in Partitioned Tables

This is the list of specific MySQL Partitioned Table features that are unsupported in ClustrixDB.

  • LIST, HASH, KEY, LINEAR KEY partitions

  • SUBPARTITIONS

  • Partitioning across functions such as: ABS(), DAYOFMONTH(), DAYOFWEEK(), DAYOFYEAR(), DATEDIFF(), EXTRACT(), MICROSECOND(), MINUTE(), MOD(), MONTH(), HOUR(), SECOND(), QUARTER(), TIME_TO_SEC(), WEEKDAY()

  • Temporary tables may not be partitioned

  • COALESCE PARTITION number

  • ANALYZE PARTITION {partition_names | ALL }

  • CHECK PARTITION {partition_names | ALL }

  • OPTIMIZE PARTITION {partition_names | ALL }

  • REBUILD PARTITION {partition_names | ALL }

  • REPAIR PARTITION {partition_names | ALL }

  • EXPLAIN PARTITION

  • SELECT FROM table_name PARTITION {partition_name | ALL}

Caveats of Using Partitioned Tables

  • To determine the number of slices utilized by a partition, SHOW FULL CREATE TABLE and count the number of HASHES.

  • A partition’s ordinal position (the consecutive sequence in which a partition was added to the database) is unavailable in ClustrixDB.

  • Adding data to a table that does not fit within any partition will cause this error - "ERROR 1 (HY000): [6144] No distribution for representation: No partition found for write".
  • Altering a partitioned table and modifying characteristics of a column used for partitioning requires that you respecify partitioning ranges for the table. For example:

    Given this table definition:
    > CREATE TABLE test (id INT KEY)
     PARTITION BY RANGE (id)
     (PARTITION p0 VALUES LESS THAN (1000),
     PARTITION p1 VALUES LESS THAN (2000));
     Query OK, 0 rows affected (0.19 sec)
    
    Changing characteristics of the partitioned column causes this ALTER to fail:
    > ALTER TABLE test
     MODIFY COLUMN id INT AUTO_INCREMENT;
     ERROR 1 (HY000): [12298] Invalid constraint specification: Index
     test.__idx_test__PRIMARY on a partitioned table must contain the partition column
    
    But respecifying the partitioning details allows this ALTER to succeed:
    > ALTER TABLE test
     MODIFY COLUMN id INT AUTO_INCREMENT
     PARTITION BY RANGE (id)
     (PARTITION p0 VALUES LESS THAN (1000),
     PARTITION p1 VALUES LESS THAN (2000));
     Query OK, 0 rows affected (3.98 sec)
  • If partitioning using a partition key that includes the YEAR datatype, you must also use the YEAR function as shown in this example:

     CREATE TABLE sample_year_partitioning (year_col YEAR PRIMARY KEY)
     PARTITION BY RANGE (YEAR(year_col))
     (PARTITION p0 VALUES LESS THAN (5),
     PARTITION p1 VALUES LESS THAN (10),
     PARTITION p2 VALUES LESS THAN MAXVALUE);

 

  • No labels