Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space ML1 and version 9.1

ClustrixDB has the ability to perform performs online schema changes without blocking reads or writes to a table. These changes can range from simply adding an index or column to completely redesigning the schema. Many other database systems require partial or full table locks while schema changes process each row and transform the data structure.ClustrixDB avoids the locking problem by creating a new temporary container (or set of containers) and copying the data into its new structure. If there are any DML operations to a table that is being modified 

ClustrixDB uses MVCC (multi-version concurrency control) to avoid locking to tables undergoing a schema change. New temporary container(s) are created, and while data is being copied, a temporary transaction log records any write or update transactions writes run against the old containers, then applies them to the new container. Once all the original records have been copied and logged transactions have been processed, the ALTER transaction commitscompletes, the new table goes liveis available, and the old table is dropped.discarded. All of these mechanics are performed automatically when you execute your ALTER SQL. 

ClustrixDB maintains read consistency during the ALTER – read and write queries that run against the table before the ALTER commits see the original table schema. Reads and writes after the ALTER commits see the new schema. From the perspective of any single query or user, the ALTER is instantaneous.

Table of Contents

...

Best Practices

While ClustrixDB is designed to easily support online schema changes for tables of any size, the following best practices help to minimize negative and unforeseen impacts to your application. This includes: 

  1. Back up your dataSince an ALTER on a large table can take a while to complete, we suggest using screen or some other detachable terminal.

  2. Test the new schema with your application's queries in an isolated a non-production environment. Compare the output of the EXPLAIN plan before and after the change to ensure that query plans are not negatively impacted. 

  3. Perform schema change(s) during off-peak hours or during a maintenance window.

  4. Ensure there is adequate disk space (see below).

  5. Understand impacts to replication and revise plan accordingly.

Estimating Disk Space Requirements

To understand how much free space you need before running an ALTER, you first need an understanding of ClustrixDB’s garbage collection process, which we refer to as BigC.

BigC is essentially a checkpoint that constantly moves forward through time as transactions complete. The BigC value of the cluster (shown in system.stats) will always show the point in time at which the longest currently running transaction has started. Any explicit transaction will “pin” BigC. While garbage collection is pinned, the cluster will log any write, delete, update, and DDL transactions into the undo-log in the event any long-running transaction fails and needs to be rolled back.

Because ALTER operations are DDL transactions, they pin BigC, which means that all garbage collection is suspended during the ALTER.

In order to Online schema changes pins BigC, which means that undo log is not cleaned up until the ALTER completes. This means that for the ALTER to complete, there must be sufficient space. To calculate how much free space is required for an ALTER you should evaluate each of review the following:

  • The current amount of space used.
  • The minimum amount of free space desired. 
    • ClustrixDB will kill long-running transactions if free space falls under 5%. This will cause your ALTER to rollback. 
    • Plan to reserve a minimum of at least 10% free space throughout your schema changes.
    The size of the table to be altered. 
    • ClustrixDB will kill long-running transactions (including the ALTER) if free space falls under 5%.
  • Ensure there is sufficient space to store a full copy of the table, including replicas. The size of a table can be estimated from system.table_sizes.
  • The estimated amount of time it will take to complete the ALTER
  • This depends on the size of your cluster (e.g. it's faster with more servers). 
  • This can be extrapolated based on the speed of previous, smaller ALTER operations. 
  • New data growth rate.
  • Undo -Log growth ratelog size. The undo - log for all INSERT, UPDATE, and DELETE operations will accumulate during the ALTER.
  • If replicating, the binlog

    Binlog size and growth rate.

     Binlog

    During the ALTER, binlog trims will still run, but the disk space will not be freed until BigC is unpinned.

When you add up all the planned growth and size of the table copy plus the current space consumed, you should still have at least 10% of disk remaining. If not, trimming back binlogs, pruning tables, or expanding the cluster with additional servers is recommended before initiating the ALTER.

Example: Disk Space required for an ALTER

In the following example, a column will be added to a 1TB table that is part of a 10TB cluster. 

Description of Space RequirementTB Required

Our cluster is currently 10 TB of usable space and we know the system reserves 5% for internal operations. If we exceed this, writes can be blocked and the ALTER will roll-back.

0.5 TB
Our users have requested we leave at least 10% of space free at all times to handle surges in load.1.0 TB
The new table will consume a bit over 1 TB due to the additional column. We could calculate rows times bytes, but let's assume 1.1 TB. Because this is a copy, we have to reserve this space until the ALTER is complete.1.1 TB

The entire cluster receives approximately ~1.5 TB per day of write transactions (including writes to the binlogs) so (1.5 / 24) * 25 hours = 1.5625 TB

1.6 TB
Total free space required to safely begin the ALTER:4.2 TB

A conservative estimate would be that there should be at least 4.5 TB available space on the cluster to allow the ALTER to complete.

Monitoring the Progress of an ALTER

To view the status of an ALTER in-process, use this SQL.

sql> select * from system.alter_progress;

Replication Concerns

Performing ALTER operations in replicated environments requires additional planning. The ALTER can be performed in-line with replication, or independently, depending on whether schema structure is changed. Whenever possible, it is highly recommended to allow the ALTER to run over the replication stream in order to avoid errors in replication where transactions are looking for either the old or new schema when the other is present.

Running the ALTER over replication means that the ALTER must execute sequentially, first on the master, then on the slave. Furthermore, the slave must process the ALTER sequentially within the replication stream, and all other writes are paused while the slave processes through the ALTER. For large tables, ALTER operations can cause the slave to fall behind the master. Plan to adjust binlog retention on the master and monitor replication regularly to ensure the slave does not exceed your master's binlog retention during this process.

...

Chaining these commands together in the shell as command && command && command is recommended to avoid delays between the completion of the ALTER and flushing of the caches.

Monitoring the Progress of an ALTER

To view the status of an ALTER in-process, use this SQL.

sql> select * from system.alter_progress;

Example of an Online Schema Change

In the following example, a one terabyte table will be altered to include an additional column. Estimating the number of hours required is more art than science and also depends on the size of the cluster performing the ALTER. In this example, previous ALTER operations on smaller tables led us to believe the one terabyte table would require ~25 hours for the ALTER to complete.

Let's consider the free space plan for our 10 TB cluster:

Description of Space RequirementTB Required

Our cluster is currently 10 TB of usable space and we know the system reserves 5% for internal operations. If we exceed this, writes can be blocked and the ALTER will roll-back.

0.5 TB
Our users have requested we leave at least 10% of space free at all times to handle surges in load.1.0 TB
The new table will consume a bit over 1 TB due to the additional column. We could calculate rows times bytes, but let's assume 1.1 TB. Because this is a copy, we have to reserve this space until the ALTER is complete.1.1 TB

The entire cluster receives approximately ~1.5 TB per day of write transactions (including writes to the binlogs) so (1.5 / 24) * 25 hours = 1.5625 TB

1.6 TB
Total free space required to safely begin the ALTER:4.2 TB

A conservative estimate would be that there should be at least 4.5 TB available space on the cluster to allow the ALTER to complete.

Performing the ALTER

If your session is lost while connected to the cluster remotely, long-running transactions such as an ALTER may fail and roll-back. We suggest using screen or some other detachable terminal.

Using one terminal connection, we initiate Session A and start a transaction to pin BigC:

sql> BEGIN;

In a separate terminal, we initiate Session B and start our ALTER. Because the following three commands need to run in close succession, we chain them together as follows:

shell> mysql -e "ALTER TABLE database.hugetable ADD COLUMN foo varchar(255)"; \ 
shell> clx cmd 'mysql -e "call system.task_run(\'pdcache_update\')"'; \ 
shell> clx cmd 'mysql -e "call system.qpc_flush()"'  

Many hours later, after the ALTER and subsequent cache flush is complete, we can release the transaction held by Session A. You may want to wait until peak load has passed as garbage collection can impact performance. 

From the Session A terminal:

sql> COMMIT;