ClustrixDB has the ability to perform online schema changes without blocking reads or writes to a table. These changes can range from simply adding an index or column to completely redesigning the schema. Many other database systems require partial or full table locks while schema changes process each row and transform the data structure.
ClustrixDB avoids the locking problem by creating a new temporary container (or set of containers) and copying the data into its new structure. If there are any DML operations to a table that is being modified, a temporary transaction log records any write or update transactions run against the old containers, then applies them to the new container. Once all the original records have been copied and logged transactions have been processed, the ALTER transaction commits, the new table goes live, and the old table is dropped.
ClustrixDB maintains read consistency during the ALTER – read and write queries that run against the table before the ALTER commits see the original table schema. Reads and writes after the ALTER commits see the new schema. From the perspective of any single query or user, the ALTER is instantaneous.
While ClustrixDB is designed to easily support online schema changes for tables of any size, the following best practices help to minimize negative and unforeseen impacts to your application. This includes:
Back up your data.
Test the new schema with your application's queries in an isolated environment. Compare the output of the EXPLAIN plan before and after the change to ensure that query plans are not negatively impacted.
Perform schema change(s) during off-peak hours or during a maintenance window.
Ensure there is adequate disk space (see below).
Understand impacts to replication and revise plan accordingly.
Estimating Disk Space Requirements
To understand how much free space you need before running an ALTER, you first need an understanding of ClustrixDB’s garbage collection process, which we refer to as BigC.
BigC is essentially a checkpoint that constantly moves forward through time as transactions complete. The BigC value of the cluster (shown in system.stats) will always show the point in time at which the longest currently running transaction has started. Any explicit transaction will “pin” BigC. While garbage collection is pinned, the cluster will log any write, delete, update, and DDL transactions into the undo-log in the event any long-running transaction fails and needs to be rolled back.
Because ALTER operations are DDL transactions, they pin BigC, which means that all garbage collection is suspended during the ALTER.
In order to calculate how much free space is required for an ALTER you should evaluate each of the following:
- The current amount of space used.
- The minimum amount of free space desired.
- ClustrixDB will kill long-running transactions if free space falls under 5%. This will cause your ALTER to rollback.
- Plan to reserve a minimum of at least 10% free space throughout your schema changes.
- The size of the table to be altered.
- Ensure there is sufficient space to store a copy of the table
- The size of a table can be estimated from system.table_sizes
- The estimated amount of time it will take to complete the ALTER.
- This depends on the size of your cluster (e.g. it's faster with more servers).
- This can be extrapolated based on the speed of previous, smaller ALTER operations.
- New data growth rate.
- Undo-Log growth rate. The undo-log for all INSERT, UPDATE, and DELETE operations will accumulate during the ALTER.
- If replicating, the binlog growth rate.
Binlog trims will still run, but the disk space will not be freed until BigC is unpinned.
When you add up all the planned growth and size of the table copy plus the current space consumed, you should still have at least 10% of disk remaining. If not, trimming back binlogs, pruning tables, or expanding the cluster with additional servers is recommended before initiating the ALTER.
Performing ALTER operations in replicated environments requires additional planning. The ALTER can be performed in-line with replication, or independently, depending on whether schema structure is changed. Whenever possible, it is highly recommended to allow the ALTER to run over the replication stream in order to avoid errors in replication where transactions are looking for either the old or new schema when the other is present.
Running the ALTER over replication means that the ALTER must execute sequentially, first on the master, then on the slave. Furthermore, the slave must process the ALTER sequentially within the replication stream, and all other writes are paused while the slave processes through the ALTER. For large tables, ALTER operations can cause the slave to fall behind the master. Plan to adjust binlog retention on the master and monitor replication regularly to ensure the slave does not exceed your master's binlog retention during this process.
When replicating via row-based binlogs, column synchronization is critical. RBR (row-based replication) sends both the old and new rows to the slave and the old must match before the new is applied. ALTER operations that modify columns must be executed within replication and in sequence to avoid slave errors trying to applying writes for one schema to the other.
When replicating via statement-based (SBR) binlogs, more flexibility is allowed, provided that the statements execute equally well on the old and new schema. However, for environments with high concurrency, it is extremely difficult to assess whether all statements are equivalent on the old and new schemas.
Exceptions to these column matching concerns include ALTER operations that add an index, change the storage type, or modify the number of slices. Since these do not affect the insertion or update of rows, they may be executed in parallel outside the replication stream.
Scale-Out Concerns for Online Schema Changes
The performance of ALTER on user data will vary depending on your particular environment and workload. Take the following as guidance and adapt based on your own experience and testing.
Modifying a Simple, Small Table
For tables around N*10^6 rows or with light concurrency, ALTER can be done live and should complete within minutes depending on the cluster size, row size, and overall load. Cache and query plan refreshes occur regularly and the automatic refresh should handle any performance issues.
Medium Scale-Out Issues with Online Schema Changes
Tables that are being accessed with high concurrency or have more than N*10^7 rows may experience degraded cluster performance immediately after the ALTER completes. ClustrixDB stores statistics and query plans for each table, and these values are cached. ClustrixDB's periodic refresh and flush of that information may not occur quickly enough and may consequently impact performance. To avoid this, immediately flush the cache once the ALTER is complete by using the following commands:
sql> ALTER TABLE ...
shell> clx cmd 'mysql system -e "call pdcache_update()"';
shell> clx cmd 'mysql system -e "call qpc_flush()"';
Note that qpc_flush and pdcache_update are done on a per-node basis. Therefore, these should be executed using the clx cmd utility to ensure they are run on all nodes.
Chaining these commands together in the shell as command && command && command is recommended to avoid delays between the completion of the ALTER and flushing of the caches.
Monitoring the Progress of an ALTER
To view the status of an ALTER in-process, use this SQL.
sql> select * from system.alter_progress;
Example of an Online Schema Change
In the following example, a one terabyte table will be altered to include an additional column. Estimating the number of hours required is more art than science and also depends on the size of the cluster performing the ALTER. In this example, previous ALTER operations on smaller tables led us to believe the one terabyte table would require ~25 hours for the ALTER to complete.
Let's consider the free space plan for our 10 TB cluster:
|Description of Space Requirement||TB Required|
Our cluster is currently 10 TB of usable space and we know the system reserves 5% for internal operations. If we exceed this, writes can be blocked and the ALTER will roll-back.
|Our users have requested we leave at least 10% of space free at all times to handle surges in load.||1.0 TB|
|The new table will consume a bit over 1 TB due to the additional column. We could calculate rows times bytes, but let's assume 1.1 TB. Because this is a copy, we have to reserve this space until the ALTER is complete.||1.1 TB|
The entire cluster receives approximately ~1.5 TB per day of write transactions (including writes to the binlogs) so (1.5 / 24) * 25 hours = 1.5625 TB
|Total free space required to safely begin the ALTER:||4.2 TB|
A conservative estimate would be that there should be at least 4.5 TB available space on the cluster to allow the ALTER to complete.
Performing the ALTER
If your session is lost while connected to the cluster remotely, long-running transactions such as an ALTER may fail and roll-back. We suggest using screen or some other detachable terminal.
Using one terminal connection, we initiate Session A and start a transaction to pin BigC:
In a separate terminal, we initiate Session B and start our ALTER. Because the following three commands need to run in close succession, we chain them together as follows:
shell> mysql -e "ALTER TABLE database.hugetable ADD COLUMN foo varchar(255)"; \
shell> clx cmd 'mysql -e "call system.task_run(\'pdcache_update\')"'; \
shell> clx cmd 'mysql -e "call system.qpc_flush()"'
Many hours later, after the ALTER and subsequent cache flush is complete, we can release the transaction held by Session A. You may want to wait until peak load has passed as garbage collection can impact performance.
From the Session A terminal: