Skip to end of metadata
Go to start of metadata

ClustrixDB provides support for parallel replication which distributes replication overhead between nodes in the cluster, allowing for more parallelism.

At this time, ClustrixDB only provides Beta support for Parallel Replication.

Configuring Parallel Replication

Configuring binlogs on the master is the same as with non-parallel replication :

CREATE BINLOG 'binlog_name' [LOG (target1, target2, ...),] [IGNORE (target3, target4, ...),] [FORMAT='STATEMENT'|'ROW']

Configuring a Parallel Replication Slave:

CREATE SLAVE slave_name PARALLEL_LOG = master_log_name, PARALLEL_POS = position, SLICES = num_slices, BATCH_SIZE_MS = batch_size 
          [, MASTER_HOST = master_host] 
          [, MASTER_USER = master_user] 
          [, MASTER_PASSWORD =  master_password] 
          [, MASTER_PORT = master_port];

Parallel Replication Specific Options

PARALLEL_LOG

PARALLEL_LOG is used in place of MASTER_LOG_FILE when configuring Parallel Replication. Unlike MASTER_LOG_FILE, PARALLEL_LOG does not require the file number, only the name of the binlog.

For example: if your binlog file is binlog-bin.000001, the PARALLEL_LOG is binlog-bin.

PARALLEL_POS

This replaces the file based replication argument MASTER_LOG_POS. This value can be found by running SHOW MASTER STATUS PARALLEL or extracted from the metadata of the backup file if restoring from backup. If restoring from backup for the purposes of replication, the value in the backup should be used with "0x" prepended to the value.

For example: if the value from your backup file is 5e23e65e9342f802 then your PARALLEL_POS is 0x5e23e65e9342f802.

SLICES

Determines the amount of parallel threads the slave will connect to the master with. This value should be N or N*2, where N is the number of nodes in the master cluster. If your cluster contains 6 nodes then the value for SLICES would be 6, or 12.

BATCH_SIZE_MS

The amount of time in milliseconds used to determine transactions batch size on the master. Default of 3000 ms.

Since the master uses time to batch transactions, the data on the slave may show 0 seconds behind master, but the actual data will be between 0 and BATCH_SIZE_MS. You can reduce this latency by setting BATCH_SIZE_MS lower, such as to 1000ms or even 500ms, however this can decrease replication performance dependent on workload.

For optimal performance MASTER_HOST should point to the master cluster's load balancer.

Using the Values in Backup to Configure Parallel Replication 

When restoring from a backup for the purposes of replication you can find the values to set PARALLEL_LOG  and  PARALLEL_POS in the binlogs and xid files. The two files are located inside the metadata directory of the Clustrix backup directory. 

Please note that the value in the xid file appears in hexadecimal but without an 0x prefix. When setting PARALLEL_POS with a hexadecimal value, it must start with 0x.

[[email protected]]# cat backup_all-2020-01-10/metadata/binlogs
binlog01.001662:18570355
[[email protected]]# cat backup-2020-01-10/metadata/xid
5e23e65e9342f802

Using the above output, the values to use for  PARALLEL_LOG  and  PARALLEL_POS would be  binlog01 and 0x5e23e65e9342f802 respectively.

The command using the values from the output above as an example would be:

CHANGE SLAVE slave_name TO PARALLEL_LOG='binlog01', PARALLEL_POS='0x5e23e65e9342f802';

Viewing Parallel Replication Status:

To display the status of a parallel master: 

SHOW MASTER STATUS PARALLEL;
SHOW ALL MASTER STATUS PARALLEL; 

Displaying the status of a parallel slave is the same as with a serial slave: 

SHOW SLAVE STATUS slave_name;
SHOW SLAVE STATUS; 

To view additional information about the Parallel slave, such as the value of SLICES or BATCH_SIZE_MS, use the table system.mysql_repconfig:

SELECT * FROM system.mysql_repconfig WHERE slave_name = 'slave_name';
SELECT slave_name, protocol, slices, batch_size / POW(2, 32) * 1000 AS batch_size_ms FROM system.mysql_repconfig WHERE slave_name = 'slave_name';

How it Works

The ClustrixDB Parallel slave is able to parallelize processing of the replication stream and applies events in batches. Row events from within the same transaction are applied in the same transaction on the slave. Both serial and parallel replication use the same Replication Master for generating binlog(s).

Caveats for Parallel Replication

ClustrixDB Parallel Replication:

  • Can only be used with a ClustrixDB slave. 
  • Is currently offered with beta support only.
  • Can only be used for RBR (row-based replication).
  • Does not support foreign keys.
  • Clustrix parallel replication is only recommended for replicating tables where each unique key has at least one column that is part of the primary key. If any unique key's columns are mutually exclusive from the columns in the primary key, then serial replication is recommended. 
  • No labels