ClustrixDB provides support for parallel replication which distributes replication overhead between nodes in the cluster, allowing for more parallelism.
At this time, ClustrixDB only provides Beta support for Parallel Replication.
Configuring binlogs on the master is the same as with non-parallel replication :
CREATE BINLOG 'binlog_name' [LOG (target1, target2, ...),] [IGNORE (target3, target4, ...),] [FORMAT='STATEMENT'|'ROW']
CREATE SLAVE slave_name PARALLEL_LOG = master_log_name, PARALLEL_POS = position, SLICES = num_slices, BATCH_SIZE_MS = batch_size [, MASTER_HOST = master_host] [, MASTER_USER = master_user] [, MASTER_PASSWORD = master_password] [, MASTER_PORT = master_port];
PARALLEL_LOG is used in place of MASTER_LOG_FILE when configuring Parallel Replication. Unlike MASTER_LOG_FILE, PARALLEL_LOG does not require the file number, only the name of the binlog.
For example: if your binlog file is binlog-bin.000001, the PARALLEL_LOG is binlog-bin.
This replaces the file based replication argument MASTER_LOG_POS. This value can be found by running SHOW MASTER STATUS PARALLEL or extracted from the metadata of the backup file if restoring from backup. If restoring from backup for the purposes of replication, the value in the backup should be used with "0x" prepended to the value.
For example: if the value from your backup file is 5e23e65e9342f802 then your PARALLEL_POS is 0x5e23e65e9342f802.
Determines the amount of parallel threads the slave will connect to the master with. This value should be N or N*2, where N is the number of nodes in the master cluster. If your cluster contains 6 nodes then the value for SLICES would be 6, or 12.
The amount of time in milliseconds used to determine transactions batch size on the master. Default of 3000 ms.
Since the master uses time to batch transactions, the data on the slave may show 0 seconds behind master, but the actual data will be between 0 and BATCH_SIZE_MS. You can reduce this latency by setting BATCH_SIZE_MS lower, such as to 1000ms or even 500ms, however this can decrease replication performance dependent on workload.
For optimal performance MASTER_HOST should point to the master cluster's load balancer.
When restoring from a backup for the purposes of replication you can find the values to set PARALLEL_LOG and PARALLEL_POS in the binlogs and xid files. The two files are located inside the metadata directory of the Clustrix backup directory.
Please note that the value in the xid file appears in hexadecimal but without an 0x prefix. When setting PARALLEL_POS with a hexadecimal value, it must start with 0x.
[[email protected]]# cat backup_all-2020-01-10/metadata/binlogs
[[email protected]]# cat backup-2020-01-10/metadata/xid
Using the above output, the values to use for PARALLEL_LOG and PARALLEL_POS would be binlog01 and 0x5e23e65e9342f802 respectively.
The command using the values from the output above as an example would be:
CHANGE SLAVE slave_name TO PARALLEL_LOG='binlog01', PARALLEL_POS='0x5e23e65e9342f802';
To display the status of a parallel master:
SHOW MASTER STATUS PARALLEL;
SHOW ALL MASTER STATUS PARALLEL;
Displaying the status of a parallel slave is the same as with a serial slave:
SHOW SLAVE STATUS slave_name;
SHOW SLAVE STATUS;
To view additional information about the Parallel slave, such as the value of SLICES or BATCH_SIZE_MS, use the table system.mysql_repconfig:
SELECT * FROM system.mysql_repconfig WHERE slave_name = 'slave_name';
SELECT slave_name, protocol, slices, batch_size / POW(2, 32) * 1000 AS batch_size_ms FROM system.mysql_repconfig WHERE slave_name = 'slave_name';
The ClustrixDB Parallel slave is able to parallelize processing of the replication stream and applies events in batches. Row events from within the same transaction are applied in the same transaction on the slave. Both serial and parallel replication use the same Replication Master for generating binlog(s).
ClustrixDB Parallel Replication: