|Table of Contents|
Sample Usage of clustrix_import
clustrix_import is highly configurable, but the only required input parameter is the -i input file parameterThe tool should already be available on your cluster in located in the /opt/clustrix/bin. If it not available there, please contact Clustrix Support to get the appropriate version. Note that it may be necessary to add execute permissions.
shell> chmod a+x clustrix_import
Sample Usage of clustrix_import
directory, is highly configurable, see options below. Once you generated a dump file and obtained clustrix_import, simply run clustrix_import:
Local Invocation on Node
shell> clustrix_import -u importuser -p sekrit -i /clustrix/bigdump.sql
The simplest use case is to run clustrix_import directly on a node just specifying the dump file. In this case clustrix_import connects through the MySQL socket to obtain the IP for each node.
You should have created a user with the proper permissions for importing. e.g. mysql>grant all on *.* to '<importuser>'@'%' identified by '<importuserpasswd>';
shell> zcat bigdb.mysqldump.sql.gz | clustrix_import -i - -H clx -u rootimportuser -p sekrit --scan-for-newlines=0 \ --initial-slices=12 --slice-down-threshold=2
shell> mysqldump -h mysqlhost -u root -p mysekrit somedb | tee /tmp/somedb.sql | clustrix_import -i - -H clx -D somedb -u rootimportuser -p sekrit
This example demonstrates pipelining a mysqldump file directly into clustrix_import. Using tee to copy the dump output into a file allows us to restart the import if it fails for some reason without needing to completely redo the dump (we'd need to make sure /tmp is sufficiently large to accommodate the dump). If the dump needed to be restarted we could then use the -S option to start at the table where the failure had occurred:
shell> clustrix_import -i /tmp/somedb.sql -D somedb -H clx -u rootimportuser -p sekrit -S employees
Using the Dump Map Options
shell> clustrix_import -u importuser -p sekrit -i hugedump.sql --write-dump-map dump.map -H master_vip shell> clustrix_import -u importuser -p sekrit -i hugedump.sql --read-dump-map dump.map -H slave_vip
|--host/-H||hostname or IP to reach the cluster; may also specify comma-delimited list to override automatic detection of IPs through the database (use if cluster is NAT'd)||localhost (mysql socket)|
|--database/-D||database to import into; must already exist, and assumes that dump does not specify database with USE statement||none|
|--port/-P||MySQL port to connect to||3306|
username with which to connect to the database; must have sufficient privileges to CREATE and INSERT data (or just INSERT if --inserts-only)
|rootlogged in linux user|
|--passwd/-p||password for above user||none|
|--threads-per-node/-T||number of parallel threads to connect, per node||8|
|--retry-limit||Number of times to retry a failed statement||30|
Note that a mysqldump taken from a ClustrixDB system will include SLICES=n within /$* */ comments. clustrix_import will process these comments and set slicing accordingly. This will override --initial-slices and unless -A is specified, scan-ahead and slice-down may still occur.
The size of a slice must be taken into consideration when setting the number of slices for import. By default a slice is 1GB or more precisely 1048576kb. This is the size at which the rebalancer splits slices and is set by the global variable: rebalancer_split_threshold_kb . The recommended size is 8GB.
Multi-row inserts are significantly more efficient for ClustrixDB to process. By default, mysqldump will generate multi-row inserts of approximately 1MB each. However, in some cases (e.g. --skip-extended-insert), there is only one row per insert. This might also be the case if the import file is being generated through a means other than mysqldump, such as pgdump as part of a Postgres migration.