clx_import is currently offered as a beta
clx_import is a tool for quickly migrating data from various data sources into a ClustrixDB database. It has 3 modes of operation:
(fastest) Import a SQL file created by mysqldump (similar to cat file | mysql, but much faster)
import single delimiter-separated file, similar to mysql’s LOAD DATA INFILE functionality
Import several delimiter-separated files in parallel
clx_import will examine the source file and estimate how tables should be sliced. However, this is just an estimate that does not fully account for the different ways in which data is represented. In most cases, the Rebalancer will further optimize the data balance after the import, which may impact initial performance. See Managing the Rebalancer for more information on monitoring the rebalancer's progress.
If you are unsure which format to export, use mysqldump with no special options other than the user, password, and tables or DBs to dump, and import with clx_import --sql-file.
If you have a character set other than utf8 set via my.cnf, invoke mysqldump with --default-character-set utf8 . This will not change the character set of the data, or future connections by any application.
To import a database exported using mysqldump:
shell> clx_import -u username -H hostname_or_ip -p --sql-file filename |
Depending on if the dump file contains USE directives , you may need to specify the target database using the option for --db.
To import a delimiter-separated value into an existing table:
shell> clx_import -u username -H hostname_or_ip -p --ldi ldi_definition |
Depending on if the dump file contains USE directives , you may need to specify the target database using the option for --db.
Where ldi_definition is a string matching the following syntax:
--ldi “'file_name' |
Because the whole ldi definition must be a single shell arguments. Since it contains several quote values, it is recommended that it be enclosed in double quotes. Because of this, if tbl_name is enclosed in `backtics`, they must be \`escaped\` to avoid shell expansion.
If no values for FIELDS
or LINES
are specified, the behavior will be as though the following was specified:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' STARTING BY ''
To import several delimiter-separated values at once, create an empty directory containing one file called ddl.sql containing CREATE TABLE statements for the tables to be imported. Then create one file per table in said directory with the same name as the table it contains data for. Once this is arranged, invoke clx_import as:
shell> clx_import -u username -H hostname_or_ip -db database -p --csv-dir path_to_data --csv-options csv_options |
Where path specified contains ddl.sql and the data files. Tbe csv_options is very similar to the ldi_definition defined above, but should not include a filename or the INTO TABLE clause.
Note: When importing multiple files at once, they all must use the same delimiter.
clx_import options
clx_import supports several other options that can be used in various circumstances, but the common use case shouldn’t require these:
--no-auto-slice | clx_import attempts to estimate the size of tables and set the number of slices optimally before importing. This helps prevent performance degradation from reslicing tables after import is finished. However, as it is impossible to estimate on-disk data size from text format data size, this may be undesirable. If ddl.sql or the sqldump file contains SLICES directives on CREATE TABLE statements, they will be overridden unless this is specified. |
-error-file | Failed INSERT statements will be added to this file. Defaults to a temp file that will be printed at the end of the import |
--log-file | Logs from the import itself will be put here (i.e. the information shown on the screen during import), defaults to a temp file |
--no-binlog | Statements run by the importer will not be binlogged |
--insert-ignore | Issues INSERT IGNORE statements instead of INSERT statements. This overrides whatever kind of statements are actually in the dump (and without this, the reverse is true) |
--no-globals | Normally clx_import will set various global variables to expedite import performance at the cost of general performance (and unsets them when finished). This skips that step. |
--presql | A .sql file to be executed after tables are created and autoslicing has been done, but before any inserts. This file does not support DELIMITER changes (, statements must end with ‘;\n’), and should be the same charset as the file being imported. |
Notable differences between this and standard LOAD DATA INFILE is that
setting individual columns (via a SET clause) is not supported
The field terminator must be a single character (rather than an arbitrary string)
The line terminating character must be some sort of newline (\r, \n, or \r\n) but multiple newlines is not supported
LOW PRIORITY and CONCURRENT are not supported