Skip to end of metadata
Go to start of metadata

In ClustrixDB, the default character set is utf8 and the default collation is utf8_general_ci.

Supported Character Sets

ClustrixDB supports the following subset of the MySQL character set. To list the supported character sets, issue the following query:

sql> select * from system.mysql_character_sets;
+---------+-----------------------+--------------------+--------+
| Charset | Description           | Default collation  | Maxlen |
+---------+-----------------------+--------------------+--------+
| binary  | Binary pseudo charset | binary             |      1 |
| latin1  | CP1252 West European  | latin1_swedish_ci  |      1 |
| utf8    | UTF-8 Unicode         | utf8_general_ci    |      3 |
| utf8mb4 | UTF-8 Unicode         | utf8mb4_general_ci |      4 |
| koi8r   | KOI8-R Relcom Russian | koi8r_general_ci   |      1 |
| euckr   | EUC-KR Korean         | euckr_korean_ci    |      2 |
+---------+-----------------------+--------------------+--------+
6 rows in set (0.00 sec)

Supported Collations

To list supported collations, issue the following query:

sql> show collation;  
+--------------------------+---------+------+---------+----------+---------+
| Collation                | Charset | Id   | Default | Compiled | Sortlen |
+--------------------------+---------+------+---------+----------+---------+
| binary                   | binary  |   63 | Yes     | Yes      |       1 |
| latin1_swedish_ci        | latin1  |    8 | Yes     | Yes      |       1 |
| latin1_bin               | latin1  |   47 | No      | Yes      |       1 |
| latin1_general_ci        | latin1  |   48 | No      | Yes      |       1 |
| latin1_general_cs        | latin1  |   49 | No      | Yes      |       1 |
| utf8_general_ci          | utf8    |   33 | Yes     | Yes      |       1 |
| utf8_bin                 | utf8    |   83 | No      | Yes      |       1 |
| utf8_unicode_ci          | utf8    |  192 | No      | Yes      |       1 |
| utf8mb4_general_ci       | utf8mb4 |   45 | Yes     | Yes      |       1 |
| utf8mb4_bin              | utf8mb4 |   46 | No      | Yes      |       1 |
| utf8mb4_unicode_ci       | utf8mb4 |  224 | No      | Yes      |       1 |
| koi8r_general_ci         | koi8r   |    7 | Yes     | Yes      |       1 |
| koi8r_bin                | koi8r   |   74 | No      | Yes      |       1 |
| euckr_korean_ci          | euckr   |   19 | Yes     | Yes      |       1 |
| euckr_bin                | euckr   |   85 | No      | Yes      |       1 |
| latin1_swedish_ci_legacy | latin1  |  264 | No      | Yes      |       1 |
| latin1_general_ci_legacy | latin1  |  304 | No      | Yes      |       1 |
| latin1_general_cs_legacy | latin1  |  305 | No      | Yes      |       1 |
| utf8_general_ci_legacy   | utf8    |  289 | No      | Yes      |       1 |
+--------------------------+---------+------+---------+----------+---------+
19 rows in set (0.00 sec)
(0.00 sec)

Caveats for Characters and Collations

  • collation_database variable cannot be modified.
  • UTF8 codepoints in ClustrixDB are not same as that of MySQL due to the fact that the internal implementation of UTF8 codepoint validity in ClustrixDB varies from that of MySQL.
  • Control codes (for example, space and empty strings), collate differently in ClustrixDB than MySQL. Both MySQL and ClustrixDB trim spaces at the end of strings, but in ClustrixDB, it is assumed that shorter strings always collate before longer strings. MySQL, however assumes a shorter string MAY collate after a longer string if the characters of the longer string contain pre-space characters.
  • Clustrix will display column-level character set information as part of SHOW CREATE TABLE output, even if it does not differ from the character set defined for the table. 
  • No labels