Child pages
  • Optimizing Performance Based on Query Execution Plans

This is documentation for a previous version of ClustrixDB. Documentation for the latest version can be found here

Skip to end of metadata
Go to start of metadata

A query can have multiple valid execution plans. For example, there might be several indexes to choose from when reading rows out of a table. The ClustrixDB query optimizer examines these options to determine the most efficient execution plan.

The optimizer transforms an SQL query into a set of building blocks called operators . Each operator performs a basic operation such as scanning an index or filtering rows based on a predicate. The operators are arranged into a query execution plan in the form of a tree, where the root of the tree represents any rows possibly returned by the query.

To display the execution plan that was chosen for a query, issue the EXPLAIN command. 

Explain command example
mysql> CREATE TABLE foo (a int, b int);
Query OK, 0 rows affected (0.17 sec)

mysql> INSERT INTO foo VALUES (5,10);
Query OK, 1 row affected (0.04 sec)

mysql> EXPLAIN SELECT * FROM foo WHERE a = 5;
+------------------------------------+-----------+-----------+
| Operation                          | Est. Cost | Est. Rows |
+------------------------------------+-----------+-----------+
| stream_combine                     |     31.72 |      0.91 | 
|   filter (1.a = param(0))          |     10.21 |      0.30 | 
|     index_scan 1 := foo.__base_foo |     10.20 |      0.34 | 
+------------------------------------+-----------+-----------+
3 rows in set (0.00 sec)


The Operation column lists two operators: filter and index_scan. The index_scan entry is indented to indicate that it is a child of the filter, which means that the rows created by the index scan are passed through the filter. The line "index_scan 1 := foo._base_foo" means that the index base_foo is read from the table foo, and the unique ID of 1 is assigned to this operation. This operator returns all rows found in base_foo. Because no primary key was specified when foo was created, the ClustrixDB created a hidden primary key for foo's primary index, _base_foo.
The filter returns only rows that pass the predicate "a = 5" .

The ClustrixDB query optimizer converts the value of 5 into a parameter, enabling it to use same execution plan for similar queries without recompiling the query (for example, 5 can be replaced with 7 or 100 without requiring recompilation). The predicate in the filter is represented as (1.a eq param((0 0))), and 1 is the unique ID assigned to _base_foo, so 1.a represents the column a from index _base_foo. The last part, param((0 0)), represents parameter 0 (plus a second argument for internal use). The plan returns rows where column foo.a equals parameter 0.

To improve query efficiency, add an index to column a and examine its effect:

mysql> CREATE INDEX idx_a ON foo (a);
Query OK, 0 rows affected (0.73 sec)


mysql> EXPLAIN SELECT a FROM foo WHERE a = 5;
+-----------------------------------------+-----------+-----------+
| Operation                               | Est. Cost | Est. Rows |
+-----------------------------------------+-----------+-----------+
| index_scan 1 := foo.idx_a, a = param(0) |     10.61 |      1.01 | 
+-----------------------------------------+-----------+-----------+
1 row in set (0.01 sec)


The query now reads from the new index, so the optimizer pushed the predicate down from the filter into the index scan. The filter is still displayed but no longer performs any processing.
The cost and rows columns in the EXPLAIN output are estimates for the associated operators in the execution plan. Est. Cost is a relative cost, and therefore is not displayed in units such as milliseconds or CPU cycles. Est. Rows is an estimate of the number of rows returned by the operator. These estimates are based on the properties of the operator and the number of rows that are returned by any children of that operator.

For example, consider the above plan for SELECT * FROM foo WHERE a = 5.

The estimate for rows returned from the index scan is based on statistics that ClustrixDB maintains for the table foo and the index __base_foo. These statistics are automatically refreshed periodically. (You cannot manually force ClustrixDB to refresh statistics.)


Here's a more complex example:

mysql> CREATE TABLE bar (x int PRIMARY KEY, y int);
Query OK, 0 rows affected (0.15 sec)


mysql> INSERT INTO bar VALUES (1,10), (2,20), (3,20), (4,20);
Query OK, 4 rows affected (0.02 sec)


mysql> CREATE TABLE baz (r int);
Query OK, 0 rows affected (0.14 sec)
 
mysql> INSERT INTO baz VALUES (1), (1), (2), (9);
Query OK, 4 rows affected (0.02 sec)


mysql> EXPLAIN SELECT x, SUM(y) FROM bar JOIN baz ON x = r GROUP BY x;
+-----------------------------------------------------------------------+-----------+-----------+
| Operation                                                             | Est. Cost | Est. Rows |
+-----------------------------------------------------------------------+-----------+-----------+
| hash_aggregate_combine GROUPBY((1 . "x")) expr0 := sum((0 . "expr0")) |     81.80 |      4.00 | 
|   hash_aggregate_partial GROUPBY((1 . "x")) expr0 := sum((1 . "y"))   |     76.20 |      4.00 | 
|     nljoin                                                            |     76.20 |      4.00 | 
|       stream_combine                                                  |     33.80 |      4.00 | 
|         index_scan 2 := baz.__base_baz                                |     10.80 |      1.33 | 
|       index_scan 1 := bar.__idx_bar__PRIMARY, x = 2.r                 |     10.60 |      1.00 | 
+-----------------------------------------------------------------------+-----------+-----------+
6 rows in set (0.02 sec)


The operator nljoin takes each row from its first input and joins it with each row from its second input. dual is an internal operator that enforces good join ordering. The inner nljoin is equivalent to the index scan on baz._base_baz. The outer nljoin reads from baz.base_baz first, and then joins with bar.idx_bar_PRIMARY. The presence of x = 2.r in the latter's index scan indicates a seek operation, meaning a full table scan is avoided.