Reducing Broadcasts
Broadcasts
Broadcasts occur when a node requests additional information from another node. This adds parallelism (a good thing) but can also add noise/congestion. Removing unnecessary or unhelpful broadcasts via schema or query changes can be beneficial to the system as a whole.
Index distribution that is not optimal can decrease performance by introducing unwanted broadcasts. Unnecessary broadcasts increase network traffic and resource contention, both of which degrade performance.
Detecting Broadcasts
Xpand's Query Plan Cache (QPC) stores statistics for recently run queries. Use this query to identify statements that recently used extensive broadcasts.
sql> SELECT statement, broadcasts, exec_ms, counts, rows_read FROM system.qpc_queries ORDER BY broadcasts DESC LIMIT 5\G; |
Using ALLNODES to Reduce Broadcasts
Tables that have replicas on all nodes will not require broadcasts to other nodes to obtain data. This is especially helpful when small tables are joined to larger tables.
ALLNODES is best used for tables that meet the following criteria:
Relatively small (Table Size < 10MiB)
Written to infrequently (Write Frequency < 1K)
Read from frequently (Read Frequency > 1M)
Used frequently in joins to other, larger tables (e.g. metadata, lookup tables)
ALLNODES should not be used for:
Tables that take frequent writes
Partitioned tables
Identifying Potential ALLNODES Tables
sql> SELECT `database`, `table`, reads, writes, bytes FROM (SELECT `database`, `table`, Sum(reads) AS reads, Sum(inserts) + Sum(deletes) + Sum(replaces) AS writes, Sum(bytes) AS bytes, Count(DISTINCT( `index` )) AS paths, Count(DISTINCT( slice )) AS slices, Count(DISTINCT( replica )) AS replicas FROM system.container_stats cs JOIN system.table_replicas tr ON cs.replica = tr.replica GROUP BY 1, 2) AS a WHERE bytes < 10 * 1024 * 1024 AND writes < 1000 AND reads > 1000000 AND slices > paths ORDER BY 1, 2; |
Modifying a Table to be ALLNODES
sql> ALTER TABLE foo.bar REPLICAS=ALLNODES; |
Changing Index Distribution to Reduce Broadcasts
With multi-column indexes having distribution greater than one, there can be a benefit to reducing the distribution in some instances:
- Distribution is > the number of columns - Change distribution to the number of columns (or to one)
- Indexes that have mostly unique initial columns - Consider reducing distribution if unnecessary broadcasts are being performed.
- EXPLAIN queries that use an index and look for stream_combine, which is usually necessary following a broadcast.
Clustrix recommends making these changes during off-peak hours or during a maintenance window.