Broadcasts

Broadcasts occur when a node requests additional information from another node. This adds parallelism (a good thing) but can also add noise/congestion. Removing unnecessary or unhelpful broadcasts via schema or query changes can be beneficial to the system as a whole.

Index distribution that is not optimal can decrease performance by introducing unwanted broadcasts. Unnecessary broadcasts increase network traffic and resource contention, both of which degrade performance.

Detecting Broadcasts

Xpand's Query Plan Cache (QPC) stores statistics for recently run queries. Use this query to identify statements that recently used extensive broadcasts.

sql> SELECT statement, 
            broadcasts, 
            exec_ms, 
            counts, 
            rows_read 
     FROM   system.qpc_queries 
     ORDER  BY broadcasts DESC 
     LIMIT  5\G;

Using ALLNODES to Reduce Broadcasts

Tables that have replicas on all nodes will not require broadcasts to other nodes to obtain data. This is especially helpful when small tables are joined to larger tables. 

ALLNODES is best used for tables that meet the following criteria:

  • Relatively small (Table Size < 10MiB)

  • Written to infrequently (Write Frequency < 1K)

  • Read from frequently (Read Frequency > 1M)

  • Used frequently in joins to other, larger tables (e.g. metadata, lookup tables)

ALLNODES should not be used for:

  • Tables that take frequent writes

  • Partitioned tables

Identifying Potential ALLNODES Tables

sql> SELECT `database`, 
            `table`, 
            reads, 
            writes, 
            bytes 
     FROM   (SELECT `database`, 
                    `table`, 
                    Sum(reads)                                  AS reads, 
                    Sum(inserts) + Sum(deletes) + Sum(replaces) AS writes, 
                    Sum(bytes)                                  AS bytes, 
                    Count(DISTINCT( `index` ))                  AS paths, 
                    Count(DISTINCT( slice ))                    AS slices, 
                    Count(DISTINCT( replica ))                  AS replicas 
             FROM   system.container_stats cs 
             JOIN   system.table_replicas tr 
                 ON cs.replica = tr.replica 
             GROUP  BY 1, 
                       2) AS a 
     WHERE  bytes < 10 * 1024 * 1024 
     AND    writes < 1000 
     AND    reads  > 1000000 
     AND    slices > paths 
     ORDER  BY 1, 2;

Modifying a Table to be ALLNODES

sql> ALTER TABLE foo.bar REPLICAS=ALLNODES;

Changing Index Distribution to Reduce Broadcasts

With multi-column indexes having distribution greater than one, there can be a benefit to reducing the distribution in some instances:

  • Distribution is > the number of columns - Change distribution to the number of columns (or to one)
  • Indexes that have mostly unique initial columns - Consider reducing distribution if unnecessary broadcasts are being performed.
  • EXPLAIN queries that use an index and look for stream_combine, which is usually necessary following a broadcast. 

Clustrix recommends making these changes during off-peak hours or during a maintenance window.

  • No labels