This section describes the potential limiting platform factors on cluster performance, how to measure whether a cluster is approaching or exceeding those limits, and options available to remedy such conditions. "Platform factors" refer to hardware resources such as CPU, memory, disk, and network I/O subsystems. For potential software-related factors, please see Managing Data Distribution, Load Balancing ClustrixDB with HAProxy, or Understanding the ClustrixDB Explain Output.
Table of Contents |
---|
A common cause of overall degraded performance within ClustrixDB is CPU contention. In the ideal case, this occurs when a cluster reaches maximum TPS for a given workload with the current number of nodes: all CPU cores are busy, and additional load results in increased query latency. The solution here is to add more nodes to the cluster, providing additional compute, memory, and storage capacity.
...
There are other cases where CPU contention becomes a bottleneck even though the cluster is not being fully utilized; that is, load is not optimally balanced across the cluster. This can be due to external factors such as an inefficient query, or client connections being poorly distributed across nodes (if not connecting through the VIP). A suboptimal sub-optimal configuration could also be a culprit, such as having a table that is not distributed evenly across the cluster, although the system goes to great lengths to automatically manage this. .
...
The following are the most useful metrics collected by statd related to disk latency and throughput:
...
To more deeply investigate disk I/O subsystem performance, you can use a tool such as sar. Note that the clustrix.io.disk.* metrics provided by statd expose The statd metrics noted above expose much of the same information, however, sar easily allows more frequent polling of this information.
sar -b will provide a global view of reads and writes of buffers from and to all disks in the system. As such, it It gives a gross indicator of disk utilization, similar to the clustrix.io.disks stats, but on utilization on a per-node basis:.
[email protected]:~$ sar -b 5 Linux 2.6.32-358.14.1.el6.x86_64 (ip-10-76-3-87) 09/25/2013 _x86_64_ (4 CPU) 07:06:13 PM tps rtps wtps bread/s bwrtn/s 07:06:18 PM 3143.40 374.40 2769.00 22281.60 19230.40 07:06:23 PM 3861.28 671.86 3189.42 41255.09 22692.22 07:06:28 PM 2556.43 375.10 2181.33 22207.23 14547.79 07:06:33 PM 3208.38 526.15 2682.24 32175.65 15326.15 07:06:38 PM 2202.00 502.00 1700.00 31121.76 9654.29 07:06:43 PM 2572.40 402.20 2170.20 24441.60 17152.00 07:06:48 PM 1290.18 285.37 1004.81 17590.38 5861.32 07:06:53 PM 3287.82 553.69 2734.13 34430.34 20011.18 |
...