What is ClustrixDB?

ClustrixDB is a shared-nothing clustered scalable database based on commodity hardware and parallel software. Parallelism throughout the system integrates the various nodes of the cluster into one very large (huge) database, from both programming and management perspectives. There are no bottlenecks and no single points of failure. All processors are enlisted in support of query processing. Queries are parallelized and distributed across the cluster to the relevant data. New nodes are automatically recognized and incorporated into the cluster. Workloads and data are automatically balanced across all nodes in the cluster. Cluster-wide SQL relational calculus and ACID properties eliminate multi-node complexity from the development and management of multi-tiered applications. The complexity commonly required to scale existing db models to handle large volumes of data is eliminated. And as your database grows, just add nodes.

What are the main features of ClustrixDB?

Does ClustrixDB use any MySQL code?

No MySQL code is used. ClustrixDB is entirely original, based on decades of experience in the development of scalable parallel file systems, very large time-series databases, and some of the world's fastest super-computers.

Is ClustrixDB available as open source?

No, ClustrixDB is available as licensed, downloadable software. 

On which platforms is ClustrixDB supported?

ClustrixDB is supported on RHEL or CentOS 7.4.

What makes ClustrixDB scalable?

There are several things that affect scalability and performance:

This is very different from other systems, which routinely move large amounts of data to the node that is processing the query, then eliminate all the data that doesn't fit the query (typically lots of data). By only moving qualified data across the network to the requesting node, ClustrixDB significantly reduces the network traffic bottleneck. In addition, more processors participate in the data selection process, By selecting data on multiple nodes in parallel, the system produces results more quickly than if all data was selected by a single node, which first has to collect all the required data from the other nodes in the system.

How does a client know with which node of the cluster to connect?

It doesn't matter. Clients can connect to any node in the cluster. The ClustrixDB parallel database software will route the queries to the appropriate nodes - the ones that have the relevant data. Clustrix recommends using an external load balancer.

How does ClustrixDB compare with the master-slave replication approach to scalability?

Replication only scales reads. In a master-slave configuration, all writes are done to the master, then replicated to the various slaves. This causes two problems:

How does ClustrixDB compare to application-level horizontal federation (a.k.a. sharding)?

Essentially, ClustrixDB is doing horizontal federation. The key is making the federation invisible to applications and to administrators. In addition, ClustrixDB provides:

By making the federation invisible to applications, ClustrixDB eliminates the need for custom programming and administration for partitioning. This increases the customer's ability to query and update transactions across partitions, ultimately leading to greater functionality at lower cost.

What are data replicas?

All data in ClustrixDB is replicated on a per-table or per-index basis. Customers may prefer to maintain more replicas of base representations (data tables), and fewer replicas of indexes, since they are reconstructable. 

How does ClustrixDB optimize joins?

The query planner is cluster-aware, and it knows which nodes of the cluster contain which indexed rows. Here's how it works:

Note: if the join is on columns that have no indexes, then table scans are required, but the scans can be done in parallel on multiple nodes, so the operation, while not optimal, is still accelerated.

What steps are required to start a ClustrixDB database?

See ClustrixDB Installation Guide Bare OS Instructions.

What steps are required to add more nodes to an existing ClustrixDB database?

The short answer is: just add nodes. Refer to these instructions for guidance in Expanding Your Cluster's Capacity - Flex Up.

What happens to the system if a component fails?

The system is designed to continue operating through inevitable component failures, as follows:

What levels of redundancy are provided?

The node is the fundamental redundant unit. Multiple nodes can fail without a system outage. In addition, all data paths and all data are redundant. Administrators can specify the desired level of redundancy (number of data replicas) and can specify priorities for the re-creation of additional replicas when storage or nodes fail.

Is ClustrixDB a new storage engine for MySQL?

No, it's a complete database, built from the ground up for high-performance, clustered OLTP. It is wire-compatible with MySQL, but is implemented without any MySQL code.

Does the product support online backup operations?

Yes. For complete details, please see ClustrixDB Fast Backup and Restore. ClustrixDB also supports MySQL operations such as mysqldump.