This is a high-level glossary of terms. To get more detail on a particular term, click on one of the related links. To find the information you are looking for, you can also try searching this site. If you would like to see additional terms defined, please email email@example.com
Search all Clustrix Documentation:
Refers to the ability to set replicas = allnodes.
Refers to the characteristic of a transaction that is all or nothing.
Standard computer science data structure used for fast access. See B-Tree at Wikipedia.
A barrier is a synchronization method used to control message flow within ClustrixDB. A barrier delineates a group of messages and all nodes must reach that barrier before proceeding.
The representation that contains all the table data and that is indexed by the primary key is the “Base Representation” or baserep. If no primary key is defined, ClustrixDB assigns a unique rowid key.
BigC is Clustrix's garbage collection process that cleans up undo logs needed to rollback running transactions. ClustrixDB must keep the system's state of a transaction the entire time a transaction is open. Once a transaction is committed, "BigC" removes it from the various undo logs as it is no longer needed.
Long-running transactions can cause BigC to become "pinned". That means that an old transaction must be preserved as it is potentially needed for recovery, yet subsequent activity on the cluster is causing the undo logs to become full.
Broadcasting refers to a method of transferring a message to all recipients simultaneously. ClustrixDB leverages distributed computing to avoid broadcasts. See how ClustrixDB scales joins.
Commit Identifier that marks when transactional changes become visible to other transactions.
A group of ClustrixDB nodes connected to provide a redundant, scalable RDBMS.
A consistent transaction does not violate any referential integrity during its execution.
Containers are the base storage unit used by ClustrixDB. They define how representations are stored and retrieved using an access method such as btrees, layered trees, skiplists, etc. Each slice and replica of a representation will have its own container, regardless of whether or not that container is written to disk.
ClustrixDB uses a cost-based model for the query optimizer (Sierra) that uses a cost factor based on I/O, CPU usage, and latency.
ClustrixDB leverages fine-grained data distribution and a shared-nothing architecture to provide scalability.
A collection of tables, or relations. This is also sometimes referred to as a schema. (The ClustrixDB term for tables is "relations.")
This file represents ClustrixDB's permanent storage and is used for all database data, undo logs, temporary tables, binlogs, and ClustrixDB system usage.
The write-ahead log (WAL) is stored in this file.
ClustrixDB uses this temporary storage for sorting and grouping of large query results.
Refers to the ability for ClustrixDB to perform aggregate queries (e.g. OLAP) in a distributed manner.
The distribution key is some prefix of a representation's key columns and is used to distribute data across the cluster.
A database is durable if it provides a guarantee that transactions that have been committed will survive permanently, even through unexpected power loss or hardware failure.
This describes how a query is evaluated in ClustrixDB.
ClustrixDB component that prevents long-running queries from monopolizing cpu resources by giving priority to any waiting small queries.
The GTM and other subsystems use flow control to prevent message senders from outpacing receivers and to prevent receiving nodes' memory from filling up with unprocessed messages.
Sending a row or rows to another node for further processing.
A pre-compiled part of a query usually sent to another node for processing.
Global Transaction Manager (GTM)
A subsystem that manages the atomic commitment of transactions across the cluster, and ensures that all nodes involved come to the same decision every time.
A Group Change is the event that occurs when a cluster forms a new group of nodes. This occurs when a node joins or leaves the cluster group.
ClustrixDB regularly sends signals to each node of the cluster to ensure each continues to be a viable component of the cluster. This check pings core 0 of each node every 2 seconds. If a node fails the heatbeat check it will be temporarily removed from the Group until it becomes available again.
Invocation Id that marks the beginning of a statement within a transaction.
ClustrixDB distributes indexes based on hashing the first column of an index unless advised otherwise. See Distribute.
An invocation represents a single use of the query engine. Typically, queries use a single invocation, but DDL queries and those that call a stored procedure or function can use multiple invocations.
is a property that defines how and when changes made by one transaction are visible to other concurrent transactions.
Generally refers to a poor data distribution.
Refers to the ability to leverage a large number of processors to perform a set of coordinated computations in parallel.
A ClustrixDB feature that accommodates multiple, simultaneous node failures without data loss. Also known as nResiliency.
A method used to implement concurrency and consistency in a distributed database environment. One of the original papers on this topic is Concurrency Control in Distributed Database Systems. ClustrixDB implements a modified version of this algorithm that provides optimizations for modern database workloads.
A single server running the ClustrixDB software. Multiple nodes connect to form a cluster.
Another term for Max_Failures.
Internal Object Identifier used by ClustrixDB to identify a database object. Types, relations, and rows are all examples of objects that have OIDs.
On-Line Analytical Processing.
On-line Transaction Processing.
PD (Probability Distribution)
Probability Distributions are tracked for values in each relation to aid in query planning.
Refers to the status of the cluster when at least two replicas of every slice are available. See also Reprotect.
The job of the optimizer is to determine which execution plan uses the least amount of resources. Typically this is done by assigning costs to a query's plan and then choosing the plan with the lowest cost.
ClustrixDB requires that a minimum number of nodes planned for a cluster are operational at any one time for it to be able to operate as configured. That minimum for ClustrixDB is called a quorum and it is calculated as one more than half of all the nodes configured for a cluster or (T otal Nodes/2 +1) . ClustrixDB cannot form a cluster without a quorum.
A condition that occurs when data is being added to slices while they are actively being split. This increases the amount of work required of the I/O subsystem and potentially impacts performance.
The ClustrixDB Rebalancer automatically moves, copies, redistributes, and re-ranks data across the cluster.
A table in ClustrixDB.
ClustrixDB maintains multiple copies of data for fault tolerance and availability. Such copies or replicas are stored on different nodes.
Every index, including the primary key, is called a “Representation” in ClustrixDB. Each representation is made up of a series of slices. The table's data is stored within the Base Representation.
When a slice has fewer replicas than desired, the Rebalancer will create a new copy of the slice on a different node.
As the data set grows, ClustrixDB will automatically and incrementally redistribute the dataset one or more slices at a time.
Relational Intermediate Language, a version of SQL used by the ClustrixDB internals.
Sierra is the name given to the ClustrixDB Query Optimizer.
ClustrixDB breaks up each representation into a collection of logical slices. Rows are assigned to slices according to the results of a hashing function. See also Re-slicing.
An operation that removes a node from a cluster.
Refers to the state of the cluster when it does not have at least two copies (replicas) of each slice.
Virtual relation, often used to represent system information.
The Write Ahead Log is used to log every command that the user executes.
xid (Transaction ID)
An identifier used by ClustrixDB internals to denote the logical start of a Transaction.