Skip to end of metadata
Go to start of metadata

This is a high-level glossary of terms. To get more detail on a particular term, click on one of the related links. To find the information you are looking for, you can also try searching this site. If you would like to see additional terms defined, please email docs@clustrix.com

Search all Clustrix Documentation:

Jump to:

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

ACID

ACID (Atomicity, Consistency, Isolation, Durability) refer to the characteristics of a database that guarantee that transactions are processed reliably. 

allnodes

Refers to the ability to set replicas = allnodes

Atomic

Refers to the characteristic of a transaction that is all or nothing. 

B

 B-Tree

Standard computer science data structure used for fast access. See B-Tree at Wikipedia.

 Barrier

A barrier is a synchronization method used to control message flow within ClustrixDB. A barrier delineates a group of messages and all nodes must reach that barrier before proceeding.

Base Representation

The representation that contains all the table data and that is indexed by the primary key is the “Base Representation” or baserep. If no primary key is defined, ClustrixDB assigns a unique rowid key.

 BigC 

BigC is Clustrix's garbage collection process that cleans up undo logs needed to rollback running transactions. ClustrixDB must keep the system's state of a transaction the entire time a transaction is open. Once a  transaction is committed, "BigC" removes it from the various undo logs as it is no longer needed.

Long-running transactions can cause BigC to become "pinned". That means that an old transaction must be preserved as it is potentially needed for recovery, yet subsequent activity on the cluster is causing the undo logs to become full.

Broadcast

Broadcasting refers to a method of transferring a message to all recipients simultaneously. ClustrixDB leverages distributed computing to avoid broadcasts. See how ClustrixDB scales joins.

C

 cid

Commit Identifier that marks when transactional changes become visible to other transactions.

Cluster

A group of ClustrixDB nodes connected to provide a redundant, scalable RDBMS.

Consistent

A consistent transaction does not violate any referential integrity during its execution. 

 Container

Containers are the base storage unit used by ClustrixDB. They define how representations are stored and retrieved using an access method such as btrees, layered trees, skiplists, etc. Each slice and replica of a representation will have its own container, regardless of whether or not that container is written to disk.

Cost

ClustrixDB uses a cost-based model for the query optimizer (Sierra) that uses a cost factor based on I/O, CPU usage, and latency.

D

Data Distribution  

ClustrixDB leverages fine-grained data distribution and a shared-nothing architecture to provide scalability.

Database

 A collection of tables, or relations. This is also sometimes referred to as a schema. (The ClustrixDB term for tables is "relations.")

device1

This file represents ClustrixDB's permanent storage and is used for all database data, undo logs, temporary tables, binlogs, and ClustrixDB system usage.

device1-redo

The write-ahead log (WAL) is stored in this file. 

device1-temp

ClustrixDB uses this temporary storage for sorting and grouping of large query results.

Distributed Aggregates

Refers to the ability for ClustrixDB to perform aggregate queries (e.g. OLAP) in a distributed manner.

Distribution Key

The distribution key is some prefix of a representation's key columns and is used to distribute data across the cluster. 

Durable

A database is durable if it provides a guarantee that transactions that have been committed will survive permanently, even through unexpected power loss or hardware failure. 

E

Evaluation Model 

This describes how a query is evaluated in ClustrixDB. 

F

 Fair Scheduler

ClustrixDB component that prevents long-running queries from monopolizing cpu resources by giving priority to any waiting small queries.

Flow Control

The GTM and other subsystems use flow control to prevent message senders from outpacing receivers and to prevent receiving nodes' memory from filling up with unprocessed messages.

Forward

Sending a row or rows to another node for further processing.

Fragment

A pre-compiled part of a query usually sent to another node for processing.

G

Global Transaction Manager (GTM)

A subsystem that manages the atomic commitment of transactions across the cluster, and ensures that all nodes involved come to the same decision every time.

Group Change

A Group Change is the event that occurs when a cluster forms a new group of nodes. This occurs when a node joins or leaves the cluster group.

H

 Heartbeat

ClustrixDB regularly sends signals to each node of the cluster to ensure each continues to be a viable component of the cluster. This check pings core 0 of each node every 2 seconds. If a node fails the heartbeat check it will be temporarily removed from the Group until it becomes available again.

I

 iid 

Invocation Id that marks the beginning of a statement within a transaction.

Index Distribution 

ClustrixDB distributes indexes based on hashing the first column of an index unless advised otherwise. See Distribute.

 Invocation 

An invocation represents a single use of the query engine. Typically, queries use a single invocation, but DDL queries and those that call a stored procedure or function can use multiple invocations.

Isolation   

is a property that defines how and when changes made by one transaction are visible to other concurrent transactions. 

J

K

L

Lumpy

Generally refers to a poor data distribution.

M

Massively Parallel Processing

Refers to the ability to leverage a large number of processors to perform a set of coordinated computations in parallel. 

 Max_Failures

A ClustrixDB feature that accommodates multiple, simultaneous node failures without data loss. Also known as nResiliency. 

Multi-Version Concurrency Control (MVCC) 

A method used to implement concurrency and consistency in a distributed database environment. One of the original papers on this topic is Concurrency Control in Distributed Database Systems. ClustrixDB implements a modified version of this algorithm that provides optimizations for modern database workloads. 

N

Node

 A single server running the ClustrixDB software. Multiple nodes connect to form a cluster.

nResiliency 

Another term for Max_Failures.

O

OID 

Internal Object Identifier used by ClustrixDB to identify a database object. Types, relations, and rows are all examples of objects that have OIDs. 

OLAP

On-Line Analytical Processing.

 OLTP

On-line Transaction Processing.

P

 PD (Probability Distribution)

Probability Distributions are tracked for values in each relation to aid in query planning. 

 Protected

Refers to the status of the cluster when at least two replicas of every slice are available. See also Reprotect.

Q

Query Optimizer

The job of the optimizer is to determine which execution plan uses the least amount of resources. Typically this is done by assigning costs to a query's plan and then choosing the plan with the lowest cost.

Queue (Recovery)

Queues are used to track changes to data that may have occurred for a given node while it was unavailable to the cluster. See also Queue Replay and Queue Flip

 Quorum

ClustrixDB requires that a minimum number of nodes planned for a cluster are operational at any one time for it to be able to operate as configured. That minimum for ClustrixDB is called a quorum and it is calculated as one more than half of all the nodes configured for a cluster or (Total Nodes/2 +1)ClustrixDB cannot form a cluster without a quorum.

R

 Race Condition 

 A condition that occurs when data is being added to slices while they are actively being split. This increases the amount of work required of the I/O subsystem and potentially impacts performance.

Rebalancer

The ClustrixDB Rebalancer automatically moves, copies, redistributes, and re-ranks data across the cluster. 

 Relation 

A table in ClustrixDB. 

Replica

ClustrixDB maintains multiple copies of data for fault tolerance and availability. Such copies or replicas are stored on different nodes. 

Representation

Every index, including the primary key, is called a “Representation” in ClustrixDB. Each representation is made up of a series of slices. The table's data is stored within the Base Representation

Reprotect

When a slice has fewer replicas than desired, the Rebalancer will create a new copy of the slice on a different node.

Reslicing

As the data set grows, ClustrixDB will automatically and incrementally redistribute the dataset one or more slices at a time.

 RIGR

Relational Intermediate Language, a version of SQL used by the ClustrixDB internals. 

S

Sierra

Sierra is the name given to the ClustrixDB Query Optimizer

Slice

ClustrixDB breaks up each representation into a collection of logical slices. Rows are assigned to slices according to the results of a hashing function. See also Re-slicing.

Soft Fail

An operation that removes a node from a cluster. 

T

U

Under-Protected

Refers to the state of the cluster when it does not have at least two copies (replicas) of each slice. 

V

 vrel

Virtual relation, often used to represent system information. 

W

WAL 

The Write Ahead Log is used to log every command that the user executes. 

X

xid (Transaction ID) 

An identifier used by ClustrixDB internals to denote the logical start of a Transaction.

Y

Z