MemSQL Architecture Reviews

MemSQL’s architecture has three distinct components that contribute to its speed and dual functionality. These three distinct components are:

MemSQL’s architecture has three distinct components that contribute to its speed and dual functionality. These three distinct components are:

  • Distributed, shared-nothing architecture on commodity hardware
  • Hybrid OLTP & OLAP Architecture
  • Lock-free Data Structures

Distributed Shared Nothing Architecture

MemSQL’s architecture is distributed across many individual machines, with each machine possessing its own CPU, memory, and disk; this is commonly known as a shared-nothing architecture, because storage and compute resources aren’t shared across the system.

MemSQL is also designed to run on commodity hardware, which means that the hardware needed to run a MemSQL system is not proprietary and therefore is relatively cheap. This also allows MemSQL administrators to scale the instance cheaply and easily.

Each node within MemSQL will fall under one of two types:

  • Aggregator nodes store metadata such as indexes across the entire system, and are responsible for querying the storage nodes (known as leaves), aggregating the results, and sending them back to the client.
  • Leaf nodes each contain a portion of the larger table. They’re responsible for storing and computing the data located on the node.



Application workloads can be divided amongst aggregators, where one aggregator serves application A, and another serves application B.

Hybrid OLTP & OLAP Architecture

MemSQL contains both an in-memory rowstore for fast, high availability queries and an on-disk columnstore for analytical workloads that involve aggregating data, or where the size of table exceeds memory constraints.

MemSQL is able to join between its rowstore and columnstore, which means that a user querying the MemSQL database will be able surface both real-time and historical data with one query.

Lock-Free Data Structures

Many relational databases impose a “lock” on stored tables when writing data. This is because if a read occurs at the same time data is being written to the table, the data can be output inconsistently. For example if an UPDATE statement is issued on the table at the same time as a SELECT statement, you could potentially read out half written data.

Table locks increase latency for database systems because they necessitate a dedicated database or data warehouse specifically for reading data or performing analytical queries. Data will then be moved in batches into that database at dedicated ETL intervals.

MemSQL implements lock-free data structures, skiplists, and Multiversion Concurrency Control (MVCC) to eliminate the need to lock the database when updating tables, which means that writes can happen at very high throughput while also maintaining a high level of concurrent reads.

Types of data

MemSQL stores data in either row or columnar format, depending on the table DDL*. It can ingest both tabular and nested data structures originating from text (delimited) files, JSON, Avro, and geospatial data.

Range of row (quantity)

MemSQL is capable of billion-row, petabyte-scale tables given enough compute and storage. Keep in mind that costs will increase if you need substantial space in the in-memory rowstore portion of MemSQL.