Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
5fbf9cb
draft
dbsid Dec 27, 2025
f0f39a3
draft
dbsid Dec 29, 2025
5873dc1
update the picures
dbsid Dec 29, 2025
6cbbdee
Update tidb-x-architecture.md
dbsid Dec 29, 2025
6c313cc
Update tidb-x-architecture.md
dbsid Dec 29, 2025
3e45363
Update tidb-x-architecture.md
dbsid Dec 30, 2025
68e520e
Update tidb-x-architecture.md
dbsid Dec 30, 2025
17d9a4a
Apply suggestions from code review
dbsid Dec 30, 2025
5b9cdd5
Update tidb-x-architecture.md
dbsid Jan 2, 2026
d00eb0c
fix typo
dbsid Jan 3, 2026
1024dfe
add RCU
dbsid Jan 4, 2026
399cbd0
Update tidb-x-architecture.md
dbsid Jan 4, 2026
eb684ab
Update tidb-x-architecture.md
dbsid Jan 5, 2026
5cb8894
Update tidb-x-architecture.md
dbsid Jan 5, 2026
cb9f005
add explanation for RF engine and dotted lines
dbsid Jan 6, 2026
0efabae
Update tidb-x-architecture.md
dbsid Jan 6, 2026
bb40adb
update the images
dbsid Jan 7, 2026
019f316
Update tidb-x-architecture.md
dbsid Jan 7, 2026
26f4096
update the images
dbsid Jan 8, 2026
d19bb47
Update tidb-classic-vs-tidb-x-1.png
dbsid Jan 8, 2026
2296d9d
Update tidb-x-architecture.md
dbsid Jan 8, 2026
fa4e9c6
toc: update wording and format
lilin90 Jan 9, 2026
feac79d
Remove unnecessary aliases
lilin90 Jan 9, 2026
4850853
Update tidb-x-architecture.md
dbsid Jan 10, 2026
54d9ed2
Merge remote-tracking branch 'upstream/master' into pr/22245
lilin90 Jan 12, 2026
0203846
Update file placement
lilin90 Jan 12, 2026
40d1fec
Update tidb-cloud/tidb-x-architecture.md
dbsid Jan 13, 2026
b978364
Update tidb-x-architecture.md
dbsid Jan 13, 2026
3bb9a02
Fix body heading levels to pass ci check
lilin90 Jan 19, 2026
4df6e61
Make capitalization style consistent, add a missing para between head…
lilin90 Jan 19, 2026
4154ae9
Update wording
lilin90 Jan 19, 2026
a1ff3c9
Update wording
lilin90 Jan 19, 2026
862dc0a
Fix list format and make capitalization style consistent
lilin90 Jan 19, 2026
94ac35e
Refine wording for body heading
lilin90 Jan 20, 2026
d39c5bd
Apply suggestions from code review
dbsid Jan 22, 2026
4d75283
Apply suggestions from code review
dbsid Jan 22, 2026
7b15faa
Apply suggestions from code review
dbsid Jan 22, 2026
b40c677
media: update three diagrams
lilin90 Jan 23, 2026
ee76f78
Update list format
lilin90 Jan 23, 2026
96358ae
Update format and add anchor link
lilin90 Jan 23, 2026
9230cc5
Mention TiDB X in classic TiDB intro page
lilin90 Jan 23, 2026
4918d7f
Update tidb-x-architecture.md
dbsid Jan 29, 2026
7e8a008
Refine format and wording
lilin90 Jan 29, 2026
6478edb
Add available plans with TiDB X
lilin90 Jan 29, 2026
4f1b978
Update an anchor link
lilin90 Jan 29, 2026
bee7262
Update customcontent plan
lilin90 Jan 29, 2026
9d9703d
Refine wording and format
lilin90 Jan 29, 2026
4ac33f9
Add alibaba cloud
lilin90 Jan 29, 2026
c57be3c
Merge branch 'tidb-x-architecture' of https://github.com/dbsid/docs i…
lilin90 Jan 29, 2026
7377841
Remove an inaccurate billing sentence
lilin90 Jan 29, 2026
bf95cd2
Update tidb-x-architecture.md
dbsid Jan 29, 2026
d9b3135
Merge branch 'tidb-x-architecture' of https://github.com/dbsid/docs i…
dbsid Jan 29, 2026
072766a
Refine wording
lilin90 Jan 29, 2026
7e1d29d
Update format
lilin90 Jan 29, 2026
fa83467
Refine format in table
lilin90 Jan 29, 2026
9f5947f
Update tidb-x-architecture.md
dbsid Jan 29, 2026
2a45889
Remove extra periods
lilin90 Jan 29, 2026
faadb5f
Merge branch 'tidb-x-architecture' of https://github.com/dbsid/docs i…
dbsid Jan 29, 2026
515a9b5
Update link to pass ci
lilin90 Jan 29, 2026
73b4561
Merge branch 'tidb-x-architecture' of https://github.com/dbsid/docs i…
lilin90 Jan 29, 2026
5a12827
Merge branch 'tidb-x-architecture' of https://github.com/dbsid/docs i…
lilin90 Jan 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions TOC-tidb-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,8 @@
- [Computing](/tidb-computing.md)
- [Scheduling](/tidb-scheduling.md)
- [TSO](/tso.md)
- TiDB X Cluster Architecture
- [Overview](/tidb-x-architecture.md)
- Storage Engines
- TiKV
- [TiKV Overview](/tikv-overview.md)
Expand Down
Binary file added media/tidb-x/tidb-classic-vs-tidb-x-1.png
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Use sentence case capitalization for diagrams. I’ve highlighted what need to be updated in bold.

Original Suggested Note
Classic (Share-Nothing) Classic TiDB (shared-nothing)
Raft Log Raft log In both left and right sides
Cloud-Native cloud-native
Shared-Storage shared-storage
RF Engine RF engine
WAL Chunk WAL chunk
Elastic Compute Pool TiKV Workers Elastic compute pool TiKV workers
Compact Compaction Use a consistent noun style
Load data Data loading Use a consistent noun style
LSM-Tree Engine LSM-tree engine

Ref: https://en.wikipedia.org/wiki/Shared-nothing_architecture

image

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/tidb-x/tidb-classic-vs-tidb-x-2.png
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Use sentence case capitalization for diagrams. I’ve highlighted what need to be updated in bold.

Original Suggested Note
Classic (Single LSM-tree) Classic TiDB (single LSM-tree)
Raft Log Raft log In both left and right sides
LSM-Forest LSM-forest
LSM-Tree Engine LSM-tree engine

Ref: https://en.wikipedia.org/wiki/Log-structured_merge-tree

image

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/tidb-x/tidb-x-architecture.png
Copy link
Copy Markdown
Member

@lilin90 lilin90 Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Use sentence case capitalization for diagrams. I’ve highlighted what need to be updated in bold.

Original Suggested Note
Isolated SQL Layer Isolated SQL layer
Shared Cache Layer Shared cache layer
Object Storage Shared storage layer Keep wording consistent with other layers
Shared Services Shared services layer
Row Engine shared resource Row engine shared resource
Columnar Engine shared resource Columnar engine shared resource
Shared Storage(object storage) Object storage Omit "shared" since it's used in the left
Analyze Statistics collection To make the operation clear and easy to understand
image

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
111 changes: 111 additions & 0 deletions tidb-x-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: TiDB X Architecture
summary: The architecture introduction of the TiDB X
Comment thread
dbsid marked this conversation as resolved.
Outdated
---

# TiDB X Introduction

TiDB X represents a fundamental architectural shift in the TiDB evolution, transitioning from a classic "Share-Nothing" distributed database to a modern, "Share-Everything" Service-Oriented Architecture (SOA). Designed for the AI era and massive cloud scalability, TiDB X leverages Object Storage (e.g., Amazon S3) as the single source of truth.
Comment thread
dbsid marked this conversation as resolved.
Outdated

TiDB classic architecture decouples storage from compute entirely, TiDB X introduces a novel "Separation of Compute and Compute" design that isolates online transactional workloads from heavy maintenance tasks. The result is a system that offers elastic scalability, predictable performance, and optimized Total Cost of Ownership (TCO).
Comment thread
dbsid marked this conversation as resolved.
Outdated

This document details the challenges of TiDB classic, the architecture and key innovations of TiDB X.
Comment thread
dbsid marked this conversation as resolved.
Outdated

# Challenges of TiDB Classic

The motivation of TiDB X is documented in the blog [The Making of TiDB X: Origins, Architecture, and What’s to Come](https://www.pingcap.com/blog/tidbx-origins-architecture/)

TiDB Classic has faced several challenges in large-scale production environments, primarily stemming from its "Share-nothing" architecture.
Comment thread
dbsid marked this conversation as resolved.
Outdated

## Scalability Limitations

In TiDB Classic, scaling out (adding nodes) or scaling in (removing nodes) requires physically copying massive amounts of data (SST files) between nodes. This process is time-consuming for large datasets and can impact online traffic due to the heavy CPU and I/O required to move data.

The underlying storage engine (RocksDB) in TiDB Classic uses a single LSM-tree protected by a global mutex. This creates a scalability ceiling where the system struggles to handle large datasets (e.g., 3TB+ data per tikv node or 100k+ SST files), preventing it from utilizing the full capacity of the hardware.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Component names like TiKV should be enclosed in backticks for consistency and to adhere to the style guide. Also, "LSM-tree" is the more common capitalization.

Suggested change
The underlying storage engine (RocksDB) in TiDB Classic uses a single LSM-tree protected by a global mutex. This creates a scalability ceiling where the system struggles to handle large datasets (e.g., 3TB+ data per tikv node or 100k+ SST files), preventing it from utilizing the full capacity of the hardware.
The underlying storage engine (RocksDB) in TiDB Classic uses a single LSM-tree protected by a global mutex. This creates a scalability ceiling where the system struggles to handle large datasets (e.g., 3TB+ data per `TiKV` node or 100k+ SST files), preventing it from utilizing the full capacity of the hardware.


## Stability and Performance Challenges

Heavy write traffic triggers massive local compaction jobs to merge SST files. In the Classic architecture, these compaction jobs run on the same TiKV nodes serving online traffic, consuming significant CPU and I/O resources and can impact the online traffic.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The phrase "and can impact the online traffic" is repetitive. I suggest rephrasing to avoid this.

Suggested change
Heavy write traffic triggers massive local compaction jobs to merge SST files. In the Classic architecture, these compaction jobs run on the same TiKV nodes serving online traffic, consuming significant CPU and I/O resources and can impact the online traffic.
Heavy write traffic triggers massive local compaction jobs to merge SST files. In the Classic architecture, these compaction jobs run on the same TiKV nodes serving online traffic, consuming significant CPU and I/O resources, which can impact online traffic.


There is no physical isolation between logical regions and physical SST files. Operations like adding an index or moving a region (balancing) create compaction overhead that competes directly with user queries, leading to performance jitter. Under heavy write pressure, if the background compaction can not keep up with the forground write traffic, the system can trigger flow control mechanisms to protect the storage engine, which results in write throughput throttle and latency spikes for the application.
Comment thread
dbsid marked this conversation as resolved.
Outdated

## Lack of Cost Effectiveness

To keep the production system stable and ensure good performance during peak traffic, customers are forced to over-provision hardware resources. Resources must be planned for the "high water mark" of both online traffic and heavy background tasks. Besides, data size on single tikv nodes is limited, users often have to add more expensive compute nodes just to get more storage capacity, even if they don't need the extra CPU power.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Component names like TiKV should be enclosed in backticks. Also, using "In addition" is slightly more formal than "Besides".

Suggested change
To keep the production system stable and ensure good performance during peak traffic, customers are forced to over-provision hardware resources. Resources must be planned for the "high water mark" of both online traffic and heavy background tasks. Besides, data size on single tikv nodes is limited, users often have to add more expensive compute nodes just to get more storage capacity, even if they don't need the extra CPU power.
To keep the production system stable and ensure good performance during peak traffic, customers are forced to over-provision hardware resources. Resources must be planned for the "high water mark" of both online traffic and heavy background tasks. In addition, the data size on single `TiKV` nodes is limited, so users often have to add more expensive compute nodes just to get more storage capacity, even if they don't need the extra CPU power.


## Heavy Background Job Interference

Resource Conflict: Heavy background jobs—such as scale operations, backup, compaction, analyze, and data import (Load Data)—run on the same nodes as foreground OLTP traffic.

Performance Degradation: These tasks are resource-intensive and often interfere with online traffic, causing latency increases or throughput drops.

Maintenance Windows: Due to this interference, administrators often have to schedule maintenance operations (like index creation or backups) during off-peak hours to avoid impacting the business, reducing operational agility.


# TiDB X Architecture

This architecture represents a modern, cloud-native Share-Everything design that decouples storage from compute and further separates foreground transaction processing from background maintenance tasks.

![TiDB X Architecture](/media/tidb-x/tidb-x-architecture.png)

## Object Storage Support

As depicted in the "Object storage" layer of the diagram, TiDB X utilizes object storage (such as Amazon S3) as the single source of truth for all data. Unlike classic architectures where data resides on local disks, here the persistent copy of all data is stored in the shared object storage layer. The "Shared Cache Layer" above it (Row Engine and Columnar Engine) acts as a high-performance cache to ensure low latency. Because the authoritative data is already in robust object storage, backing up simply relys on incremental Raft logs and metadata stored in S3. This allows backups to finish in seconds no matter of the data volumes. New compute or cache nodes can come online almost instantly because they do not need to physically copy data from other nodes. They simply connect to the object storage and load the necessary data, making scale-out operations significantly faster.
Comment thread
dbsid marked this conversation as resolved.
Outdated

## Auto-Scaling Mechanism

The architecture is designed for elasticity, facilitated by the "Load balancer" and the stateless nature of the "Isolated SQL Layer".

Scaling within Seconds: Since compute nodes (in the SQL layer) are decoupled from the data (in object storage), the system can auto-scale by adding or removing compute pods in seconds to match real-time workload demands.

Pay-As-You-Go Model: This elasticity enables a true consumption-based pricing model. Users no longer need to provision for peak load 24/7; the system automatically provisions resources during traffic spikes and scales down during quiet periods to minimize costs.

## Microservice and Workload Isolation

The architecture diagram highlights a sophisticated separation of duties, ensuring that different types of work do not interfere with each other.

Isolated SQL Layer: The top "Isolated SQL Layer" shows separate groups of Compute nodes. This allows for multi-tenancy or workload isolation, where different applications or users can have dedicated compute resources while sharing the same underlying data.

Shared Services (Microservices): The bottom layer, "Shared Services", breaks down heavy database tasks into independent microservices like Compaction, Analyze, and DDL.

Zero Impact from Heavy Tasks: Expensive background operations—such as adding an index, Online DDL, or massive data imports—are offloaded to the Shared Services layer. This ensures that heavy jobs never compete for CPU or memory with the "Compute" nodes serving online user traffic, guaranteeing predictable performance for critical applications.

Independent Scaling: Each component (Gateway, SQL Compute, Cache, Background Services) can be scaled independently based on specific bottlenecks, and failovers are smoother as services are loosely coupled.

# TiDB X Key innovations
Comment thread
dbsid marked this conversation as resolved.
Outdated

Here is the figure to show the key archtecture difference between TiDB Classic and TiDB X.
Comment thread
dbsid marked this conversation as resolved.
Outdated

![TiDB Classic vs TiDB X](/media/tidb-x/tidb-classic-vs-tidb-x-1.png)

## Separation of Compute and Compute

While TiDB Classic already separated compute (SQL) from storage (TiKV), TiDB X introduces a secondary layer of separation within the compute layer itself:

- Lightweight Compute: Dedicated resources for lightweight OLTP workloads (user queries).
- Heavy Compute: A separate "Elastic Compute Pool" for heavy jobs (e.g., compaction, backups, scale operations, analyze, load data, and slow queries).

By offloading heavy tasks to the elastic compute pool, TiDB X ensures that maintenance tasks and heavy background jobs do not interfere with online transaction performance. The system delivers stable, predictable latency for OLTP workloads even during heavy operations.

## Transition to "Share-Everything" Architecture

TiDB X moves away from the classic "Share-Nothing" architecture (where data is copied between tikv nodes) to a modern "Share-Everything" model. Object storage as single source of truth: All persistent data relies on object storage (like S3) rather than local disks. This removes the need for physical data copying during scaling, enabling rapid elasticity.
Comment thread
dbsid marked this conversation as resolved.
Outdated

The introduction of the object storage will not impact the performance of forground read and write operations. For the read operation, only the heavy read request will be offloaded to the remote elastic coprocessor workers. For the write operation, the interaction with object storage is asynchronous and will not impact the write performance. The Raft log is perssisted on local disk first, the Raft WAL(Write-Ahead Log) chunks are uploaded to object storage in background later. When the data in a MemTable is full and flushed to local disk, the region leader will upload the sst file to object storage. After the remote compaction is done on elastic compaction workers, the tikv nodes will be notified to load the compacted sst files from object stroage.
Comment thread
dbsid marked this conversation as resolved.
Outdated


## Elastic Scalability (5x-10x Faster)

Because data resides in shared object storage, adding or removing nodes no longer requires massive data migration between machines. Scale-in and scale-out operations are 5 to 10 times faster than TiDB Classic and have zero impact on live traffic.

## Elastic TCO (Pay-As-You-Go)

TiDB Classic required over-provisioning hardware to handle peak traffic and background tasks (like compaction overhead) simultaneously. TiDB X enables auto-scaling, allowing users to pay only for the resources they use (Pay-As-You-Go). Background resources for heavy jobs spin up on demand and spin down when finished, eliminating wasted cost.

## From LSM tree to LSM Forest
Comment thread
dbsid marked this conversation as resolved.
Outdated

In the classic architecture, every TiKV node runs a single, massive RocksDB instance. This means all data from thousands of different "Regions" (logical data shards) is mixed together into one giant "Single LSM Tree" structure. Because data is mixed, operations like moving a Region, scaling in/out, or importing data require rewriting massive amounts of existing data (compaction) to separate or merge it. This consumes huge CPU/IO resources and impacts online traffic. The single LSM tree is protected by a global mutex. As data size grows (3TB+) or file count increases (100k+ SST files), contention on this global lock will impact both the read and write operations.
Comment thread
dbsid marked this conversation as resolved.
Outdated

TiDB X completely redesigns the storage engine by moving from a single tree to an LSM Forest. Instead of one giant tree for all data, TiDB X assigns each Region its own separate, independent LSM Tree. The most critical benefit of this physical isolation is the elimination of compaction overhead during cluster operations (scale-in, scale-out, region movement, load data). Operations on one Region (like a heavy write or a split) are isolated to its specific tree. There is no global mutex lock contention.
Comment thread
dbsid marked this conversation as resolved.
Outdated

![TiDB Classic vs TiDB X](/media/tidb-x/tidb-classic-vs-tidb-x-2.png)