LSM Tree vs B+ Tree Notes

How They Work

sequential writes Alt text for the image

Write Process:
- New data is first written to an in-memory buffer called MemTable.
- When MemTable reaches a size threshold, it's flushed to disk as an immutable SST (Sorted String Table) file.
- Periodically, multiple SST files are merged and sorted in a process called compaction.
Read Process:
- Reads check the MemTable first.
- If not found, it checks the most recent SST files, then proceeds through older SSTs.
- Bloom filters are often used to quickly determine which SSTs might contain the data.
Levels:
- Data is organized into multiple levels, with each level containing larger and less frequently updated data.
- Lower levels (closer to memory) are smaller and more frequently updated.

Structure:
- Balanced tree with a root, internal nodes, and leaf nodes.
- Internal nodes contain keys and pointers to child nodes.
- Leaf nodes contain keys and either data or pointers to data.
- Leaf nodes are typically linked for efficient range queries.
Write Process:
- Tree is traversed from root to appropriate leaf.
- If leaf has space, data is inserted.
- If leaf is full, it splits into two nodes, and the split propagates up the tree if necessary.
Read Process:
- Tree is traversed from root to leaf, using keys to guide the search.
- Once the leaf is reached, data can be retrieved directly or via a pointer.

Excellent Write Performance (O(1) amortized):
- Why: Writes are initially to in-memory MemTable, which is very fast.
- Flushing to disk happens in large sequential writes, which are efficient.
- Amortized O(1) because occasional larger operations (flushes and compactions) are spread over many writes.
Good but Potentially Degrading Read Performance (O(log N) worst case):
- Why: May need to check multiple SST files, increasing with the number of levels.
- Bloom filters help, but their effectiveness decreases as data size grows.
- Worst case occurs when data is in the oldest/largest SST level.
High Space Efficiency:
- Why: Compaction process merges and removes obsolete data.
- Sequential storage in SSTs allows for better compression.
Write Amplification:
- Why: Data may be rewritten multiple times during compaction.
- Each level typically involves rewriting the entire level's worth of data.
Read Amplification:
- Why: Might need to check multiple SST files for a single read.
- Can be mitigated by caching and Bloom filters, but still present.

random writes

Good Write Performance (O(log N)):
- Why: Needs to traverse the tree for each write, hence logarithmic.
- Splits and rebalances can cause additional writes, but are infrequent.
Excellent Read Performance (O(log N)):
- Why: Tree structure allows efficient traversal to locate data.
- Height of the tree grows logarithmically with the amount of data.
- Once leaf level is reached, data access is direct and fast.
Moderate Space Efficiency:
- Why: Maintains structure for fast access, which uses some extra space.
- Internal nodes contain only keys and pointers, saving some space.
- Can have some internal fragmentation due to partially filled nodes.
Lower Write Amplification (compared to LSM):
- Why: Updates are often in-place or require minimal restructuring.
- No equivalent to the LSM compaction process.
Minimal Read Amplification:
- Why: Each read operation typically only traverses the tree once.
- No need to check multiple files or structures for a single read.

Ideal for write-heavy workloads due to fast initial writes.
Good for scenarios with high insert rates but less frequent updates or deletes.
Suitable for time-series data, log data, and some types of analytical workloads.