Block Depth Threading
Overview
Block depth threading is a work distribution algorithm that balances scanner thread workload based on the amount of blockchain data each address needs to process, rather than simply distributing addresses evenly across threads. This approach addresses the performance inefficiency where threads with fewer blocks to scan finish early and become idle while other threads continue processing.
Motivation
The default threading algorithm distributes addresses evenly across threads, treating each address as an equal unit of work. However, addresses have vastly different synchronization requirements: - A fully synced address may only need to process a few recent blocks - A newly added address may need to scan hundreds of thousands of blocks
This imbalance causes threads with mostly synced addresses to finish quickly and remain idle while threads with unsynced addresses continue working, resulting in poor CPU utilization and longer overall sync times.
Configuration
--block-depth-threading: Enable block depth threading algorithm (default: false)--min-block-depth: Minimum block depth value for workload calculations (default: 16)
Usage
Enable block depth threading with default settings:
monero-lws-daemon --block-depth-threading [other options]
Enable with custom minimum block depth:
monero-lws-daemon --block-depth-threading --min-block-depth=32 [other options]
Or in config file:
block-depth-threading=true
min-block-depth=32
Algorithm
Block Depth Calculation
For each address, the block depth is calculated as the number of blocks remaining to be scanned:
blockdepth = max(current_blockchain_height - address_scan_height, min_block_depth)
Where min_block_depth is the value specified by --min-block-depth (default: 16).
Addresses with blockdepth less than the minimum are assigned the minimum value. This prevents edge cases where fully synced addresses would have zero blockdepth, which could cause: - Division by zero or near-zero values in workload calculations - Degenerate cases where many fully-synced accounts get assigned together - Poor workload distribution when most accounts are fully synced
Minimum Block Depth
The --min-block-depth flag sets the minimum block depth value used in workload calculations. This ensures that even fully synced accounts contribute meaningfully to workload balancing.
Default value: 16 blocks
Example: With --min-block-depth=16, an account at the current blockchain height (0 blocks remaining) is treated as having 16 blocks remaining for workload distribution purposes.
When to adjust: You may want to increase this value if you have many fully synced accounts and want them to contribute more to workload balancing, or decrease it if you want more precise distribution for nearly-synced accounts.
Thread Assignment
- Calculate total work: Sum all address blockdepths to get
total_blockdepth - Calculate target per thread:
blockdepth_per_thread = total_blockdepth / thread_count - Sort addresses: Order by blockdepth (smallest first)
- Distribute to threads:
- Addresses are assigned sequentially to threads
- Accounts are added to the current thread until the cumulative depth reaches or exceeds the target
- When target is reached, move to the next thread
- Final thread receives any remaining addresses
This overallocation strategy ensures more balanced workload distribution and better thread utilization throughout the scanning process.
Example
With 4 threads and 20 accounts with varying sync states: - Accounts A-H: 16 blocks each (synced, at minimum) = 128 blocks - Accounts I-L: 100 blocks each = 400 blocks - Accounts M-P: 300 blocks each = 1,200 blocks - Accounts Q-T: 500 blocks each = 2,000 blocks
Old Algorithm (by count, evenly distributed - 5 accounts per thread): - Thread 0: A, B, C, D, E (80 blocks) ✓ finishes immediately - Thread 1: F, G, H, I, J (228 blocks) ⏱ - Thread 2: K, L, M, N, O (1,028 blocks) ⏱⏱ - Thread 3: P, Q, R, S, T (2,100 blocks) ⏱⏱⏱⏱ takes much longer - Problem: Despite equal account count (5 per thread), massive workload imbalance - thread 3 has 26x more work than thread 0
New Algorithm (by depth, balanced workload with alternating over/under allocation): - Total: 3,728 blocks, target: 932 blocks/thread - Thread 0: (even, over-allocate): A, B, C, D, E, F, G, H, I, J, K, L, M, N (1,128 blocks) - Thread 1: (odd, under-allocate): O, P (600 blocks) - Thread 2: (even, over-allocate): Q, R (1,000 blocks) - Thread 3: (odd, under-allocate): S, T (1,000 blocks) - Result: All 4 threads utilized with better balance (600-1,128 vs 80-2,100 blocks), synced accounts efficiently grouped
Benefits
- Improved parallelization: All threads remain active longer
- Reduced sync time: More efficient CPU utilization
- Better resource usage: Eliminates idle threads waiting for others to complete
- Predictable performance: Workload is distributed based on actual work required