SpaceTime Benchmark Suite
Standardized benchmarks for measuring and comparing space-time tradeoffs across algorithms and systems.
Features
- Standard Benchmarks: Sorting, searching, graph algorithms, matrix operations
- Real-World Workloads: Database queries, ML training, distributed computing
- Accurate Measurement: Time, memory (peak/average), cache misses, throughput
- Statistical Analysis: Compare strategies with confidence
- Reproducible Results: Controlled environment, result validation
- Visualization: Automatic plots and analysis
Installation
# From sqrtspace-tools root directory
pip install numpy matplotlib psutil
# For database benchmarks
pip install sqlite3 # Usually pre-installed
Quick Start
# Run quick benchmark suite
python spacetime_benchmarks.py --quick
# Run all benchmarks
python spacetime_benchmarks.py
# Run specific suite
python spacetime_benchmarks.py --suite sorting
# Analyze saved results
python spacetime_benchmarks.py --analyze results_20240315_143022.json
Benchmark Categories
1. Sorting Algorithms
Compare memory-time tradeoffs in sorting:
# Strategies benchmarked:
- standard: In-memory quicksort/mergesort (O(n) space)
- sqrt_n: External sort with √n buffer (O(√n) space)
- constant: Streaming sort (O(1) space)
# Example results for n=1,000,000:
Standard: 0.125s, 8.0MB memory
√n buffer: 0.187s, 0.3MB memory (96% less memory, 50% slower)
Streaming: 0.543s, 0.01MB memory (99.9% less memory, 4.3x slower)
2. Search Data Structures
Compare different index structures:
# Strategies benchmarked:
- hash: Standard hash table (O(n) space)
- btree: B-tree index (O(n) space, cache-friendly)
- external: External index with √n cache
# Example results for n=1,000,000:
Hash table: 0.003s per query, 40MB memory
B-tree: 0.008s per query, 35MB memory
External: 0.025s per query, 2MB memory (95% less)
3. Database Operations
Real SQLite database with different cache configurations:
# Strategies benchmarked:
- standard: Default cache size (2000 pages)
- sqrt_n: √n cache pages
- minimal: Minimal cache (10 pages)
# Example results for n=100,000 rows:
Standard: 1000 queries in 0.45s, 16MB cache
√n cache: 1000 queries in 0.52s, 1.2MB cache
Minimal: 1000 queries in 1.83s, 0.08MB cache
4. ML Training
Neural network training with memory optimizations:
# Strategies benchmarked:
- standard: Keep all activations for backprop
- gradient_checkpoint: Recompute activations (√n checkpoints)
- mixed_precision: FP16 compute, FP32 master weights
# Example results for 50,000 samples:
Standard: 2.3s, 195MB peak memory
Checkpointing: 2.8s, 42MB peak memory (78% less)
Mixed precision: 2.1s, 98MB peak memory (50% less)
5. Graph Algorithms
Graph traversal with memory constraints:
# Strategies benchmarked:
- bfs: Standard breadth-first search
- dfs_iterative: Depth-first with explicit stack
- memory_bounded: Limited queue size (like IDA*)
# Example results for n=50,000 nodes:
BFS: 0.18s, 12MB memory (full frontier)
DFS: 0.15s, 4MB memory (stack only)
Bounded: 0.31s, 0.8MB memory (√n queue)
6. Matrix Operations
Cache-aware matrix multiplication:
# Strategies benchmarked:
- standard: Naive multiplication
- blocked: Cache-blocked multiplication
- streaming: Row-by-row streaming
# Example results for 2000×2000 matrices:
Standard: 1.2s, 32MB memory
Blocked: 0.8s, 32MB memory (33% faster)
Streaming: 3.5s, 0.5MB memory (98% less memory)
Running Benchmarks
Command Line Options
# Run all benchmarks
python spacetime_benchmarks.py
# Quick benchmarks (subset for testing)
python spacetime_benchmarks.py --quick
# Specific suite only
python spacetime_benchmarks.py --suite sorting
python spacetime_benchmarks.py --suite database
python spacetime_benchmarks.py --suite ml
# With automatic plotting
python spacetime_benchmarks.py --plot
# Analyze previous results
python spacetime_benchmarks.py --analyze results_20240315_143022.json
Programmatic Usage
from spacetime_benchmarks import BenchmarkRunner, benchmark_sorting
runner = BenchmarkRunner()
# Run single benchmark
result = runner.run_benchmark(
name="Custom Sort",
category=BenchmarkCategory.SORTING,
strategy="sqrt_n",
benchmark_func=benchmark_sorting,
data_size=1000000
)
print(f"Time: {result.time_seconds:.3f}s")
print(f"Memory: {result.memory_peak_mb:.1f}MB")
print(f"Space-Time Product: {result.space_time_product:.1f}")
# Compare strategies
comparisons = runner.compare_strategies(
name="Sort Comparison",
category=BenchmarkCategory.SORTING,
benchmark_func=benchmark_sorting,
strategies=["standard", "sqrt_n", "constant"],
data_sizes=[10000, 100000, 1000000]
)
for comp in comparisons:
print(f"\n{comp.baseline.strategy} vs {comp.optimized.strategy}:")
print(f" Memory reduction: {comp.memory_reduction:.1f}%")
print(f" Time overhead: {comp.time_overhead:.1f}%")
print(f" Recommendation: {comp.recommendation}")
Custom Benchmarks
Add your own benchmarks:
def benchmark_custom_algorithm(n: int, strategy: str = 'standard', **kwargs) -> int:
"""Custom algorithm with space-time tradeoffs"""
if strategy == 'standard':
# O(n) space implementation
data = list(range(n))
# ... algorithm ...
return n # Return operation count
elif strategy == 'memory_efficient':
# O(√n) space implementation
buffer_size = int(np.sqrt(n))
# ... algorithm ...
return n
# Register and run
runner = BenchmarkRunner()
runner.compare_strategies(
"Custom Algorithm",
BenchmarkCategory.CUSTOM,
benchmark_custom_algorithm,
["standard", "memory_efficient"],
[1000, 10000, 100000]
)
Understanding Results
Key Metrics
- Time (seconds): Wall-clock execution time
- Peak Memory (MB): Maximum memory usage during execution
- Average Memory (MB): Average memory over execution
- Throughput (ops/sec): Operations completed per second
- Space-Time Product: Memory × Time (lower is better)
Interpreting Comparisons
Comparison standard vs sqrt_n:
Memory reduction: 94.3% # How much less memory
Time overhead: 47.2% # How much slower
Space-time improvement: 91.8% # Overall efficiency gain
Recommendation: Use sqrt_n for 94% memory savings
When to Use Each Strategy
| Strategy | Use When | Avoid When |
|---|---|---|
| Standard | Memory abundant, Speed critical | Memory constrained |
| √n Optimized | Memory limited, Moderate slowdown OK | Real-time systems |
| O(log n) | Extreme memory constraints | Random access needed |
| O(1) Space | Streaming data, Minimal memory | Need multiple passes |
Benchmark Output
Results File Format
{
"system_info": {
"cpu_count": 8,
"memory_gb": 32.0,
"l3_cache_mb": 12.0
},
"results": [
{
"name": "Sorting",
"category": "sorting",
"strategy": "sqrt_n",
"data_size": 1000000,
"time_seconds": 0.187,
"memory_peak_mb": 8.2,
"memory_avg_mb": 6.5,
"throughput": 5347593.5,
"space_time_product": 1.534,
"metadata": {
"success": true,
"operations": 1000000
}
}
],
"timestamp": 1710512345.678
}
Visualization
Automatic plots show:
- Time complexity curves
- Memory usage scaling
- Space-time product comparison
- Throughput vs data size
Performance Tips
-
System Preparation:
# Disable CPU frequency scaling sudo cpupower frequency-set -g performance # Clear caches sync && echo 3 | sudo tee /proc/sys/vm/drop_caches -
Accurate Memory Measurement:
- Results include Python overhead
- Use
memory_peak_mbfor maximum usage memory_avg_mbshows typical usage
-
Reproducibility:
- Run multiple times and average
- Control background processes
- Use consistent data sizes
Extending the Suite
Adding New Categories
class BenchmarkCategory(Enum):
# ... existing categories ...
CUSTOM = "custom"
def custom_suite(runner: BenchmarkRunner):
"""Run custom benchmarks"""
strategies = ['approach1', 'approach2']
data_sizes = [1000, 10000, 100000]
runner.compare_strategies(
"Custom Workload",
BenchmarkCategory.CUSTOM,
benchmark_custom,
strategies,
data_sizes
)
Platform-Specific Metrics
def get_cache_misses():
"""Get L3 cache misses (Linux perf)"""
if platform.system() == 'Linux':
# Use perf_event_open or read from perf
pass
return None
Real-World Insights
From our benchmarks:
-
√n strategies typically save 90-99% memory with 20-100% time overhead
-
Cache-aware algorithms can be faster despite theoretical complexity
-
Memory bandwidth often dominates over computational complexity
-
Optimal strategy depends on:
- Data size vs available memory
- Latency requirements
- Power/cost constraints
Troubleshooting
Memory Measurements Seem Low
- Python may not release memory immediately
- Use
gc.collect()before benchmarks - Check for lazy evaluation
High Variance in Results
- Disable CPU throttling
- Close other applications
- Increase data sizes for stability
Database Benchmarks Fail
- Ensure write permissions in output directory
- Check SQLite installation
- Verify disk space available
Contributing
Add new benchmarks following the pattern:
- Implement
benchmark_*function - Return operation count
- Handle different strategies
- Add suite function
- Update documentation
See Also
- SpaceTimeCore: Core calculations
- Profiler: Profile your applications
- Visual Explorer: Visualize tradeoffs