Files
sqrtspace-tools/benchmarks/README.md
2025-07-20 04:04:41 -04:00

392 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SpaceTime Benchmark Suite
Standardized benchmarks for measuring and comparing space-time tradeoffs across algorithms and systems.
## Features
- **Standard Benchmarks**: Sorting, searching, graph algorithms, matrix operations
- **Real-World Workloads**: Database queries, ML training, distributed computing
- **Accurate Measurement**: Time, memory (peak/average), cache misses, throughput
- **Statistical Analysis**: Compare strategies with confidence
- **Reproducible Results**: Controlled environment, result validation
- **Visualization**: Automatic plots and analysis
## Installation
```bash
# From sqrtspace-tools root directory
pip install numpy matplotlib psutil
# For database benchmarks
pip install sqlite3 # Usually pre-installed
```
## Quick Start
```bash
# Run quick benchmark suite
python spacetime_benchmarks.py --quick
# Run all benchmarks
python spacetime_benchmarks.py
# Run specific suite
python spacetime_benchmarks.py --suite sorting
# Analyze saved results
python spacetime_benchmarks.py --analyze results_20240315_143022.json
```
## Benchmark Categories
### 1. Sorting Algorithms
Compare memory-time tradeoffs in sorting:
```python
# Strategies benchmarked:
- standard: In-memory quicksort/mergesort (O(n) space)
- sqrt_n: External sort with n buffer (O(n) space)
- constant: Streaming sort (O(1) space)
# Example results for n=1,000,000:
Standard: 0.125s, 8.0MB memory
n buffer: 0.187s, 0.3MB memory (96% less memory, 50% slower)
Streaming: 0.543s, 0.01MB memory (99.9% less memory, 4.3x slower)
```
### 2. Search Data Structures
Compare different index structures:
```python
# Strategies benchmarked:
- hash: Standard hash table (O(n) space)
- btree: B-tree index (O(n) space, cache-friendly)
- external: External index with n cache
# Example results for n=1,000,000:
Hash table: 0.003s per query, 40MB memory
B-tree: 0.008s per query, 35MB memory
External: 0.025s per query, 2MB memory (95% less)
```
### 3. Database Operations
Real SQLite database with different cache configurations:
```python
# Strategies benchmarked:
- standard: Default cache size (2000 pages)
- sqrt_n: n cache pages
- minimal: Minimal cache (10 pages)
# Example results for n=100,000 rows:
Standard: 1000 queries in 0.45s, 16MB cache
n cache: 1000 queries in 0.52s, 1.2MB cache
Minimal: 1000 queries in 1.83s, 0.08MB cache
```
### 4. ML Training
Neural network training with memory optimizations:
```python
# Strategies benchmarked:
- standard: Keep all activations for backprop
- gradient_checkpoint: Recompute activations (n checkpoints)
- mixed_precision: FP16 compute, FP32 master weights
# Example results for 50,000 samples:
Standard: 2.3s, 195MB peak memory
Checkpointing: 2.8s, 42MB peak memory (78% less)
Mixed precision: 2.1s, 98MB peak memory (50% less)
```
### 5. Graph Algorithms
Graph traversal with memory constraints:
```python
# Strategies benchmarked:
- bfs: Standard breadth-first search
- dfs_iterative: Depth-first with explicit stack
- memory_bounded: Limited queue size (like IDA*)
# Example results for n=50,000 nodes:
BFS: 0.18s, 12MB memory (full frontier)
DFS: 0.15s, 4MB memory (stack only)
Bounded: 0.31s, 0.8MB memory (n queue)
```
### 6. Matrix Operations
Cache-aware matrix multiplication:
```python
# Strategies benchmarked:
- standard: Naive multiplication
- blocked: Cache-blocked multiplication
- streaming: Row-by-row streaming
# Example results for 2000×2000 matrices:
Standard: 1.2s, 32MB memory
Blocked: 0.8s, 32MB memory (33% faster)
Streaming: 3.5s, 0.5MB memory (98% less memory)
```
## Running Benchmarks
### Command Line Options
```bash
# Run all benchmarks
python spacetime_benchmarks.py
# Quick benchmarks (subset for testing)
python spacetime_benchmarks.py --quick
# Specific suite only
python spacetime_benchmarks.py --suite sorting
python spacetime_benchmarks.py --suite database
python spacetime_benchmarks.py --suite ml
# With automatic plotting
python spacetime_benchmarks.py --plot
# Analyze previous results
python spacetime_benchmarks.py --analyze results_20240315_143022.json
```
### Programmatic Usage
```python
from spacetime_benchmarks import BenchmarkRunner, benchmark_sorting
runner = BenchmarkRunner()
# Run single benchmark
result = runner.run_benchmark(
name="Custom Sort",
category=BenchmarkCategory.SORTING,
strategy="sqrt_n",
benchmark_func=benchmark_sorting,
data_size=1000000
)
print(f"Time: {result.time_seconds:.3f}s")
print(f"Memory: {result.memory_peak_mb:.1f}MB")
print(f"Space-Time Product: {result.space_time_product:.1f}")
# Compare strategies
comparisons = runner.compare_strategies(
name="Sort Comparison",
category=BenchmarkCategory.SORTING,
benchmark_func=benchmark_sorting,
strategies=["standard", "sqrt_n", "constant"],
data_sizes=[10000, 100000, 1000000]
)
for comp in comparisons:
print(f"\n{comp.baseline.strategy} vs {comp.optimized.strategy}:")
print(f" Memory reduction: {comp.memory_reduction:.1f}%")
print(f" Time overhead: {comp.time_overhead:.1f}%")
print(f" Recommendation: {comp.recommendation}")
```
## Custom Benchmarks
Add your own benchmarks:
```python
def benchmark_custom_algorithm(n: int, strategy: str = 'standard', **kwargs) -> int:
"""Custom algorithm with space-time tradeoffs"""
if strategy == 'standard':
# O(n) space implementation
data = list(range(n))
# ... algorithm ...
return n # Return operation count
elif strategy == 'memory_efficient':
# O(√n) space implementation
buffer_size = int(np.sqrt(n))
# ... algorithm ...
return n
# Register and run
runner = BenchmarkRunner()
runner.compare_strategies(
"Custom Algorithm",
BenchmarkCategory.CUSTOM,
benchmark_custom_algorithm,
["standard", "memory_efficient"],
[1000, 10000, 100000]
)
```
## Understanding Results
### Key Metrics
1. **Time (seconds)**: Wall-clock execution time
2. **Peak Memory (MB)**: Maximum memory usage during execution
3. **Average Memory (MB)**: Average memory over execution
4. **Throughput (ops/sec)**: Operations completed per second
5. **Space-Time Product**: Memory × Time (lower is better)
### Interpreting Comparisons
```
Comparison standard vs sqrt_n:
Memory reduction: 94.3% # How much less memory
Time overhead: 47.2% # How much slower
Space-time improvement: 91.8% # Overall efficiency gain
Recommendation: Use sqrt_n for 94% memory savings
```
### When to Use Each Strategy
| Strategy | Use When | Avoid When |
|----------|----------|------------|
| Standard | Memory abundant, Speed critical | Memory constrained |
| √n Optimized | Memory limited, Moderate slowdown OK | Real-time systems |
| O(log n) | Extreme memory constraints | Random access needed |
| O(1) Space | Streaming data, Minimal memory | Need multiple passes |
## Benchmark Output
### Results File Format
```json
{
"system_info": {
"cpu_count": 8,
"memory_gb": 32.0,
"l3_cache_mb": 12.0
},
"results": [
{
"name": "Sorting",
"category": "sorting",
"strategy": "sqrt_n",
"data_size": 1000000,
"time_seconds": 0.187,
"memory_peak_mb": 8.2,
"memory_avg_mb": 6.5,
"throughput": 5347593.5,
"space_time_product": 1.534,
"metadata": {
"success": true,
"operations": 1000000
}
}
],
"timestamp": 1710512345.678
}
```
### Visualization
Automatic plots show:
- Time complexity curves
- Memory usage scaling
- Space-time product comparison
- Throughput vs data size
## Performance Tips
1. **System Preparation**:
```bash
# Disable CPU frequency scaling
sudo cpupower frequency-set -g performance
# Clear caches
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
```
2. **Accurate Memory Measurement**:
- Results include Python overhead
- Use `memory_peak_mb` for maximum usage
- `memory_avg_mb` shows typical usage
3. **Reproducibility**:
- Run multiple times and average
- Control background processes
- Use consistent data sizes
## Extending the Suite
### Adding New Categories
```python
class BenchmarkCategory(Enum):
# ... existing categories ...
CUSTOM = "custom"
def custom_suite(runner: BenchmarkRunner):
"""Run custom benchmarks"""
strategies = ['approach1', 'approach2']
data_sizes = [1000, 10000, 100000]
runner.compare_strategies(
"Custom Workload",
BenchmarkCategory.CUSTOM,
benchmark_custom,
strategies,
data_sizes
)
```
### Platform-Specific Metrics
```python
def get_cache_misses():
"""Get L3 cache misses (Linux perf)"""
if platform.system() == 'Linux':
# Use perf_event_open or read from perf
pass
return None
```
## Real-World Insights
From our benchmarks:
1. **√n strategies typically save 90-99% memory** with 20-100% time overhead
2. **Cache-aware algorithms can be faster** despite theoretical complexity
3. **Memory bandwidth often dominates** over computational complexity
4. **Optimal strategy depends on**:
- Data size vs available memory
- Latency requirements
- Power/cost constraints
## Troubleshooting
### Memory Measurements Seem Low
- Python may not release memory immediately
- Use `gc.collect()` before benchmarks
- Check for lazy evaluation
### High Variance in Results
- Disable CPU throttling
- Close other applications
- Increase data sizes for stability
### Database Benchmarks Fail
- Ensure write permissions in output directory
- Check SQLite installation
- Verify disk space available
## Contributing
Add new benchmarks following the pattern:
1. Implement `benchmark_*` function
2. Return operation count
3. Handle different strategies
4. Add suite function
5. Update documentation
## See Also
- [SpaceTimeCore](../core/spacetime_core.py): Core calculations
- [Profiler](../profiler/): Profile your applications
- [Visual Explorer](../explorer/): Visualize tradeoffs