| .. | ||
| checkpointed_sort.py | ||
| experiment_environment.json | ||
| fast_checkpoint_sort.py | ||
| final_experiment_results.json | ||
| memory_usage_analysis.png | ||
| paper_sorting_figure.png | ||
| README.md | ||
| rigorous_experiment.py | ||
| rigorous_sorting_analysis.png | ||
| run_final_experiment.py | ||
| run_reduced.py | ||
| simple_sort_demo.py | ||
| sorting_memory.png | ||
| sorting_tradeoff.png | ||
| test_quick.py | ||
| test_rigorous.py | ||
Checkpointed Sorting Experiment
Overview
This experiment demonstrates how external merge sort with limited memory exhibits the space-time tradeoff predicted by Williams' 2025 result.
Key Concepts
Standard In-Memory Sort
- Space: O(n) - entire array in memory
- Time: O(n log n) - optimal comparison-based sorting
- Example: Python's built-in sort, quicksort
Checkpointed External Sort
- Space: O(√n) - only √n elements in memory at once
- Time: O(n√n) - due to disk I/O and recomputation
- Technique: Sort chunks that fit in memory, merge with limited buffers
Extreme Space-Limited Sort
- Space: O(log n) - minimal memory usage
- Time: O(n²) - extensive recomputation required
- Technique: Iterative merging with frequent checkpointing
Running the Experiments
Quick Test
python test_quick.py
Runs with small input sizes (100-1000) to verify correctness.
Full Experiment
python run_final_experiment.py
Runs complete experiment with:
- Input sizes: 1000, 2000, 5000, 10000, 20000
- 10 trials per size for statistical significance
- RAM disk comparison to isolate I/O overhead
- Generates publication-quality plots
Rigorous Analysis
python rigorous_experiment.py
Comprehensive experiment with:
- 20 trials per size
- Detailed memory profiling
- Environment logging
- Statistical analysis with confidence intervals
Actual Results (Apple M3 Max, 64GB RAM)
| Input Size | In-Memory Time | Checkpointed Time | Slowdown | Memory Reduction |
|---|---|---|---|---|
| 1,000 | 0.022 ms | 8.2 ms | 375× | 0.1× (overhead) |
| 5,000 | 0.045 ms | 23.4 ms | 516× | 0.2× |
| 10,000 | 0.091 ms | 40.5 ms | 444× | 0.2× |
| 20,000 | 0.191 ms | 71.4 ms | 375× | 0.2× |
Note: Memory shows algorithmic overhead due to Python's memory management.
Key Findings
- Massive Constant Factors: 375-627× slowdown instead of theoretical √n
- I/O Not Dominant: Fast NVMe SSDs show only 1.0-1.1× I/O overhead
- Scaling Confirmed: Power law fits show n^1.0 for in-memory, n^1.4 for checkpointed
Real-World Applications
- Database Systems: External sorting for large datasets
- MapReduce: Shuffle phase with limited memory
- Video Processing: Frame-by-frame processing with checkpoints
- Scientific Computing: Out-of-core algorithms
Visualization
The experiment generates:
paper_sorting_figure.png- Clean figure for publicationrigorous_sorting_analysis.png- Detailed analysis with error barsmemory_usage_analysis.png- Memory scaling comparisonexperiment_environment.json- Hardware/software configurationfinal_experiment_results.json- Raw experimental data
Dependencies
pip install numpy scipy matplotlib psutil
Reproducing Results
To reproduce our results exactly:
- Ensure CPU frequency scaling is disabled
- Close all other applications
- Run on a machine with fast SSD (>3GB/s read)
- Use Python 3.10+ with NumPy 2.0+