sqrtspace-experiments/experiments/checkpointed_sorting/README.md
2025-07-20 03:56:21 -04:00

3.1 KiB
Raw Blame History

Checkpointed Sorting Experiment

Overview

This experiment demonstrates how external merge sort with limited memory exhibits the space-time tradeoff predicted by Williams' 2025 result.

Key Concepts

Standard In-Memory Sort

  • Space: O(n) - entire array in memory
  • Time: O(n log n) - optimal comparison-based sorting
  • Example: Python's built-in sort, quicksort

Checkpointed External Sort

  • Space: O(√n) - only √n elements in memory at once
  • Time: O(n√n) - due to disk I/O and recomputation
  • Technique: Sort chunks that fit in memory, merge with limited buffers

Extreme Space-Limited Sort

  • Space: O(log n) - minimal memory usage
  • Time: O(n²) - extensive recomputation required
  • Technique: Iterative merging with frequent checkpointing

Running the Experiments

Quick Test

python test_quick.py

Runs with small input sizes (100-1000) to verify correctness.

Full Experiment

python run_final_experiment.py

Runs complete experiment with:

  • Input sizes: 1000, 2000, 5000, 10000, 20000
  • 10 trials per size for statistical significance
  • RAM disk comparison to isolate I/O overhead
  • Generates publication-quality plots

Rigorous Analysis

python rigorous_experiment.py

Comprehensive experiment with:

  • 20 trials per size
  • Detailed memory profiling
  • Environment logging
  • Statistical analysis with confidence intervals

Actual Results (Apple M3 Max, 64GB RAM)

Input Size In-Memory Time Checkpointed Time Slowdown Memory Reduction
1,000 0.022 ms 8.2 ms 375× 0.1× (overhead)
5,000 0.045 ms 23.4 ms 516× 0.2×
10,000 0.091 ms 40.5 ms 444× 0.2×
20,000 0.191 ms 71.4 ms 375× 0.2×

Note: Memory shows algorithmic overhead due to Python's memory management.

Key Findings

  1. Massive Constant Factors: 375-627× slowdown instead of theoretical √n
  2. I/O Not Dominant: Fast NVMe SSDs show only 1.0-1.1× I/O overhead
  3. Scaling Confirmed: Power law fits show n^1.0 for in-memory, n^1.4 for checkpointed

Real-World Applications

  • Database Systems: External sorting for large datasets
  • MapReduce: Shuffle phase with limited memory
  • Video Processing: Frame-by-frame processing with checkpoints
  • Scientific Computing: Out-of-core algorithms

Visualization

The experiment generates:

  1. paper_sorting_figure.png - Clean figure for publication
  2. rigorous_sorting_analysis.png - Detailed analysis with error bars
  3. memory_usage_analysis.png - Memory scaling comparison
  4. experiment_environment.json - Hardware/software configuration
  5. final_experiment_results.json - Raw experimental data

Dependencies

pip install numpy scipy matplotlib psutil

Reproducing Results

To reproduce our results exactly:

  1. Ensure CPU frequency scaling is disabled
  2. Close all other applications
  3. Run on a machine with fast SSD (>3GB/s read)
  4. Use Python 3.10+ with NumPy 2.0+