History

Dave Friedel 59539f4daa Initial		2025-07-20 03:56:21 -04:00
..
checkpointed_sort.py	Initial	2025-07-20 03:56:21 -04:00
experiment_environment.json	Initial	2025-07-20 03:56:21 -04:00
fast_checkpoint_sort.py	Initial	2025-07-20 03:56:21 -04:00
final_experiment_results.json	Initial	2025-07-20 03:56:21 -04:00
memory_usage_analysis.png	Initial	2025-07-20 03:56:21 -04:00
paper_sorting_figure.png	Initial	2025-07-20 03:56:21 -04:00
README.md	Initial	2025-07-20 03:56:21 -04:00
rigorous_experiment.py	Initial	2025-07-20 03:56:21 -04:00
rigorous_sorting_analysis.png	Initial	2025-07-20 03:56:21 -04:00
run_final_experiment.py	Initial	2025-07-20 03:56:21 -04:00
run_reduced.py	Initial	2025-07-20 03:56:21 -04:00
simple_sort_demo.py	Initial	2025-07-20 03:56:21 -04:00
sorting_memory.png	Initial	2025-07-20 03:56:21 -04:00
sorting_tradeoff.png	Initial	2025-07-20 03:56:21 -04:00
test_quick.py	Initial	2025-07-20 03:56:21 -04:00
test_rigorous.py	Initial	2025-07-20 03:56:21 -04:00

README.md

Checkpointed Sorting Experiment

Overview

This experiment demonstrates how external merge sort with limited memory exhibits the space-time tradeoff predicted by Williams' 2025 result.

Key Concepts

Standard In-Memory Sort

Space: O(n) - entire array in memory
Time: O(n log n) - optimal comparison-based sorting
Example: Python's built-in sort, quicksort

Checkpointed External Sort

Space: O(√n) - only √n elements in memory at once
Time: O(n√n) - due to disk I/O and recomputation
Technique: Sort chunks that fit in memory, merge with limited buffers

Extreme Space-Limited Sort

Space: O(log n) - minimal memory usage
Time: O(n²) - extensive recomputation required
Technique: Iterative merging with frequent checkpointing

Running the Experiments

Quick Test

python test_quick.py

Runs with small input sizes (100-1000) to verify correctness.

Full Experiment

python run_final_experiment.py

Runs complete experiment with:

Input sizes: 1000, 2000, 5000, 10000, 20000
10 trials per size for statistical significance
RAM disk comparison to isolate I/O overhead
Generates publication-quality plots

Rigorous Analysis

python rigorous_experiment.py

Comprehensive experiment with:

20 trials per size
Detailed memory profiling
Environment logging
Statistical analysis with confidence intervals

Actual Results (Apple M3 Max, 64GB RAM)

Input Size	In-Memory Time	Checkpointed Time	Slowdown	Memory Reduction
1,000	0.022 ms	8.2 ms	375×	0.1× (overhead)
5,000	0.045 ms	23.4 ms	516×	0.2×
10,000	0.091 ms	40.5 ms	444×	0.2×
20,000	0.191 ms	71.4 ms	375×	0.2×

Note: Memory shows algorithmic overhead due to Python's memory management.

Key Findings

Massive Constant Factors: 375-627× slowdown instead of theoretical √n
I/O Not Dominant: Fast NVMe SSDs show only 1.0-1.1× I/O overhead
Scaling Confirmed: Power law fits show n^1.0 for in-memory, n^1.4 for checkpointed

Real-World Applications

Database Systems: External sorting for large datasets
MapReduce: Shuffle phase with limited memory
Video Processing: Frame-by-frame processing with checkpoints
Scientific Computing: Out-of-core algorithms

Visualization

The experiment generates:

paper_sorting_figure.png - Clean figure for publication
rigorous_sorting_analysis.png - Detailed analysis with error bars
memory_usage_analysis.png - Memory scaling comparison
experiment_environment.json - Hardware/software configuration
final_experiment_results.json - Raw experimental data

Dependencies

pip install numpy scipy matplotlib psutil

Reproducing Results

To reproduce our results exactly:

Ensure CPU frequency scaling is disabled
Close all other applications
Run on a machine with fast SSD (>3GB/s read)
Use Python 3.10+ with NumPy 2.0+

README.md Unescape Escape