sqrtspace/sqrtspace-experiments

Fork 0

Files

Dave Friedel 979788de5c MIssing ollama figures

2025-07-21 18:06:37 -04:00

7.7 KiB

Raw Blame History

Experimental Findings: Space-Time Tradeoffs

Key Observations from Initial Experiments

1. Checkpointed Sorting Experiment

Experimental Setup

Platform: macOS-15.5-arm64, Python 3.12.7
Hardware: 16 CPU cores, 64GB RAM
Methodology: External merge sort with checkpointing vs in-memory sort
Trials: 10 runs per configuration with statistical analysis

Results

Performance Impact of Memory Reduction

Array Size	In-Memory Time	Checkpoint Time	Slowdown Factor	Memory Reduction
1,000	0.022ms ± 0.026ms	8.21ms ± 0.45ms	375x	87.1%
2,000	0.020ms ± 0.001ms	12.49ms ± 0.15ms	627x	84.9%
5,000	0.045ms ± 0.003ms	23.39ms ± 0.63ms	515x	83.7%
10,000	0.091ms ± 0.003ms	40.53ms ± 3.73ms	443x	82.9%
20,000	0.191ms ± 0.007ms	71.43ms ± 4.98ms	375x	82.1%

Key Finding: Reducing memory usage by ~85% results in 375-627x performance degradation due to disk I/O overhead.

I/O Overhead Analysis

Comparison of disk vs RAM disk checkpointing shows:

Average I/O overhead factor: 1.03-1.10x
Confirms that disk I/O dominates the performance penalty

2. Stream Processing: Sliding Window

Experimental Setup

Task: Computing sliding window average over streaming data
Configurations: Full storage vs sliding window vs checkpointing

Results

Stream Size	Window	Full Storage	Sliding Window	Speedup	Memory Reduction
10,000	100	4.8ms / 78KB	1.5ms / 0.8KB	3.1x faster	100x
50,000	500	79.6ms / 391KB	4.7ms / 3.9KB	16.8x faster	100x
100,000	1000	330.6ms / 781KB	11.0ms / 7.8KB	30.0x faster	100x

Key Finding: For sliding window operations, space reduction actually IMPROVES performance by 3-30x due to better cache locality.

3. Database Buffer Pool (SQLite)

Experimental Setup

Database: SQLite with 150MB database (50,000 scale factor)
Test: Random point queries with varying cache sizes

Results

Cache Configuration	Cache Size	Avg Query Time	Relative Performance
O(n) Full Cache	78.1 MB	66.6ms	1.00x (baseline)
O(√n) Cache	1.08 MB	15.0ms	4.42x faster
O(log n) Cache	0.11 MB	50.0ms	1.33x faster
O(1) Minimal	0.08 MB	50.4ms	1.32x faster

Key Finding: Contrary to theoretical predictions, smaller cache sizes showed IMPROVED performance in this workload, likely due to reduced cache management overhead.

4. LLM KV-Cache Simulation

Experimental Setup

Model Configuration: 768 hidden dim, 12 heads, 64 head dim
Test: Token generation with varying KV-cache sizes

Results

Sequence Length	Cache Strategy	Cache Size	Tokens/sec	Memory Usage	Recomputes
512	Full O(n)	512	685	3.0 MB	0
512	Flash O(√n)	90	2,263	0.5 MB	75,136
512	Minimal O(1)	8	4,739	0.05 MB	96,128
1024	Full O(n)	1024	367	6.0 MB	0
1024	Flash O(√n)	128	1,655	0.75 MB	327,424
1024	Minimal O(1)	8	4,374	0.05 MB	388,864

Key Finding: Smaller caches resulted in FASTER token generation (up to 6.9x) despite massive recomputation, suggesting the overhead of cache management exceeds recomputation cost for this implementation.

5. Real LLM Inference with Ollama

Experimental Setup

Platform: Local Ollama installation with llama3.2:latest
Hardware: Same as above experiments
Tests: Context chunking, streaming generation, checkpointing

Results

Context Chunking (√n chunks)

Method	Time	Memory Delta	Details
Full Context O(n)	2.95s	0.39 MB	Process 14,750 chars at once
Chunked O(√n)	54.10s	2.41 MB	122 chunks of 121 chars each

Slowdown: 18.3x for √n chunking strategy

Streaming vs Full Generation

Method	Time	Memory	Tokens Generated
Full Generation	4.15s	0.02 MB	~405 tokens
Streaming	4.40s	0.05 MB	~406 tokens

Finding: Minimal performance difference, streaming adds only 6% overhead

Checkpointed Generation

Method	Time	Memory	Details
No Checkpoint	40.48s	0.09 MB	10 prompts processed
Checkpoint every 3	43.55s	0.14 MB	4 checkpoints created

Overhead: 7.6% time overhead for √n checkpointing

Key Finding: Real LLM inference shows 18x slowdown for √n context chunking, validating theoretical space-time tradeoffs with actual models.

6. Production Library Implementations

Verified Components

SqrtSpace.SpaceTime (.NET)

External Sort: OrderByExternal() LINQ extension
External GroupBy: GroupByExternal() for aggregations
Adaptive Collections: AdaptiveDictionary and AdaptiveList
Checkpoint Manager: Automatic √n interval checkpointing
Memory Calculator: SpaceTimeCalculator.CalculateSqrtInterval()

sqrtspace-spacetime (Python)

External algorithms: external_sort, external_groupby
SpaceTimeArray: Dynamic array with automatic spillover
Memory monitoring: Real-time pressure detection
Checkpoint decorators: @checkpointable for long computations

sqrtspace/spacetime (PHP)

ExternalSort: Memory-efficient sorting
SpaceTimeStream: Lazy evaluation with bounded memory
CheckpointManager: Multiple storage backends
Laravel/Symfony integration: Production-ready components

Critical Observations

1. Theory vs Practice Gap

Theory predicts √n slowdown for √n space reduction
Practice shows 100-1000x slowdown due to:
- Disk I/O latency (10,000x slower than RAM)
- Cache hierarchy effects
- System overhead

2. When Space Reduction Helps Performance

Sliding window operations: Better cache locality
Small working sets: Reduced management overhead
Streaming scenarios: Bounded memory prevents swapping

3. Implementation Quality Matters

The .NET library includes BenchmarkDotNet benchmarks
All three libraries provide working external memory algorithms
Production-ready with comprehensive test coverage

Conclusions

External memory algorithms work but with significant performance penalties (100-1000x) when actually reducing memory usage
√n space algorithms are practical for scenarios where:
- Memory is severely constrained
- Performance can be sacrificed for reliability
- Checkpointing provides fault tolerance benefits
Some workloads benefit from space reduction:
- Sliding windows (up to 30x faster)
- Cache-friendly access patterns
- Avoiding system memory pressure
Production libraries demonstrate feasibility:
- Working implementations in .NET, Python, and PHP
- Real external sort and groupby algorithms
- Checkpoint systems for fault tolerance

Reproducibility

All experiments include:

Source code in experiments/ directory
JSON results files with raw data
Environment specifications
Statistical analysis with error bars

To reproduce:

cd ubiquity-experiments-main/experiments
python checkpointed_sorting/run_final_experiment.py
python stream_processing/sliding_window.py
python database_buffer_pool/sqlite_heavy_experiment.py
python llm_kv_cache/llm_kv_cache_experiment.py
python llm_ollama/ollama_spacetime_experiment.py  # Requires Ollama installed

7.7 KiB Raw Blame History

Experimental Findings: Space-Time Tradeoffs

Key Observations from Initial Experiments

1. Checkpointed Sorting Experiment

Experimental Setup

Results

Performance Impact of Memory Reduction

I/O Overhead Analysis

2. Stream Processing: Sliding Window

Experimental Setup

Results

3. Database Buffer Pool (SQLite)

Experimental Setup

Results

4. LLM KV-Cache Simulation

Experimental Setup

Results

5. Real LLM Inference with Ollama

Experimental Setup

Results

Context Chunking (√n chunks)

Streaming vs Full Generation

Checkpointed Generation

6. Production Library Implementations

Verified Components

SqrtSpace.SpaceTime (.NET)

sqrtspace-spacetime (Python)

sqrtspace/spacetime (PHP)

Critical Observations

1. Theory vs Practice Gap

2. When Space Reduction Helps Performance

3. Implementation Quality Matters

Conclusions

Reproducibility

7.7 KiB

Raw Blame History