- Update all repository URLs to git.marketally.com - Remove GitHub Actions workflows (not compatible with Gitea) - Update documentation links in README, CONTRIBUTING, and CHANGELOG |
||
|---|---|---|
| config | ||
| examples | ||
| src | ||
| tests | ||
| .editorconfig | ||
| .gitattributes | ||
| .gitignore | ||
| .php-cs-fixer.php | ||
| CHANGELOG.md | ||
| composer.json | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| phpstan.neon.dist | ||
| phpunit.xml.dist | ||
| README.md | ||
SqrtSpace SpaceTime for PHP
Memory-efficient algorithms and data structures for PHP using Williams' √n space-time tradeoffs.
Paper Repository: git.marketally.com/sqrtspace/sqrtspace-paper
Installation
composer require sqrtspace/spacetime
Core Concepts
SpaceTime implements theoretical computer science results showing that many algorithms can achieve better memory usage by accepting slightly slower runtime. The key insight is using √n memory instead of n memory, where n is the input size.
Key Features
- External Sorting: Sort large datasets that don't fit in memory
- External Grouping: Group and aggregate data with minimal memory usage
- Streaming Operations: Process files and data streams efficiently
- Memory Pressure Handling: Automatic response to low memory conditions
- Checkpoint/Resume: Save progress and resume long-running operations
- Laravel Integration: Deep integration with Laravel collections and queries
Quick Start
use SqrtSpace\SpaceTime\Collections\SpaceTimeArray;
use SqrtSpace\SpaceTime\Algorithms\ExternalSort;
// Handle large arrays with automatic memory management
$array = new SpaceTimeArray();
for ($i = 0; $i < 10000000; $i++) {
$array[] = random_int(1, 1000000);
}
// Sort large datasets using only √n memory
$sorted = ExternalSort::sort($array);
// Process in optimal chunks
foreach ($array->chunkBySqrtN() as $chunk) {
processChunk($chunk);
}
Examples
Basic Examples
See examples/comprehensive_example.php for a complete demonstration of all features including:
- Memory-efficient arrays and dictionaries
- External sorting and grouping
- Stream processing
- CSV import/export
- Batch processing with checkpoints
- Memory pressure monitoring
Laravel Application
Check out examples/laravel-app/ for a complete Laravel application demonstrating:
- Streaming API endpoints
- Memory-efficient CSV exports
- Background job processing with checkpoints
- Real-time analytics with SSE
- Production-ready configurations
See the Laravel example README for setup instructions and detailed usage.
Features
1. Memory-Efficient Collections
use SqrtSpace\SpaceTime\Collections\SpaceTimeArray;
use SqrtSpace\SpaceTime\Collections\AdaptiveDictionary;
// Adaptive array - automatically switches between memory and disk
$array = new SpaceTimeArray();
$array->setThreshold(10000); // Switch to external storage after 10k items
// Adaptive dictionary with optimal memory usage
$dict = new AdaptiveDictionary();
for ($i = 0; $i < 1000000; $i++) {
$dict["key_$i"] = "value_$i";
}
2. External Algorithms
use SqrtSpace\SpaceTime\Algorithms\ExternalSort;
use SqrtSpace\SpaceTime\Algorithms\ExternalGroupBy;
// Sort millions of records using minimal memory
$data = getData(); // Large dataset
$sorted = ExternalSort::sort($data, fn($a, $b) => $a['date'] <=> $b['date']);
// Group by with external storage
$grouped = ExternalGroupBy::groupBy($data, fn($item) => $item['category']);
3. Streaming Operations
use SqrtSpace\SpaceTime\Streams\SpaceTimeStream;
// Process large files with bounded memory
$stream = SpaceTimeStream::fromFile('large_file.csv')
->map(fn($line) => str_getcsv($line))
->filter(fn($row) => $row[2] > 100)
->chunkBySqrtN()
->each(function($chunk) {
processBatch($chunk);
});
4. Database Integration
use SqrtSpace\SpaceTime\Database\SpaceTimeQueryBuilder;
// Process large result sets efficiently
$query = new SpaceTimeQueryBuilder($pdo);
$query->from('orders')
->where('status', '=', 'pending')
->orderByExternal('created_at', 'desc')
->chunkBySqrtN(function($orders) {
foreach ($orders as $order) {
processOrder($order);
}
});
// Stream results for minimal memory usage
$stream = $query->from('logs')
->where('level', '=', 'error')
->stream();
$stream->filter(fn($log) => strpos($log['message'], 'critical') !== false)
->each(fn($log) => alertAdmin($log));
5. Laravel Integration
// In AppServiceProvider
use SqrtSpace\SpaceTime\Laravel\SpaceTimeServiceProvider;
public function register()
{
$this->app->register(SpaceTimeServiceProvider::class);
}
// Collection macros
$collection = collect($largeArray);
// Sort using external memory
$sorted = $collection->sortByExternal('price');
// Group by with external storage
$grouped = $collection->groupByExternal('category');
// Process in √n chunks
$collection->chunkBySqrtN()->each(function ($chunk) {
processBatch($chunk);
});
// Query builder extensions
DB::table('orders')
->chunkBySqrtN(function ($orders) {
foreach ($orders as $order) {
processOrder($order);
}
});
6. Memory Pressure Handling
use SqrtSpace\SpaceTime\Memory\MemoryPressureMonitor;
use SqrtSpace\SpaceTime\Memory\Handlers\LoggingHandler;
use SqrtSpace\SpaceTime\Memory\Handlers\CacheEvictionHandler;
use SqrtSpace\SpaceTime\Memory\Handlers\GarbageCollectionHandler;
$monitor = new MemoryPressureMonitor('512M');
// Add handlers
$monitor->registerHandler(new LoggingHandler($logger));
$monitor->registerHandler(new CacheEvictionHandler());
$monitor->registerHandler(new GarbageCollectionHandler());
// Check pressure in your operations
if ($monitor->check() === MemoryPressureLevel::HIGH) {
// Switch to more aggressive memory saving
$processor->useExternalStorage();
}
7. Checkpointing for Fault Tolerance
use SqrtSpace\SpaceTime\Checkpoint\CheckpointManager;
$checkpoint = new CheckpointManager('import_job_123');
foreach ($largeDataset->chunkBySqrtN() as $chunk) {
processChunk($chunk);
// Save progress every √n items
if ($checkpoint->shouldCheckpoint()) {
$checkpoint->save([
'processed' => $processedCount,
'last_id' => $lastId
]);
}
}
Real-World Examples
Processing Large CSV Files
use SqrtSpace\SpaceTime\File\CsvReader;
use SqrtSpace\SpaceTime\Algorithms\ExternalGroupBy;
$reader = new CsvReader('sales_data.csv');
// Get column statistics
$stats = $reader->getColumnStats('amount');
echo "Average order: $" . $stats['avg'];
// Process with type conversion
$totals = $reader->readWithTypes([
'amount' => 'float',
'quantity' => 'int',
'date' => 'date'
])->reduce(function ($totals, $row) {
$month = $row['date']->format('Y-m');
$totals[$month] = ($totals[$month] ?? 0) + $row['amount'];
return $totals;
}, []);
Large Data Export
use SqrtSpace\SpaceTime\File\CsvExporter;
use SqrtSpace\SpaceTime\Database\SpaceTimeQueryBuilder;
$exporter = new CsvExporter('users_export.csv');
$query = new SpaceTimeQueryBuilder($pdo);
// Export with headers
$exporter->writeHeaders(['ID', 'Name', 'Email', 'Created At']);
// Stream data directly to CSV
$query->from('users')
->orderBy('created_at', 'desc')
->chunkBySqrtN(function($users) use ($exporter) {
$exporter->writeRows(array_map(function($user) {
return [
$user['id'],
$user['name'],
$user['email'],
$user['created_at']
];
}, $users));
});
echo "Exported " . number_format($exporter->getBytesWritten()) . " bytes\n";
Batch Processing with Memory Limits
use SqrtSpace\SpaceTime\Batch\BatchProcessor;
$processor = new BatchProcessor([
'memory_threshold' => 0.8,
'checkpoint_enabled' => true,
'progress_callback' => function($batch, $size, $result) {
echo "Processed batch $batch ($size items)\n";
}
]);
$result = $processor->process($millionItems, function($batch) {
$processed = [];
foreach ($batch as $key => $item) {
$processed[$key] = expensiveOperation($item);
}
return $processed;
}, 'job_123');
echo "Success: " . $result->getSuccessCount() . "\n";
echo "Errors: " . $result->getErrorCount() . "\n";
echo "Time: " . $result->getExecutionTime() . "s\n";
Configuration
use SqrtSpace\SpaceTime\SpaceTimeConfig;
// Global configuration
SpaceTimeConfig::configure([
'memory_limit' => '512M',
'external_storage_path' => '/tmp/spacetime',
'chunk_strategy' => 'sqrt_n', // or 'memory_based', 'fixed'
'enable_checkpointing' => true,
'compression' => true,
'compression_level' => 6
]);
// Per-operation configuration
$array = new SpaceTimeArray(10000); // threshold
// Check configuration
echo "Chunk size for 1M items: " . SpaceTimeConfig::calculateSqrtN(1000000) . "\n";
echo "Storage path: " . SpaceTimeConfig::getStoragePath() . "\n";
Advanced Usage
JSON Lines Processing
use SqrtSpace\SpaceTime\File\JsonLinesProcessor;
// Process large JSONL files
JsonLinesProcessor::processInChunks('events.jsonl', function($events) {
foreach ($events as $event) {
if ($event['type'] === 'error') {
logError($event);
}
}
});
// Split large file
$files = JsonLinesProcessor::split('huge.jsonl', 100000, 'output/chunk');
echo "Split into " . count($files) . " files\n";
// Merge multiple files
$count = JsonLinesProcessor::merge($files, 'merged.jsonl');
echo "Merged $count records\n";
Streaming Operations
use SqrtSpace\SpaceTime\Streams\SpaceTimeStream;
// Chain operations efficiently
SpaceTimeStream::fromCsv('sales.csv')
->filter(fn($row) => $row['region'] === 'US')
->map(fn($row) => [
'product' => $row['product'],
'revenue' => $row['quantity'] * $row['price']
])
->chunkBySqrtN()
->each(function($chunk) {
$total = array_sum(array_column($chunk, 'revenue'));
echo "Chunk revenue: \$$total\n";
});
Custom Batch Jobs
use SqrtSpace\SpaceTime\Batch\BatchJob;
class ImportJob extends BatchJob
{
private string $filename;
public function __construct(string $filename)
{
parent::__construct();
$this->filename = $filename;
}
protected function getItems(): iterable
{
return SpaceTimeStream::fromCsv($this->filename);
}
public function processItem(array $batch): array
{
$results = [];
foreach ($batch as $key => $row) {
$user = User::create([
'name' => $row['name'],
'email' => $row['email']
]);
$results[$key] = $user->id;
}
return $results;
}
protected function getUniqueId(): string
{
return md5($this->filename);
}
}
// Run job with automatic checkpointing
$job = new ImportJob('users.csv');
$result = $job->execute();
// Or resume if interrupted
if ($job->canResume()) {
$result = $job->resume();
}
Testing
# Run all tests
vendor/bin/phpunit
# Run specific test suite
vendor/bin/phpunit tests/Algorithms
# With coverage
vendor/bin/phpunit --coverage-html coverage
Performance Considerations
-
Chunk Size: The default √n chunk size is optimal for most cases, but you can tune it:
SpaceTimeConfig::configure(['chunk_strategy' => 'fixed', 'fixed_chunk_size' => 5000]); -
Compression: Enable for text-heavy data, disable for already compressed data:
SpaceTimeConfig::configure(['compression' => false]); -
Storage Location: Use fast local SSDs for external storage:
SpaceTimeConfig::configure(['external_storage_path' => '/mnt/fast-ssd/spacetime']);
Framework Integration
Laravel
// config/spacetime.php
return [
'memory_limit' => env('SPACETIME_MEMORY_LIMIT', '256M'),
'storage_driver' => env('SPACETIME_STORAGE', 'file'),
'redis_connection' => env('SPACETIME_REDIS', 'default'),
];
// In controller
public function exportOrders()
{
return SpaceTimeResponse::stream(function() {
Order::orderByExternal('created_at')
->chunkBySqrtN(function($orders) {
foreach ($orders as $order) {
echo $order->toCsv() . "\n";
}
});
});
}
Symfony
For a complete Symfony integration example, see our Symfony bundle documentation.
# config/bundles.php
return [
// ...
SqrtSpace\SpaceTime\Symfony\SpaceTimeBundle::class => ['all' => true],
];
# config/packages/spacetime.yaml
spacetime:
memory_limit: '%env(SPACETIME_MEMORY_LIMIT)%'
storage_path: '%kernel.project_dir%/var/spacetime'
chunk_strategy: 'sqrt_n'
enable_checkpointing: true
compression: true
// In controller
use SqrtSpace\SpaceTime\Batch\BatchProcessor;
use SqrtSpace\SpaceTime\File\CsvReader;
#[Route('/import')]
public function import(BatchProcessor $processor): Response
{
$reader = new CsvReader($this->getParameter('import_file'));
$result = $processor->process(
$reader->stream(),
fn($batch) => $this->importBatch($batch)
);
return $this->json([
'imported' => $result->getSuccessCount(),
'errors' => $result->getErrorCount()
]);
}
# Console command
php bin/console spacetime:process-file input.csv output.csv --format=csv --checkpoint
Troubleshooting
Out of Memory Errors
-
Reduce chunk size:
SpaceTimeConfig::configure(['chunk_strategy' => 'fixed', 'fixed_chunk_size' => 1000]); -
Enable more aggressive memory handling:
$monitor = new MemoryPressureMonitor('128M'); // Lower threshold -
Use external storage earlier:
$array = new SpaceTimeArray(100); // Smaller threshold
Performance Issues
- Check disk I/O speed
- Enable compression for text data
- Use memory-based external storage:
SpaceTimeConfig::configure(['external_storage_path' => '/dev/shm/spacetime']);
Checkpoint Recovery
$checkpoint = new CheckpointManager('job_id');
if ($checkpoint->exists()) {
$state = $checkpoint->load();
echo "Resuming from: " . json_encode($state) . "\n";
}
Requirements
- PHP 8.1 or higher
- ext-json
- ext-mbstring
Optional Extensions
- ext-apcu for faster caching
- ext-redis for distributed operations
- ext-zlib for compression
Contributing
Please see CONTRIBUTING.md for details.
License
The Apache 2.0 License. Please see LICENSE for details.