# SqrtSpace SpaceTime for PHP [![Latest Stable Version](https://poser.pugx.org/sqrtspace/spacetime/v)](https://packagist.org/packages/sqrtspace/spacetime) [![Total Downloads](https://poser.pugx.org/sqrtspace/spacetime/downloads)](https://packagist.org/packages/sqrtspace/spacetime) [![License](https://poser.pugx.org/sqrtspace/spacetime/license)](https://packagist.org/packages/sqrtspace/spacetime) [![PHP Version Require](https://poser.pugx.org/sqrtspace/spacetime/require/php)](https://packagist.org/packages/sqrtspace/spacetime) Memory-efficient algorithms and data structures for PHP using Williams' √n space-time tradeoffs. ## Installation ```bash composer require sqrtspace/spacetime ``` ## Core Concepts SpaceTime implements theoretical computer science results showing that many algorithms can achieve better memory usage by accepting slightly slower runtime. The key insight is using √n memory instead of n memory, where n is the input size. ### Key Features - **External Sorting**: Sort large datasets that don't fit in memory - **External Grouping**: Group and aggregate data with minimal memory usage - **Streaming Operations**: Process files and data streams efficiently - **Memory Pressure Handling**: Automatic response to low memory conditions - **Checkpoint/Resume**: Save progress and resume long-running operations - **Laravel Integration**: Deep integration with Laravel collections and queries ## Quick Start ```php use SqrtSpace\SpaceTime\Collections\SpaceTimeArray; use SqrtSpace\SpaceTime\Algorithms\ExternalSort; // Handle large arrays with automatic memory management $array = new SpaceTimeArray(); for ($i = 0; $i < 10000000; $i++) { $array[] = random_int(1, 1000000); } // Sort large datasets using only √n memory $sorted = ExternalSort::sort($array); // Process in optimal chunks foreach ($array->chunkBySqrtN() as $chunk) { processChunk($chunk); } ``` ## Examples ### Basic Examples See [`examples/comprehensive_example.php`](examples/comprehensive_example.php) for a complete demonstration of all features including: - Memory-efficient arrays and dictionaries - External sorting and grouping - Stream processing - CSV import/export - Batch processing with checkpoints - Memory pressure monitoring ### Laravel Application Check out [`examples/laravel-app/`](examples/laravel-app/) for a complete Laravel application demonstrating: - Streaming API endpoints - Memory-efficient CSV exports - Background job processing with checkpoints - Real-time analytics with SSE - Production-ready configurations See the [Laravel example README](examples/laravel-app/README.md) for setup instructions and detailed usage. ## Features ### 1. Memory-Efficient Collections ```php use SqrtSpace\SpaceTime\Collections\SpaceTimeArray; use SqrtSpace\SpaceTime\Collections\AdaptiveDictionary; // Adaptive array - automatically switches between memory and disk $array = new SpaceTimeArray(); $array->setThreshold(10000); // Switch to external storage after 10k items // Adaptive dictionary with optimal memory usage $dict = new AdaptiveDictionary(); for ($i = 0; $i < 1000000; $i++) { $dict["key_$i"] = "value_$i"; } ``` ### 2. External Algorithms ```php use SqrtSpace\SpaceTime\Algorithms\ExternalSort; use SqrtSpace\SpaceTime\Algorithms\ExternalGroupBy; // Sort millions of records using minimal memory $data = getData(); // Large dataset $sorted = ExternalSort::sort($data, fn($a, $b) => $a['date'] <=> $b['date']); // Group by with external storage $grouped = ExternalGroupBy::groupBy($data, fn($item) => $item['category']); ``` ### 3. Streaming Operations ```php use SqrtSpace\SpaceTime\Streams\SpaceTimeStream; // Process large files with bounded memory $stream = SpaceTimeStream::fromFile('large_file.csv') ->map(fn($line) => str_getcsv($line)) ->filter(fn($row) => $row[2] > 100) ->chunkBySqrtN() ->each(function($chunk) { processBatch($chunk); }); ``` ### 4. Database Integration ```php use SqrtSpace\SpaceTime\Database\SpaceTimeQueryBuilder; // Process large result sets efficiently $query = new SpaceTimeQueryBuilder($pdo); $query->from('orders') ->where('status', '=', 'pending') ->orderByExternal('created_at', 'desc') ->chunkBySqrtN(function($orders) { foreach ($orders as $order) { processOrder($order); } }); // Stream results for minimal memory usage $stream = $query->from('logs') ->where('level', '=', 'error') ->stream(); $stream->filter(fn($log) => strpos($log['message'], 'critical') !== false) ->each(fn($log) => alertAdmin($log)); ``` ### 5. Laravel Integration ```php // In AppServiceProvider use SqrtSpace\SpaceTime\Laravel\SpaceTimeServiceProvider; public function register() { $this->app->register(SpaceTimeServiceProvider::class); } // Collection macros $collection = collect($largeArray); // Sort using external memory $sorted = $collection->sortByExternal('price'); // Group by with external storage $grouped = $collection->groupByExternal('category'); // Process in √n chunks $collection->chunkBySqrtN()->each(function ($chunk) { processBatch($chunk); }); // Query builder extensions DB::table('orders') ->chunkBySqrtN(function ($orders) { foreach ($orders as $order) { processOrder($order); } }); ``` ### 6. Memory Pressure Handling ```php use SqrtSpace\SpaceTime\Memory\MemoryPressureMonitor; use SqrtSpace\SpaceTime\Memory\Handlers\LoggingHandler; use SqrtSpace\SpaceTime\Memory\Handlers\CacheEvictionHandler; use SqrtSpace\SpaceTime\Memory\Handlers\GarbageCollectionHandler; $monitor = new MemoryPressureMonitor('512M'); // Add handlers $monitor->registerHandler(new LoggingHandler($logger)); $monitor->registerHandler(new CacheEvictionHandler()); $monitor->registerHandler(new GarbageCollectionHandler()); // Check pressure in your operations if ($monitor->check() === MemoryPressureLevel::HIGH) { // Switch to more aggressive memory saving $processor->useExternalStorage(); } ``` ### 7. Checkpointing for Fault Tolerance ```php use SqrtSpace\SpaceTime\Checkpoint\CheckpointManager; $checkpoint = new CheckpointManager('import_job_123'); foreach ($largeDataset->chunkBySqrtN() as $chunk) { processChunk($chunk); // Save progress every √n items if ($checkpoint->shouldCheckpoint()) { $checkpoint->save([ 'processed' => $processedCount, 'last_id' => $lastId ]); } } ``` ## Real-World Examples ### Processing Large CSV Files ```php use SqrtSpace\SpaceTime\File\CsvReader; use SqrtSpace\SpaceTime\Algorithms\ExternalGroupBy; $reader = new CsvReader('sales_data.csv'); // Get column statistics $stats = $reader->getColumnStats('amount'); echo "Average order: $" . $stats['avg']; // Process with type conversion $totals = $reader->readWithTypes([ 'amount' => 'float', 'quantity' => 'int', 'date' => 'date' ])->reduce(function ($totals, $row) { $month = $row['date']->format('Y-m'); $totals[$month] = ($totals[$month] ?? 0) + $row['amount']; return $totals; }, []); ``` ### Large Data Export ```php use SqrtSpace\SpaceTime\File\CsvExporter; use SqrtSpace\SpaceTime\Database\SpaceTimeQueryBuilder; $exporter = new CsvExporter('users_export.csv'); $query = new SpaceTimeQueryBuilder($pdo); // Export with headers $exporter->writeHeaders(['ID', 'Name', 'Email', 'Created At']); // Stream data directly to CSV $query->from('users') ->orderBy('created_at', 'desc') ->chunkBySqrtN(function($users) use ($exporter) { $exporter->writeRows(array_map(function($user) { return [ $user['id'], $user['name'], $user['email'], $user['created_at'] ]; }, $users)); }); echo "Exported " . number_format($exporter->getBytesWritten()) . " bytes\n"; ``` ### Batch Processing with Memory Limits ```php use SqrtSpace\SpaceTime\Batch\BatchProcessor; $processor = new BatchProcessor([ 'memory_threshold' => 0.8, 'checkpoint_enabled' => true, 'progress_callback' => function($batch, $size, $result) { echo "Processed batch $batch ($size items)\n"; } ]); $result = $processor->process($millionItems, function($batch) { $processed = []; foreach ($batch as $key => $item) { $processed[$key] = expensiveOperation($item); } return $processed; }, 'job_123'); echo "Success: " . $result->getSuccessCount() . "\n"; echo "Errors: " . $result->getErrorCount() . "\n"; echo "Time: " . $result->getExecutionTime() . "s\n"; ``` ## Configuration ```php use SqrtSpace\SpaceTime\SpaceTimeConfig; // Global configuration SpaceTimeConfig::configure([ 'memory_limit' => '512M', 'external_storage_path' => '/tmp/spacetime', 'chunk_strategy' => 'sqrt_n', // or 'memory_based', 'fixed' 'enable_checkpointing' => true, 'compression' => true, 'compression_level' => 6 ]); // Per-operation configuration $array = new SpaceTimeArray(10000); // threshold // Check configuration echo "Chunk size for 1M items: " . SpaceTimeConfig::calculateSqrtN(1000000) . "\n"; echo "Storage path: " . SpaceTimeConfig::getStoragePath() . "\n"; ``` ## Advanced Usage ### JSON Lines Processing ```php use SqrtSpace\SpaceTime\File\JsonLinesProcessor; // Process large JSONL files JsonLinesProcessor::processInChunks('events.jsonl', function($events) { foreach ($events as $event) { if ($event['type'] === 'error') { logError($event); } } }); // Split large file $files = JsonLinesProcessor::split('huge.jsonl', 100000, 'output/chunk'); echo "Split into " . count($files) . " files\n"; // Merge multiple files $count = JsonLinesProcessor::merge($files, 'merged.jsonl'); echo "Merged $count records\n"; ``` ### Streaming Operations ```php use SqrtSpace\SpaceTime\Streams\SpaceTimeStream; // Chain operations efficiently SpaceTimeStream::fromCsv('sales.csv') ->filter(fn($row) => $row['region'] === 'US') ->map(fn($row) => [ 'product' => $row['product'], 'revenue' => $row['quantity'] * $row['price'] ]) ->chunkBySqrtN() ->each(function($chunk) { $total = array_sum(array_column($chunk, 'revenue')); echo "Chunk revenue: \$$total\n"; }); ``` ### Custom Batch Jobs ```php use SqrtSpace\SpaceTime\Batch\BatchJob; class ImportJob extends BatchJob { private string $filename; public function __construct(string $filename) { parent::__construct(); $this->filename = $filename; } protected function getItems(): iterable { return SpaceTimeStream::fromCsv($this->filename); } public function processItem(array $batch): array { $results = []; foreach ($batch as $key => $row) { $user = User::create([ 'name' => $row['name'], 'email' => $row['email'] ]); $results[$key] = $user->id; } return $results; } protected function getUniqueId(): string { return md5($this->filename); } } // Run job with automatic checkpointing $job = new ImportJob('users.csv'); $result = $job->execute(); // Or resume if interrupted if ($job->canResume()) { $result = $job->resume(); } ``` ## Testing ```bash # Run all tests vendor/bin/phpunit # Run specific test suite vendor/bin/phpunit tests/Algorithms # With coverage vendor/bin/phpunit --coverage-html coverage ``` ## Performance Considerations 1. **Chunk Size**: The default √n chunk size is optimal for most cases, but you can tune it: ```php SpaceTimeConfig::configure(['chunk_strategy' => 'fixed', 'fixed_chunk_size' => 5000]); ``` 2. **Compression**: Enable for text-heavy data, disable for already compressed data: ```php SpaceTimeConfig::configure(['compression' => false]); ``` 3. **Storage Location**: Use fast local SSDs for external storage: ```php SpaceTimeConfig::configure(['external_storage_path' => '/mnt/fast-ssd/spacetime']); ``` ## Framework Integration ### Laravel ```php // config/spacetime.php return [ 'memory_limit' => env('SPACETIME_MEMORY_LIMIT', '256M'), 'storage_driver' => env('SPACETIME_STORAGE', 'file'), 'redis_connection' => env('SPACETIME_REDIS', 'default'), ]; // In controller public function exportOrders() { return SpaceTimeResponse::stream(function() { Order::orderByExternal('created_at') ->chunkBySqrtN(function($orders) { foreach ($orders as $order) { echo $order->toCsv() . "\n"; } }); }); } ``` ### Symfony For a complete Symfony integration example, see our [Symfony bundle documentation](https://github.com/MarketAlly/Ubiquity/wiki/Symfony-Integration). ```yaml # config/bundles.php return [ // ... SqrtSpace\SpaceTime\Symfony\SpaceTimeBundle::class => ['all' => true], ]; ``` ```yaml # config/packages/spacetime.yaml spacetime: memory_limit: '%env(SPACETIME_MEMORY_LIMIT)%' storage_path: '%kernel.project_dir%/var/spacetime' chunk_strategy: 'sqrt_n' enable_checkpointing: true compression: true ``` ```php // In controller use SqrtSpace\SpaceTime\Batch\BatchProcessor; use SqrtSpace\SpaceTime\File\CsvReader; #[Route('/import')] public function import(BatchProcessor $processor): Response { $reader = new CsvReader($this->getParameter('import_file')); $result = $processor->process( $reader->stream(), fn($batch) => $this->importBatch($batch) ); return $this->json([ 'imported' => $result->getSuccessCount(), 'errors' => $result->getErrorCount() ]); } ``` ```bash # Console command php bin/console spacetime:process-file input.csv output.csv --format=csv --checkpoint ``` ## Troubleshooting ### Out of Memory Errors 1. Reduce chunk size: ```php SpaceTimeConfig::configure(['chunk_strategy' => 'fixed', 'fixed_chunk_size' => 1000]); ``` 2. Enable more aggressive memory handling: ```php $monitor = new MemoryPressureMonitor('128M'); // Lower threshold ``` 3. Use external storage earlier: ```php $array = new SpaceTimeArray(100); // Smaller threshold ``` ### Performance Issues 1. Check disk I/O speed 2. Enable compression for text data 3. Use memory-based external storage: ```php SpaceTimeConfig::configure(['external_storage_path' => '/dev/shm/spacetime']); ``` ### Checkpoint Recovery ```php $checkpoint = new CheckpointManager('job_id'); if ($checkpoint->exists()) { $state = $checkpoint->load(); echo "Resuming from: " . json_encode($state) . "\n"; } ``` ## Requirements - PHP 8.1 or higher - ext-json - ext-mbstring ## Optional Extensions - ext-apcu for faster caching - ext-redis for distributed operations - ext-zlib for compression ## Contributing Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details. ## License The Apache 2.0 License. Please see [LICENSE](LICENSE) for details.