Files
sqrtspace-dotnet/samples/BestPractices/README.md
2025-07-20 03:41:39 -04:00

328 lines
8.8 KiB
Markdown

# SqrtSpace SpaceTime Best Practices
This project demonstrates best practices for building production-ready applications using the SqrtSpace SpaceTime library. It showcases advanced patterns and configurations for optimal memory efficiency and performance.
## Key Concepts Demonstrated
### 1. **Comprehensive Service Configuration**
The application demonstrates proper configuration of all SpaceTime services:
```csharp
// Environment-aware memory configuration
builder.Services.Configure<SpaceTimeConfiguration>(options =>
{
options.Memory.MaxMemory = environment.IsDevelopment()
? 256 * 1024 * 1024 // 256MB for dev
: 1024 * 1024 * 1024; // 1GB for production
// Respect container limits
var memoryLimit = Environment.GetEnvironmentVariable("MEMORY_LIMIT");
if (long.TryParse(memoryLimit, out var limit))
{
options.Memory.MaxMemory = (long)(limit * 0.8); // Use 80% of container limit
}
});
```
### 2. **Layered Caching Strategy**
Implements hot/cold tiered caching with automatic spill-to-disk:
```csharp
builder.Services.AddSpaceTimeCaching(options =>
{
options.MaxHotMemory = 50 * 1024 * 1024; // 50MB hot cache
options.EnableColdStorage = true;
options.ColdStoragePath = Path.Combine(Path.GetTempPath(), "spacetime-cache");
});
```
### 3. **Production-Ready Diagnostics**
Comprehensive monitoring with OpenTelemetry integration:
```csharp
builder.Services.AddSpaceTimeDiagnostics(options =>
{
options.EnableMetrics = true;
options.EnableTracing = true;
options.SamplingRate = builder.Environment.IsDevelopment() ? 1.0 : 0.1;
});
```
### 4. **Entity Framework Integration**
Shows how to configure EF Core with SpaceTime optimizations:
```csharp
options.UseSqlServer(connectionString)
.UseSpaceTimeOptimizer(opt =>
{
opt.EnableSqrtNChangeTracking = true;
opt.BufferPoolStrategy = BufferPoolStrategy.SqrtN;
});
```
### 5. **Memory-Aware Background Processing**
Background services that respond to memory pressure:
```csharp
_memoryMonitor.PressureEvents
.Where(e => e.CurrentLevel >= MemoryPressureLevel.High)
.Subscribe(e =>
{
_logger.LogWarning("High memory pressure detected, pausing processing");
// Implement backpressure
});
```
### 6. **Pipeline Pattern for Complex Processing**
Multi-stage processing with checkpointing:
```csharp
var pipeline = _pipelineFactory.CreatePipeline<Order, ProcessedOrder>("OrderProcessing")
.Configure(config =>
{
config.ExpectedItemCount = orders.Count();
config.EnableCheckpointing = true;
})
.AddTransform("Validate", ValidateOrder)
.AddBatch("EnrichCustomerData", EnrichWithCustomerData)
.AddParallel("CalculateTax", CalculateTax, maxConcurrency: 4)
.AddCheckpoint("SaveProgress")
.Build();
```
### 7. **Distributed Processing Coordination**
Shows how to partition work across multiple nodes:
```csharp
var partition = await _coordinator.RequestPartitionAsync(
request.WorkloadId,
request.EstimatedSize);
// Process only this node's portion
var filter = new OrderFilter
{
StartDate = partition.StartRange,
EndDate = partition.EndRange
};
```
### 8. **Streaming API Endpoints**
Demonstrates memory-efficient streaming with automatic chunking:
```csharp
[HttpGet("export")]
[SpaceTimeStreaming(ChunkStrategy = ChunkStrategy.SqrtN)]
public async IAsyncEnumerable<OrderExportDto> ExportOrders([FromQuery] OrderFilter filter)
{
await foreach (var batch in orders.BatchBySqrtNAsync())
{
foreach (var order in batch)
{
yield return MapToDto(order);
}
}
}
```
## Architecture Patterns
### Service Layer Pattern
The `OrderService` demonstrates:
- Dependency injection of SpaceTime services
- Operation tracking with diagnostics
- External sorting for large datasets
- Proper error handling and logging
### Memory-Aware Queries
```csharp
// Automatically switches to external sorting for large results
var orders = await query
.OrderByExternal(o => o.CreatedDate)
.ToListWithSqrtNMemoryAsync();
```
### Batch Processing
```csharp
// Process data in memory-efficient batches
await foreach (var batch in context.Orders
.Where(o => o.Status == "Pending")
.BatchBySqrtNAsync())
{
// Process batch
}
```
### Task Scheduling
```csharp
// Schedule work based on memory availability
await _scheduler.ScheduleAsync(
async () => await ProcessNextBatchAsync(stoppingToken),
estimatedMemory: 50 * 1024 * 1024, // 50MB
priority: TaskPriority.Low);
```
## Configuration Best Practices
### 1. **Environment-Based Configuration**
- Development: Lower memory limits, full diagnostics
- Production: Higher limits, sampled diagnostics
- Container: Respect container memory limits
### 2. **Conditional Service Registration**
```csharp
// Only add distributed coordination if Redis is available
var redisConnection = builder.Configuration.GetConnectionString("Redis");
if (!string.IsNullOrEmpty(redisConnection))
{
builder.Services.AddSpaceTimeDistributed(options =>
{
options.NodeId = Environment.MachineName;
options.CoordinationEndpoint = redisConnection;
});
}
```
### 3. **Health Monitoring**
```csharp
app.MapGet("/health", async (IMemoryPressureMonitor monitor) =>
{
var stats = monitor.CurrentStatistics;
return Results.Ok(new
{
Status = "Healthy",
MemoryPressure = monitor.CurrentPressureLevel.ToString(),
MemoryUsage = new
{
ManagedMemoryMB = stats.ManagedMemory / (1024.0 * 1024.0),
WorkingSetMB = stats.WorkingSet / (1024.0 * 1024.0),
AvailablePhysicalMemoryMB = stats.AvailablePhysicalMemory / (1024.0 * 1024.0)
}
});
});
```
## Production Considerations
### 1. **Memory Limits**
Always configure memory limits based on your deployment environment:
- Container deployments: Use 80% of container limit
- VMs: Consider other processes running
- Serverless: Respect function memory limits
### 2. **Checkpointing Strategy**
Enable checkpointing for:
- Long-running operations
- Operations that process large datasets
- Critical business processes that must be resumable
### 3. **Monitoring and Alerting**
Monitor these key metrics:
- Memory pressure levels
- External sort operations
- Checkpoint frequency
- Cache hit rates
- Pipeline processing times
### 4. **Error Handling**
Implement proper error handling:
- Use diagnostics to track operations
- Log errors with context
- Implement retry logic for transient failures
- Clean up resources on failure
### 5. **Performance Tuning**
- Adjust batch sizes based on workload
- Configure parallelism based on CPU cores
- Set appropriate cache sizes
- Monitor and adjust memory thresholds
## Testing Recommendations
### 1. **Load Testing**
Test with datasets that exceed memory limits to ensure:
- External processing activates correctly
- Memory pressure is handled gracefully
- Checkpointing works under load
### 2. **Failure Testing**
Test recovery scenarios:
- Process crashes during batch processing
- Memory pressure during operations
- Network failures in distributed scenarios
### 3. **Performance Testing**
Measure:
- Response times under various memory conditions
- Throughput with different batch sizes
- Resource utilization patterns
## Deployment Checklist
- [ ] Configure memory limits based on deployment environment
- [ ] Set up monitoring and alerting
- [ ] Configure persistent storage for checkpoints and cold cache
- [ ] Test failover and recovery procedures
- [ ] Document memory requirements and scaling limits
- [ ] Configure appropriate logging levels
- [ ] Set up distributed coordination (if using multiple nodes)
- [ ] Verify health check endpoints
- [ ] Test under expected production load
## Advanced Scenarios
### Multi-Node Deployment
For distributed deployments:
1. Configure Redis for coordination
2. Set unique node IDs
3. Implement partition-aware processing
4. Monitor cross-node communication
### High-Availability Setup
1. Use persistent checkpoint storage
2. Implement automatic failover
3. Configure redundant cache storage
4. Monitor node health
### Performance Optimization
1. Profile memory usage patterns
2. Adjust algorithm selection thresholds
3. Optimize batch sizes for your workload
4. Configure appropriate parallelism levels
## Summary
This best practices project demonstrates how to build robust, memory-efficient applications using SqrtSpace SpaceTime. By following these patterns, you can build applications that:
- Scale gracefully under memory pressure
- Process large datasets efficiently
- Recover from failures automatically
- Provide predictable performance
- Optimize resource utilization
The key is to embrace the √n space-time tradeoff philosophy throughout your application architecture, letting the library handle the complexity of memory management while you focus on business logic.