190 lines
5.8 KiB
Markdown
190 lines
5.8 KiB
Markdown
# SqrtSpace SpaceTime Sample Web API
|
|
|
|
This sample demonstrates how to build a memory-efficient Web API using the SqrtSpace SpaceTime library. It showcases real-world scenarios where √n space-time tradeoffs can significantly improve application performance and scalability.
|
|
|
|
## Features Demonstrated
|
|
|
|
### 1. **Memory-Efficient Data Processing**
|
|
- Streaming large datasets without loading everything into memory
|
|
- Automatic batching using √n-sized chunks
|
|
- External sorting and aggregation for datasets that exceed memory limits
|
|
|
|
### 2. **Checkpoint-Enabled Operations**
|
|
- Resumable bulk operations that can recover from failures
|
|
- Progress tracking for long-running tasks
|
|
- Automatic state persistence at optimal intervals
|
|
|
|
### 3. **Real-World API Patterns**
|
|
|
|
#### Products Controller (`/api/products`)
|
|
- **Paginated queries** - Basic memory control through pagination
|
|
- **Streaming endpoints** - Stream millions of products using NDJSON format
|
|
- **Smart search** - Automatically switches to external sorting for large result sets
|
|
- **Bulk updates** - Checkpoint-enabled price updates that can resume after failures
|
|
- **CSV export** - Stream large exports without memory bloat
|
|
- **Statistics** - Calculate aggregates over large datasets efficiently
|
|
|
|
#### Analytics Controller (`/api/analytics`)
|
|
- **Revenue analysis** - External grouping for large-scale aggregations
|
|
- **Top customers** - Find top N using external sorting when needed
|
|
- **Real-time streaming** - Server-Sent Events for continuous analytics
|
|
- **Complex reports** - Multi-stage report generation with checkpointing
|
|
- **Pattern analysis** - ML-ready data processing with memory constraints
|
|
- **Memory monitoring** - Track how the system manages memory
|
|
|
|
### 4. **Automatic Memory Management**
|
|
- Adapts processing strategy based on data size
|
|
- Spills to disk when memory pressure is detected
|
|
- Provides memory usage statistics for monitoring
|
|
|
|
## Running the Sample
|
|
|
|
1. **Start the API:**
|
|
```bash
|
|
dotnet run
|
|
```
|
|
|
|
2. **Access Swagger UI:**
|
|
Navigate to `https://localhost:5001/swagger` to explore the API
|
|
|
|
3. **Generate Test Data:**
|
|
The application automatically seeds the database with:
|
|
- 1,000 customers
|
|
- 10,000 products
|
|
- 50,000 orders
|
|
|
|
A background service continuously generates new orders to simulate real-time data.
|
|
|
|
## Key Scenarios to Try
|
|
|
|
### 1. Stream Large Dataset
|
|
```bash
|
|
# Stream all products (10,000+) without loading into memory
|
|
curl -N https://localhost:5001/api/products/stream
|
|
|
|
# The response is newline-delimited JSON (NDJSON)
|
|
```
|
|
|
|
### 2. Bulk Update with Checkpointing
|
|
```bash
|
|
# Start a bulk price update
|
|
curl -X POST https://localhost:5001/api/products/bulk-update-prices \
|
|
-H "Content-Type: application/json" \
|
|
-H "X-Operation-Id: price-update-123" \
|
|
-d '{"categoryFilter": "Electronics", "priceMultiplier": 1.1}'
|
|
|
|
# If it fails, resume with the same Operation ID
|
|
```
|
|
|
|
### 3. Generate Complex Report
|
|
```bash
|
|
# Generate a report with automatic checkpointing
|
|
curl -X POST https://localhost:5001/api/analytics/reports/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"startDate": "2024-01-01",
|
|
"endDate": "2024-12-31",
|
|
"metricsToInclude": ["revenue", "categories", "customers", "products"],
|
|
"includeDetailedBreakdown": true
|
|
}'
|
|
```
|
|
|
|
### 4. Real-Time Analytics Stream
|
|
```bash
|
|
# Connect to real-time analytics stream
|
|
curl -N https://localhost:5001/api/analytics/real-time/orders
|
|
|
|
# Streams analytics data every second using Server-Sent Events
|
|
```
|
|
|
|
### 5. Export Large Dataset
|
|
```bash
|
|
# Export all products to CSV (streams the file)
|
|
curl https://localhost:5001/api/products/export/csv > products.csv
|
|
```
|
|
|
|
## Memory Efficiency Examples
|
|
|
|
### Small Dataset (In-Memory Processing)
|
|
When working with small datasets (<10,000 items), the API uses standard in-memory processing:
|
|
```csharp
|
|
// Standard LINQ operations
|
|
var results = await query
|
|
.Where(p => p.Category == "Books")
|
|
.OrderBy(p => p.Price)
|
|
.ToListAsync();
|
|
```
|
|
|
|
### Large Dataset (External Processing)
|
|
For large datasets (>10,000 items), the API automatically switches to external processing:
|
|
```csharp
|
|
// Automatic external sorting
|
|
if (count > 10000)
|
|
{
|
|
query = query.UseExternalSorting();
|
|
}
|
|
|
|
// Process in √n-sized batches
|
|
await foreach (var batch in query.BatchBySqrtNAsync())
|
|
{
|
|
// Process batch
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
The sample includes configurable memory limits:
|
|
|
|
```csharp
|
|
// appsettings.json
|
|
{
|
|
"MemoryOptions": {
|
|
"MaxMemoryMB": 512,
|
|
"WarningThresholdPercent": 80
|
|
}
|
|
}
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
Check memory usage statistics:
|
|
```bash
|
|
curl https://localhost:5001/api/analytics/memory-stats
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"currentMemoryUsageMB": 245,
|
|
"peakMemoryUsageMB": 412,
|
|
"externalSortOperations": 3,
|
|
"checkpointsSaved": 15,
|
|
"dataSpilledToDiskMB": 89,
|
|
"cacheHitRate": 0.87,
|
|
"currentMemoryPressure": "Medium"
|
|
}
|
|
```
|
|
|
|
## Architecture Highlights
|
|
|
|
1. **Service Layer**: Encapsulates business logic and SpaceTime optimizations
|
|
2. **Entity Framework Integration**: Seamless integration with EF Core queries
|
|
3. **Middleware**: Automatic checkpoint and streaming support
|
|
4. **Background Services**: Continuous data generation for testing
|
|
5. **Memory Monitoring**: Real-time tracking of memory usage
|
|
|
|
## Best Practices Demonstrated
|
|
|
|
1. **Know Your Data Size**: Check count before choosing processing strategy
|
|
2. **Stream When Possible**: Use IAsyncEnumerable for large results
|
|
3. **Checkpoint Long Operations**: Enable recovery from failures
|
|
4. **Monitor Memory Usage**: Track and respond to memory pressure
|
|
5. **Use External Processing**: Let the library handle large datasets efficiently
|
|
|
|
## Next Steps
|
|
|
|
- Modify the memory limits and observe behavior changes
|
|
- Add your own endpoints using SpaceTime patterns
|
|
- Connect to a real database for production scenarios
|
|
- Implement caching with hot/cold storage tiers
|
|
- Add distributed processing with Redis coordination |