sqrtspace-dotnet/samples/SampleWebApi/README.md

# SqrtSpace SpaceTime Sample Web API

This sample demonstrates how to build a memory-efficient Web API using the SqrtSpace SpaceTime library. It showcases real-world scenarios where √n space-time tradeoffs can significantly improve application performance and scalability.

## Features Demonstrated

### 1. **Memory-Efficient Data Processing**
- Streaming large datasets without loading everything into memory
- Automatic batching using √n-sized chunks
- External sorting and aggregation for datasets that exceed memory limits

### 2. **Checkpoint-Enabled Operations**
- Resumable bulk operations that can recover from failures
- Progress tracking for long-running tasks
- Automatic state persistence at optimal intervals

### 3. **Real-World API Patterns**

#### Products Controller (`/api/products`)
- **Paginated queries** - Basic memory control through pagination
- **Streaming endpoints** - Stream millions of products using NDJSON format
- **Smart search** - Automatically switches to external sorting for large result sets
- **Bulk updates** - Checkpoint-enabled price updates that can resume after failures
- **CSV export** - Stream large exports without memory bloat
- **Statistics** - Calculate aggregates over large datasets efficiently

#### Analytics Controller (`/api/analytics`)
- **Revenue analysis** - External grouping for large-scale aggregations
- **Top customers** - Find top N using external sorting when needed
- **Real-time streaming** - Server-Sent Events for continuous analytics
- **Complex reports** - Multi-stage report generation with checkpointing
- **Pattern analysis** - ML-ready data processing with memory constraints
- **Memory monitoring** - Track how the system manages memory

### 4. **Automatic Memory Management**
- Adapts processing strategy based on data size
- Spills to disk when memory pressure is detected
- Provides memory usage statistics for monitoring

## Running the Sample

1. **Start the API:**
   ```bash
   dotnet run
   ```

2. **Access Swagger UI:**
   Navigate to `https://localhost:5001/swagger` to explore the API

3. **Generate Test Data:**
   The application automatically seeds the database with:
   - 1,000 customers
   - 10,000 products
   - 50,000 orders

   A background service continuously generates new orders to simulate real-time data.

## Key Scenarios to Try

### 1. Stream Large Dataset
```bash
# Stream all products (10,000+) without loading into memory
curl -N https://localhost:5001/api/products/stream

# The response is newline-delimited JSON (NDJSON)
```

### 2. Bulk Update with Checkpointing
```bash
# Start a bulk price update
curl -X POST https://localhost:5001/api/products/bulk-update-prices \
  -H "Content-Type: application/json" \
  -H "X-Operation-Id: price-update-123" \
  -d '{"categoryFilter": "Electronics", "priceMultiplier": 1.1}'

# If it fails, resume with the same Operation ID
```

### 3. Generate Complex Report
```bash
# Generate a report with automatic checkpointing
curl -X POST https://localhost:5001/api/analytics/reports/generate \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": "2024-01-01",
    "endDate": "2024-12-31",
    "metricsToInclude": ["revenue", "categories", "customers", "products"],
    "includeDetailedBreakdown": true
  }'
```

### 4. Real-Time Analytics Stream
```bash
# Connect to real-time analytics stream
curl -N https://localhost:5001/api/analytics/real-time/orders

# Streams analytics data every second using Server-Sent Events
```

### 5. Export Large Dataset
```bash
# Export all products to CSV (streams the file)
curl https://localhost:5001/api/products/export/csv > products.csv
```

## Memory Efficiency Examples

### Small Dataset (In-Memory Processing)
When working with small datasets (<10,000 items), the API uses standard in-memory processing:
```csharp
// Standard LINQ operations
var results = await query
    .Where(p => p.Category == "Books")
    .OrderBy(p => p.Price)
    .ToListAsync();
```

### Large Dataset (External Processing)
For large datasets (>10,000 items), the API automatically switches to external processing:
```csharp
// Automatic external sorting
if (count > 10000)
{
    query = query.UseExternalSorting();
}

// Process in √n-sized batches
await foreach (var batch in query.BatchBySqrtNAsync())
{
    // Process batch
}
```

## Configuration

The sample includes configurable memory limits:

```csharp
// appsettings.json
{
  "MemoryOptions": {
    "MaxMemoryMB": 512,
    "WarningThresholdPercent": 80
  }
}
```

## Monitoring

Check memory usage statistics:
```bash
curl https://localhost:5001/api/analytics/memory-stats
```

Response:
```json
{
  "currentMemoryUsageMB": 245,
  "peakMemoryUsageMB": 412,
  "externalSortOperations": 3,
  "checkpointsSaved": 15,
  "dataSpilledToDiskMB": 89,
  "cacheHitRate": 0.87,
  "currentMemoryPressure": "Medium"
}
```

## Architecture Highlights

1. **Service Layer**: Encapsulates business logic and SpaceTime optimizations
2. **Entity Framework Integration**: Seamless integration with EF Core queries
3. **Middleware**: Automatic checkpoint and streaming support
4. **Background Services**: Continuous data generation for testing
5. **Memory Monitoring**: Real-time tracking of memory usage

## Best Practices Demonstrated

1. **Know Your Data Size**: Check count before choosing processing strategy
2. **Stream When Possible**: Use IAsyncEnumerable for large results
3. **Checkpoint Long Operations**: Enable recovery from failures
4. **Monitor Memory Usage**: Track and respond to memory pressure
5. **Use External Processing**: Let the library handle large datasets efficiently

## Next Steps

- Modify the memory limits and observe behavior changes
- Add your own endpoints using SpaceTime patterns
- Connect to a real database for production scenarios
- Implement caching with hot/cold storage tiers
- Add distributed processing with Redis coordination