Dave Friedel d315f5d26e Initial push

2025-07-20 03:41:39 -04:00

5.8 KiB

Raw Blame History

SqrtSpace SpaceTime Sample Web API

This sample demonstrates how to build a memory-efficient Web API using the SqrtSpace SpaceTime library. It showcases real-world scenarios where √n space-time tradeoffs can significantly improve application performance and scalability.

Features Demonstrated

1. Memory-Efficient Data Processing

Streaming large datasets without loading everything into memory
Automatic batching using √n-sized chunks
External sorting and aggregation for datasets that exceed memory limits

2. Checkpoint-Enabled Operations

Resumable bulk operations that can recover from failures
Progress tracking for long-running tasks
Automatic state persistence at optimal intervals

3. Real-World API Patterns

Products Controller (`/api/products`)

Paginated queries - Basic memory control through pagination
Streaming endpoints - Stream millions of products using NDJSON format
Smart search - Automatically switches to external sorting for large result sets
Bulk updates - Checkpoint-enabled price updates that can resume after failures
CSV export - Stream large exports without memory bloat
Statistics - Calculate aggregates over large datasets efficiently

Analytics Controller (`/api/analytics`)

Revenue analysis - External grouping for large-scale aggregations
Top customers - Find top N using external sorting when needed
Real-time streaming - Server-Sent Events for continuous analytics
Complex reports - Multi-stage report generation with checkpointing
Pattern analysis - ML-ready data processing with memory constraints
Memory monitoring - Track how the system manages memory

4. Automatic Memory Management

Adapts processing strategy based on data size
Spills to disk when memory pressure is detected
Provides memory usage statistics for monitoring

Running the Sample

Start the API:
```
dotnet run
```
Access Swagger UI: Navigate to https://localhost:5001/swagger to explore the API
Generate Test Data: The application automatically seeds the database with:
- 1,000 customers
- 10,000 products
- 50,000 orders
A background service continuously generates new orders to simulate real-time data.

Key Scenarios to Try

1. Stream Large Dataset

# Stream all products (10,000+) without loading into memory
curl -N https://localhost:5001/api/products/stream

# The response is newline-delimited JSON (NDJSON)

2. Bulk Update with Checkpointing

# Start a bulk price update
curl -X POST https://localhost:5001/api/products/bulk-update-prices \
  -H "Content-Type: application/json" \
  -H "X-Operation-Id: price-update-123" \
  -d '{"categoryFilter": "Electronics", "priceMultiplier": 1.1}'

# If it fails, resume with the same Operation ID

3. Generate Complex Report

# Generate a report with automatic checkpointing
curl -X POST https://localhost:5001/api/analytics/reports/generate \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": "2024-01-01",
    "endDate": "2024-12-31",
    "metricsToInclude": ["revenue", "categories", "customers", "products"],
    "includeDetailedBreakdown": true
  }'

4. Real-Time Analytics Stream

# Connect to real-time analytics stream
curl -N https://localhost:5001/api/analytics/real-time/orders

# Streams analytics data every second using Server-Sent Events

5. Export Large Dataset

# Export all products to CSV (streams the file)
curl https://localhost:5001/api/products/export/csv > products.csv

Memory Efficiency Examples

Small Dataset (In-Memory Processing)

When working with small datasets (<10,000 items), the API uses standard in-memory processing:

// Standard LINQ operations
var results = await query
    .Where(p => p.Category == "Books")
    .OrderBy(p => p.Price)
    .ToListAsync();

Large Dataset (External Processing)

For large datasets (>10,000 items), the API automatically switches to external processing:

// Automatic external sorting
if (count > 10000)
{
    query = query.UseExternalSorting();
}

// Process in √n-sized batches
await foreach (var batch in query.BatchBySqrtNAsync())
{
    // Process batch
}

Configuration

The sample includes configurable memory limits:

// appsettings.json
{
  "MemoryOptions": {
    "MaxMemoryMB": 512,
    "WarningThresholdPercent": 80
  }
}

Monitoring

Check memory usage statistics:

curl https://localhost:5001/api/analytics/memory-stats

Response:

{
  "currentMemoryUsageMB": 245,
  "peakMemoryUsageMB": 412,
  "externalSortOperations": 3,
  "checkpointsSaved": 15,
  "dataSpilledToDiskMB": 89,
  "cacheHitRate": 0.87,
  "currentMemoryPressure": "Medium"
}

Architecture Highlights

Service Layer: Encapsulates business logic and SpaceTime optimizations
Entity Framework Integration: Seamless integration with EF Core queries
Middleware: Automatic checkpoint and streaming support
Background Services: Continuous data generation for testing
Memory Monitoring: Real-time tracking of memory usage

Best Practices Demonstrated

Know Your Data Size: Check count before choosing processing strategy
Stream When Possible: Use IAsyncEnumerable for large results
Checkpoint Long Operations: Enable recovery from failures
Monitor Memory Usage: Track and respond to memory pressure
Use External Processing: Let the library handle large datasets efficiently

Next Steps

Modify the memory limits and observe behavior changes
Add your own endpoints using SpaceTime patterns
Connect to a real database for production scenarios
Implement caching with hot/cold storage tiers
Add distributed processing with Redis coordination

5.8 KiB Raw Blame History