2025-07-20 04:04:41 -04:00

8.0 KiB

Raw Blame History

SpaceTime Configuration Advisor

Intelligent system configuration advisor that applies Williams' √n space-time tradeoffs to optimize database, JVM, kernel, container, and application settings.

Features

System Analysis: Comprehensive hardware profiling (CPU, memory, storage, network)
Workload Characterization: Analyze access patterns and resource requirements
Multi-System Support: Database, JVM, kernel, container, and application configs
√n Optimization: Apply theoretical bounds to real-world settings
A/B Testing: Compare configurations with statistical confidence
AI Explanations: Clear reasoning for each recommendation

Installation

# From sqrtspace-tools root directory
pip install -r requirements-minimal.txt

Quick Start

from advisor import ConfigurationAdvisor, SystemType

advisor = ConfigurationAdvisor()

# Analyze for database workload
config = advisor.analyze(
    workload_data={
        'read_ratio': 0.8,
        'working_set_gb': 50,
        'total_data_gb': 500,
        'qps': 10000
    },
    target=SystemType.DATABASE
)

print(config.explanation)
# "Database configured with 12.5GB buffer pool (√n sizing), 
#  128MB work memory per operation, and standard checkpointing."

System Types

1. Database Configuration

Optimizes PostgreSQL/MySQL settings:

# E-commerce OLTP workload
config = advisor.analyze(
    workload_data={
        'read_ratio': 0.9,
        'working_set_gb': 20,
        'total_data_gb': 200,
        'qps': 5000,
        'connections': 300,
        'latency_sla_ms': 50
    },
    target=SystemType.DATABASE
)

# Generated PostgreSQL config:
# shared_buffers = 5120MB      # √n sized if data > memory
# work_mem = 21MB              # Per-operation memory
# checkpoint_segments = 16      # Based on write ratio
# max_connections = 600        # 2x concurrent users

2. JVM Configuration

Tunes heap size, GC, and thread settings:

# Low-latency trading system
config = advisor.analyze(
    workload_data={
        'latency_sla_ms': 10,
        'working_set_gb': 8,
        'connections': 100
    },
    target=SystemType.JVM
)

# Generated JVM flags:
# -Xmx16g -Xms16g              # 50% of system memory
# -Xmn512m                     # √n young generation
# -XX:+UseG1GC                # Low-latency GC
# -XX:MaxGCPauseMillis=10     # Match SLA

3. Kernel Configuration

Optimizes Linux kernel parameters:

# High-throughput web server
config = advisor.analyze(
    workload_data={
        'request_rate': 50000,
        'connections': 10000,
        'working_set_gb': 32
    },
    target=SystemType.KERNEL
)

# Generated sysctl settings:
# vm.dirty_ratio = 20
# vm.swappiness = 60
# net.core.somaxconn = 65535
# net.ipv4.tcp_max_syn_backlog = 65535

4. Container Configuration

Sets Docker/Kubernetes resource limits:

# Microservice API
config = advisor.analyze(
    workload_data={
        'working_set_gb': 2,
        'connections': 100,
        'qps': 1000
    },
    target=SystemType.CONTAINER
)

# Generated Docker command:
# docker run --memory=3.0g --cpus=100

5. Application Configuration

Tunes thread pools, caches, and batch sizes:

# Data processing application
config = advisor.analyze(
    workload_data={
        'working_set_gb': 50,
        'connections': 200,
        'batch_size': 10000
    },
    target=SystemType.APPLICATION
)

# Generated settings:
# thread_pool_size: 16         # Based on CPU cores
# connection_pool_size: 200    # Match concurrency
# cache_size: 229,739          # √n entries
# batch_size: 10,000           # Optimized for memory

System Analysis

The advisor automatically profiles your system:

from advisor import SystemAnalyzer

analyzer = SystemAnalyzer()
profile = analyzer.analyze_system()

print(f"CPU: {profile.cpu_count} cores ({profile.cpu_model})")
print(f"Memory: {profile.memory_gb:.1f}GB")
print(f"Storage: {profile.storage_type} ({profile.storage_iops} IOPS)")
print(f"L3 Cache: {profile.l3_cache_mb:.1f}MB")

Workload Analysis

Characterize workloads from metrics or logs:

from advisor import WorkloadAnalyzer

analyzer = WorkloadAnalyzer()

# From metrics
workload = analyzer.analyze_workload(metrics={
    'read_ratio': 0.8,
    'working_set_gb': 100,
    'qps': 10000,
    'connections': 500
})

# From logs
workload = analyzer.analyze_workload(logs=[
    "SELECT * FROM users WHERE id = 123",
    "UPDATE orders SET status = 'shipped'",
    # ... more log entries
])

A/B Testing

Compare configurations scientifically:

# Create two configurations
config_a = advisor.analyze(workload_a, target=SystemType.DATABASE)
config_b = advisor.analyze(workload_b, target=SystemType.DATABASE)

# Run A/B test
results = advisor.compare_configs(
    [config_a, config_b],
    test_duration=300  # 5 minutes
)

for result in results:
    print(f"{result.config_name}:")
    print(f"  Throughput: {result.metrics['throughput']} QPS")
    print(f"  Latency: {result.metrics['latency']} ms")
    print(f"  Winner: {'Yes' if result.winner else 'No'}")

Export Configurations

Save configurations in appropriate formats:

# PostgreSQL config file
advisor.export_config(db_config, "postgresql.conf")

# JVM startup script
advisor.export_config(jvm_config, "jvm_startup.sh")

# JSON for other systems
advisor.export_config(app_config, "app_config.json")

√n Optimization Examples

The advisor applies Williams' space-time tradeoffs:

Database Buffer Pool

For data larger than memory:

Traditional: Try to cache everything (thrashing)
√n approach: Cache √(data_size) for optimal performance
Example: 1TB data → 32GB buffer pool (not 1TB!)

JVM Young Generation

Balance GC frequency vs pause time:

Traditional: Fixed percentage (25% of heap)
√n approach: √(heap_size) for optimal GC
Example: 64GB heap → 8GB young gen

Application Cache

Limited memory for caching:

Traditional: LRU with fixed size
√n approach: √(total_items) cache entries
Example: 1B items → 31,622 cache entries

Real-World Impact

Organizations using these principles:

Google: Bigtable uses √n buffer sizes
Facebook: RocksDB applies similar concepts
PostgreSQL: Shared buffers tuning
JVM: G1GC uses √n heuristics
Linux: Page cache management

Advanced Usage

Custom System Types

class CustomConfigGenerator(ConfigurationGenerator):
    def generate_custom_config(self, system, workload):
        # Apply √n principles to your system
        buffer_size = self.sqrt_calc.calculate_optimal_buffer(
            workload.total_data_size_gb * 1024
        )
        return Configuration(...)

Continuous Optimization

# Monitor and adapt over time
while True:
    current_metrics = collect_metrics()
    
    if significant_change(current_metrics, last_metrics):
        new_config = advisor.analyze(
            workload_data=current_metrics,
            target=SystemType.DATABASE
        )
        apply_config(new_config)
    
    time.sleep(3600)  # Check hourly

Examples

See example_advisor.py for comprehensive examples:

PostgreSQL tuning for OLTP vs OLAP
JVM configuration for latency vs throughput
Container resource allocation
Kernel tuning for different workloads
A/B testing configurations
Adaptive configuration over time

Troubleshooting

Memory Calculations

Buffer sizes are capped at available memory
√n sizing only applied when data > memory
Consider OS overhead (typically 20% reserved)

Performance Testing

A/B tests simulate load (real tests needed)
Confidence intervals require sufficient samples
Network conditions affect distributed systems

Future Enhancements

Cloud provider specific configs (AWS, GCP, Azure)
Kubernetes operator for automatic tuning
Machine learning workload detection
Integration with monitoring systems
Automated rollback on regression

8.0 KiB Raw Blame History