Files
sqrtspace-tools/advisor/README.md
2025-07-20 04:04:41 -04:00

324 lines
8.0 KiB
Markdown

# SpaceTime Configuration Advisor
Intelligent system configuration advisor that applies Williams' √n space-time tradeoffs to optimize database, JVM, kernel, container, and application settings.
## Features
- **System Analysis**: Comprehensive hardware profiling (CPU, memory, storage, network)
- **Workload Characterization**: Analyze access patterns and resource requirements
- **Multi-System Support**: Database, JVM, kernel, container, and application configs
- **√n Optimization**: Apply theoretical bounds to real-world settings
- **A/B Testing**: Compare configurations with statistical confidence
- **AI Explanations**: Clear reasoning for each recommendation
## Installation
```bash
# From sqrtspace-tools root directory
pip install -r requirements-minimal.txt
```
## Quick Start
```python
from advisor import ConfigurationAdvisor, SystemType
advisor = ConfigurationAdvisor()
# Analyze for database workload
config = advisor.analyze(
workload_data={
'read_ratio': 0.8,
'working_set_gb': 50,
'total_data_gb': 500,
'qps': 10000
},
target=SystemType.DATABASE
)
print(config.explanation)
# "Database configured with 12.5GB buffer pool (√n sizing),
# 128MB work memory per operation, and standard checkpointing."
```
## System Types
### 1. Database Configuration
Optimizes PostgreSQL/MySQL settings:
```python
# E-commerce OLTP workload
config = advisor.analyze(
workload_data={
'read_ratio': 0.9,
'working_set_gb': 20,
'total_data_gb': 200,
'qps': 5000,
'connections': 300,
'latency_sla_ms': 50
},
target=SystemType.DATABASE
)
# Generated PostgreSQL config:
# shared_buffers = 5120MB # √n sized if data > memory
# work_mem = 21MB # Per-operation memory
# checkpoint_segments = 16 # Based on write ratio
# max_connections = 600 # 2x concurrent users
```
### 2. JVM Configuration
Tunes heap size, GC, and thread settings:
```python
# Low-latency trading system
config = advisor.analyze(
workload_data={
'latency_sla_ms': 10,
'working_set_gb': 8,
'connections': 100
},
target=SystemType.JVM
)
# Generated JVM flags:
# -Xmx16g -Xms16g # 50% of system memory
# -Xmn512m # √n young generation
# -XX:+UseG1GC # Low-latency GC
# -XX:MaxGCPauseMillis=10 # Match SLA
```
### 3. Kernel Configuration
Optimizes Linux kernel parameters:
```python
# High-throughput web server
config = advisor.analyze(
workload_data={
'request_rate': 50000,
'connections': 10000,
'working_set_gb': 32
},
target=SystemType.KERNEL
)
# Generated sysctl settings:
# vm.dirty_ratio = 20
# vm.swappiness = 60
# net.core.somaxconn = 65535
# net.ipv4.tcp_max_syn_backlog = 65535
```
### 4. Container Configuration
Sets Docker/Kubernetes resource limits:
```python
# Microservice API
config = advisor.analyze(
workload_data={
'working_set_gb': 2,
'connections': 100,
'qps': 1000
},
target=SystemType.CONTAINER
)
# Generated Docker command:
# docker run --memory=3.0g --cpus=100
```
### 5. Application Configuration
Tunes thread pools, caches, and batch sizes:
```python
# Data processing application
config = advisor.analyze(
workload_data={
'working_set_gb': 50,
'connections': 200,
'batch_size': 10000
},
target=SystemType.APPLICATION
)
# Generated settings:
# thread_pool_size: 16 # Based on CPU cores
# connection_pool_size: 200 # Match concurrency
# cache_size: 229,739 # √n entries
# batch_size: 10,000 # Optimized for memory
```
## System Analysis
The advisor automatically profiles your system:
```python
from advisor import SystemAnalyzer
analyzer = SystemAnalyzer()
profile = analyzer.analyze_system()
print(f"CPU: {profile.cpu_count} cores ({profile.cpu_model})")
print(f"Memory: {profile.memory_gb:.1f}GB")
print(f"Storage: {profile.storage_type} ({profile.storage_iops} IOPS)")
print(f"L3 Cache: {profile.l3_cache_mb:.1f}MB")
```
## Workload Analysis
Characterize workloads from metrics or logs:
```python
from advisor import WorkloadAnalyzer
analyzer = WorkloadAnalyzer()
# From metrics
workload = analyzer.analyze_workload(metrics={
'read_ratio': 0.8,
'working_set_gb': 100,
'qps': 10000,
'connections': 500
})
# From logs
workload = analyzer.analyze_workload(logs=[
"SELECT * FROM users WHERE id = 123",
"UPDATE orders SET status = 'shipped'",
# ... more log entries
])
```
## A/B Testing
Compare configurations scientifically:
```python
# Create two configurations
config_a = advisor.analyze(workload_a, target=SystemType.DATABASE)
config_b = advisor.analyze(workload_b, target=SystemType.DATABASE)
# Run A/B test
results = advisor.compare_configs(
[config_a, config_b],
test_duration=300 # 5 minutes
)
for result in results:
print(f"{result.config_name}:")
print(f" Throughput: {result.metrics['throughput']} QPS")
print(f" Latency: {result.metrics['latency']} ms")
print(f" Winner: {'Yes' if result.winner else 'No'}")
```
## Export Configurations
Save configurations in appropriate formats:
```python
# PostgreSQL config file
advisor.export_config(db_config, "postgresql.conf")
# JVM startup script
advisor.export_config(jvm_config, "jvm_startup.sh")
# JSON for other systems
advisor.export_config(app_config, "app_config.json")
```
## √n Optimization Examples
The advisor applies Williams' space-time tradeoffs:
### Database Buffer Pool
For data larger than memory:
- Traditional: Try to cache everything (thrashing)
- √n approach: Cache √(data_size) for optimal performance
- Example: 1TB data → 32GB buffer pool (not 1TB!)
### JVM Young Generation
Balance GC frequency vs pause time:
- Traditional: Fixed percentage (25% of heap)
- √n approach: √(heap_size) for optimal GC
- Example: 64GB heap → 8GB young gen
### Application Cache
Limited memory for caching:
- Traditional: LRU with fixed size
- √n approach: √(total_items) cache entries
- Example: 1B items → 31,622 cache entries
## Real-World Impact
Organizations using these principles:
- **Google**: Bigtable uses √n buffer sizes
- **Facebook**: RocksDB applies similar concepts
- **PostgreSQL**: Shared buffers tuning
- **JVM**: G1GC uses √n heuristics
- **Linux**: Page cache management
## Advanced Usage
### Custom System Types
```python
class CustomConfigGenerator(ConfigurationGenerator):
def generate_custom_config(self, system, workload):
# Apply √n principles to your system
buffer_size = self.sqrt_calc.calculate_optimal_buffer(
workload.total_data_size_gb * 1024
)
return Configuration(...)
```
### Continuous Optimization
```python
# Monitor and adapt over time
while True:
current_metrics = collect_metrics()
if significant_change(current_metrics, last_metrics):
new_config = advisor.analyze(
workload_data=current_metrics,
target=SystemType.DATABASE
)
apply_config(new_config)
time.sleep(3600) # Check hourly
```
## Examples
See [example_advisor.py](example_advisor.py) for comprehensive examples:
- PostgreSQL tuning for OLTP vs OLAP
- JVM configuration for latency vs throughput
- Container resource allocation
- Kernel tuning for different workloads
- A/B testing configurations
- Adaptive configuration over time
## Troubleshooting
### Memory Calculations
- Buffer sizes are capped at available memory
- √n sizing only applied when data > memory
- Consider OS overhead (typically 20% reserved)
### Performance Testing
- A/B tests simulate load (real tests needed)
- Confidence intervals require sufficient samples
- Network conditions affect distributed systems
## Future Enhancements
- Cloud provider specific configs (AWS, GCP, Azure)
- Kubernetes operator for automatic tuning
- Machine learning workload detection
- Integration with monitoring systems
- Automated rollback on regression
## See Also
- [SpaceTimeCore](../core/spacetime_core.py): √n calculations
- [Memory Profiler](../profiler/): Identify bottlenecks