324 lines
8.0 KiB
Markdown
324 lines
8.0 KiB
Markdown
# SpaceTime Configuration Advisor
|
|
|
|
Intelligent system configuration advisor that applies Williams' √n space-time tradeoffs to optimize database, JVM, kernel, container, and application settings.
|
|
|
|
## Features
|
|
|
|
- **System Analysis**: Comprehensive hardware profiling (CPU, memory, storage, network)
|
|
- **Workload Characterization**: Analyze access patterns and resource requirements
|
|
- **Multi-System Support**: Database, JVM, kernel, container, and application configs
|
|
- **√n Optimization**: Apply theoretical bounds to real-world settings
|
|
- **A/B Testing**: Compare configurations with statistical confidence
|
|
- **AI Explanations**: Clear reasoning for each recommendation
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# From sqrtspace-tools root directory
|
|
pip install -r requirements-minimal.txt
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
from advisor import ConfigurationAdvisor, SystemType
|
|
|
|
advisor = ConfigurationAdvisor()
|
|
|
|
# Analyze for database workload
|
|
config = advisor.analyze(
|
|
workload_data={
|
|
'read_ratio': 0.8,
|
|
'working_set_gb': 50,
|
|
'total_data_gb': 500,
|
|
'qps': 10000
|
|
},
|
|
target=SystemType.DATABASE
|
|
)
|
|
|
|
print(config.explanation)
|
|
# "Database configured with 12.5GB buffer pool (√n sizing),
|
|
# 128MB work memory per operation, and standard checkpointing."
|
|
```
|
|
|
|
## System Types
|
|
|
|
### 1. Database Configuration
|
|
Optimizes PostgreSQL/MySQL settings:
|
|
|
|
```python
|
|
# E-commerce OLTP workload
|
|
config = advisor.analyze(
|
|
workload_data={
|
|
'read_ratio': 0.9,
|
|
'working_set_gb': 20,
|
|
'total_data_gb': 200,
|
|
'qps': 5000,
|
|
'connections': 300,
|
|
'latency_sla_ms': 50
|
|
},
|
|
target=SystemType.DATABASE
|
|
)
|
|
|
|
# Generated PostgreSQL config:
|
|
# shared_buffers = 5120MB # √n sized if data > memory
|
|
# work_mem = 21MB # Per-operation memory
|
|
# checkpoint_segments = 16 # Based on write ratio
|
|
# max_connections = 600 # 2x concurrent users
|
|
```
|
|
|
|
### 2. JVM Configuration
|
|
Tunes heap size, GC, and thread settings:
|
|
|
|
```python
|
|
# Low-latency trading system
|
|
config = advisor.analyze(
|
|
workload_data={
|
|
'latency_sla_ms': 10,
|
|
'working_set_gb': 8,
|
|
'connections': 100
|
|
},
|
|
target=SystemType.JVM
|
|
)
|
|
|
|
# Generated JVM flags:
|
|
# -Xmx16g -Xms16g # 50% of system memory
|
|
# -Xmn512m # √n young generation
|
|
# -XX:+UseG1GC # Low-latency GC
|
|
# -XX:MaxGCPauseMillis=10 # Match SLA
|
|
```
|
|
|
|
### 3. Kernel Configuration
|
|
Optimizes Linux kernel parameters:
|
|
|
|
```python
|
|
# High-throughput web server
|
|
config = advisor.analyze(
|
|
workload_data={
|
|
'request_rate': 50000,
|
|
'connections': 10000,
|
|
'working_set_gb': 32
|
|
},
|
|
target=SystemType.KERNEL
|
|
)
|
|
|
|
# Generated sysctl settings:
|
|
# vm.dirty_ratio = 20
|
|
# vm.swappiness = 60
|
|
# net.core.somaxconn = 65535
|
|
# net.ipv4.tcp_max_syn_backlog = 65535
|
|
```
|
|
|
|
### 4. Container Configuration
|
|
Sets Docker/Kubernetes resource limits:
|
|
|
|
```python
|
|
# Microservice API
|
|
config = advisor.analyze(
|
|
workload_data={
|
|
'working_set_gb': 2,
|
|
'connections': 100,
|
|
'qps': 1000
|
|
},
|
|
target=SystemType.CONTAINER
|
|
)
|
|
|
|
# Generated Docker command:
|
|
# docker run --memory=3.0g --cpus=100
|
|
```
|
|
|
|
### 5. Application Configuration
|
|
Tunes thread pools, caches, and batch sizes:
|
|
|
|
```python
|
|
# Data processing application
|
|
config = advisor.analyze(
|
|
workload_data={
|
|
'working_set_gb': 50,
|
|
'connections': 200,
|
|
'batch_size': 10000
|
|
},
|
|
target=SystemType.APPLICATION
|
|
)
|
|
|
|
# Generated settings:
|
|
# thread_pool_size: 16 # Based on CPU cores
|
|
# connection_pool_size: 200 # Match concurrency
|
|
# cache_size: 229,739 # √n entries
|
|
# batch_size: 10,000 # Optimized for memory
|
|
```
|
|
|
|
## System Analysis
|
|
|
|
The advisor automatically profiles your system:
|
|
|
|
```python
|
|
from advisor import SystemAnalyzer
|
|
|
|
analyzer = SystemAnalyzer()
|
|
profile = analyzer.analyze_system()
|
|
|
|
print(f"CPU: {profile.cpu_count} cores ({profile.cpu_model})")
|
|
print(f"Memory: {profile.memory_gb:.1f}GB")
|
|
print(f"Storage: {profile.storage_type} ({profile.storage_iops} IOPS)")
|
|
print(f"L3 Cache: {profile.l3_cache_mb:.1f}MB")
|
|
```
|
|
|
|
## Workload Analysis
|
|
|
|
Characterize workloads from metrics or logs:
|
|
|
|
```python
|
|
from advisor import WorkloadAnalyzer
|
|
|
|
analyzer = WorkloadAnalyzer()
|
|
|
|
# From metrics
|
|
workload = analyzer.analyze_workload(metrics={
|
|
'read_ratio': 0.8,
|
|
'working_set_gb': 100,
|
|
'qps': 10000,
|
|
'connections': 500
|
|
})
|
|
|
|
# From logs
|
|
workload = analyzer.analyze_workload(logs=[
|
|
"SELECT * FROM users WHERE id = 123",
|
|
"UPDATE orders SET status = 'shipped'",
|
|
# ... more log entries
|
|
])
|
|
```
|
|
|
|
## A/B Testing
|
|
|
|
Compare configurations scientifically:
|
|
|
|
```python
|
|
# Create two configurations
|
|
config_a = advisor.analyze(workload_a, target=SystemType.DATABASE)
|
|
config_b = advisor.analyze(workload_b, target=SystemType.DATABASE)
|
|
|
|
# Run A/B test
|
|
results = advisor.compare_configs(
|
|
[config_a, config_b],
|
|
test_duration=300 # 5 minutes
|
|
)
|
|
|
|
for result in results:
|
|
print(f"{result.config_name}:")
|
|
print(f" Throughput: {result.metrics['throughput']} QPS")
|
|
print(f" Latency: {result.metrics['latency']} ms")
|
|
print(f" Winner: {'Yes' if result.winner else 'No'}")
|
|
```
|
|
|
|
## Export Configurations
|
|
|
|
Save configurations in appropriate formats:
|
|
|
|
```python
|
|
# PostgreSQL config file
|
|
advisor.export_config(db_config, "postgresql.conf")
|
|
|
|
# JVM startup script
|
|
advisor.export_config(jvm_config, "jvm_startup.sh")
|
|
|
|
# JSON for other systems
|
|
advisor.export_config(app_config, "app_config.json")
|
|
```
|
|
|
|
## √n Optimization Examples
|
|
|
|
The advisor applies Williams' space-time tradeoffs:
|
|
|
|
### Database Buffer Pool
|
|
For data larger than memory:
|
|
- Traditional: Try to cache everything (thrashing)
|
|
- √n approach: Cache √(data_size) for optimal performance
|
|
- Example: 1TB data → 32GB buffer pool (not 1TB!)
|
|
|
|
### JVM Young Generation
|
|
Balance GC frequency vs pause time:
|
|
- Traditional: Fixed percentage (25% of heap)
|
|
- √n approach: √(heap_size) for optimal GC
|
|
- Example: 64GB heap → 8GB young gen
|
|
|
|
### Application Cache
|
|
Limited memory for caching:
|
|
- Traditional: LRU with fixed size
|
|
- √n approach: √(total_items) cache entries
|
|
- Example: 1B items → 31,622 cache entries
|
|
|
|
## Real-World Impact
|
|
|
|
Organizations using these principles:
|
|
- **Google**: Bigtable uses √n buffer sizes
|
|
- **Facebook**: RocksDB applies similar concepts
|
|
- **PostgreSQL**: Shared buffers tuning
|
|
- **JVM**: G1GC uses √n heuristics
|
|
- **Linux**: Page cache management
|
|
|
|
## Advanced Usage
|
|
|
|
### Custom System Types
|
|
|
|
```python
|
|
class CustomConfigGenerator(ConfigurationGenerator):
|
|
def generate_custom_config(self, system, workload):
|
|
# Apply √n principles to your system
|
|
buffer_size = self.sqrt_calc.calculate_optimal_buffer(
|
|
workload.total_data_size_gb * 1024
|
|
)
|
|
return Configuration(...)
|
|
```
|
|
|
|
### Continuous Optimization
|
|
|
|
```python
|
|
# Monitor and adapt over time
|
|
while True:
|
|
current_metrics = collect_metrics()
|
|
|
|
if significant_change(current_metrics, last_metrics):
|
|
new_config = advisor.analyze(
|
|
workload_data=current_metrics,
|
|
target=SystemType.DATABASE
|
|
)
|
|
apply_config(new_config)
|
|
|
|
time.sleep(3600) # Check hourly
|
|
```
|
|
|
|
## Examples
|
|
|
|
See [example_advisor.py](example_advisor.py) for comprehensive examples:
|
|
- PostgreSQL tuning for OLTP vs OLAP
|
|
- JVM configuration for latency vs throughput
|
|
- Container resource allocation
|
|
- Kernel tuning for different workloads
|
|
- A/B testing configurations
|
|
- Adaptive configuration over time
|
|
|
|
## Troubleshooting
|
|
|
|
### Memory Calculations
|
|
- Buffer sizes are capped at available memory
|
|
- √n sizing only applied when data > memory
|
|
- Consider OS overhead (typically 20% reserved)
|
|
|
|
### Performance Testing
|
|
- A/B tests simulate load (real tests needed)
|
|
- Confidence intervals require sufficient samples
|
|
- Network conditions affect distributed systems
|
|
|
|
## Future Enhancements
|
|
|
|
- Cloud provider specific configs (AWS, GCP, Azure)
|
|
- Kubernetes operator for automatic tuning
|
|
- Machine learning workload detection
|
|
- Integration with monitoring systems
|
|
- Automated rollback on regression
|
|
|
|
## See Also
|
|
|
|
- [SpaceTimeCore](../core/spacetime_core.py): √n calculations
|
|
- [Memory Profiler](../profiler/): Identify bottlenecks |