mirror of
https://github.com/johndoe6345789/MetalOS.git
synced 2026-04-25 14:15:24 +00:00
278 lines
6.5 KiB
Markdown
278 lines
6.5 KiB
Markdown
# MetalOS - Simple Multicore Support
|
|
|
|
## Overview
|
|
|
|
MetalOS now includes basic SMP (Symmetric Multi-Processing) support to utilize all available CPU cores. This provides better performance on modern multi-core processors.
|
|
|
|
## Features
|
|
|
|
### Supported Hardware
|
|
- **CPU Cores**: Up to 16 logical processors
|
|
- **Tested on**: 6-core, 12-thread systems (Intel/AMD)
|
|
- **Architecture**: x86_64 with APIC support
|
|
|
|
### Components
|
|
|
|
#### 1. APIC (Advanced Programmable Interrupt Controller)
|
|
- **File**: `kernel/src/apic.c`, `kernel/include/kernel/apic.h`
|
|
- **Purpose**: Per-CPU interrupt handling
|
|
- **Features**:
|
|
- Local APIC initialization
|
|
- Inter-Processor Interrupts (IPI)
|
|
- APIC ID detection
|
|
- EOI (End of Interrupt) handling
|
|
|
|
#### 2. SMP Initialization
|
|
- **File**: `kernel/src/smp.c`, `kernel/include/kernel/smp.h`
|
|
- **Purpose**: Detect and start secondary CPUs
|
|
- **Features**:
|
|
- CPU detection (up to 16 cores)
|
|
- AP (Application Processor) startup via SIPI
|
|
- Per-CPU data structures
|
|
- CPU online/offline tracking
|
|
|
|
#### 3. AP Trampoline
|
|
- **File**: `kernel/src/ap_trampoline.asm`
|
|
- **Purpose**: Real-mode startup code for secondary CPUs
|
|
- **Features**:
|
|
- 16-bit to 64-bit mode transition
|
|
- GDT setup for APs
|
|
- Long mode activation
|
|
|
|
#### 4. Spinlocks
|
|
- **File**: `kernel/src/spinlock.c`, `kernel/include/kernel/spinlock.h`
|
|
- **Purpose**: Multicore synchronization
|
|
- **Features**:
|
|
- Atomic lock/unlock operations
|
|
- Pause instruction for efficiency
|
|
- Try-lock support
|
|
|
|
## Usage
|
|
|
|
### Initialization
|
|
|
|
The SMP system is automatically initialized in `kernel_main()`:
|
|
|
|
```c
|
|
void kernel_main(BootInfo* boot_info) {
|
|
// ... other initialization ...
|
|
|
|
// Initialize SMP - starts all CPU cores
|
|
smp_init();
|
|
|
|
// Check how many cores are online
|
|
uint8_t num_cpus = smp_get_cpu_count();
|
|
|
|
// ... continue ...
|
|
}
|
|
```
|
|
|
|
### Getting Current CPU
|
|
|
|
```c
|
|
uint8_t cpu_id = smp_get_current_cpu();
|
|
```
|
|
|
|
### Using Spinlocks
|
|
|
|
```c
|
|
spinlock_t my_lock;
|
|
|
|
// Initialize
|
|
spinlock_init(&my_lock);
|
|
|
|
// Critical section
|
|
spinlock_acquire(&my_lock);
|
|
// ... protected code ...
|
|
spinlock_release(&my_lock);
|
|
```
|
|
|
|
### Checking SMP Status
|
|
|
|
```c
|
|
if (smp_is_enabled()) {
|
|
// Multicore mode
|
|
} else {
|
|
// Single core fallback
|
|
}
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Boot Sequence
|
|
|
|
1. **BSP (Bootstrap Processor)** boots normally
|
|
2. **smp_init()** called by BSP
|
|
3. **APIC detection** - check if hardware supports APIC
|
|
4. **AP discovery** - detect additional CPU cores
|
|
5. **For each AP**:
|
|
- Copy trampoline code to low memory (0x8000)
|
|
- Send INIT IPI
|
|
- Send SIPI (Startup IPI) twice
|
|
- Wait for AP to come online
|
|
6. **APs enter 64-bit mode** and mark themselves online
|
|
|
|
### Memory Layout
|
|
|
|
```
|
|
Low Memory:
|
|
0x8000 - 0x8FFF : AP trampoline code (real mode)
|
|
|
|
High Memory:
|
|
Per-CPU stacks (future enhancement)
|
|
Shared kernel code and data
|
|
```
|
|
|
|
### Interrupt Handling
|
|
|
|
- **Legacy PIC**: Used in single-core fallback mode
|
|
- **APIC**: Used when SMP is enabled
|
|
- **Auto-detection**: Kernel automatically switches based on availability
|
|
|
|
## Performance
|
|
|
|
### Improvements
|
|
- **Parallel Processing**: All cores available for work distribution
|
|
- **Better Throughput**: Can handle multiple tasks simultaneously
|
|
- **Future-Ready**: Foundation for parallel QT6 rendering
|
|
|
|
### Current Limitations
|
|
- **Single Application**: Only BSP runs main application
|
|
- **No Work Distribution**: APs idle after initialization (future: work stealing)
|
|
- **Simple Synchronization**: Basic spinlocks only
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Features
|
|
- [ ] Per-CPU timer interrupts
|
|
- [ ] Work queue for distributing tasks to APs
|
|
- [ ] Parallel framebuffer rendering
|
|
- [ ] Load balancing for QT6 event processing
|
|
- [ ] Per-CPU kernel stacks
|
|
|
|
### Potential Optimizations
|
|
- [ ] MWAIT/MONITOR for power-efficient idle
|
|
- [ ] CPU affinity for specific tasks
|
|
- [ ] NUMA awareness (if needed)
|
|
|
|
## Configuration
|
|
|
|
### Build Options
|
|
|
|
All SMP features are enabled by default. The system automatically falls back to single-core mode if:
|
|
- APIC is not available
|
|
- No additional CPUs detected
|
|
- SMP initialization fails
|
|
|
|
### Maximum CPUs
|
|
|
|
Edit `kernel/include/kernel/smp.h`:
|
|
|
|
```c
|
|
#define MAX_CPUS 16 // Change to support more CPUs
|
|
```
|
|
|
|
## Debugging
|
|
|
|
### Check CPU Count
|
|
|
|
After boot, the kernel has detected and initialized all cores. You can check:
|
|
|
|
```c
|
|
uint8_t count = smp_get_cpu_count();
|
|
// count = number of online CPUs (typically 6-12 for 6-core/12-thread)
|
|
```
|
|
|
|
### Per-CPU Information
|
|
|
|
```c
|
|
cpu_info_t* info = smp_get_cpu_info(cpu_id);
|
|
if (info) {
|
|
// info->cpu_id
|
|
// info->apic_id
|
|
// info->online
|
|
}
|
|
```
|
|
|
|
## Technical Details
|
|
|
|
### APIC Registers
|
|
- **Base Address**: 0xFEE00000 (default)
|
|
- **Register Access**: Memory-mapped I/O
|
|
- **Key Registers**:
|
|
- `0x020`: APIC ID
|
|
- `0x0B0`: EOI register
|
|
- `0x300/0x310`: ICR (Inter-Processor Interrupt)
|
|
|
|
### IPI Protocol
|
|
1. **INIT IPI**: Reset AP to known state
|
|
2. **Wait**: 10ms delay
|
|
3. **SIPI #1**: Send startup vector (page number of trampoline)
|
|
4. **Wait**: 200μs delay
|
|
5. **SIPI #2**: Send startup vector again (per Intel spec)
|
|
6. **Wait**: Poll for AP online (up to 1 second timeout)
|
|
|
|
### Synchronization
|
|
- **Spinlocks**: Using x86 `xchg` instruction (atomic)
|
|
- **Memory Barriers**: Compiler barriers for ordering
|
|
- **Pause**: `pause` instruction in spin loops for efficiency
|
|
|
|
## Examples
|
|
|
|
### Parallel Work Distribution (Future)
|
|
|
|
```c
|
|
// Not yet implemented - shows intended usage
|
|
typedef void (*work_func_t)(void* data);
|
|
|
|
void distribute_work(work_func_t func, void* data) {
|
|
uint8_t num_cpus = smp_get_cpu_count();
|
|
|
|
// Divide work among available CPUs
|
|
for (uint8_t i = 1; i < num_cpus; i++) {
|
|
// Queue work for CPU i
|
|
schedule_on_cpu(i, func, data);
|
|
}
|
|
|
|
// BSP does its share
|
|
func(data);
|
|
}
|
|
```
|
|
|
|
### Per-CPU Data Access
|
|
|
|
```c
|
|
// Get data for current CPU
|
|
uint8_t cpu = smp_get_current_cpu();
|
|
per_cpu_data_t* data = &per_cpu_array[cpu];
|
|
```
|
|
|
|
## Compatibility
|
|
|
|
### Single-Core Systems
|
|
- Automatically detected and handled
|
|
- Falls back to legacy PIC mode
|
|
- No performance penalty
|
|
|
|
### Hyper-Threading
|
|
- Treats logical processors as separate CPUs
|
|
- All threads initialized and available
|
|
- Works on 6-core/12-thread systems
|
|
|
|
### Virtual Machines
|
|
- Works in QEMU, VirtualBox, VMware
|
|
- May need to enable APIC in VM settings
|
|
- Performance varies by hypervisor
|
|
|
|
## Binary Size Impact
|
|
|
|
- **Additional Code**: ~8 KB (SMP + APIC + spinlocks)
|
|
- **Total Kernel**: 22 KB (was 16 KB)
|
|
- **Still Well Under Target**: < 150 KB goal
|
|
|
|
## References
|
|
|
|
- Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3
|
|
- AMD64 Architecture Programmer's Manual, Volume 2
|
|
- OSDev Wiki: SMP, APIC, Trampoline
|