Files
lora/docs/specialists/oxidizer.md
T

152 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Oxidizer — Rust Specialist LoRA
Adapter codename: `oxidizer`
Agent: `build-rust`
Base model: Qwen/Qwen3.5-27B
## Objective
Teach the model Rust-idiomatic code generation aligned with the build-rust agent's
system prompt constraints. The adapter should internalize:
- `Result<T, E>` everywhere, no `unwrap()` in library code
- `thiserror` for libs, `anyhow` for binaries
- `tokio` 1.x async, `tracing` for logging
- Edition 2024, no `Box<dyn Error>`, no unjustified `unsafe`
- cargo fmt → clippy → test verification cycle
- Workspace-aware Cargo.toml patterns
## Data Sources
### Session extraction (~150250 examples)
Classify from opencode `build` sessions by:
- File paths containing `.rs`, `Cargo.toml`, `Cargo.lock`
- Bash commands: `cargo build`, `cargo test`, `cargo clippy`, `cargo fmt`, `cargo add`
- Compiler output patterns: `error[E`, `warning:`, `rustc`
- Tool calls editing `.rs` files
### Git repo mining (~100200 examples)
Target repos:
- `madcat-os/tengu` — Rust mesh daemon, async networking, MQTT
- `madcat-os/madcat-core` — core library, data structures
- `madcat-os/madcat-tts` — TTS pipeline, audio processing
- Any other `Cargo.toml`-rooted repos in madcat-os org
Extract patterns:
- Bug fixes with before/after (error diagnosis training)
- Refactors (code improvement patterns)
- New feature implementations (scaffold → impl → test)
- Dependency additions with rationale
### Synthetic (~50100 examples)
Generate for gaps:
- Lifetime annotation scenarios
- Trait object vs generics decisions
- `Pin<Box<dyn Future>>` patterns
- Cross-crate error propagation
- Workspace dependency management
- `#[derive]` and proc macro usage
## Example Format
```json
{
"messages": [
{
"role": "system",
"content": "<build-rust agent system prompt>"
},
{
"role": "user",
"content": "Add error handling to the MQTT connection retry loop in src/mqtt.rs"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "read",
"arguments": {"filePath": "/path/to/src/mqtt.rs"}
}
}
]
},
{
"role": "tool",
"content": "<file contents>"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "edit",
"arguments": {
"filePath": "/path/to/src/mqtt.rs",
"oldString": "<original code>",
"newString": "<fixed code with Result<T, E>>"
}
}
}
]
},
{
"role": "tool",
"content": "Edit applied successfully"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "bash",
"arguments": {"command": "cargo clippy --all-targets -- -D warnings"}
}
}
]
},
{
"role": "tool",
"content": "<clippy output — clean>"
},
{
"role": "assistant",
"content": "Error handling added. Connection retry now returns `Result<MqttClient, ConnectionError>` with exponential backoff. Clippy clean."
}
]
}
```
## Evaluation Criteria
1. Generated Rust code compiles with `cargo build`
2. No `unwrap()` in non-test code
3. Uses `?` propagation, not manual match-on-error
4. Correct lifetime annotations (no unnecessary `'static`)
5. `cargo clippy -- -D warnings` passes
6. Appropriate crate recommendations (tokio, serde, tracing, etc.)
7. Tool call sequence: read → edit → verify (fmt/clippy/test)
## Training Config Overrides
```python
MAX_SEQ = 8192 # Rust files can be long
LR = 5e-5 # Lower LR for code — less style drift from base
```
## Estimated Size
- 300500 examples total
- ~2M tokens at avg 4K tokens/example
- Training time: ~23 hrs on H100
- Adapter size: ~305 MB