152 lines
3.9 KiB
Markdown
152 lines
3.9 KiB
Markdown
# Oxidizer — Rust Specialist LoRA
|
||
|
||
Adapter codename: `oxidizer`
|
||
Agent: `build-rust`
|
||
Base model: Qwen/Qwen3.5-27B
|
||
|
||
## Objective
|
||
|
||
Teach the model Rust-idiomatic code generation aligned with the build-rust agent's
|
||
system prompt constraints. The adapter should internalize:
|
||
|
||
- `Result<T, E>` everywhere, no `unwrap()` in library code
|
||
- `thiserror` for libs, `anyhow` for binaries
|
||
- `tokio` 1.x async, `tracing` for logging
|
||
- Edition 2024, no `Box<dyn Error>`, no unjustified `unsafe`
|
||
- cargo fmt → clippy → test verification cycle
|
||
- Workspace-aware Cargo.toml patterns
|
||
|
||
## Data Sources
|
||
|
||
### Session extraction (~150–250 examples)
|
||
|
||
Classify from opencode `build` sessions by:
|
||
- File paths containing `.rs`, `Cargo.toml`, `Cargo.lock`
|
||
- Bash commands: `cargo build`, `cargo test`, `cargo clippy`, `cargo fmt`, `cargo add`
|
||
- Compiler output patterns: `error[E`, `warning:`, `rustc`
|
||
- Tool calls editing `.rs` files
|
||
|
||
### Git repo mining (~100–200 examples)
|
||
|
||
Target repos:
|
||
- `madcat-os/tengu` — Rust mesh daemon, async networking, MQTT
|
||
- `madcat-os/madcat-core` — core library, data structures
|
||
- `madcat-os/madcat-tts` — TTS pipeline, audio processing
|
||
- Any other `Cargo.toml`-rooted repos in madcat-os org
|
||
|
||
Extract patterns:
|
||
- Bug fixes with before/after (error diagnosis training)
|
||
- Refactors (code improvement patterns)
|
||
- New feature implementations (scaffold → impl → test)
|
||
- Dependency additions with rationale
|
||
|
||
### Synthetic (~50–100 examples)
|
||
|
||
Generate for gaps:
|
||
- Lifetime annotation scenarios
|
||
- Trait object vs generics decisions
|
||
- `Pin<Box<dyn Future>>` patterns
|
||
- Cross-crate error propagation
|
||
- Workspace dependency management
|
||
- `#[derive]` and proc macro usage
|
||
|
||
## Example Format
|
||
|
||
```json
|
||
{
|
||
"messages": [
|
||
{
|
||
"role": "system",
|
||
"content": "<build-rust agent system prompt>"
|
||
},
|
||
{
|
||
"role": "user",
|
||
"content": "Add error handling to the MQTT connection retry loop in src/mqtt.rs"
|
||
},
|
||
{
|
||
"role": "assistant",
|
||
"content": null,
|
||
"tool_calls": [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "read",
|
||
"arguments": {"filePath": "/path/to/src/mqtt.rs"}
|
||
}
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"role": "tool",
|
||
"content": "<file contents>"
|
||
},
|
||
{
|
||
"role": "assistant",
|
||
"content": null,
|
||
"tool_calls": [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "edit",
|
||
"arguments": {
|
||
"filePath": "/path/to/src/mqtt.rs",
|
||
"oldString": "<original code>",
|
||
"newString": "<fixed code with Result<T, E>>"
|
||
}
|
||
}
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"role": "tool",
|
||
"content": "Edit applied successfully"
|
||
},
|
||
{
|
||
"role": "assistant",
|
||
"content": null,
|
||
"tool_calls": [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "bash",
|
||
"arguments": {"command": "cargo clippy --all-targets -- -D warnings"}
|
||
}
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"role": "tool",
|
||
"content": "<clippy output — clean>"
|
||
},
|
||
{
|
||
"role": "assistant",
|
||
"content": "Error handling added. Connection retry now returns `Result<MqttClient, ConnectionError>` with exponential backoff. Clippy clean."
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
## Evaluation Criteria
|
||
|
||
1. Generated Rust code compiles with `cargo build`
|
||
2. No `unwrap()` in non-test code
|
||
3. Uses `?` propagation, not manual match-on-error
|
||
4. Correct lifetime annotations (no unnecessary `'static`)
|
||
5. `cargo clippy -- -D warnings` passes
|
||
6. Appropriate crate recommendations (tokio, serde, tracing, etc.)
|
||
7. Tool call sequence: read → edit → verify (fmt/clippy/test)
|
||
|
||
## Training Config Overrides
|
||
|
||
```python
|
||
MAX_SEQ = 8192 # Rust files can be long
|
||
LR = 5e-5 # Lower LR for code — less style drift from base
|
||
```
|
||
|
||
## Estimated Size
|
||
|
||
- 300–500 examples total
|
||
- ~2M tokens at avg 4K tokens/example
|
||
- Training time: ~2–3 hrs on H100
|
||
- Adapter size: ~305 MB
|