Files

3.9 KiB
Raw Permalink Blame History

Oxidizer — Rust Specialist LoRA

Adapter codename: oxidizer Agent: build-rust Base model: Qwen/Qwen3.5-27B

Objective

Teach the model Rust-idiomatic code generation aligned with the build-rust agent's system prompt constraints. The adapter should internalize:

  • Result<T, E> everywhere, no unwrap() in library code
  • thiserror for libs, anyhow for binaries
  • tokio 1.x async, tracing for logging
  • Edition 2024, no Box<dyn Error>, no unjustified unsafe
  • cargo fmt → clippy → test verification cycle
  • Workspace-aware Cargo.toml patterns

Data Sources

Session extraction (~150250 examples)

Classify from opencode build sessions by:

  • File paths containing .rs, Cargo.toml, Cargo.lock
  • Bash commands: cargo build, cargo test, cargo clippy, cargo fmt, cargo add
  • Compiler output patterns: error[E, warning:, rustc
  • Tool calls editing .rs files

Git repo mining (~100200 examples)

Target repos:

  • madcat-os/tengu — Rust mesh daemon, async networking, MQTT
  • madcat-os/madcat-core — core library, data structures
  • madcat-os/madcat-tts — TTS pipeline, audio processing
  • Any other Cargo.toml-rooted repos in madcat-os org

Extract patterns:

  • Bug fixes with before/after (error diagnosis training)
  • Refactors (code improvement patterns)
  • New feature implementations (scaffold → impl → test)
  • Dependency additions with rationale

Synthetic (~50100 examples)

Generate for gaps:

  • Lifetime annotation scenarios
  • Trait object vs generics decisions
  • Pin<Box<dyn Future>> patterns
  • Cross-crate error propagation
  • Workspace dependency management
  • #[derive] and proc macro usage

Example Format

{
  "messages": [
    {
      "role": "system",
      "content": "<build-rust agent system prompt>"
    },
    {
      "role": "user",
      "content": "Add error handling to the MQTT connection retry loop in src/mqtt.rs"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "read",
            "arguments": {"filePath": "/path/to/src/mqtt.rs"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "<file contents>"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "edit",
            "arguments": {
              "filePath": "/path/to/src/mqtt.rs",
              "oldString": "<original code>",
              "newString": "<fixed code with Result<T, E>>"
            }
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "Edit applied successfully"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "bash",
            "arguments": {"command": "cargo clippy --all-targets -- -D warnings"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "<clippy output — clean>"
    },
    {
      "role": "assistant",
      "content": "Error handling added. Connection retry now returns `Result<MqttClient, ConnectionError>` with exponential backoff. Clippy clean."
    }
  ]
}

Evaluation Criteria

  1. Generated Rust code compiles with cargo build
  2. No unwrap() in non-test code
  3. Uses ? propagation, not manual match-on-error
  4. Correct lifetime annotations (no unnecessary 'static)
  5. cargo clippy -- -D warnings passes
  6. Appropriate crate recommendations (tokio, serde, tracing, etc.)
  7. Tool call sequence: read → edit → verify (fmt/clippy/test)

Training Config Overrides

MAX_SEQ = 8192   # Rust files can be long
LR      = 5e-5   # Lower LR for code — less style drift from base

Estimated Size

  • 300500 examples total
  • ~2M tokens at avg 4K tokens/example
  • Training time: ~23 hrs on H100
  • Adapter size: ~305 MB