Files

4.1 KiB
Raw Permalink Blame History

Forge — Ruby Specialist LoRA

Adapter codename: forge Agent: build-ruby Base model: Qwen/Qwen3.5-27B

Objective

Teach the model Ruby/Rails-idiomatic code generation aligned with the build-ruby agent's system prompt. The adapter should internalize:

  • # frozen_string_literal: true on all new files
  • Symbols over strings for hash keys
  • Guard clauses, Result pattern in service objects
  • No monkey-patching unless project already does it
  • ViewComponent, concerns, service objects, scopes
  • standardrb/rubocop → rspec/minitest verification cycle

Data Sources

Session extraction (~3060 examples)

Ruby sessions are sparse. Classify by:

  • File paths: .rb, .erb, .haml, Gemfile, Rakefile, .ruby-version
  • Bash commands: bundle, rails, rake, rspec, rubocop, standardrb
  • Error patterns: NoMethodError, NameError, Rails backtrace format

Git repo mining (~3050 examples)

Target repos:

  • Any Rails projects in madcat-os org or personal repos
  • Gem libraries if any exist

Extract patterns:

  • ActiveRecord model definitions (validations, associations, scopes)
  • Controller implementations (strong params, error handling)
  • Service object patterns
  • Migration files
  • RSpec test patterns

Synthetic (~60120 examples)

Ruby/Rails is the most synthetic-heavy adapter due to sparse session data:

  • Rails 8 + Hotwire/Turbo patterns
  • Service object with Result pattern
  • ActiveRecord query optimization (N+1, eager loading)
  • Concern extraction and composition
  • API-only Rails setup
  • Background job patterns (Sidekiq, Solid Queue)
  • ViewComponent implementations
  • RSpec: let/subject/shared_examples patterns

Example Format

{
  "messages": [
    {
      "role": "system",
      "content": "<build-ruby agent system prompt>"
    },
    {
      "role": "user",
      "content": "Create a service object for user registration with email verification"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "read",
            "arguments": {"filePath": "app/services"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "<directory listing showing existing service pattern>"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "write",
            "arguments": {
              "filePath": "app/services/users/register.rb",
              "content": "# frozen_string_literal: true\n\nmodule Users\n  class Register\n    Result = Data.define(:user, :errors)\n    ..."
            }
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "File written"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "bash",
            "arguments": {"command": "bundle exec standardrb app/services/users/register.rb"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "0 offenses"
    },
    {
      "role": "assistant",
      "content": "Service object created. Uses Result pattern (Data.define), guard clauses, frozen string literal. Standardrb clean."
    }
  ]
}

Evaluation Criteria

  1. All files have # frozen_string_literal: true
  2. Symbol keys in hashes (not strings)
  3. Guard clauses for early returns
  4. Service objects use Result/value objects, not raised exceptions
  5. bundle exec standardrb or rubocop passes
  6. Rails conventions: scopes over class methods, concerns for shared behavior
  7. Tool call sequence: explore → read → implement → lint → test

Training Config Overrides

MAX_SEQ = 8192
LR      = 5e-5

Estimated Size

  • 100200 examples (synthetic-heavy)
  • ~0.8M tokens
  • Training time: ~1 hr on H100
  • Adapter size: ~305 MB

Risk: Synthetic Quality

With >50% synthetic data, risk of hallucinated gem names or outdated Rails patterns. Mitigation: curate synthetic examples manually, verify all gem references exist, test generated code against a real Rails 8 scaffold.