Files
lora/docs/specialists/forge.md
T

157 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Forge — Ruby Specialist LoRA
Adapter codename: `forge`
Agent: `build-ruby`
Base model: Qwen/Qwen3.5-27B
## Objective
Teach the model Ruby/Rails-idiomatic code generation aligned with the build-ruby
agent's system prompt. The adapter should internalize:
- `# frozen_string_literal: true` on all new files
- Symbols over strings for hash keys
- Guard clauses, Result pattern in service objects
- No monkey-patching unless project already does it
- ViewComponent, concerns, service objects, scopes
- standardrb/rubocop → rspec/minitest verification cycle
## Data Sources
### Session extraction (~3060 examples)
Ruby sessions are sparse. Classify by:
- File paths: `.rb`, `.erb`, `.haml`, `Gemfile`, `Rakefile`, `.ruby-version`
- Bash commands: `bundle`, `rails`, `rake`, `rspec`, `rubocop`, `standardrb`
- Error patterns: `NoMethodError`, `NameError`, Rails backtrace format
### Git repo mining (~3050 examples)
Target repos:
- Any Rails projects in madcat-os org or personal repos
- Gem libraries if any exist
Extract patterns:
- ActiveRecord model definitions (validations, associations, scopes)
- Controller implementations (strong params, error handling)
- Service object patterns
- Migration files
- RSpec test patterns
### Synthetic (~60120 examples)
Ruby/Rails is the most synthetic-heavy adapter due to sparse session data:
- Rails 8 + Hotwire/Turbo patterns
- Service object with Result pattern
- ActiveRecord query optimization (N+1, eager loading)
- Concern extraction and composition
- API-only Rails setup
- Background job patterns (Sidekiq, Solid Queue)
- ViewComponent implementations
- RSpec: let/subject/shared_examples patterns
## Example Format
```json
{
"messages": [
{
"role": "system",
"content": "<build-ruby agent system prompt>"
},
{
"role": "user",
"content": "Create a service object for user registration with email verification"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "read",
"arguments": {"filePath": "app/services"}
}
}
]
},
{
"role": "tool",
"content": "<directory listing showing existing service pattern>"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "write",
"arguments": {
"filePath": "app/services/users/register.rb",
"content": "# frozen_string_literal: true\n\nmodule Users\n class Register\n Result = Data.define(:user, :errors)\n ..."
}
}
}
]
},
{
"role": "tool",
"content": "File written"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "bash",
"arguments": {"command": "bundle exec standardrb app/services/users/register.rb"}
}
}
]
},
{
"role": "tool",
"content": "0 offenses"
},
{
"role": "assistant",
"content": "Service object created. Uses Result pattern (Data.define), guard clauses, frozen string literal. Standardrb clean."
}
]
}
```
## Evaluation Criteria
1. All files have `# frozen_string_literal: true`
2. Symbol keys in hashes (not strings)
3. Guard clauses for early returns
4. Service objects use Result/value objects, not raised exceptions
5. `bundle exec standardrb` or `rubocop` passes
6. Rails conventions: scopes over class methods, concerns for shared behavior
7. Tool call sequence: explore → read → implement → lint → test
## Training Config Overrides
```python
MAX_SEQ = 8192
LR = 5e-5
```
## Estimated Size
- 100200 examples (synthetic-heavy)
- ~0.8M tokens
- Training time: ~1 hr on H100
- Adapter size: ~305 MB
## Risk: Synthetic Quality
With >50% synthetic data, risk of hallucinated gem names or outdated Rails patterns.
Mitigation: curate synthetic examples manually, verify all gem references exist,
test generated code against a real Rails 8 scaffold.