add docs: system lora plan, specialist specs, training review

This commit is contained in:
marauder-actual
2026-05-31 11:38:46 +02:00
parent 4678816795
commit 4cef9386b1
23 changed files with 62713 additions and 0 deletions
+156
View File
@@ -0,0 +1,156 @@
# Forge — Ruby Specialist LoRA
Adapter codename: `forge`
Agent: `build-ruby`
Base model: Qwen/Qwen3.5-27B
## Objective
Teach the model Ruby/Rails-idiomatic code generation aligned with the build-ruby
agent's system prompt. The adapter should internalize:
- `# frozen_string_literal: true` on all new files
- Symbols over strings for hash keys
- Guard clauses, Result pattern in service objects
- No monkey-patching unless project already does it
- ViewComponent, concerns, service objects, scopes
- standardrb/rubocop → rspec/minitest verification cycle
## Data Sources
### Session extraction (~3060 examples)
Ruby sessions are sparse. Classify by:
- File paths: `.rb`, `.erb`, `.haml`, `Gemfile`, `Rakefile`, `.ruby-version`
- Bash commands: `bundle`, `rails`, `rake`, `rspec`, `rubocop`, `standardrb`
- Error patterns: `NoMethodError`, `NameError`, Rails backtrace format
### Git repo mining (~3050 examples)
Target repos:
- Any Rails projects in madcat-os org or personal repos
- Gem libraries if any exist
Extract patterns:
- ActiveRecord model definitions (validations, associations, scopes)
- Controller implementations (strong params, error handling)
- Service object patterns
- Migration files
- RSpec test patterns
### Synthetic (~60120 examples)
Ruby/Rails is the most synthetic-heavy adapter due to sparse session data:
- Rails 8 + Hotwire/Turbo patterns
- Service object with Result pattern
- ActiveRecord query optimization (N+1, eager loading)
- Concern extraction and composition
- API-only Rails setup
- Background job patterns (Sidekiq, Solid Queue)
- ViewComponent implementations
- RSpec: let/subject/shared_examples patterns
## Example Format
```json
{
"messages": [
{
"role": "system",
"content": "<build-ruby agent system prompt>"
},
{
"role": "user",
"content": "Create a service object for user registration with email verification"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "read",
"arguments": {"filePath": "app/services"}
}
}
]
},
{
"role": "tool",
"content": "<directory listing showing existing service pattern>"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "write",
"arguments": {
"filePath": "app/services/users/register.rb",
"content": "# frozen_string_literal: true\n\nmodule Users\n class Register\n Result = Data.define(:user, :errors)\n ..."
}
}
}
]
},
{
"role": "tool",
"content": "File written"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "bash",
"arguments": {"command": "bundle exec standardrb app/services/users/register.rb"}
}
}
]
},
{
"role": "tool",
"content": "0 offenses"
},
{
"role": "assistant",
"content": "Service object created. Uses Result pattern (Data.define), guard clauses, frozen string literal. Standardrb clean."
}
]
}
```
## Evaluation Criteria
1. All files have `# frozen_string_literal: true`
2. Symbol keys in hashes (not strings)
3. Guard clauses for early returns
4. Service objects use Result/value objects, not raised exceptions
5. `bundle exec standardrb` or `rubocop` passes
6. Rails conventions: scopes over class methods, concerns for shared behavior
7. Tool call sequence: explore → read → implement → lint → test
## Training Config Overrides
```python
MAX_SEQ = 8192
LR = 5e-5
```
## Estimated Size
- 100200 examples (synthetic-heavy)
- ~0.8M tokens
- Training time: ~1 hr on H100
- Adapter size: ~305 MB
## Risk: Synthetic Quality
With >50% synthetic data, risk of hallucinated gem names or outdated Rails patterns.
Mitigation: curate synthetic examples manually, verify all gem references exist,
test generated code against a real Rails 8 scaffold.