add docs: system lora plan, specialist specs, training review
This commit is contained in:
@@ -0,0 +1,156 @@
|
||||
# Forge — Ruby Specialist LoRA
|
||||
|
||||
Adapter codename: `forge`
|
||||
Agent: `build-ruby`
|
||||
Base model: Qwen/Qwen3.5-27B
|
||||
|
||||
## Objective
|
||||
|
||||
Teach the model Ruby/Rails-idiomatic code generation aligned with the build-ruby
|
||||
agent's system prompt. The adapter should internalize:
|
||||
|
||||
- `# frozen_string_literal: true` on all new files
|
||||
- Symbols over strings for hash keys
|
||||
- Guard clauses, Result pattern in service objects
|
||||
- No monkey-patching unless project already does it
|
||||
- ViewComponent, concerns, service objects, scopes
|
||||
- standardrb/rubocop → rspec/minitest verification cycle
|
||||
|
||||
## Data Sources
|
||||
|
||||
### Session extraction (~30–60 examples)
|
||||
|
||||
Ruby sessions are sparse. Classify by:
|
||||
- File paths: `.rb`, `.erb`, `.haml`, `Gemfile`, `Rakefile`, `.ruby-version`
|
||||
- Bash commands: `bundle`, `rails`, `rake`, `rspec`, `rubocop`, `standardrb`
|
||||
- Error patterns: `NoMethodError`, `NameError`, Rails backtrace format
|
||||
|
||||
### Git repo mining (~30–50 examples)
|
||||
|
||||
Target repos:
|
||||
- Any Rails projects in madcat-os org or personal repos
|
||||
- Gem libraries if any exist
|
||||
|
||||
Extract patterns:
|
||||
- ActiveRecord model definitions (validations, associations, scopes)
|
||||
- Controller implementations (strong params, error handling)
|
||||
- Service object patterns
|
||||
- Migration files
|
||||
- RSpec test patterns
|
||||
|
||||
### Synthetic (~60–120 examples)
|
||||
|
||||
Ruby/Rails is the most synthetic-heavy adapter due to sparse session data:
|
||||
- Rails 8 + Hotwire/Turbo patterns
|
||||
- Service object with Result pattern
|
||||
- ActiveRecord query optimization (N+1, eager loading)
|
||||
- Concern extraction and composition
|
||||
- API-only Rails setup
|
||||
- Background job patterns (Sidekiq, Solid Queue)
|
||||
- ViewComponent implementations
|
||||
- RSpec: let/subject/shared_examples patterns
|
||||
|
||||
## Example Format
|
||||
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "<build-ruby agent system prompt>"
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Create a service object for user registration with email verification"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": null,
|
||||
"tool_calls": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read",
|
||||
"arguments": {"filePath": "app/services"}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"content": "<directory listing showing existing service pattern>"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": null,
|
||||
"tool_calls": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "write",
|
||||
"arguments": {
|
||||
"filePath": "app/services/users/register.rb",
|
||||
"content": "# frozen_string_literal: true\n\nmodule Users\n class Register\n Result = Data.define(:user, :errors)\n ..."
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"content": "File written"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": null,
|
||||
"tool_calls": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "bash",
|
||||
"arguments": {"command": "bundle exec standardrb app/services/users/register.rb"}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"content": "0 offenses"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "Service object created. Uses Result pattern (Data.define), guard clauses, frozen string literal. Standardrb clean."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Evaluation Criteria
|
||||
|
||||
1. All files have `# frozen_string_literal: true`
|
||||
2. Symbol keys in hashes (not strings)
|
||||
3. Guard clauses for early returns
|
||||
4. Service objects use Result/value objects, not raised exceptions
|
||||
5. `bundle exec standardrb` or `rubocop` passes
|
||||
6. Rails conventions: scopes over class methods, concerns for shared behavior
|
||||
7. Tool call sequence: explore → read → implement → lint → test
|
||||
|
||||
## Training Config Overrides
|
||||
|
||||
```python
|
||||
MAX_SEQ = 8192
|
||||
LR = 5e-5
|
||||
```
|
||||
|
||||
## Estimated Size
|
||||
|
||||
- 100–200 examples (synthetic-heavy)
|
||||
- ~0.8M tokens
|
||||
- Training time: ~1 hr on H100
|
||||
- Adapter size: ~305 MB
|
||||
|
||||
## Risk: Synthetic Quality
|
||||
|
||||
With >50% synthetic data, risk of hallucinated gem names or outdated Rails patterns.
|
||||
Mitigation: curate synthetic examples manually, verify all gem references exist,
|
||||
test generated code against a real Rails 8 scaffold.
|
||||
Reference in New Issue
Block a user