3.7 KiB
3.7 KiB
Serpent — Python Specialist LoRA
Adapter codename: serpent
Agent: build-python
Base model: Qwen/Qwen3.5-27B
Objective
Teach the model Python-idiomatic code generation aligned with the build-python agent's system prompt. The adapter should internalize:
- Type hints on all function signatures (params + return)
pathlib.Pathoveros.pathuvfor dependency management,pyproject.tomlas source of truth- Pydantic v2 patterns (
.model_dump(), not.dict()) pytest+pytest-asynciofor testingrufffor lint + format- ruff → pytest → mypy verification cycle
Data Sources
Session extraction (~100–200 examples)
Classify from opencode build sessions by:
- File paths:
.py,pyproject.toml,requirements.txt,setup.py - Bash commands:
python,pip,uv,pytest,ruff,mypy - Error patterns:
SyntaxError,TypeError,ImportError, traceback format - Framework detection: FastAPI, Django, Flask imports
Git repo mining (~50–100 examples)
Target repos:
madcat-os/lora— training scripts (this repo)- Automation scripts in
~/.config/opencode/scripts/ - Any Python projects in madcat-os org
Extract patterns:
- Data processing scripts (JSONL, datasets)
- ML training pipelines (unsloth, torch, datasets)
- CLI tools and automation
- FastAPI endpoint implementations
Synthetic (~50–100 examples)
- async/await with
asynciopatterns - Pydantic v2 model definitions with validators
pathlibfile operations- Type-narrowing with
TypeGuard,assert_never - Dataclass vs Pydantic decision points
uvproject setup and dependency resolution
Example Format
{
"messages": [
{
"role": "system",
"content": "<build-python agent system prompt>"
},
{
"role": "user",
"content": "Add a CLI argument to specify output format (json/csv) in the extraction script"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "read",
"arguments": {"filePath": "extract.py"}
}
}
]
},
{
"role": "tool",
"content": "<existing script>"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "edit",
"arguments": {
"filePath": "extract.py",
"oldString": "def main():",
"newString": "def main(output_format: Literal[\"json\", \"csv\"] = \"json\") -> None:"
}
}
}
]
},
{
"role": "tool",
"content": "Edit applied"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "bash",
"arguments": {"command": "ruff check extract.py && ruff format --check extract.py"}
}
}
]
},
{
"role": "tool",
"content": "All checks passed"
},
{
"role": "assistant",
"content": "Added `--format` CLI arg via `argparse`. Type-hinted with `Literal[\"json\", \"csv\"]`. Ruff clean."
}
]
}
Evaluation Criteria
- All functions have type hints (params and return)
- Uses
pathlib.Path, notos.path ruff checkandruff format --checkpasspytesttests pass- Pydantic v2 patterns (no v1
.dict(),.json()) - No
requirements.txt— usespyproject.toml+uv - Tool call sequence: read → edit → lint → test
Training Config Overrides
MAX_SEQ = 8192
LR = 5e-5
Estimated Size
- 200–400 examples
- ~1.5M tokens
- Training time: ~1.5 hrs on H100
- Adapter size: ~305 MB