Files

151 lines
3.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Serpent — Python Specialist LoRA
Adapter codename: `serpent`
Agent: `build-python`
Base model: Qwen/Qwen3.5-27B
## Objective
Teach the model Python-idiomatic code generation aligned with the build-python
agent's system prompt. The adapter should internalize:
- Type hints on all function signatures (params + return)
- `pathlib.Path` over `os.path`
- `uv` for dependency management, `pyproject.toml` as source of truth
- Pydantic v2 patterns (`.model_dump()`, not `.dict()`)
- `pytest` + `pytest-asyncio` for testing
- `ruff` for lint + format
- ruff → pytest → mypy verification cycle
## Data Sources
### Session extraction (~100200 examples)
Classify from opencode `build` sessions by:
- File paths: `.py`, `pyproject.toml`, `requirements.txt`, `setup.py`
- Bash commands: `python`, `pip`, `uv`, `pytest`, `ruff`, `mypy`
- Error patterns: `SyntaxError`, `TypeError`, `ImportError`, traceback format
- Framework detection: FastAPI, Django, Flask imports
### Git repo mining (~50100 examples)
Target repos:
- `madcat-os/lora` — training scripts (this repo)
- Automation scripts in `~/.config/opencode/scripts/`
- Any Python projects in madcat-os org
Extract patterns:
- Data processing scripts (JSONL, datasets)
- ML training pipelines (unsloth, torch, datasets)
- CLI tools and automation
- FastAPI endpoint implementations
### Synthetic (~50100 examples)
- async/await with `asyncio` patterns
- Pydantic v2 model definitions with validators
- `pathlib` file operations
- Type-narrowing with `TypeGuard`, `assert_never`
- Dataclass vs Pydantic decision points
- `uv` project setup and dependency resolution
## Example Format
```json
{
"messages": [
{
"role": "system",
"content": "<build-python agent system prompt>"
},
{
"role": "user",
"content": "Add a CLI argument to specify output format (json/csv) in the extraction script"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "read",
"arguments": {"filePath": "extract.py"}
}
}
]
},
{
"role": "tool",
"content": "<existing script>"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "edit",
"arguments": {
"filePath": "extract.py",
"oldString": "def main():",
"newString": "def main(output_format: Literal[\"json\", \"csv\"] = \"json\") -> None:"
}
}
}
]
},
{
"role": "tool",
"content": "Edit applied"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "bash",
"arguments": {"command": "ruff check extract.py && ruff format --check extract.py"}
}
}
]
},
{
"role": "tool",
"content": "All checks passed"
},
{
"role": "assistant",
"content": "Added `--format` CLI arg via `argparse`. Type-hinted with `Literal[\"json\", \"csv\"]`. Ruff clean."
}
]
}
```
## Evaluation Criteria
1. All functions have type hints (params and return)
2. Uses `pathlib.Path`, not `os.path`
3. `ruff check` and `ruff format --check` pass
4. `pytest` tests pass
5. Pydantic v2 patterns (no v1 `.dict()`, `.json()`)
6. No `requirements.txt` — uses `pyproject.toml` + `uv`
7. Tool call sequence: read → edit → lint → test
## Training Config Overrides
```python
MAX_SEQ = 8192
LR = 5e-5
```
## Estimated Size
- 200400 examples
- ~1.5M tokens
- Training time: ~1.5 hrs on H100
- Adapter size: ~305 MB