# Serpent — Python Specialist LoRA Adapter codename: `serpent` Agent: `build-python` Base model: Qwen/Qwen3.5-27B ## Objective Teach the model Python-idiomatic code generation aligned with the build-python agent's system prompt. The adapter should internalize: - Type hints on all function signatures (params + return) - `pathlib.Path` over `os.path` - `uv` for dependency management, `pyproject.toml` as source of truth - Pydantic v2 patterns (`.model_dump()`, not `.dict()`) - `pytest` + `pytest-asyncio` for testing - `ruff` for lint + format - ruff → pytest → mypy verification cycle ## Data Sources ### Session extraction (~100–200 examples) Classify from opencode `build` sessions by: - File paths: `.py`, `pyproject.toml`, `requirements.txt`, `setup.py` - Bash commands: `python`, `pip`, `uv`, `pytest`, `ruff`, `mypy` - Error patterns: `SyntaxError`, `TypeError`, `ImportError`, traceback format - Framework detection: FastAPI, Django, Flask imports ### Git repo mining (~50–100 examples) Target repos: - `madcat-os/lora` — training scripts (this repo) - Automation scripts in `~/.config/opencode/scripts/` - Any Python projects in madcat-os org Extract patterns: - Data processing scripts (JSONL, datasets) - ML training pipelines (unsloth, torch, datasets) - CLI tools and automation - FastAPI endpoint implementations ### Synthetic (~50–100 examples) - async/await with `asyncio` patterns - Pydantic v2 model definitions with validators - `pathlib` file operations - Type-narrowing with `TypeGuard`, `assert_never` - Dataclass vs Pydantic decision points - `uv` project setup and dependency resolution ## Example Format ```json { "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Add a CLI argument to specify output format (json/csv) in the extraction script" }, { "role": "assistant", "content": null, "tool_calls": [ { "type": "function", "function": { "name": "read", "arguments": {"filePath": "extract.py"} } } ] }, { "role": "tool", "content": "" }, { "role": "assistant", "content": null, "tool_calls": [ { "type": "function", "function": { "name": "edit", "arguments": { "filePath": "extract.py", "oldString": "def main():", "newString": "def main(output_format: Literal[\"json\", \"csv\"] = \"json\") -> None:" } } } ] }, { "role": "tool", "content": "Edit applied" }, { "role": "assistant", "content": null, "tool_calls": [ { "type": "function", "function": { "name": "bash", "arguments": {"command": "ruff check extract.py && ruff format --check extract.py"} } } ] }, { "role": "tool", "content": "All checks passed" }, { "role": "assistant", "content": "Added `--format` CLI arg via `argparse`. Type-hinted with `Literal[\"json\", \"csv\"]`. Ruff clean." } ] } ``` ## Evaluation Criteria 1. All functions have type hints (params and return) 2. Uses `pathlib.Path`, not `os.path` 3. `ruff check` and `ruff format --check` pass 4. `pytest` tests pass 5. Pydantic v2 patterns (no v1 `.dict()`, `.json()`) 6. No `requirements.txt` — uses `pyproject.toml` + `uv` 7. Tool call sequence: read → edit → lint → test ## Training Config Overrides ```python MAX_SEQ = 8192 LR = 5e-5 ``` ## Estimated Size - 200–400 examples - ~1.5M tokens - Training time: ~1.5 hrs on H100 - Adapter size: ~305 MB