lora/docs/specialists/serpent.md

# Serpent — Python Specialist LoRA

Adapter codename: `serpent`
Agent: `build-python`
Base model: Qwen/Qwen3.5-27B

## Objective

Teach the model Python-idiomatic code generation aligned with the build-python
agent's system prompt. The adapter should internalize:

- Type hints on all function signatures (params + return)
- `pathlib.Path` over `os.path`
- `uv` for dependency management, `pyproject.toml` as source of truth
- Pydantic v2 patterns (`.model_dump()`, not `.dict()`)
- `pytest` + `pytest-asyncio` for testing
- `ruff` for lint + format
- ruff → pytest → mypy verification cycle

## Data Sources

### Session extraction (~100–200 examples)

Classify from opencode `build` sessions by:
- File paths: `.py`, `pyproject.toml`, `requirements.txt`, `setup.py`
- Bash commands: `python`, `pip`, `uv`, `pytest`, `ruff`, `mypy`
- Error patterns: `SyntaxError`, `TypeError`, `ImportError`, traceback format
- Framework detection: FastAPI, Django, Flask imports

### Git repo mining (~50–100 examples)

Target repos:
- `madcat-os/lora` — training scripts (this repo)
- Automation scripts in `~/.config/opencode/scripts/`
- Any Python projects in madcat-os org

Extract patterns:
- Data processing scripts (JSONL, datasets)
- ML training pipelines (unsloth, torch, datasets)
- CLI tools and automation
- FastAPI endpoint implementations

### Synthetic (~50–100 examples)

- async/await with `asyncio` patterns
- Pydantic v2 model definitions with validators
- `pathlib` file operations
- Type-narrowing with `TypeGuard`, `assert_never`
- Dataclass vs Pydantic decision points
- `uv` project setup and dependency resolution

## Example Format

```json
{
  "messages": [
    {
      "role": "system",
      "content": "<build-python agent system prompt>"
    },
    {
      "role": "user",
      "content": "Add a CLI argument to specify output format (json/csv) in the extraction script"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "read",
            "arguments": {"filePath": "extract.py"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "<existing script>"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "edit",
            "arguments": {
              "filePath": "extract.py",
              "oldString": "def main():",
              "newString": "def main(output_format: Literal[\"json\", \"csv\"] = \"json\") -> None:"
            }
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "Edit applied"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "bash",
            "arguments": {"command": "ruff check extract.py && ruff format --check extract.py"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "All checks passed"
    },
    {
      "role": "assistant",
      "content": "Added `--format` CLI arg via `argparse`. Type-hinted with `Literal[\"json\", \"csv\"]`. Ruff clean."
    }
  ]
}
```

## Evaluation Criteria

1. All functions have type hints (params and return)
2. Uses `pathlib.Path`, not `os.path`
3. `ruff check` and `ruff format --check` pass
4. `pytest` tests pass
5. Pydantic v2 patterns (no v1 `.dict()`, `.json()`)
6. No `requirements.txt` — uses `pyproject.toml` + `uv`
7. Tool call sequence: read → edit → lint → test

## Training Config Overrides

```python
MAX_SEQ = 8192
LR      = 5e-5
```

## Estimated Size

- 200–400 examples
- ~1.5M tokens
- Training time: ~1.5 hrs on H100
- Adapter size: ~305 MB