Serpent — Python Specialist LoRA

Adapter codename: serpent Agent: build-python Base model: Qwen/Qwen3.5-27B

Objective

Teach the model Python-idiomatic code generation aligned with the build-python agent's system prompt. The adapter should internalize:

Type hints on all function signatures (params + return)
pathlib.Path over os.path
uv for dependency management, pyproject.toml as source of truth
Pydantic v2 patterns (.model_dump(), not .dict())
pytest + pytest-asyncio for testing
ruff for lint + format
ruff → pytest → mypy verification cycle

Data Sources

Session extraction (~100–200 examples)

Classify from opencode build sessions by:

File paths: .py, pyproject.toml, requirements.txt, setup.py
Bash commands: python, pip, uv, pytest, ruff, mypy
Error patterns: SyntaxError, TypeError, ImportError, traceback format
Framework detection: FastAPI, Django, Flask imports

Git repo mining (~50–100 examples)

Target repos:

madcat-os/lora — training scripts (this repo)
Automation scripts in ~/.config/opencode/scripts/
Any Python projects in madcat-os org

Extract patterns:

Data processing scripts (JSONL, datasets)
ML training pipelines (unsloth, torch, datasets)
CLI tools and automation
FastAPI endpoint implementations

Synthetic (~50–100 examples)

async/await with asyncio patterns
Pydantic v2 model definitions with validators
pathlib file operations
Type-narrowing with TypeGuard, assert_never
Dataclass vs Pydantic decision points
uv project setup and dependency resolution

Example Format

{
  "messages": [
    {
      "role": "system",
      "content": "<build-python agent system prompt>"
    },
    {
      "role": "user",
      "content": "Add a CLI argument to specify output format (json/csv) in the extraction script"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "read",
            "arguments": {"filePath": "extract.py"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "<existing script>"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "edit",
            "arguments": {
              "filePath": "extract.py",
              "oldString": "def main():",
              "newString": "def main(output_format: Literal[\"json\", \"csv\"] = \"json\") -> None:"
            }
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "Edit applied"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "bash",
            "arguments": {"command": "ruff check extract.py && ruff format --check extract.py"}
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "All checks passed"
    },
    {
      "role": "assistant",
      "content": "Added `--format` CLI arg via `argparse`. Type-hinted with `Literal[\"json\", \"csv\"]`. Ruff clean."
    }
  ]
}

Evaluation Criteria

All functions have type hints (params and return)
Uses pathlib.Path, not os.path
ruff check and ruff format --check pass
pytest tests pass
Pydantic v2 patterns (no v1 .dict(), .json())
No requirements.txt — uses pyproject.toml + uv
Tool call sequence: read → edit → lint → test

Training Config Overrides

MAX_SEQ = 8192
LR      = 5e-5

Estimated Size

200–400 examples
~1.5M tokens
Training time: ~1.5 hrs on H100
Adapter size: ~305 MB

3.7 KiB Raw Permalink Blame History Unescape Escape