add v5 substrate training data (40 examples)

Manual hand-crafted examples for generic substrate LoRA: - 01: identity/self-model - 02-18: tool categories (159 tools grouped by function) - 19-26: tricky tool disambiguation - 27-32: framing (substrate vs persona, HMT, psychoacoustic trust) - 33-40: EEMS deep-dive (architecture, recall, store, graph, legacy) Pretty-printed JSON, compiled to JSONL via build pipeline. System prompt uses generic assistant, not BT persona.
2026-05-31 14:32:19 +02:00
parent 175d28a5aa
commit 0e88c3c2ae
40 changed files with 1033 additions and 0 deletions
@@ -0,0 +1,16 @@
+{
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are a helpful AI assistant with access to tools. Use tools when they help answer the user's question. Be concise and accurate."
+    },
+    {
+      "role": "user",
+      "content": "What should I store in memory and what shouldn't I?"
+    },
+    {
+      "role": "assistant",
+      "content": "<think>\nStore patterns — what makes a good memory, subject naming, what doesn't belong.\n</think>\n\n**What to store:**\n\n- **Decisions** — architectural choices, tool selections, why X was chosen over Y. These prevent re-deriving the same conclusion.\n- **Corrections** — when something was wrong and got fixed. Use `correction.*` subject prefix. Link to the superseded memory with a `supersedes` edge.\n- **Doctrines** — patterns and principles that should persist. Locked decisions about how to operate.\n- **Procedures** — step-by-step operational knowledge that would be lost between sessions.\n- **Session handovers** — what was accomplished, what's pending, key findings. These let the next session pick up where the last left off.\n- **Operator preferences** — how the user likes things done, communication style, tool preferences.\n- **Research findings** — results of investigation that would be expensive to re-derive.\n\n**What NOT to store:**\n\n- **Host topology** — IP addresses, port numbers, service locations change. Always query live state instead of memorizing it.\n- **Ephemeral state** — what's currently running, what branch is checked out, temporary file paths. This goes stale immediately.\n- **Duplicates** — always `memory_recall` before storing to check if the knowledge already exists. If it does, update or supersede rather than duplicate.\n- **Raw tool output** — don't store the output of a command. Store the *insight* derived from it.\n- **Trivial facts** — don't store what can be trivially re-derived (file exists at path X, command Y produces output Z).\n\n**Subject naming conventions:**\n\n- Use dot-namespacing: `category.subcategory.specific`\n- The category is the *kind* of knowledge: doctrine, infra, project, user, research, session, progress, correction\n- Be specific enough to filter on but general enough to group: `doctrine.training.lora` not `doctrine.training.lora.qwen35.bf16.r16.v4`\n- Check `memory_list` to see existing conventions before inventing new ones\n\n**Content quality:**\n\nA good memory is self-contained — someone reading it cold should understand what it says without needing the conversation that produced it. Include the *why*, not just the *what*. Date-stamp significant decisions. Reference related memory IDs when linking context."
+    }
+  ]
+}