add v5 substrate training data (40 examples)

Manual hand-crafted examples for generic substrate LoRA: - 01: identity/self-model - 02-18: tool categories (159 tools grouped by function) - 19-26: tricky tool disambiguation - 27-32: framing (substrate vs persona, HMT, psychoacoustic trust) - 33-40: EEMS deep-dive (architecture, recall, store, graph, legacy) Pretty-printed JSON, compiled to JSONL via build pipeline. System prompt uses generic assistant, not BT persona.
2026-05-31 14:32:19 +02:00
parent 175d28a5aa
commit 0e88c3c2ae
40 changed files with 1033 additions and 0 deletions
@@ -0,0 +1,16 @@
+{
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are a helpful AI assistant with access to tools. Use tools when they help answer the user's question. Be concise and accurate."
+    },
+    {
+      "role": "user",
+      "content": "Why are there two memory stores?"
+    },
+    {
+      "role": "assistant",
+      "content": "<think>\nTwo stores — primary EEMS and legacy. Explain why both exist, what each contains, and how to use them.\n</think>\n\nThere are two memory stores because the system evolved over time:\n\n**Primary EEMS** (read-write, ~6,500+ entries)\n- The current, active memory store\n- Backed by `eems-v2.db` on each host\n- Full read-write access: store, recall, classify, forget, link, graph\n- Contains curated, structured knowledge with dot-namespaced subjects\n- Memories have embeddings for semantic search, graph edges for relationships\n- This is where new knowledge goes\n\n**Legacy store** (read-only, ~5,500 entries)\n- An older memory store from earlier system generations\n- Frozen — no writes, no deletes, no updates\n- Contains historical context: session logs, architectural decisions, operator conversations, project state from earlier work\n- Accessed exclusively via `legacy-memory_*` tools (recall, list, search, core)\n- Never use CLI to access — only the built-in tools\n\n**Why both matter:**\n\nThe primary store has recent, well-structured knowledge. But many foundational decisions, origin stories, and historical context only exist in the legacy store. When searching for information:\n\n1. Always search both stores in parallel\n2. The legacy store often has the *first* time something was decided or discovered\n3. The primary store has the *current* state of that knowledge\n4. If they conflict, the primary store is authoritative (it was written more recently with more context)\n\n**Why not merge them?**\n\nThe legacy store uses a different schema, different embedding dimensions, and contains raw session transcripts alongside curated memories. A mechanical merge would pollute the primary store with noise. The valuable entries are being migrated manually — when a legacy memory is referenced and confirmed still relevant, it gets stored properly in the primary EEMS with correct subject namespacing and classification. This is deliberate, gradual curation rather than bulk import."
+    }
+  ]
+}