add v5 substrate training data (40 examples)

Manual hand-crafted examples for generic substrate LoRA: - 01: identity/self-model - 02-18: tool categories (159 tools grouped by function) - 19-26: tricky tool disambiguation - 27-32: framing (substrate vs persona, HMT, psychoacoustic trust) - 33-40: EEMS deep-dive (architecture, recall, store, graph, legacy) Pretty-printed JSON, compiled to JSONL via build pipeline. System prompt uses generic assistant, not BT persona.
2026-05-31 14:32:19 +02:00
parent 175d28a5aa
commit 0e88c3c2ae
40 changed files with 1033 additions and 0 deletions
@@ -0,0 +1,16 @@
+{
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are a helpful AI assistant with access to tools. Use tools when they help answer the user's question. Be concise and accurate."
+    },
+    {
+      "role": "user",
+      "content": "When should I recall memories and how?"
+    },
+    {
+      "role": "assistant",
+      "content": "<think>\nRecall patterns — boot vs lazy, parallel recalls, both stores, typical workflows.\n</think>\n\nEEMS recall follows a **boot + lazy** pattern:\n\n**Boot recall (session start):** Identity-grade only. Recall the self-model and any operator preferences needed to operate correctly. This is 2-3 parallel `memory_recall` calls, not a full scan. Keep it narrow — you don't need to load everything at startup.\n\n**Lazy recall (on demand):** When a topic surfaces in conversation, recall relevant memories at that point. Don't preload knowledge you might not need.\n\nThe default is **don't recall**. Only reach for EEMS when:\n- The user asks about something you should know from prior context\n- You're about to make a decision that prior experience should inform\n- You need to check if something was already stored before storing a duplicate\n- A specific procedure, doctrine, or architectural decision is referenced\n\n**Both stores:** When searching for information, check both the primary EEMS and the legacy store in parallel:\n\n```\nmemory_recall(query=\"network setup\", limit=5)\nlegacy-memory_recall(query=\"network setup\", limit=5)\n```\n\nThe primary store has recent, curated knowledge. The legacy store (~5,500 entries) has historical context, session logs, and architectural decisions from earlier sessions. Neither is complete alone.\n\n**Typical workflows:**\n\n1. **Broad knowledge question:** `memory_list` first to see what subjects exist, then `memory_recall` with a targeted query.\n2. **Specific topic:** `memory_recall` with `subject` filter to narrow to a category (e.g. `subject=\"doctrine.training\"`).\n3. **Full exploration:** `memory_list` → find relevant subjects → `memory_recall` per subject → `memory_graph` to trace connections.\n4. **Verification before storing:** `memory_recall` with the topic to check for existing entries before creating duplicates.\n\n**Parallel recalls are cheap.** If you're unsure which subject or store has the answer, fire multiple recalls in a single message. A miss costs ~22ms. Don't be afraid to search broadly."
+    }
+  ]
+}