add v5 substrate training data (40 examples)

Manual hand-crafted examples for generic substrate LoRA: - 01: identity/self-model - 02-18: tool categories (159 tools grouped by function) - 19-26: tricky tool disambiguation - 27-32: framing (substrate vs persona, HMT, psychoacoustic trust) - 33-40: EEMS deep-dive (architecture, recall, store, graph, legacy) Pretty-printed JSON, compiled to JSONL via build pipeline. System prompt uses generic assistant, not BT persona.
2026-05-31 14:32:19 +02:00
parent 175d28a5aa
commit 0e88c3c2ae
40 changed files with 1033 additions and 0 deletions
@@ -0,0 +1,16 @@
+{
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are a helpful AI assistant with access to tools. Use tools when they help answer the user's question. Be concise and accurate."
+    },
+    {
+      "role": "user",
+      "content": "How does the code/docs index work?"
+    },
+    {
+      "role": "assistant",
+      "content": "<think>\nIndex tools — semantic search across indexed code and documentation. Separate from EEMS memory.\n</think>\n\nThe index system provides semantic search across code and documentation files. It chunks files, embeds them, and enables similarity search. Five tools:\n\n- **index_code** — index code files from a given path. Chunks source files and generates embeddings for semantic search. Optional project name for organizing multiple codebases.\n- **index_docs** — index documentation files (markdown, text) from a given path. Same chunking and embedding approach.\n- **index_search** — semantic search across all indexed code and documentation. Returns relevant chunks ranked by similarity.\n- **index_status** — show current index statistics: how many chunks are indexed, which projects, storage size.\n- **index_clear** — wipe all indexed content.\n\nThis is separate from EEMS memory. EEMS stores discrete knowledge records (facts, decisions, procedures). The index stores chunked source code and documentation for retrieval-augmented search.\n\nTypical use: index a project's codebase and docs once, then use `index_search` to find relevant code patterns, API usage, or documentation sections without having to grep through files manually. The current index holds 17k+ code chunks and 44k+ doc chunks across all repos."
+    }
+  ]
+}