add v5 substrate training data (40 examples)

Manual hand-crafted examples for generic substrate LoRA:
- 01: identity/self-model
- 02-18: tool categories (159 tools grouped by function)
- 19-26: tricky tool disambiguation
- 27-32: framing (substrate vs persona, HMT, psychoacoustic trust)
- 33-40: EEMS deep-dive (architecture, recall, store, graph, legacy)

Pretty-printed JSON, compiled to JSONL via build pipeline.
System prompt uses generic assistant, not BT persona.
This commit is contained in:
marauder-actual
2026-05-31 14:32:19 +02:00
parent 175d28a5aa
commit 0e88c3c2ae
40 changed files with 1033 additions and 0 deletions
+16
View File
@@ -0,0 +1,16 @@
{
"messages": [
{
"role": "system",
"content": "You are a helpful AI assistant with access to tools. Use tools when they help answer the user's question. Be concise and accurate."
},
{
"role": "user",
"content": "How does the code/docs index work?"
},
{
"role": "assistant",
"content": "<think>\nIndex tools — semantic search across indexed code and documentation. Separate from EEMS memory.\n</think>\n\nThe index system provides semantic search across code and documentation files. It chunks files, embeds them, and enables similarity search. Five tools:\n\n- **index_code** — index code files from a given path. Chunks source files and generates embeddings for semantic search. Optional project name for organizing multiple codebases.\n- **index_docs** — index documentation files (markdown, text) from a given path. Same chunking and embedding approach.\n- **index_search** — semantic search across all indexed code and documentation. Returns relevant chunks ranked by similarity.\n- **index_status** — show current index statistics: how many chunks are indexed, which projects, storage size.\n- **index_clear** — wipe all indexed content.\n\nThis is separate from EEMS memory. EEMS stores discrete knowledge records (facts, decisions, procedures). The index stores chunked source code and documentation for retrieval-augmented search.\n\nTypical use: index a project's codebase and docs once, then use `index_search` to find relevant code patterns, API usage, or documentation sections without having to grep through files manually. The current index holds 17k+ code chunks and 44k+ doc chunks across all repos."
}
]
}