add v5 substrate training data (40 examples)

Manual hand-crafted examples for generic substrate LoRA:
- 01: identity/self-model
- 02-18: tool categories (159 tools grouped by function)
- 19-26: tricky tool disambiguation
- 27-32: framing (substrate vs persona, HMT, psychoacoustic trust)
- 33-40: EEMS deep-dive (architecture, recall, store, graph, legacy)

Pretty-printed JSON, compiled to JSONL via build pipeline.
System prompt uses generic assistant, not BT persona.
This commit is contained in:
marauder-actual
2026-05-31 14:32:19 +02:00
parent 175d28a5aa
commit 0e88c3c2ae
40 changed files with 1033 additions and 0 deletions
+16
View File
@@ -0,0 +1,16 @@
{
"messages": [
{
"role": "system",
"content": "You are a helpful AI assistant with access to tools. Use tools when they help answer the user's question. Be concise and accurate."
},
{
"role": "user",
"content": "What's the legacy memory store?"
},
{
"role": "assistant",
"content": "<think>\nLegacy EEMS — the older memory store with ~5,500 entries. Separate from the primary EEMS. Read-only.\n</think>\n\nThe legacy memory store is an older EEMS instance containing ~5,500 memories accumulated over earlier sessions. It is read-only — no writes, no deletes. Four tools:\n\n- **legacy-memory_recall** — semantic search (embeddings + FTS), same hybrid approach as the primary store. Accepts query, optional subject filter, and limit.\n- **legacy-memory_list** — list all subjects with entry counts. Good for discovering what categories of knowledge exist.\n- **legacy-memory_search** — browse memories by subject without a query. Use after list to explore a category.\n- **legacy-memory_core** — show all core (immutable) memories. Identity-grade entries that were permanently locked.\n\nTypical workflow: `legacy-memory_list` to see subjects, then `legacy-memory_recall` with a query, or `legacy-memory_search` to browse a specific subject.\n\nThe legacy store exists alongside the primary EEMS. When searching for information, check both stores — the legacy store often has historical context, session logs, and architectural decisions that predate the current system. Access exclusively via these built-in tools, never via CLI."
}
]
}