tensors/models-db.md

# Models Database Implementation

SQLite database (`models.db`) for local model metadata storage and CivitAI model information cache.

## Dual Purpose

1. **Metadata Storage** — Track local safetensor files with their SHA256 hashes and link them to CivitAI model info
2. **Information Source** — Cache CivitAI API responses for offline queries, search, and model discovery

## Schema Overview

### Local Files Tracking

```sql
-- Your local safetensor files
local_files (
    id INTEGER PRIMARY KEY,
    file_path TEXT NOT NULL UNIQUE,      -- Absolute path to file
    sha256 TEXT NOT NULL,                 -- Full SHA256 hash
    header_size INTEGER,                  -- Safetensor header size in bytes
    tensor_count INTEGER,                 -- Number of tensors in file
    civitai_model_id INTEGER,             -- Links to models.civitai_id
    civitai_version_id INTEGER,           -- Links to model_versions.civitai_id
    created_at TEXT,
    updated_at TEXT
)

-- Key-value metadata extracted from safetensor headers
safetensor_metadata (
    id INTEGER PRIMARY KEY,
    local_file_id INTEGER NOT NULL,       -- FK to local_files.id
    key TEXT NOT NULL,
    value TEXT,
    UNIQUE(local_file_id, key)
)
```

### CivitAI Model Cache

```sql
-- Model creators
creators (id, username, image_url)

-- Models from CivitAI
models (
    id INTEGER PRIMARY KEY,
    civitai_id INTEGER UNIQUE NOT NULL,   -- CivitAI model ID
    name TEXT NOT NULL,
    description TEXT,                      -- HTML description
    type TEXT NOT NULL,                    -- Checkpoint, LORA, etc.
    nsfw INTEGER,
    creator_id INTEGER,                    -- FK to creators
    download_count INTEGER,
    thumbs_up_count INTEGER,
    ...
)

-- Model versions (each model has multiple versions)
model_versions (
    id INTEGER PRIMARY KEY,
    civitai_id INTEGER UNIQUE NOT NULL,   -- CivitAI version ID
    model_id INTEGER NOT NULL,            -- FK to models.id
    name TEXT NOT NULL,
    base_model TEXT,                       -- "SD 1.5", "SDXL 1.0", "Pony", etc.
    download_url TEXT,
    version_index INTEGER,                 -- 0 = latest
    ...
)

-- Trigger words for LoRAs
trained_words (version_id, word, position)

-- Downloadable files for each version
version_files (
    civitai_id INTEGER UNIQUE,
    version_id INTEGER,                    -- FK to model_versions
    name TEXT,
    size_kb REAL,
    format TEXT,                           -- safetensors, ckpt, etc.
    fp TEXT,                               -- fp16, fp32, bf16
    is_primary INTEGER,
    download_url TEXT
)

-- File hashes (SHA256, AutoV1, AutoV2, etc.)
file_hashes (file_id, hash_type, hash_value)
```

### Tags and Images

```sql
tags (id, name)
model_tags (model_id, tag_id)

-- Preview images with generation params
version_images (version_id, url, width, height, nsfw_level, ...)
image_generation_params (image_id, key, value)  -- prompt, sampler, cfg, etc.
image_resources (image_id, name, type, hash, weight)  -- LoRAs used in image
```

## Views

```sql
-- Models with their latest version info
v_models_with_latest:
    id, civitai_id, name, type, nsfw, creator, latest_version, base_model, download_count, thumbs_up_count

-- Local files with linked CivitAI info
v_local_files_full:
    file_path, sha256, model_name, model_type, version_name, base_model, creator
```

## Implementation Strategy

### 1. Scan Command (`tsr scan`)

Scan local model directories and populate `local_files`:

```python
def scan_models(directory: Path, db: Connection) -> None:
    """Scan directory for safetensor files and add to database."""
    for path in directory.rglob("*.safetensors"):
        sha256 = compute_sha256(path)
        header = read_safetensor_header(path)

        # Insert or update local_files
        db.execute("""
            INSERT INTO local_files (file_path, sha256, header_size, tensor_count)
            VALUES (?, ?, ?, ?)
            ON CONFLICT(file_path) DO UPDATE SET
                sha256 = excluded.sha256,
                updated_at = datetime('now')
        """, (str(path), sha256, header['size'], header['tensor_count']))

        # Store metadata
        for key, value in header['metadata'].items():
            db.execute("""
                INSERT INTO safetensor_metadata (local_file_id, key, value)
                VALUES (?, ?, ?)
                ON CONFLICT DO UPDATE SET value = excluded.value
            """, (file_id, key, json.dumps(value)))
```

### 2. Link Command (`tsr link`)

Match local files to CivitAI by hash lookup:

```python
def link_to_civitai(db: Connection, api_key: str | None) -> None:
    """Link local files to CivitAI models using hash matching."""
    unlinked = db.execute("""
        SELECT id, sha256 FROM local_files
        WHERE civitai_model_id IS NULL
    """).fetchall()

    for file_id, sha256 in unlinked:
        # Check local hash cache first
        version = db.execute("""
            SELECT mv.civitai_id, mv.model_id
            FROM file_hashes fh
            JOIN version_files vf ON fh.file_id = vf.id
            JOIN model_versions mv ON vf.version_id = mv.id
            WHERE fh.hash_value = ? AND fh.hash_type = 'SHA256'
        """, (sha256,)).fetchone()

        if not version:
            # Fall back to API lookup
            data = fetch_civitai_by_hash(sha256, api_key)
            if data:
                store_model_version(db, data)
                version = (data['id'], data['modelId'])

        if version:
            db.execute("""
                UPDATE local_files
                SET civitai_version_id = ?, civitai_model_id = ?
                WHERE id = ?
            """, (version[0], version[1], file_id))
```

### 3. Cache Command (`tsr cache`)

Fetch and store full model details from CivitAI:

```python
def cache_model(model_id: int, db: Connection, api_key: str | None) -> None:
    """Fetch and cache complete model data from CivitAI."""
    data = fetch_civitai_model(model_id, api_key)
    if not data:
        return

    # Upsert creator
    creator = data.get('creator', {})
    if creator:
        db.execute("""
            INSERT INTO creators (username, image_url) VALUES (?, ?)
            ON CONFLICT(username) DO UPDATE SET image_url = excluded.image_url
        """, (creator['username'], creator.get('image')))

    # Upsert model
    db.execute("""
        INSERT INTO models (civitai_id, name, description, type, nsfw, ...)
        VALUES (?, ?, ?, ?, ?, ...)
        ON CONFLICT(civitai_id) DO UPDATE SET ...
    """, ...)

    # Process versions, files, hashes, images, trained words
    for idx, version in enumerate(data.get('modelVersions', [])):
        store_version(db, model_id, version, version_index=idx)
```

### 4. Query Commands

**List local models with CivitAI info:**
```python
def list_local_models(db: Connection) -> list[dict]:
    """List all local files with their linked CivitAI metadata."""
    return db.execute("""
        SELECT * FROM v_local_files_full ORDER BY model_name
    """).fetchall()
```

**Search cached models:**
```python
def search_cached(query: str, model_type: str | None, db: Connection) -> list[dict]:
    """Search cached models without hitting the API."""
    sql = """
        SELECT m.*, mv.base_model, mv.download_url
        FROM models m
        JOIN model_versions mv ON mv.model_id = m.id AND mv.version_index = 0
        WHERE m.name LIKE ?
    """
    params = [f'%{query}%']

    if model_type:
        sql += " AND m.type = ?"
        params.append(model_type)

    return db.execute(sql, params).fetchall()
```

**Find trigger words for a local LoRA:**
```python
def get_trigger_words(file_path: str, db: Connection) -> list[str]:
    """Get trigger words for a local LoRA file."""
    return db.execute("""
        SELECT tw.word
        FROM trained_words tw
        JOIN model_versions mv ON tw.version_id = mv.id
        JOIN local_files lf ON lf.civitai_version_id = mv.civitai_id
        WHERE lf.file_path = ?
        ORDER BY tw.position
    """, (file_path,)).fetchall()
```

## Database Location

Following XDG conventions, the database should live at:

```python
from tensors.config import DATA_DIR

DB_PATH = DATA_DIR / "models.db"  # ~/.local/share/tensors/models.db
```

## CLI Integration

```bash
# Scan models directory
tsr db scan /models/

# Link local files to CivitAI (uses API for unknown hashes)
tsr db link

# Cache a specific model's full data
tsr db cache 999258

# List local models with CivitAI info
tsr db list

# Search cached models (offline)
tsr db search "bimbo" --type lora

# Show trigger words for a LoRA
tsr db triggers /models/loras/70s_VPMS.safetensors

# Show generation params from example images
tsr db prompts 999258
```

## Benefits

1. **Offline First** — Query cached data without API calls
2. **Hash Deduplication** — Detect duplicate files by SHA256
3. **Metadata Enrichment** — Combine safetensor header info with CivitAI metadata
4. **Trigger Word Lookup** — Find correct prompts for LoRAs
5. **Example Prompts** — Extract working prompts from preview images
6. **Version Tracking** — Know which version you have vs. latest available