64 lines
1.5 KiB
Markdown
64 lines
1.5 KiB
Markdown
# Spark -- DGX Finetuning
|
|
|
|
Docs, scripts, and recipes for finetuning on the DGX Spark.
|
|
|
|
## Target Host: sin (sinanju)
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Hostname | `sin` / `sinanju` |
|
|
| User | `madcat` |
|
|
| LAN | `192.168.88.108` |
|
|
| WireGuard (mesh) | `10.44.0.2` |
|
|
| SSH | `ssh sin` or `ssh madcat` (LAN), `mesh-sin` (WG) |
|
|
|
|
### Hardware
|
|
|
|
| Component | Spec |
|
|
|---|---|
|
|
| Model | NVIDIA DGX Spark Founders Edition |
|
|
| Baseboard | P4242, rev A.7 |
|
|
| GPU | GB10 (sm_121) |
|
|
| Memory | ~128 GiB unified (CPU+GPU shared) |
|
|
| NIC | Realtek RTL8127 10GbE |
|
|
| WiFi | MediaTek 7925 |
|
|
| Architecture | aarch64 |
|
|
|
|
### Software
|
|
|
|
| Component | Version |
|
|
|---|---|
|
|
| DGX OS | 7.5.0 |
|
|
| Kernel | 6.17.0-1018-nvidia |
|
|
| Driver | 580.159.03 |
|
|
| BIOS | 5.36_0ACUM018 |
|
|
|
|
### Services
|
|
|
|
- **Ollama** -- localhost:11434 (local model serving)
|
|
- **opencode-serve** -- 0.0.0.0:4096 (systemd user service)
|
|
- **DGX Dashboard** -- localhost:8787
|
|
|
|
### Key Paths
|
|
|
|
```
|
|
/etc/dgx-release # version manifest
|
|
/opt/nvidia/nvfwupd/bin/nvfwupd # firmware update tool
|
|
/opt/nvidia/bin/spark-ota-check # OTA check CLI
|
|
/home/madcat/.config/opencode/opencode.jsonc # opencode config
|
|
```
|
|
|
|
## Repo Structure
|
|
|
|
```
|
|
spark/
|
|
AGENTS.md # this file
|
|
```
|
|
|
|
## Notes
|
|
|
|
- `nvidia-smi` showing N/A is **normal** for Spark -- it's not a discrete GPU
|
|
- GB10 uses unified memory; no separate VRAM
|
|
- BIOS `_0ACUM023` exists but is not in OTA pipeline (request via NVIDIA support)
|
|
- Known BIOS issue: early firmware can hang during long-context (~200k tok) vLLM prefill
|