1908cf91c4
Adds client-side concurrent queueing to style-sweep. -P N submits N prompts to ComfyUI's HTTP queue concurrently via ThreadPoolExecutor. The GPU still processes one prompt at a time (ComfyUI's queue is single-worker), but the HTTP submission, websocket polling, image download, and disk-write phases pipeline with the next prompt's submission. Expected speedup: 5-15% on a typical Flux sweep where per-image GPU time is ~25-30s and overhead is ~3-5s. Real benefit grows with slower networks or larger images. Design choices: - Default P=1 preserves the exact existing sequential behavior and log output (no "(submit #N)" suffix in messages). - P>1 uses ThreadPoolExecutor.as_completed for completion-order reporting; the manifest is re-sorted to source-list order after. - Skip-existing + dry-run cases are handled synchronously before the executor even starts (no point pipelining no-ops). - --abort-on-error is incompatible with parallelism (can't reliably stop in-flight workers); we warn and continue. - Per-task console output WILL interleave under -P>1 because _run_generation prints its own progress; users are pointed at the manifest for clean per-slug timing. Why not full async multi-GPU-workflow parallelism: - ComfyUI processes its queue strictly sequentially; we can't actually run two Flux UNets concurrently without a second ComfyUI instance, second port, second model dir, etc. - Even with two instances on one GPU, the CUDA cores time-slice and you get ~1.1x not 2x. - Memory math is tighter than it looks even on Spark's 80GB unified pool: two Flux dev instances = 64GB fixed before any activations. - Maintenance burden is real; speed gain is marginal. Client-side pipelining gets the practical wins (overhead hiding, cleaner progress feedback for long sweeps) without the complexity or OOM risk. 7 new tests covering: invalid P=0, P=1 equivalence with sequential, multi-style execution, source-order manifest preservation under chaotic completion, skip-existing in parallel mode, individual failure containment, and abort-on-error warning. 267 -> 274 tests.