6 Commits

Author SHA1 Message Date
marauder-actual 2200120133 fix: lower seqlen to 512 for short calibration examples
Training examples are ~500-1500 tokens. seqlen=2048 causes
'no data has been cached' error. Also remove deprecated format param.
2026-06-01 04:27:24 +02:00
marauder-actual 465e74f49e fix: patch is_mllm_model for Qwen3.6 text-only model
AutoRound misidentifies Qwen3_5ForConditionalGeneration as a VLM
and tries to load a vision processor. Patch to force LLM mode.
2026-06-01 04:26:15 +02:00
marauder-actual 367ed705ab fix: convert chat messages to text for AutoRound calibration 2026-06-01 04:23:28 +02:00
marauder-actual 4edaeeb21b switch quantization from llm-compressor to AutoRound
llm-compressor pins transformers<=4.57.6, can't load Qwen3.6.
AutoRound (Intel) works with transformers 5.x and is already
installed as an llmcompressor dependency. Produces vLLM-compatible
INT4 output.
2026-06-01 04:22:07 +02:00
marauder-actual 934be8ce48 fix: load tokenizer from base repo for quant venv compat
Merged model has tokenizer_class=TokenizersBackend (transformers 5.x)
which is unknown to transformers 4.57.6 in the quant venv.
2026-06-01 04:16:32 +02:00
marauder-actual 0fa46c9fed add AWQ quantization script (llm-compressor) 2026-06-01 04:15:15 +02:00