SDXL NPU Conversion (Experimental)
This guide converts a Stable Diffusion XL .safetensors checkpoint into an NPU model for evaluation on supported Snapdragon devices.
WARNING
SDXL conversion is experimental and substantially heavier than SD1.5 conversion. High-resolution SDXL workflows can require 100 GB+ disk space and 128 GB of RAM + swap. Read the prerequisites carefully before starting.
Prerequisites
- OS: Linux or WSL
- Qualcomm AI Engine Direct SDK 2.28 — please use v2.28 to avoid potential issues. Download v2.28.0.241029.zip.
- uv — Python environment manager
- RAM + swap: ~128 GB for high-resolution conversion
If you have a CUDA-capable GPU, you can switch the torch dependency in pyproject.toml to the GPU build to speed up data preparation.
Conversion Scripts
INFO
Download the script: convertsdxl.zip.
Environment Setup
bash
cd convertsdxl
uv venv -p 3.10.17
source .venv/bin/activate
uv syncSet the QNN_SDK_ROOT path inside convert_all_sdxl.sh to your extracted QNN SDK.
Example: export_sdxl.sh
bash
set -e
model_path=~/Downloads/anythingxl.safetensors
model_name=anythingxl
realistic=false # set true for realistic-style prompts
scheduler=dpm # dpm | lcm | eulera
cfg=5,7 # 5–7 random per image
steps=15,30 # 15–30 random per image
# Currently only 8gen3 is built. The 8gen3 build runs on 8 Gen 3, 8 Elite, and 8 Elite Gen 5 / 8 Gen 5.
soc_versions=("8gen3")
uv venv -p 3.10.17 --clear
source .venv/bin/activate
uv sync
realistic_flag=""
if [ "$realistic" = true ]; then
realistic_flag="--realistic"
fi
# ===== 1024×1024 =====
echo "Processing base resolution: 1024x1024"
python prepare_data_sdxl.py --model_path $model_path $realistic_flag --scheduler $scheduler --cfg $cfg --step $steps
python gen_quant_data_sdxl.py
python export_onnx_sdxl.py --model_path $model_path
for soc in "${soc_versions[@]}"; do
bash scripts/convert_all_sdxl.sh --min_soc $soc
done
# ===== Package =====
echo "Packaging output files..."
for soc in "${soc_versions[@]}"; do
touch output/qnn_models_sdxl_${soc}/SDXL
zip -r ${model_name}_qnn2.28_${soc}.zip output/qnn_models_sdxl_${soc}
doneKey Parameters
| Parameter | Notes |
|---|---|
scheduler | dpm, lcm, or eulera. Pick one that matches the model's recommended sampler. |
cfg | A range used to sample CFG values during quantization data generation. Wider ranges produce more general models but increase quantization noise. |
steps | Same idea as cfg — a range that determines how varied the calibration data is. |
soc_versions | Currently only 8gen3 is supported. The same binary runs on 8 Gen 3, 8 Elite, and 8 Elite Gen 5 / 8 Gen 5. |
Output
You'll get a single zip per SOC tier:
anythingxl_qnn2.28_8gen3.zipImport it in the app the same way as SD1.5 NPU models. See Custom Models.
Caveats
- Expect generation quality slightly below the source SDXL checkpoint — quantization noise is more visible at SDXL scale than at SD1.5.
- The
SDXLmarker file inside the output is required so the app loads it as an SDXL model rather than SD1.5. - Conversion takes considerably longer than SD1.5 — plan for an overnight run on a workstation.