Skip to content

SDXL NPU Conversion (Experimental)

This guide converts a Stable Diffusion XL .safetensors checkpoint into an NPU model for evaluation on supported Snapdragon devices.

WARNING

SDXL conversion is experimental and substantially heavier than SD1.5 conversion. High-resolution SDXL workflows can require 100 GB+ disk space and 128 GB of RAM + swap. Read the prerequisites carefully before starting.

Prerequisites

  • OS: Linux or WSL
  • Qualcomm AI Engine Direct SDK 2.28 — please use v2.28 to avoid potential issues. Download v2.28.0.241029.zip.
  • uv — Python environment manager
  • RAM + swap: ~128 GB for high-resolution conversion

If you have a CUDA-capable GPU, you can switch the torch dependency in pyproject.toml to the GPU build to speed up data preparation.

Conversion Scripts

INFO

Download the script: convertsdxl.zip.

Environment Setup

bash
cd convertsdxl
uv venv -p 3.10.17
source .venv/bin/activate
uv sync

Set the QNN_SDK_ROOT path inside convert_all_sdxl.sh to your extracted QNN SDK.

Example: export_sdxl.sh

bash
set -e

model_path=~/Downloads/anythingxl.safetensors
model_name=anythingxl
realistic=false                 # set true for realistic-style prompts
scheduler=dpm                   # dpm | lcm | eulera
cfg=5,7                         # 5–7 random per image
steps=15,30                     # 15–30 random per image

# Currently only 8gen3 is built. The 8gen3 build runs on 8 Gen 3, 8 Elite, and 8 Elite Gen 5 / 8 Gen 5.
soc_versions=("8gen3")

uv venv -p 3.10.17 --clear
source .venv/bin/activate
uv sync

realistic_flag=""
if [ "$realistic" = true ]; then
    realistic_flag="--realistic"
fi

# ===== 1024×1024 =====
echo "Processing base resolution: 1024x1024"
python prepare_data_sdxl.py --model_path $model_path $realistic_flag --scheduler $scheduler --cfg $cfg --step $steps
python gen_quant_data_sdxl.py
python export_onnx_sdxl.py --model_path $model_path

for soc in "${soc_versions[@]}"; do
    bash scripts/convert_all_sdxl.sh --min_soc $soc
done

# ===== Package =====
echo "Packaging output files..."
for soc in "${soc_versions[@]}"; do
    touch output/qnn_models_sdxl_${soc}/SDXL
    zip -r ${model_name}_qnn2.28_${soc}.zip output/qnn_models_sdxl_${soc}
done

Key Parameters

ParameterNotes
schedulerdpm, lcm, or eulera. Pick one that matches the model's recommended sampler.
cfgA range used to sample CFG values during quantization data generation. Wider ranges produce more general models but increase quantization noise.
stepsSame idea as cfg — a range that determines how varied the calibration data is.
soc_versionsCurrently only 8gen3 is supported. The same binary runs on 8 Gen 3, 8 Elite, and 8 Elite Gen 5 / 8 Gen 5.

Output

You'll get a single zip per SOC tier:

anythingxl_qnn2.28_8gen3.zip

Import it in the app the same way as SD1.5 NPU models. See Custom Models.

Caveats

  • Expect generation quality slightly below the source SDXL checkpoint — quantization noise is more visible at SDXL scale than at SD1.5.
  • The SDXL marker file inside the output is required so the app loads it as an SDXL model rather than SD1.5.
  • Conversion takes considerably longer than SD1.5 — plan for an overnight run on a workstation.