SD1.5 NPU Conversion
This guide converts a SD1.5 .safetensors checkpoint into a Local Dream-compatible NPU model.
TIP
On-device CPU conversion is already supported by the app — this guide is only for the NPU path.
Prerequisites
- OS: Linux or WSL (other platforms not verified)
- Qualcomm AI Engine Direct SDK 2.28 — please use v2.28 to avoid potential issues. Download v2.28.0.241029.zip.
- uv — Python environment manager
- zstd — install via your package manager:
sudo apt-get install zstd
WARNING
You need at least ~20 GB RAM to convert a 512×512 model. For higher resolutions, 64 GB+ of RAM + swap is required. Confirm your system meets this before starting — partial conversions are slow to recover from.
If you have a CUDA-capable GPU, you can edit pyproject.toml to use the GPU build of torch for a faster data preparation phase. The quantization step itself runs on CPU.
Conversion Scripts
INFO
Download the script: npuconvertv2.zip.
Environment Setup
cd npuconvert
uv venv -p 3.10.17
source .venv/bin/activate
uv syncThen set the QNN_SDK_ROOT path inside convert_all.sh and convert_all_unet_only.sh to point at your extracted QNN SDK.
Example: export.sh
The script below converts a checkpoint at the base 512×512 resolution and at two extra resolutions (512×768 and 768×512), packaging output for three SOC tiers.
set -e
clip_skip=2 # 1 or 2
model_path=~/Downloads/AnythingXL_v50.safetensors
model_name=AnythingV5
realistic=false # set true for realistic-style prompts
# Extra resolutions (width height). Leave empty to skip.
extra_resolutions=(
"512 768"
"768 512"
)
# SOC tiers
soc_versions=("8gen2" "8gen1" "min")
# Non-flagship tiers can't run higher resolutions
extra_resolution_soc_versions=("8gen2" "8gen1")
uv venv -p 3.10.17 --clear
source .venv/bin/activate
uv sync
realistic_flag=""
if [ "$realistic" = true ]; then
realistic_flag="--realistic"
fi
process_extra_resolution() {
local width=$1
local height=$2
local size="${width}x${height}"
echo "Processing resolution: ${size}"
python prepare_data.py --model_path $model_path --clip_skip $clip_skip --height $height --width $width $realistic_flag
python gen_quant_data.py
python export_onnx_unet_only.py --model_path $model_path --clip_skip $clip_skip --height $height --width $width
for soc in "${extra_resolution_soc_versions[@]}"; do
bash scripts/convert_all_unet_only.sh --min_soc $soc
done
mv output output_${size}
# Patch files relative to the 512×512 base UNet
for soc in "${extra_resolution_soc_versions[@]}"; do
zstd --patch-from ./output_512/qnn_models_${soc}/unet.bin \
output_${size}/qnn_models_${soc}/unet.bin \
-o ./output_512/qnn_models_${soc}/${size}.patch
done
}
# ===== Base 512×512 (required) =====
echo "Processing base resolution: 512x512"
python prepare_data.py --model_path $model_path --clip_skip $clip_skip $realistic_flag
python gen_quant_data.py
python export_onnx.py --model_path $model_path --clip_skip $clip_skip
for soc in "${soc_versions[@]}"; do
bash scripts/convert_all.sh --min_soc $soc
done
mv output output_512
# ===== Extra resolutions =====
for resolution in "${extra_resolutions[@]}"; do
read -r width height <<< "$resolution"
process_extra_resolution $width $height
done
# ===== Package =====
echo "Packaging output files..."
for soc in "${soc_versions[@]}"; do
zip -r ${model_name}_qnn2.28_${soc}.zip output_512/qnn_models_${soc}
doneKey Parameters
| Parameter | Notes |
|---|---|
clip_skip | 1 or 2. Match the value the original model was trained with — most anime checkpoints use 2. |
realistic | true switches to a realistic-image prompt set during quantization data generation. Use it for photo-style models. |
extra_resolutions | Each extra resolution adds a separate UNet conversion + a .patch file relative to the 512×512 base. The patch makes the in-app download smaller. |
soc_versions | Tiers to build for. min covers Hexagon V68+ non-flagship chips; 8gen1 covers SD 8 Gen 1 / 8+ Gen 1; 8gen2 covers 8 Gen 2/3/4/5. |
Output
You will get one .zip per SOC tier, e.g.:
AnythingV5_qnn2.28_min.zip
AnythingV5_qnn2.28_8gen1.zip
AnythingV5_qnn2.28_8gen2.zipEach zip can be imported directly in the app. See Custom Models.
Troubleshooting
- Conversion process is extremely slow. This is normal — expect several hours per resolution per SOC tier on a workstation-class CPU.
- Out-of-memory mid-conversion. Increase swap. The 768×768 SD1.5 conversion will OOM on a 32 GB machine without swap.
- App refuses to load the imported zip. Make sure you packaged the right SOC tier for your chip. A
_8gen2zip will not run on an 8 Gen 1 build slot, etc.