Skip to content

SD1.5 NPU Conversion

This guide converts a SD1.5 .safetensors checkpoint into a Local Dream-compatible NPU model.

TIP

On-device CPU conversion is already supported by the app — this guide is only for the NPU path.

Prerequisites

  • OS: Linux or WSL (other platforms not verified)
  • Qualcomm AI Engine Direct SDK 2.28 — please use v2.28 to avoid potential issues. Download v2.28.0.241029.zip.
  • uv — Python environment manager
  • zstd — install via your package manager: sudo apt-get install zstd

WARNING

You need at least ~20 GB RAM to convert a 512×512 model. For higher resolutions, 64 GB+ of RAM + swap is required. Confirm your system meets this before starting — partial conversions are slow to recover from.

If you have a CUDA-capable GPU, you can edit pyproject.toml to use the GPU build of torch for a faster data preparation phase. The quantization step itself runs on CPU.

Conversion Scripts

INFO

Download the script: npuconvertv2.zip.

Environment Setup

bash
cd npuconvert
uv venv -p 3.10.17
source .venv/bin/activate
uv sync

Then set the QNN_SDK_ROOT path inside convert_all.sh and convert_all_unet_only.sh to point at your extracted QNN SDK.

Example: export.sh

The script below converts a checkpoint at the base 512×512 resolution and at two extra resolutions (512×768 and 768×512), packaging output for three SOC tiers.

bash
set -e

clip_skip=2 # 1 or 2
model_path=~/Downloads/AnythingXL_v50.safetensors
model_name=AnythingV5
realistic=false  # set true for realistic-style prompts

# Extra resolutions (width height). Leave empty to skip.
extra_resolutions=(
    "512 768"
    "768 512"
)

# SOC tiers
soc_versions=("8gen2" "8gen1" "min")
# Non-flagship tiers can't run higher resolutions
extra_resolution_soc_versions=("8gen2" "8gen1")

uv venv -p 3.10.17 --clear
source .venv/bin/activate
uv sync

realistic_flag=""
if [ "$realistic" = true ]; then
    realistic_flag="--realistic"
fi

process_extra_resolution() {
    local width=$1
    local height=$2
    local size="${width}x${height}"
    echo "Processing resolution: ${size}"

    python prepare_data.py --model_path $model_path --clip_skip $clip_skip --height $height --width $width $realistic_flag
    python gen_quant_data.py
    python export_onnx_unet_only.py --model_path $model_path --clip_skip $clip_skip --height $height --width $width

    for soc in "${extra_resolution_soc_versions[@]}"; do
        bash scripts/convert_all_unet_only.sh --min_soc $soc
    done

    mv output output_${size}

    # Patch files relative to the 512×512 base UNet
    for soc in "${extra_resolution_soc_versions[@]}"; do
        zstd --patch-from ./output_512/qnn_models_${soc}/unet.bin \
             output_${size}/qnn_models_${soc}/unet.bin \
             -o ./output_512/qnn_models_${soc}/${size}.patch
    done
}

# ===== Base 512×512 (required) =====
echo "Processing base resolution: 512x512"
python prepare_data.py --model_path $model_path --clip_skip $clip_skip $realistic_flag
python gen_quant_data.py
python export_onnx.py --model_path $model_path --clip_skip $clip_skip

for soc in "${soc_versions[@]}"; do
    bash scripts/convert_all.sh --min_soc $soc
done

mv output output_512

# ===== Extra resolutions =====
for resolution in "${extra_resolutions[@]}"; do
    read -r width height <<< "$resolution"
    process_extra_resolution $width $height
done

# ===== Package =====
echo "Packaging output files..."
for soc in "${soc_versions[@]}"; do
    zip -r ${model_name}_qnn2.28_${soc}.zip output_512/qnn_models_${soc}
done

Key Parameters

ParameterNotes
clip_skip1 or 2. Match the value the original model was trained with — most anime checkpoints use 2.
realistictrue switches to a realistic-image prompt set during quantization data generation. Use it for photo-style models.
extra_resolutionsEach extra resolution adds a separate UNet conversion + a .patch file relative to the 512×512 base. The patch makes the in-app download smaller.
soc_versionsTiers to build for. min covers Hexagon V68+ non-flagship chips; 8gen1 covers SD 8 Gen 1 / 8+ Gen 1; 8gen2 covers 8 Gen 2/3/4/5.

Output

You will get one .zip per SOC tier, e.g.:

AnythingV5_qnn2.28_min.zip
AnythingV5_qnn2.28_8gen1.zip
AnythingV5_qnn2.28_8gen2.zip

Each zip can be imported directly in the app. See Custom Models.

Troubleshooting

  • Conversion process is extremely slow. This is normal — expect several hours per resolution per SOC tier on a workstation-class CPU.
  • Out-of-memory mid-conversion. Increase swap. The 768×768 SD1.5 conversion will OOM on a 32 GB machine without swap.
  • App refuses to load the imported zip. Make sure you packaged the right SOC tier for your chip. A _8gen2 zip will not run on an 8 Gen 1 build slot, etc.