wan-gguf

Author: calcuis
Downloads: 27,983
Likes: 126
License: Apache 2.0
Created: Feb 26, 2025
Last Modified: Jul 8, 2025

gguf quantized version of wan video

  • drag gguf to > ./ComfyUI/models/diffusion_models
  • drag t5xxl-um to > ./ComfyUI/models/text_encoders
  • drag vae to > ./ComfyUI/models/vae

screenshot

workflow

  • for i2v model, drag clip-vision-h to > ./ComfyUI/models/clip_vision
  • run the .bat file in the main directory (assume you are using gguf pack below)
  • if you opt to use fp8 scaled umt5xxl encoder (if applies to any fp8 scale t5 actually), please use cpu offload (switch from default to cpu under device in gguf clip loader; won't affect speed); btw, it works fine for both gguf umt5xxl and gguf vae
  • drag any demo video (below) to > your browser for workflow

screenshot

review

  • pig is a lazy architecture for gguf node; it applies to all model, encoder and vae gguf file(s); if you try to run it in comfyui-gguf node, you might need to manually add pig in it's IMG_ARCH_LIST (under loader.py); easier than you edit the gguf file itself; btw, model architecture which compatible with comfyui-gguf, including wan, should work in gguf node
  • 1.3b model: t2v, vace gguf is working fine; good for old or low end machine

run it with diffusers🧨 (alternative 1)

import torch
from transformers import UMT5EncoderModel
from diffusers import AutoencoderKLWan, WanVACEPipeline, WanVACETransformer3DModel, GGUFQuantizationConfig
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video

model_path = "https://huggingface.co/calcuis/wan-gguf/blob/main/wan2.1-v5-vace-1.3b-q4_0.gguf"
transformer = WanVACETransformer3DModel.from_single_file(
    model_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
    )

text_encoder = UMT5EncoderModel.from_pretrained(
    "chatpig/umt5xxl-encoder-gguf",
    gguf_file="umt5xxl-encoder-q4_0.gguf",
    torch_dtype=torch.bfloat16,
    )

vae = AutoencoderKLWan.from_pretrained(
    "callgg/wan-decoder",
    subfolder="vae",
    torch_dtype=torch.float32
    )

pipe = WanVACEPipeline.from_pretrained(
    "callgg/wan-decoder",
    transformer=transformer,
    text_encoder=text_encoder,
    vae=vae, 
    torch_dtype=torch.bfloat16
)

flow_shift = 3.0
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompt = "a pig moving quickly in a beautiful winter scenery nature trees sunset tracking camera"
negative_prompt = "blurry ugly bad"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=720,
    height=480,
    num_frames=57,
    num_inference_steps=24,
    guidance_scale=2.5,
    conditioning_scale=0.0,
    generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(output, "output.mp4", fps=16)

run it with gguf-connector (alternative 2)

ggc v2

screenshot

update

  • wan2.1-v5-vace-1.3b: except block weights, all in f32 status (avoid triggering time/text embedding key error for inference usage)

reference

Share this model

Found this model useful? Share it with others!