A Comprehensive Manual Installation Guide for Wan2GP: Windows & Linux

Hello everyone! As a technical enthusiast exploring the latest developments in AI video generation, I recently spent some time working with Wan2GP (Wan 2.1 wrapper). Given the wide variety of hardware configurations out there—from the trusty GTX 10 series to the cutting-edge RTX 50 series—getting the environment set up correctly can sometimes be a bit specific.

In this post, I would like to share a detailed, step-by-step manual installation guide. whether you are on Windows or Linux, this guide aims to help you get up and running smoothly.

🛠️ Prerequisites

Before we dive into the installation, there are a few essential tools and drivers we need to ensure are present on your system. Having these ready will prevent common errors later on.

System Requirements

  • GPU: A compatible NVIDIA GPU ranging from the GTX 10XX series up to the RTX 50XX series.
  • OS: Windows 10/11 or Linux.

Essential Software

Please ensure you have the following installed. I have included links to the specific versions recommended for the best compatibility:

  1. Git: Required for cloning the repository. Download Git here.
  2. Visual Studio Build Tools: Essential for compiling C++ extensions (needed for CUDA). Please install Build Tools for Visual Studio 2022 and ensure the "Desktop development with C++" workload is selected. Download VS2022 Build Tools.
  3. CUDA Toolkit: You will need version 12.8 or higher for the best support, especially for newer cards. Download CUDA Toolkit.
  4. NVIDIA Drivers: Please keep your drivers up to date to ensure compatibility with the CUDA Toolkit. Update Drivers.
  5. FFMPEG: Crucial for video processing. After downloading and unzipping, please remember to add the bin folder to your system's PATH environment variable. Download FFMPEG.
  6. Python: Version 3.10.9 is the recommended baseline. Download Python 3.10.9.
  7. Environment Manager: I highly recommend using Miniconda to manage your environments, though a standard Python venv works as well. Download Miniconda.

🚀 Step 1: Repository Setup & Environment Creation

Regardless of your operating system or GPU, the first step is to get the code and create a clean sandbox for our dependencies.

  1. Clone the Repository: Create a folder named Wan2GP. Open your terminal (or Command Prompt) in this folder and run:

    git clone https://github.com/deepbeepmeep/Wan2GP.git
    
  2. Create the Conda Environment: We will create an environment named wan2gp running Python 3.10.9.

    conda create -n wan2gp python=3.10.9
    
  3. Activate the Environment:

    conda activate wan2gp
    

🖥️ Step 2: Choose Your Installation Path (Windows)

To ensure stability, the installation steps vary slightly depending on your GPU architecture. Please locate your GPU generation below and follow the specific commands.

Option A: Windows for GTX 10XX - 16XX

Target: PyTorch 2.6.0 | CUDA 12.6

For older architectures, we stick to a very stable PyTorch release.

  1. Install PyTorch:
    pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
    
  2. Install Requirements:
    pip install -r requirements.txt
    

Option B: Windows for RTX 20XX / Quadro

Target: PyTorch 2.6.0 | CUDA 12.6 | SageAttention 1.0.6

  1. Install PyTorch:
    pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
    
  2. Install Triton:
    pip install -U "triton-windows<3.3"
    
  3. Install SageAttention (v1):
    pip install sageattention==1.0.6
    
  4. Install Requirements:
    pip install -r requirements.txt
    

Option C: Windows for RTX 30XX

Target: PyTorch 2.6.0 | CUDA 12.6 | SageAttention 2.1.1

  1. Install PyTorch:
    pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
    
  2. Install Triton:
    pip install -U "triton-windows<3.3"
    
  3. Install SageAttention (v2.1.1):
    pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl
    
  4. Install Requirements:
    pip install -r requirements.txt
    

Option D: Windows for RTX 40XX & 50XX (Standard)

Target: PyTorch 2.7.1 | CUDA 12.8 | SageAttention 2.2.0

For modern cards, we upgrade to PyTorch 2.7.1 to leverage CUDA 12.8 features.

  1. Install PyTorch:
    pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
    
  2. Install Triton:
    pip install -U "triton-windows<3.4"
    
  3. Install SageAttention (v2.2.0):
    pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows/sageattention-2.2.0+cu128torch2.7.1-cp310-cp310-win_amd64.whl
    
  4. Install Requirements:
    pip install -r requirements.txt
    

Option E: Windows for RTX 50XX (NV FP4 Optimized)

Target: PyTorch 2.9.1 | CUDA 13.0

Note: This is an experimental setup specifically for using NV FP4 optimized kernels on RTX 50-series cards. PyTorch 2.9.1 is bleeding-edge; generally, stick to Option D unless you specifically need these kernels.

  1. Install PyTorch:
    pip install torch==2.9.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
    
  2. Install Triton:
    pip install -U "triton-windows<3.4"
    
  3. Install SageAttention:
    pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows.post4/sageattention-2.2.0+cu130torch2.9.0andhigher.post4-cp39-abi3-win_amd64.whl
    
  4. Install Requirements:
    pip install -r requirements.txt
    

🐧 Step 3: Choose Your Installation Path (Linux)

For our Linux users, the process is very similar, though we often build SageAttention from source or use standard pip packages rather than Windows-specific wheels.

Option A: Linux for GTX 10XX - 16XX

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Option B: Linux for RTX 20XX / Quadro

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -U "triton<3.3"
pip install sageattention==1.0.6
pip install -r requirements.txt

Option C: Linux for RTX 30XX

We compile SageAttention from source to ensure compatibility.

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -U "triton<3.3"

# Reinstall setuptools to avoid build issues, then build SageAttention
python -m pip install "setuptools<=75.8.2" --force-reinstall
git clone https://github.com/thu-ml/SageAttention
cd SageAttention 
pip install -e .
cd ..

pip install -r requirements.txt

Option D: Linux for RTX 40XX & 50XX

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -U "triton<3.4"

# Build SageAttention
python -m pip install "setuptools<=75.8.2" --force-reinstall
git clone https://github.com/thu-ml/SageAttention
cd SageAttention 
pip install -e .
cd ..

pip install -r requirements.txt

⚡ Performance Optimization & Configuration

Once installed, there are several ways to tune Wan2GP for your specific hardware.

Attention Modes

The choice of attention mechanism significantly impacts inference speed.

  • SDPA (default): Standard PyTorch attention. Reliable and compatible with everything.
  • Sage: Offers a ~30% speed boost with a negligible cost to quality.
  • Sage2: Offers a ~40% speed boost.
  • Flash Attention: Excellent performance, though installation on Windows can be complex.

Compatibility Cheat Sheet:

  • RTX 10XX: SDPA only.
  • RTX 20XX: SDPA, Sage1.
  • RTX 30XX/40XX: SDPA, Flash, Sage1, Sage2/Sage2++.
  • RTX 50XX: All of the above plus Sage3.

Performance Profiles (RAM/VRAM Usage)

You can select profiles to manage how the model is loaded:

  • Profile 3 (LowRAM_HighVRAM): Loads the entire model into VRAM. Best for speed, but requires substantial VRAM (e.g., 24GB for an 8-bit 14B model).
  • Profile 4 (LowRAM_LowVRAM): The default setting. Loads model parts dynamically. It is slower but allows running larger models on GPUs with less VRAM.

Optional: Flash Attention

If you wish to use Flash Attention:

  • Windows:
    pip install https://github.com/Redtash1/Flash_Attention_2_Windows/releases/download/v2.7.0-v2.7.4/flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
    
  • Linux:
    pip install flash-attn==2.7.2.post1
    

🧪 Advanced: Optimized INT4 / FP4 Kernels (RTX 50XX Only)

For users with RTX 50-series (SM120+) GPUs, there are specialized kernels available for INT4/FP4 dequantization. These are highly experimental and hardware-dependent.

Light2xv NVP4 Kernels

Requires Python 3.10, PyTorch 2.9.1, and CUDA 13.

Nunchaku INT4/FP4 Kernels

Available for both PyTorch 2.7.1 and 2.9.1.

  • Windows (PT 2.7.1): pip install .../nunchaku-1.2.0+torch2.7-cp310-cp310-win_amd64.whl
  • Linux (PT 2.7.1): pip install .../nunchaku-1.2.0+torch2.7-cp310-cp310-linux_x86_64.whl

(Please refer to the original repository for the full list of Nunchaku download links).


❓ Troubleshooting

If you encounter issues, here are a few quick tips:

  1. Sage Attention Errors:

    • Ensure Triton is installed correctly.
    • Try clearing the Triton cache.
    • If all else fails, force the standard attention mode:
      python wgp.py --attention sdpa
      
  2. Out of Memory (OOM):

    • Try lowering the generation resolution or video length.
    • Ensure quantization is enabled (default).
    • Switch to Profile 4 to prioritize VRAM savings.
    • Consider using the 1.3B parameter model instead of the larger 14B model.

I hope this guide helps you get Wan2GP running on your machine! It is a powerful tool, and with the right setup, you can achieve impressive performance across a wide range of hardware.

Happy generating!

Read More

Original link: Manual Installation Guide For Windows & Linux