A Comprehensive Manual Installation Guide for Wan2GP: Windows & Linux
Hello everyone! As a technical enthusiast exploring the latest developments in AI video generation, I recently spent some time working with Wan2GP (Wan 2.1 wrapper). Given the wide variety of hardware configurations out there—from the trusty GTX 10 series to the cutting-edge RTX 50 series—getting the environment set up correctly can sometimes be a bit specific.
In this post, I would like to share a detailed, step-by-step manual installation guide. whether you are on Windows or Linux, this guide aims to help you get up and running smoothly.
🛠️ Prerequisites
Before we dive into the installation, there are a few essential tools and drivers we need to ensure are present on your system. Having these ready will prevent common errors later on.
System Requirements
- GPU: A compatible NVIDIA GPU ranging from the GTX 10XX series up to the RTX 50XX series.
- OS: Windows 10/11 or Linux.
Essential Software
Please ensure you have the following installed. I have included links to the specific versions recommended for the best compatibility:
- Git: Required for cloning the repository. Download Git here.
- Visual Studio Build Tools: Essential for compiling C++ extensions (needed for CUDA). Please install Build Tools for Visual Studio 2022 and ensure the "Desktop development with C++" workload is selected. Download VS2022 Build Tools.
- CUDA Toolkit: You will need version 12.8 or higher for the best support, especially for newer cards. Download CUDA Toolkit.
- NVIDIA Drivers: Please keep your drivers up to date to ensure compatibility with the CUDA Toolkit. Update Drivers.
- FFMPEG: Crucial for video processing. After downloading and unzipping, please remember to add the
binfolder to your system's PATH environment variable. Download FFMPEG. - Python: Version 3.10.9 is the recommended baseline. Download Python 3.10.9.
- Environment Manager: I highly recommend using Miniconda to manage your environments, though a standard Python
venvworks as well. Download Miniconda.
🚀 Step 1: Repository Setup & Environment Creation
Regardless of your operating system or GPU, the first step is to get the code and create a clean sandbox for our dependencies.
-
Clone the Repository: Create a folder named
Wan2GP. Open your terminal (or Command Prompt) in this folder and run:git clone https://github.com/deepbeepmeep/Wan2GP.git -
Create the Conda Environment: We will create an environment named
wan2gprunning Python 3.10.9.conda create -n wan2gp python=3.10.9 -
Activate the Environment:
conda activate wan2gp
🖥️ Step 2: Choose Your Installation Path (Windows)
To ensure stability, the installation steps vary slightly depending on your GPU architecture. Please locate your GPU generation below and follow the specific commands.
Option A: Windows for GTX 10XX - 16XX
Target: PyTorch 2.6.0 | CUDA 12.6
For older architectures, we stick to a very stable PyTorch release.
- Install PyTorch:
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126 - Install Requirements:
pip install -r requirements.txt
Option B: Windows for RTX 20XX / Quadro
Target: PyTorch 2.6.0 | CUDA 12.6 | SageAttention 1.0.6
- Install PyTorch:
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126 - Install Triton:
pip install -U "triton-windows<3.3" - Install SageAttention (v1):
pip install sageattention==1.0.6 - Install Requirements:
pip install -r requirements.txt
Option C: Windows for RTX 30XX
Target: PyTorch 2.6.0 | CUDA 12.6 | SageAttention 2.1.1
- Install PyTorch:
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126 - Install Triton:
pip install -U "triton-windows<3.3" - Install SageAttention (v2.1.1):
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl - Install Requirements:
pip install -r requirements.txt
Option D: Windows for RTX 40XX & 50XX (Standard)
Target: PyTorch 2.7.1 | CUDA 12.8 | SageAttention 2.2.0
For modern cards, we upgrade to PyTorch 2.7.1 to leverage CUDA 12.8 features.
- Install PyTorch:
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128 - Install Triton:
pip install -U "triton-windows<3.4" - Install SageAttention (v2.2.0):
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows/sageattention-2.2.0+cu128torch2.7.1-cp310-cp310-win_amd64.whl - Install Requirements:
pip install -r requirements.txt
Option E: Windows for RTX 50XX (NV FP4 Optimized)
Target: PyTorch 2.9.1 | CUDA 13.0
Note: This is an experimental setup specifically for using NV FP4 optimized kernels on RTX 50-series cards. PyTorch 2.9.1 is bleeding-edge; generally, stick to Option D unless you specifically need these kernels.
- Install PyTorch:
pip install torch==2.9.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130 - Install Triton:
pip install -U "triton-windows<3.4" - Install SageAttention:
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows.post4/sageattention-2.2.0+cu130torch2.9.0andhigher.post4-cp39-abi3-win_amd64.whl - Install Requirements:
pip install -r requirements.txt
🐧 Step 3: Choose Your Installation Path (Linux)
For our Linux users, the process is very similar, though we often build SageAttention from source or use standard pip packages rather than Windows-specific wheels.
Option A: Linux for GTX 10XX - 16XX
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
Option B: Linux for RTX 20XX / Quadro
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -U "triton<3.3"
pip install sageattention==1.0.6
pip install -r requirements.txt
Option C: Linux for RTX 30XX
We compile SageAttention from source to ensure compatibility.
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -U "triton<3.3"
# Reinstall setuptools to avoid build issues, then build SageAttention
python -m pip install "setuptools<=75.8.2" --force-reinstall
git clone https://github.com/thu-ml/SageAttention
cd SageAttention
pip install -e .
cd ..
pip install -r requirements.txt
Option D: Linux for RTX 40XX & 50XX
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -U "triton<3.4"
# Build SageAttention
python -m pip install "setuptools<=75.8.2" --force-reinstall
git clone https://github.com/thu-ml/SageAttention
cd SageAttention
pip install -e .
cd ..
pip install -r requirements.txt
⚡ Performance Optimization & Configuration
Once installed, there are several ways to tune Wan2GP for your specific hardware.
Attention Modes
The choice of attention mechanism significantly impacts inference speed.
- SDPA (default): Standard PyTorch attention. Reliable and compatible with everything.
- Sage: Offers a ~30% speed boost with a negligible cost to quality.
- Sage2: Offers a ~40% speed boost.
- Flash Attention: Excellent performance, though installation on Windows can be complex.
Compatibility Cheat Sheet:
- RTX 10XX: SDPA only.
- RTX 20XX: SDPA, Sage1.
- RTX 30XX/40XX: SDPA, Flash, Sage1, Sage2/Sage2++.
- RTX 50XX: All of the above plus Sage3.
Performance Profiles (RAM/VRAM Usage)
You can select profiles to manage how the model is loaded:
- Profile 3 (LowRAM_HighVRAM): Loads the entire model into VRAM. Best for speed, but requires substantial VRAM (e.g., 24GB for an 8-bit 14B model).
- Profile 4 (LowRAM_LowVRAM): The default setting. Loads model parts dynamically. It is slower but allows running larger models on GPUs with less VRAM.
Optional: Flash Attention
If you wish to use Flash Attention:
- Windows:
pip install https://github.com/Redtash1/Flash_Attention_2_Windows/releases/download/v2.7.0-v2.7.4/flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl - Linux:
pip install flash-attn==2.7.2.post1
🧪 Advanced: Optimized INT4 / FP4 Kernels (RTX 50XX Only)
For users with RTX 50-series (SM120+) GPUs, there are specialized kernels available for INT4/FP4 dequantization. These are highly experimental and hardware-dependent.
Light2xv NVP4 Kernels
Requires Python 3.10, PyTorch 2.9.1, and CUDA 13.
- Windows: Download Wheel
- Linux: Download Wheel
Nunchaku INT4/FP4 Kernels
Available for both PyTorch 2.7.1 and 2.9.1.
- Windows (PT 2.7.1):
pip install .../nunchaku-1.2.0+torch2.7-cp310-cp310-win_amd64.whl - Linux (PT 2.7.1):
pip install .../nunchaku-1.2.0+torch2.7-cp310-cp310-linux_x86_64.whl
(Please refer to the original repository for the full list of Nunchaku download links).
❓ Troubleshooting
If you encounter issues, here are a few quick tips:
-
Sage Attention Errors:
- Ensure Triton is installed correctly.
- Try clearing the Triton cache.
- If all else fails, force the standard attention mode:
python wgp.py --attention sdpa
-
Out of Memory (OOM):
- Try lowering the generation resolution or video length.
- Ensure quantization is enabled (default).
- Switch to Profile 4 to prioritize VRAM savings.
- Consider using the 1.3B parameter model instead of the larger 14B model.
I hope this guide helps you get Wan2GP running on your machine! It is a powerful tool, and with the right setup, you can achieve impressive performance across a wide range of hardware.
Happy generating!
Read More
Original link: Manual Installation Guide For Windows & Linux
