A Comprehensive Manual Installation Guide for Wan2GP: Windows & Linux

Hello everyone! As a technical enthusiast exploring the latest developments in AI video generation, I recently spent some time working with Wan2GP (Wan 2.1 wrapper). Given the wide variety of hardware configurations out there—from the trusty GTX 10 series to the cutting-edge RTX 50 series—getting the environment set up correctly can sometimes be a bit specific.

In this post, I would like to share a detailed, step-by-step manual installation guide. whether you are on Windows or Linux, this guide aims to help you get up and running smoothly.

🛠️ Prerequisites

Before we dive into the installation, there are a few essential tools and drivers we need to ensure are present on your system. Having these ready will prevent common errors later on.

System Requirements

GPU: A compatible NVIDIA GPU ranging from the GTX 10XX series up to the RTX 50XX series.
OS: Windows 10/11 or Linux.

Essential Software

Please ensure you have the following installed. I have included links to the specific versions recommended for the best compatibility:

Git: Required for cloning the repository. Download Git here.
Visual Studio Build Tools: Essential for compiling C++ extensions (needed for CUDA). Please install Build Tools for Visual Studio 2022 and ensure the "Desktop development with C++" workload is selected. Download VS2022 Build Tools.
CUDA Toolkit: You will need version 12.8 or higher for the best support, especially for newer cards. Download CUDA Toolkit.
NVIDIA Drivers: Please keep your drivers up to date to ensure compatibility with the CUDA Toolkit. Update Drivers.
FFMPEG: Crucial for video processing. After downloading and unzipping, please remember to add the bin folder to your system's PATH environment variable. Download FFMPEG.
Python: Version 3.10.9 is the recommended baseline. Download Python 3.10.9.
Environment Manager: I highly recommend using Miniconda to manage your environments, though a standard Python venv works as well. Download Miniconda.

🚀 Step 1: Repository Setup & Environment Creation

Regardless of your operating system or GPU, the first step is to get the code and create a clean sandbox for our dependencies.

Clone the Repository: Create a folder named Wan2GP. Open your terminal (or Command Prompt) in this folder and run:
```
git clone https://github.com/deepbeepmeep/Wan2GP.git
```
Create the Conda Environment: We will create an environment named wan2gp running Python 3.10.9.
```
conda create -n wan2gp python=3.10.9
```
Activate the Environment:
```
conda activate wan2gp
```

🖥️ Step 2: Choose Your Installation Path (Windows)

To ensure stability, the installation steps vary slightly depending on your GPU architecture. Please locate your GPU generation below and follow the specific commands.

Option A: Windows for GTX 10XX - 16XX

Target: PyTorch 2.6.0 | CUDA 12.6

For older architectures, we stick to a very stable PyTorch release.

Install PyTorch:

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126

Install Requirements:
```
pip install -r requirements.txt
```

Option B: Windows for RTX 20XX / Quadro

Target: PyTorch 2.6.0 | CUDA 12.6 | SageAttention 1.0.6

Install PyTorch:

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126

Install Triton:
```
pip install -U "triton-windows<3.3"
```
Install SageAttention (v1):
```
pip install sageattention==1.0.6
```
Install Requirements:
```
pip install -r requirements.txt
```

Option C: Windows for RTX 30XX

Target: PyTorch 2.6.0 | CUDA 12.6 | SageAttention 2.1.1

Install PyTorch:

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126

Install Triton:
```
pip install -U "triton-windows<3.3"
```

Install SageAttention (v2.1.1):

pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl

Install Requirements:
```
pip install -r requirements.txt
```

Option D: Windows for RTX 40XX & 50XX (Standard)

Target: PyTorch 2.7.1 | CUDA 12.8 | SageAttention 2.2.0

For modern cards, we upgrade to PyTorch 2.7.1 to leverage CUDA 12.8 features.

Install PyTorch:

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128

Install Triton:
```
pip install -U "triton-windows<3.4"
```

Install SageAttention (v2.2.0):

pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows/sageattention-2.2.0+cu128torch2.7.1-cp310-cp310-win_amd64.whl

Install Requirements:
```
pip install -r requirements.txt
```

Option E: Windows for RTX 50XX (NV FP4 Optimized)

Target: PyTorch 2.9.1 | CUDA 13.0

Note: This is an experimental setup specifically for using NV FP4 optimized kernels on RTX 50-series cards. PyTorch 2.9.1 is bleeding-edge; generally, stick to Option D unless you specifically need these kernels.

Install PyTorch:

pip install torch==2.9.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Install Triton:
```
pip install -U "triton-windows<3.4"
```

Install SageAttention:

pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows.post4/sageattention-2.2.0+cu130torch2.9.0andhigher.post4-cp39-abi3-win_amd64.whl

Install Requirements:
```
pip install -r requirements.txt
```

🐧 Step 3: Choose Your Installation Path (Linux)

For our Linux users, the process is very similar, though we often build SageAttention from source or use standard pip packages rather than Windows-specific wheels.

Option A: Linux for GTX 10XX - 16XX

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Option B: Linux for RTX 20XX / Quadro

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -U "triton<3.3"
pip install sageattention==1.0.6
pip install -r requirements.txt

Option C: Linux for RTX 30XX

We compile SageAttention from source to ensure compatibility.

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -U "triton<3.3"

# Reinstall setuptools to avoid build issues, then build SageAttention
python -m pip install "setuptools<=75.8.2" --force-reinstall
git clone https://github.com/thu-ml/SageAttention
cd SageAttention 
pip install -e .
cd ..

pip install -r requirements.txt

Option D: Linux for RTX 40XX & 50XX

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -U "triton<3.4"

# Build SageAttention
python -m pip install "setuptools<=75.8.2" --force-reinstall
git clone https://github.com/thu-ml/SageAttention
cd SageAttention 
pip install -e .
cd ..

pip install -r requirements.txt

⚡ Performance Optimization & Configuration

Once installed, there are several ways to tune Wan2GP for your specific hardware.

Attention Modes

The choice of attention mechanism significantly impacts inference speed.

SDPA (default): Standard PyTorch attention. Reliable and compatible with everything.
Sage: Offers a ~30% speed boost with a negligible cost to quality.
Sage2: Offers a ~40% speed boost.
Flash Attention: Excellent performance, though installation on Windows can be complex.

Compatibility Cheat Sheet:

RTX 10XX: SDPA only.
RTX 20XX: SDPA, Sage1.
RTX 30XX/40XX: SDPA, Flash, Sage1, Sage2/Sage2++.
RTX 50XX: All of the above plus Sage3.

Performance Profiles (RAM/VRAM Usage)

You can select profiles to manage how the model is loaded:

Profile 3 (LowRAM_HighVRAM): Loads the entire model into VRAM. Best for speed, but requires substantial VRAM (e.g., 24GB for an 8-bit 14B model).
Profile 4 (LowRAM_LowVRAM): The default setting. Loads model parts dynamically. It is slower but allows running larger models on GPUs with less VRAM.

Optional: Flash Attention

If you wish to use Flash Attention:

Windows:

pip install https://github.com/Redtash1/Flash_Attention_2_Windows/releases/download/v2.7.0-v2.7.4/flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl

Linux:
```
pip install flash-attn==2.7.2.post1
```

🧪 Advanced: Optimized INT4 / FP4 Kernels (RTX 50XX Only)

For users with RTX 50-series (SM120+) GPUs, there are specialized kernels available for INT4/FP4 dequantization. These are highly experimental and hardware-dependent.

Light2xv NVP4 Kernels

Requires Python 3.10, PyTorch 2.9.1, and CUDA 13.

Windows: Download Wheel
Linux: Download Wheel

Nunchaku INT4/FP4 Kernels

Available for both PyTorch 2.7.1 and 2.9.1.

Windows (PT 2.7.1): pip install .../nunchaku-1.2.0+torch2.7-cp310-cp310-win_amd64.whl
Linux (PT 2.7.1): pip install .../nunchaku-1.2.0+torch2.7-cp310-cp310-linux_x86_64.whl

(Please refer to the original repository for the full list of Nunchaku download links).

❓ Troubleshooting

If you encounter issues, here are a few quick tips:

Sage Attention Errors:
- Ensure Triton is installed correctly.
- Try clearing the Triton cache.
- If all else fails, force the standard attention mode:
```
python wgp.py --attention sdpa
```
Out of Memory (OOM):
- Try lowering the generation resolution or video length.
- Ensure quantization is enabled (default).
- Switch to Profile 4 to prioritize VRAM savings.
- Consider using the 1.3B parameter model instead of the larger 14B model.

I hope this guide helps you get Wan2GP running on your machine! It is a powerful tool, and with the right setup, you can achieve impressive performance across a wide range of hardware.

Happy generating!

Getting Started with WanGP

Original link: Manual Installation Guide For Windows & Linux

A Comprehensive Manual Installation Guide for Wan2GP: Windows & Linux

🛠️ Prerequisites

System Requirements

Essential Software

🚀 Step 1: Repository Setup & Environment Creation

🖥️ Step 2: Choose Your Installation Path (Windows)

Option A: Windows for GTX 10XX - 16XX

Option B: Windows for RTX 20XX / Quadro

Option C: Windows for RTX 30XX

Option D: Windows for RTX 40XX & 50XX (Standard)

Option E: Windows for RTX 50XX (NV FP4 Optimized)

🐧 Step 3: Choose Your Installation Path (Linux)

Option A: Linux for GTX 10XX - 16XX

Option B: Linux for RTX 20XX / Quadro

Option C: Linux for RTX 30XX

Option D: Linux for RTX 40XX & 50XX

⚡ Performance Optimization & Configuration

Attention Modes

Performance Profiles (RAM/VRAM Usage)

Optional: Flash Attention

🧪 Advanced: Optimized INT4 / FP4 Kernels (RTX 50XX Only)

Light2xv NVP4 Kernels

Nunchaku INT4/FP4 Kernels

❓ Troubleshooting

Read More