End2End-Diffusion/diffusion-bench: Towards Holistic evaluation of Generative Diffusion Transformers! 路 GitHub

💥 Check out this must-read post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

##############################################################################
#                                                                            #
#   ____  _  __  __           _                            .-----------.     #
#  |  _ \(_)/ _|/ _|_   _ ___(_) ___  _ __                 |           |     #
#  | | | | | |_| |_| | | / __| |/ _ \| '_ \                | ░▒▓█▓▒░▒▓ |     #
#  | |_| | |  _|  _| |_| \__ \ | (_) | | | |               | ▒▓█████▓▒ |     #
#  |____/|_|_| |_|  \__,_|___/_|\___/|_| |_|               | ▓███████▓ |     #
#                                                          |     ↓     |     #
#   ____                  _                                | █████████ |     #
#  | __ )  ___ _ __   ___| |__                             | ▓███████▓ |     #
#  |  _ \ / _ \ '_ \ / __| '_ \                            | ▒▓█████▓▒ |     #
#  | |_) |  __/ | | | (__| | | |                           |           |     #
#  |____/ \___|_| |_|\___|_| |_|                           '-----------'     #
#                                                                            #
#           Because ImageNet evaluation alone is no longer enough!           #
#                                                                            #
##############################################################################

Arxiv GitHub HuggingFace Discord Blog

📣 Announcement post: Call for DiffusionBench: A Holistic Benchmark for Diffusion Transformers. Help us grow the benchmark with new evaluation axes, new metrics, and faithful reproductions of published methods.

This repo contains the unified codebase for DiffusionBench. It supports training and evaluation across different generation tasks (ImageNet, T2I, …) through a single interface. Please see the sections below for the detailed structure. Come join us!

Qualitative results from DiffusionBench

Text-to-image samples at 256×256 from models trained for 200K iterations using DiffusionBench.

# install uv project manager (if you don't already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# install dependencies
uv sync

# prepare data
uv run python scripts/prepare.py --data 💬

# download pretrained models
uv run hf download diffusion-bench/diffusion-bench --local-dir pretrained_models --exclude .gitattributes

Reproduction flow: Stage 1 → Stage 2. Set these environment variables first (used for the output directory and W&B logging):

export EXPERIMENT_NAME=<run-name>
export ENTITY=<wandb-entity>
export PROJECT=<wandb-project>
export WANDB_KEY=<key>

Stage 1. Train the RAE tokenizer:

uv run torchrun --standalone --nproc_per_node=8 \
    src/train_stage1.py \
    --config [STAGE1_CONFIG_PATH] \
    --results-dir results/stage1 --precision bf16 --compile --wandb

Stage 2. Train the diffusion model on VAE/RAE/Pixel space:

uv run torchrun --standalone --nproc_per_node=8 \
    src/train.py \
    --config [STAGE2_CONFIG_PATH] \
    --results-dir results/stage2 --precision bf16 --compile --wandb

Stage 2 training configs run online evaluation during training (the eval: block). For standalone evaluation of a released checkpoint, use the sampling/ configs — each embeds stage_2.ckpt (pointing into pretrained_models/) and the eval-time guidance, so the weights load automatically:

export EXPERIMENT_NAME=<run-name>

# stage 1 reconstruction (rFID/PSNR/SSIM/LPIPS)
uv run torchrun --nproc_per_node=8 src/offline_eval_stage1.py --config [STAGE1_CONFIG_PATH]

# stage 2 generation (FID/IS, GenEval/DPGBench/...)
uv run torchrun --nproc_per_node=8 src/offline_eval.py --config [STAGE2_CONFIG_PATH]
configs/
├── stage1/
└── stage2/
    ├── training/
    │   ├── imagenet/
    │   └── t2i/
    └── sampling/
        ├── imagenet/
        └── t2i/

Stage 2 spans VAE (11), RAE (6), REG (4), and Pixel (3) families, identical across ImageNet and T2I. Swap any config between tasks with a single path change. The sampling/ set mirrors training/ but adds the trained checkpoint and eval-time guidance, so it runs offline eval directly.

For ImageNet, pick the CFG-off baseline ([STAGE2_CONFIG_PATH].yaml) or the per-model best-CFG variant ([STAGE2_CONFIG_PATH]-cfg-t0.0-0.9.yaml).

Category Methods
Latent Space Pixel Space
RAE (30+ representation encoders): DINOv2 SigLIP2 WebSSL PE LangPE and more
RAEv2 (30+ representation encoders): DINOv2 SigLIP2 WebSSL PE LangPE etc
VAE (10+ VAEs): FLUX.2 FLUX.1 SD3.5 VA-VAE E2E-VAE and more
Output Prediction x-prediction v-prediction
Transport Rectified-Flow MeanFlow Improved-MeanFlow Pixel-MeanFlow Drifting
Loss Flow Matching REPA iREPA
Architecture LightningDiT JiT DDT
Tasks ImageNet: class-conditional generation
T2I: text-to-image generation
Evaluation ImageNet: FID IS
T2I: GenEval DPGBench GenAIBench VQAScore
Training Backend DDP FSDP [TODO]

Status Details
Coding Agents Yes Agent-compatible. See skills/ for setup and workflow skills.
AutoResearch [TODO] AutoResearch integration is planned (not yet available).

We welcome contributions! Please refer to docs/contributors.md and docs/contributing.md for further details.

The codebase is built upon some amazing projects:

We thank the authors for making their work publicly available.

⚡ **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#End2EndDiffusiondiffusionbench #Holistic #evaluation #Generative #Diffusion #Transformers #GitHub**

🕒 **Posted on**: 1782271518

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *