FLUX.2 launch

2025-11-25 07:25:25 -08:00
commit e80b84ed9f
24 changed files with 3238 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,142 @@
+# FLUX.2
+by Black Forest Labs: https://bfl.ai.
+
+Documentation for our API can be found here: [docs.bfl.ai](https://docs.bfl.ai/).
+
+This repo contains minimal inference code to run image generation & editing with our FLUX.2 open-weight models.
+
+## `FLUX.2 [dev]`
+
+`FLUX.2 [dev]` is a 32B parameter flow matching transformer model capable of generating and editing (multiple) images. The model is released under the [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV) and can be found [here](https://huggingface.co/black-forest-labs/FLUX.2-dev).
+
+Note that the below script for `FLUX.2 [dev]` needs considerable amount of VRAM (H100-equivalent GPU). We partnered with Hugging Face to make quantized versions that run on consumer hardware; below you can find instructions on how to run it on a RTX 4090 with a remote text encoder, for other quantization sizes and combinations, check the [diffusers quantization guide here](docs/flux2_dev_hf.md).
+
+### Text-to-image examples
+
+![t2i-grid](assets/teaser_generation.png)
+
+### Editing examples
+
+![edit-grid](assets/teaser_editing.png)
+
+### Prompt upsampling
+
+`FLUX.2 [dev]` benefits significantly from prompt upsampling. The inference script below offers the option to use both local prompt upsampling with the same model we use for text encoding ([`Mistral-Small-3.2-24B-Instruct-2506`](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506)), or alternatively, use any model on [OpenRouter](https://openrouter.ai/) via an API call.
+
+See the [upsampling guide](docs/flux2_with_prompt_upsampling.md) for additional details and guidance on when to use upsampling.
+
+## `FLUX.2` autoencoder
+
+The FLUX.2 autoencoder has considerably improved over the [FLUX.1 autoencoder](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors). The autoencoder is released under [Apache 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) and can be found [here](https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/ae.safetensors). For more information, see our [technical blogpost](https://bfl.ai/blog/flux-2).
+
+## Local installation
+
+The inference code was tested on GB200 and H100 (with CPU offloading).
+
+### GB200
+
+On GB200, we tested `FLUX.2 [dev]` using CUDA 12.9 and Python 3.12.
+
+```bash
+python3.12 -m venv .venv
+source .venv/bin/activate
+pip install -e . --extra-index-url https://download.pytorch.org/whl/cu129 --no-cache-dir
+```
+
+### H100
+
+On H100, we tested `FLUX.2 [dev]` using CUDA 12.6 and Python 3.10.
+
+```bash
+python3.10 -m venv .venv
+source .venv/bin/activate
+pip install -e . --extra-index-url https://download.pytorch.org/whl/cu126 --no-cache-dir
+```
+
+## Run the CLI
+
+Before running the CLI, you may download the weights from [here](https://huggingface.co/black-forest-labs/FLUX.2-dev) and set the following environment variables.
+
+```bash
+export FLUX2_MODEL_PATH="<flux2_path>"
+export AE_MODEL_PATH="<ae_path>"
+```
+
+If you don't set the environment variables, the weights will be downloaded
+automatically.
+
+You can start an interactive session with loaded weights by running the
+following command. That will allow you to do both text to image generation as
+well as editing one or multiple images.
+```bash
+export PYTHONPATH=src
+python scripts/cli.py
+```
+
+On H100, we additionally set the flag `--cpu_offloading True`.
+
+## Watermarking
+
+We've added an option to embed invisible watermarks directly into the generated images
+via the [invisible watermark library](https://github.com/ShieldMnt/invisible-watermark).
+
+Additionally, we are recommending implementing a solution to mark the metadata of your outputs, such as [C2PA](https://c2pa.org/)
+
+## 🧨 Lower VRAM diffusers example
+
+The below example should run on a RTX 4090.
+
+```python
+import torch
+from diffusers import Flux2Pipeline, Flux2Transformer2DModel
+from diffusers.utils import load_image
+from huggingface_hub import get_token
+import requests
+import io
+
+repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
+device = "cuda:0"
+torch_dtype = torch.bfloat16
+
+def remote_text_encoder(prompts):
+    response = requests.post(
+        "https://remote-text-encoder-flux-2.huggingface.co/predict",
+        json={"prompt": prompts},
+        headers={
+            "Authorization": f"Bearer {get_token()}",
+            "Content-Type": "application/json"
+        }
+    )
+    prompt_embeds = torch.load(io.BytesIO(response.content))
+
+    return prompt_embeds.to(device)
+
+pipe = Flux2Pipeline.from_pretrained(
+    repo_id, transformer=transformer, text_encoder=None, torch_dtype=torch_dtype
+).to(device)
+
+prompt = "Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text `BFL Diffusers` on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom."
+
+image = pipe(
+    prompt_embeds=remote_text_encoder(prompt),
+    #image=load_image("https://huggingface.co/spaces/zerogpu-aoti/FLUX.1-Kontext-Dev-fp8-dynamic/resolve/main/cat.png") #optional image input
+    generator=torch.Generator(device=device).manual_seed(42),
+    num_inference_steps=50, #28 steps can be a good trade-off
+    guidance_scale=4,
+).images[0]
+
+image.save("flux2_output.png")
+```
+
+## Citation
+
+If you find the provided code or models useful for your research, consider citing them as:
+
+```bib
+@misc{flux-2-2025,
+    author={Black Forest Labs},
+    title={{FLUX.2: State-of-the-Art Visual Intelligence}},
+    year={2025},
+    howpublished={\url{https://bfl.ai/blog/flux-2}},
+}
+```