release hunyuan3d-dit-v2-0-fast

Tencent · Feb 3, 2025 · 3207473 · 3207473
1 parent 9cc9cf7
commit 3207473
Show file tree

Hide file tree

Showing 4 changed files with 78 additions and 34 deletions.
diff --git a/README.md b/README.md
@@ -25,10 +25,6 @@
 
 <br>
 
-> From Hunyuan3D Team: Happy New Year!
-
-![happynewyear](https://github.com/user-attachments/assets/69aa40a7-8657-4c2b-8efd-99eda6c26fe4)
-
 
 > Join our **[Wechat](#)** and **[Discord](https://discord.gg/GuaWYwzKbX)** group to discuss and find help from us.
 
@@ -44,12 +40,18 @@
 
 ## 🔥 News
 
+- Feb 3, 2025: 🐎
+  Release [Hunyuan3D-DiT-v2-0-Fast](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-dit-v2-0-fast), our
+  guidance distillation model that could half the dit inference time, see [here](minimal_demo.py) for usage.
 - Jan 27, 2025: 🛠️ Release Blender addon for Hunyuan3D 2.0, Check it out [here](#blender-addon).
-- Jan 23, 2025: 💬 We thank community members for creating [Windows installation tool](https://github.com/YanWenKun/Hunyuan3D-2-WinPortable), ComfyUI support with [ComfyUI-Hunyuan3DWrapper](https://github.com/kijai/ComfyUI-Hunyuan3DWrapper) and [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack) and other awesome [extensions](#community-resources).
+- Jan 23, 2025: 💬 We thank community members for
+  creating [Windows installation tool](https://github.com/YanWenKun/Hunyuan3D-2-WinPortable), ComfyUI support
+  with [ComfyUI-Hunyuan3DWrapper](https://github.com/kijai/ComfyUI-Hunyuan3DWrapper)
+  and [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack) and other
+  awesome [extensions](#community-resources).
 - Jan 21, 2025: 💬 Enjoy exciting 3D generation on our website [Hunyuan3D Studio](https://3d.hunyuan.tencent.com)!
 - Jan 21, 2025: 🤗 Release inference code and pretrained models
-  of [Hunyuan3D 2.0](https://huggingface.co/tencent/Hunyuan3D-2).
-- Jan 21, 2025: 🤗 Release Hunyuan3D 2.0. Please give it a try
+  of [Hunyuan3D 2.0](https://huggingface.co/tencent/Hunyuan3D-2). Please give it a try
   via [huggingface space](https://huggingface.co/spaces/tencent/Hunyuan3D-2) and
   our [official site](https://3d.hunyuan.tencent.com)!
 
@@ -103,6 +105,7 @@ and the condition following ability.
 
 Generation results of Hunyuan3D 2.0:
 <p align="left">
+ <img src="https://github.com/user-attachments/assets/69aa40a7-8657-4c2b-8efd-99eda6c26fe4"  height=250>
   <img src="assets/images/e2e-1.gif"  height=250>
   <img src="assets/images/e2e-2.gif"  height=250>
 </p>
@@ -185,12 +188,15 @@ python3 gradio_app.py
 
 ### API Server
 
-You could launch an API server locally, which you could post web request for Image/Text to 3D, Texturing existing mesh, and e.t.c.
+You could launch an API server locally, which you could post web request for Image/Text to 3D, Texturing existing mesh,
+and e.t.c.
 
 ```bash
 python api_server.py --host 0.0.0.0 --port 8080
 ```
+
 A demo post request for image to 3D without texture.
+
 ```bash
 img_b64_str=$(base64 -i assets/demo.png)
 curl -X POST "http://localhost:8080/generate" \
@@ -203,19 +209,15 @@ curl -X POST "http://localhost:8080/generate" \
 
 ### Blender Addon
 
-With an API server launched, you could also directly use Hunyuan3D 2.0 in your blender with our [Blender Addon](blender_addon.py). Please follow our tutorial to install and use.
-
+With an API server launched, you could also directly use Hunyuan3D 2.0 in your blender with
+our [Blender Addon](blender_addon.py). Please follow our tutorial to install and use.
 
 https://github.com/user-attachments/assets/8230bfb5-32b1-4e48-91f4-a977c54a4f3e
 
-
-
-
 ### Official Site
 
 Don't forget to visit [Hunyuan3D](https://3d.hunyuan.tencent.com) for quick use, if you don't want to host yourself.
 
-
 ## 📑 Open-Source Plan
 
 - [x] Inference Code
@@ -252,13 +254,12 @@ If you found this repository helpful, please cite our reports:
 
 Thanks for the contributions of community members, here we have these great extensions of Hunyuan3D 2.0:
 
-- [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack) 
+- [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack)
 - [ComfyUI-Hunyuan3DWrapper](https://github.com/kijai/ComfyUI-Hunyuan3DWrapper)
 - [Hunyuan3D-2-for-windows](https://github.com/sdbds/Hunyuan3D-2-for-windows)
 - [📦 A bundle for running on Windows | 整合包](https://github.com/YanWenKun/Hunyuan3D-2-WinPortable)
 - [Hunyuan3D-2GP](https://github.com/deepbeepmeep/Hunyuan3D-2GP)
 
-
 ## Acknowledgements
 
 We would like to thank the contributors to

diff --git a/hy3dgen/shapegen/models/hunyuan3ddit.py b/hy3dgen/shapegen/models/hunyuan3ddit.py
@@ -287,6 +287,7 @@ def __init__(
         theta: int = 10_000,
         qkv_bias: bool = True,
         time_factor: float = 1000,
+        guidance_embed: bool = False,
         ckpt_path: Optional[str] = None,
         **kwargs,
     ):
@@ -303,6 +304,7 @@ def __init__(
         self.qkv_bias = qkv_bias
         self.time_factor = time_factor
         self.out_channels = self.in_channels
+        self.guidance_embed = guidance_embed
 
         if hidden_size % num_heads != 0:
             raise ValueError(
@@ -316,6 +318,9 @@ def __init__(
         self.latent_in = nn.Linear(self.in_channels, self.hidden_size, bias=True)
         self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size)
         self.cond_in = nn.Linear(context_in_dim, self.hidden_size)
+        self.guidance_in = (
+            MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if guidance_embed else nn.Identity()
+        )
 
         self.double_blocks = nn.ModuleList(
             [
@@ -374,7 +379,14 @@ def forward(
     ) -> Tensor:
         cond = contexts['main']
         latent = self.latent_in(x)
+
         vec = self.time_in(timestep_embedding(t, 256, self.time_factor).to(dtype=latent.dtype))
+        if self.guidance_embed:
+            guidance = kwargs.get('guidance', None)
+            if guidance is None:
+                raise ValueError("Didn't get guidance strength for guidance distilled model.")
+            vec = vec + self.guidance_in(timestep_embedding(guidance, 256, self.time_factor))
+
         cond = self.cond_in(cond)
         pe = None
 

diff --git a/hy3dgen/shapegen/pipelines.py b/hy3dgen/shapegen/pipelines.py
@@ -142,18 +142,21 @@ def from_single_file(
         config_path,
         device='cuda',
         dtype=torch.float16,
+        use_safetensors=None,
         **kwargs,
     ):
         # load config
         with open(config_path, 'r') as f:
             config = yaml.safe_load(f)
 
         # load ckpt
+        if use_safetensors:
+            ckpt_path = ckpt_path.replace('.ckpt', '.safetensors')
         if not os.path.exists(ckpt_path):
             raise FileNotFoundError(f"Model file {ckpt_path} not found")
         logger.info(f"Loading model from {ckpt_path}")
 
-        if ckpt_path.endswith('.safetensors'):
+        if use_safetensors:
             # parse safetensors
             import safetensors.torch
             safetensors_ckpt = safetensors.torch.load_file(ckpt_path, device='cpu')
@@ -165,8 +168,7 @@ def from_single_file(
                     ckpt[model_name] = {}
                 ckpt[model_name][new_key] = value
         else:
-            ckpt = torch.load(ckpt_path, map_location='cpu', weights_only=True)
-
+            ckpt = torch.load(ckpt_path, map_location='cpu')
         # load model
         model = instantiate_from_config(config['model'])
         model.load_state_dict(ckpt['model'])
@@ -205,21 +207,25 @@ def from_pretrained(
         **kwargs,
     ):
         original_model_path = model_path
+        # try local path
+        base_dir = os.environ.get('HY3DGEN_MODELS', '~/.cache/hy3dgen')
+        model_path = os.path.expanduser(os.path.join(base_dir, model_path, subfolder))
+        print('Try to load model from local path:', model_path)
         if not os.path.exists(model_path):
-            # try local path
-            base_dir = os.environ.get('HY3DGEN_MODELS', '~/.cache/hy3dgen')
-            model_path = os.path.expanduser(os.path.join(base_dir, model_path, subfolder))
-            if not os.path.exists(model_path):
-                try:
-                    import huggingface_hub
-                    # download from huggingface
-                    path = huggingface_hub.snapshot_download(repo_id=original_model_path)
-                    model_path = os.path.join(path, subfolder)
-                except ImportError:
-                    logger.warning(
-                        "You need to install HuggingFace Hub to load models from the hub."
-                    )
-                    raise RuntimeError(f"Model path {model_path} not found")
+            print('Model path not exists, try to download from huggingface')
+            try:
+                import huggingface_hub
+                # download from huggingface
+                path = huggingface_hub.snapshot_download(repo_id=original_model_path)
+                model_path = os.path.join(path, subfolder)
+            except ImportError:
+                logger.warning(
+                    "You need to install HuggingFace Hub to load models from the hub."
+                )
+                raise RuntimeError(f"Model path {model_path} not found")
+            except Exception as e:
+                raise e
+
         if not os.path.exists(model_path):
             raise FileNotFoundError(f"Model path {original_model_path} not found")
 
@@ -554,6 +560,7 @@ def __call__(
         if hasattr(self.model, 'guidance_embed') and \
             self.model.guidance_embed is True:
             guidance = torch.tensor([guidance_scale] * batch_size, device=device, dtype=dtype)
+            print(f'Using guidance embed with scale {guidance_scale}')
 
         for i, t in enumerate(tqdm(timesteps, disable=not enable_pbar, desc="Diffusion Sampling:")):
             # expand the latents if we are doing classifier free guidance

diff --git a/minimal_demo.py b/minimal_demo.py
@@ -73,6 +73,30 @@ def text_to_3d(prompt='a car'):
     mesh.export('t2i_demo.glb')
 
 
+def image_to_3d_fast(image_path='assets/demo.png'):
+    rembg = BackgroundRemover()
+    model_path = 'tencent/Hunyuan3D-2'
+
+    image = Image.open(image_path)
+
+    if image.mode == 'RGB':
+        image = rembg(image)
+
+    pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(
+        model_path,
+        subfolder='hunyuan3d-dit-v2-0-fast',
+        variant='fp16'
+    )
+
+    mesh = pipeline(image=image, num_inference_steps=30, mc_algo='mc',
+                    generator=torch.manual_seed(2025))[0]
+    mesh = FloaterRemover()(mesh)
+    mesh = DegenerateFaceRemover()(mesh)
+    mesh = FaceReducer()(mesh)
+    mesh.export('mesh.glb')
+
+
 if __name__ == '__main__':
-    image_to_3d()
+    image_to_3d_fast()
+    # image_to_3d()
     # text_to_3d()