Import `xgrammar` slowly, which seems to be compiling kernels each time #253

hnyls2002 · 2025-03-19T01:28:32Z

When just importing xgrammar, it takes about 60s.

And the importing seems to be stuck at the kernel compiling stage.

  File "/venv/lib/python3.8/site-packages/xgrammar/__init__.py", line 1, in <module>
    from . import testing
  File "/venv/lib/python3.8/site-packages/xgrammar/testing.py", line 11, in <module>
    from .matcher import GrammarMatcher, bitmask_dtype
  File "/venv/lib/python3.8/site-packages/xgrammar/matcher.py", line 13, in <module>
    from .kernels import apply_token_bitmask_inplace_kernels
  File "/venv/lib/python3.8/site-packages/xgrammar/kernels/__init__.py", line 12, in <module>
    from .apply_token_bitmask_inplace_cuda import apply_token_bitmask_inplace_cuda
  File "/venv/lib/python3.8/site-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 54, in <module>
    _load_torch_ops()
  File "/venv/lib/python3.8/site-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 42, in _load_torch_ops
    torch.utils.cpp_extension.load_inline(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1434, in load_inline
    return _jit_compile(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/venv/lib/python3.8/subprocess.py", line 495, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/venv/lib/python3.8/subprocess.py", line 1015, in communicate
    stdout = self.stdout.read()
KeyboardInterrupt

The text was updated successfully, but these errors were encountered:

abhiaagarwal · 2025-04-04T22:22:38Z

Seems related to #292. I would really like the ability to precompile this extension, as it can be painful inside a docker container.

Ubospica · 2025-04-13T11:16:12Z

I think the latest version should have solved this by setting the triton kernel (instead of the torch extension) as the default. @abhiaagarwal @hnyls2002 Could you try if the compilation speed is still slow now?

This kernel is not be precompiled because xgrammar need to support a large range of backends: https://pypi.org/project/xgrammar/#files. Adding cuda version might make the packaging too complex for now. We will consider this feature in the future.

hnyls2002 · 2025-04-18T09:58:02Z

Sure, I have switched to triton kernel in my fork, but even with triton kernel, the first calling would cost me about 1 minute. I have no idea what happens in the calling process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import `xgrammar` slowly, which seems to be compiling kernels each time #253

Import `xgrammar` slowly, which seems to be compiling kernels each time #253

hnyls2002 commented Mar 19, 2025

abhiaagarwal commented Apr 4, 2025

Ubospica commented Apr 13, 2025 •

edited

Loading

hnyls2002 commented Apr 18, 2025

Import xgrammar slowly, which seems to be compiling kernels each time #253

Import xgrammar slowly, which seems to be compiling kernels each time #253

Comments

hnyls2002 commented Mar 19, 2025

abhiaagarwal commented Apr 4, 2025

Ubospica commented Apr 13, 2025 • edited Loading

hnyls2002 commented Apr 18, 2025

Import `xgrammar` slowly, which seems to be compiling kernels each time #253

Import `xgrammar` slowly, which seems to be compiling kernels each time #253

Ubospica commented Apr 13, 2025 •

edited

Loading