Skip to content

Import xgrammar slowly, which seems to be compiling kernels each time #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hnyls2002 opened this issue Mar 19, 2025 · 3 comments
Open

Comments

@hnyls2002
Copy link
Collaborator

When just importing xgrammar, it takes about 60s.

And the importing seems to be stuck at the kernel compiling stage.

  File "/venv/lib/python3.8/site-packages/xgrammar/__init__.py", line 1, in <module>
    from . import testing
  File "/venv/lib/python3.8/site-packages/xgrammar/testing.py", line 11, in <module>
    from .matcher import GrammarMatcher, bitmask_dtype
  File "/venv/lib/python3.8/site-packages/xgrammar/matcher.py", line 13, in <module>
    from .kernels import apply_token_bitmask_inplace_kernels
  File "/venv/lib/python3.8/site-packages/xgrammar/kernels/__init__.py", line 12, in <module>
    from .apply_token_bitmask_inplace_cuda import apply_token_bitmask_inplace_cuda
  File "/venv/lib/python3.8/site-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 54, in <module>
    _load_torch_ops()
  File "/venv/lib/python3.8/site-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 42, in _load_torch_ops
    torch.utils.cpp_extension.load_inline(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1434, in load_inline
    return _jit_compile(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/venv/lib/python3.8/subprocess.py", line 495, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/venv/lib/python3.8/subprocess.py", line 1015, in communicate
    stdout = self.stdout.read()
KeyboardInterrupt
@abhiaagarwal
Copy link

Seems related to #292. I would really like the ability to precompile this extension, as it can be painful inside a docker container.

@Ubospica
Copy link
Collaborator

Ubospica commented Apr 13, 2025

I think the latest version should have solved this by setting the triton kernel (instead of the torch extension) as the default. @abhiaagarwal @hnyls2002 Could you try if the compilation speed is still slow now?

This kernel is not be precompiled because xgrammar need to support a large range of backends: https://pypi.org/project/xgrammar/#files. Adding cuda version might make the packaging too complex for now. We will consider this feature in the future.

@hnyls2002
Copy link
Collaborator Author

Sure, I have switched to triton kernel in my fork, but even with triton kernel, the first calling would cost me about 1 minute. I have no idea what happens in the calling process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants