You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this is a bug or intended behavior.
JSON specification forbids control characters (Unicode characters U+0000 to U+001f) in strings. For example, this is not a valid JSON:
{"text": "\tab char is illegal here"}
A grammar compiled using GrammarCompiler.compile_builtin_json_grammar() correctly rejects this input. But a grammar compiled with GrammarCompiler.compile_json_schema() accepts it as a valid JSON.
Standalone code snippet, tested with xgrammar-0.1.17:
importxgrammarfromtransformersimportAutoTokenizer, AutoConfigINVALID_INPUT='{"text": "\tab char is illegal here"}'defcheck_if_rejects_invalid_json(grammar: xgrammar.CompiledGrammar) ->None:
matcher=xgrammar.GrammarMatcher(grammar, terminate_without_stop_token=True)
assertnotmatcher._debug_accept_string(INVALID_INPUT, debug_print=True)
if__name__=="__main__":
tokenizer=AutoTokenizer.from_pretrained("google-t5/t5-small")
tokenizer_info=xgrammar.TokenizerInfo.from_huggingface(tokenizer)
grammar_compiler=xgrammar.GrammarCompiler(tokenizer_info)
# The builtin JSON grammar correctly rejects the INVALID_INPUTbuiltin_json_grammar=grammar_compiler.compile_builtin_json_grammar()
check_if_rejects_invalid_json(builtin_json_grammar)
# prints: [11:13:09] /Users/runner/work/xgrammar/xgrammar/cpp/grammar_matcher_base.cc:301: Character 9 "\t" Rejected# A grammar compiled from a JSON schema accepts itjson_schema_grammar=grammar_compiler.compile_json_schema(
{
"type": "object",
"properties": {
"text": {"type": "string"},
},
"required": ["text"],
}
)
check_if_rejects_invalid_json(json_schema_grammar)
# raises AssertionError
The text was updated successfully, but these errors were encountered:
I'm not sure if this is a bug or intended behavior.
JSON specification forbids control characters (Unicode characters U+0000 to U+001f) in strings. For example, this is not a valid JSON:
A grammar compiled using
GrammarCompiler.compile_builtin_json_grammar()
correctly rejects this input. But a grammar compiled withGrammarCompiler.compile_json_schema()
accepts it as a valid JSON.Standalone code snippet, tested with xgrammar-0.1.17:
The text was updated successfully, but these errors were encountered: