Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DuckDB reserved words aren't accurately listed in IdentifierPreparer, leading to synax errors. #1190

Open
1 task done
asford opened this issue Dec 31, 2024 · 5 comments · Fixed by #1199 · May be fixed by #1213
Open
1 task done

[Bug]: DuckDB reserved words aren't accurately listed in IdentifierPreparer, leading to synax errors. #1190

asford opened this issue Dec 31, 2024 · 5 comments · Fixed by #1199 · May be fixed by #1213
Assignees
Labels
bug Something isn't working

Comments

@asford
Copy link

asford commented Dec 31, 2024

What happened?

duckdb_engine inherits it's IdentifierPreparer from the postgres dialect, which defines a fixed set of reserved keywords.
Postgres's set of reserved keywords is slightly different than the sql standard and duckdb (ref), and has more non-reserved keywords.

DuckDB's reserved keyworlds are more expansive and are accessible under duckdb_keywords.

If a column name uses a duckdb-reserved but postgres-unreserved keyword as a column name, e.g. "show",
then sqlalchemy won't properly quote the identifier and a semi-opaque syntax error is returned.

This can be patched at runtime via:

duckdb_keywords = set(
    duckdb.execute(
        "select keyword_name from duckdb_keywords() where keyword_category == 'reserved'"
    ).df()["keyword_name"]
)

duckdb_engine.DuckDBIdentifierPreparer.reserved_words = duckdb_keyword

This could (should?) be moved into duckdb_engine.

DuckDB Engine Version

0.14.0

DuckDB Version

1.0

SQLAlchemy Version

1.4.41

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@asford asford added the bug Something isn't working label Dec 31, 2024
@Mause
Copy link
Owner

Mause commented Jan 10, 2025

Hey there, thanks for reporting this! I've released a fix in 0.14.2

Feel free to let me know if you find further issues!

@johnallen3d
Copy link

@Mause - I think this change is causing issues in applications using some form of concurrency. In our case we started experiencing crashes of Gunicorn workers in Superset (using stock configuration) when using duckdb-engine version 0.14.2.

apache/superset#31978 (comment)

@Mause
Copy link
Owner

Mause commented Jan 29, 2025

Well that isn't good - do you have a code sample I could use to reproduce this @johnallen3d ?

@Mause Mause reopened this Jan 29, 2025
@johnallen3d
Copy link

I have minimal Python experience @Mause but I've managed to recreate what I think is happening within these Superset/Gunicorn requests in a simple script. This causes crashes in the majority of execution:

import threading
from sqlalchemy import create_engine, text

_thread_local = threading.local()

def get_engine():
    if not hasattr(_thread_local, "engine"):
        _thread_local.engine = create_engine("duckdb:///:memory:")
    return _thread_local.engine

def query():
    engine = get_engine()
    with engine.connect() as conn:
        result = conn.execute(text("SELECT version();"))
        print(result.scalar())

threads = [threading.Thread(target=query) for _ in range(10)]

for t in threads:
    t.start()

for t in threads:
    t.join()

If I directly modify the code inside of my local duckdb-engine __init__.py to use duckdb.cursor().execute() then this script works fine. I'm not sure if cursor() is technically the correct usage though.

@asford
Copy link
Author

asford commented Jan 30, 2025

Oof, sorry for the bad beta on that.

The duckdb docs on multi threaded operations specifically calls out that different threads need to use .cursor to open separate handles to the shared db.

https://duckdb.org/docs/guides/python/multiple_threads.html

So this solution is likely correct & safe. The content returned here doesn't depend on the database, so there's no problem with executing against the default db.

johnallen3d added a commit to johnallen3d/duckdb_engine that referenced this issue Jan 30, 2025
@johnallen3d johnallen3d linked a pull request Jan 30, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants