-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Dune Table Create/Insert #129
base: main
Are you sure you want to change the base?
Conversation
8ef35c4
to
a193538
Compare
There was an inconsistency with the types returned from postgres and those being passed in. PG.fetch returned `DataFrame` but PG.save expected `TypedDataFrame` Resulted in a bug where data.is_empty did not exist. This PR aligns the types naively, but this will all be improved in a follow up #129 (which assigns proper/non-empty types) to Postgres dataframes.
a193538
to
5b956bf
Compare
So, this has been a draft for a while because of some big changes it contains. Namely we utilize the create-table and insert-data endpoints documented here: https://docs.dune.com/api-reference/tables/endpoint/overview Technically, this already provides an equivalent replacement for upload csv with the added benefit of having types. However, naively calling create table before insert data wipes the table (this is what makes it equivalent to upload csv). Dune doesn't offer an explicit endpoint to check if a table exists. We could make a cheeky work-around that queries cc @fleupold |
Could you outline how this new mechanism would be used in a config file? I feel like the semantics should be similar to the current postgres datasource, where we also have optional replace logic. |
Update: Dune's API docs are not quite accurate.
Reworking some of the logic to invoke
We could resolve this by introducing some polling/wait for table existence/non-existence before continue. However, this seems like a hack. After polling for existence we find a further inconsistency regarding false positive of table existence. That is:
Its beginning to feel like we may want to mark this feature as "unstable"... |
if "." not in table_name: | ||
raise ValueError("Table name must be in the format namespace.table_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the end to end test is expected to fail because it is reading a remote file from main. Once this PR lands. The file will be updated and the e2e test will pass again.
One idea that could simplify this dramatically is to leave the user in charge of "create_table". Then
This would be aligned the postgres functionality and eliminate all this create/delete logic. Note that, if the table schema changes, the user would have to call "delete" and "create" again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big stuff, cool stuff! 🎉
Using Dune Table Endpoints
create
andinsert
in favour ofupload_csv
brings type-consistency to the data columns.We introduce
TableExistsPolicy
to Dune Destinations with support forappend
andreplace
. This logic is hinged on the "skip_create" function which checks if "append" is the desired insertion type and "table_exists". Table existence is determined by querying the table. When it does not exist, the query execution fails (this error is caught and false is returned).Closes #127
Closes #69
Special Notes
The scope of Dune types supported is limited to the following:
https://docs.dune.com/api-reference/tables/endpoint/create#body-schema-type
This means that JSON document upload would have to utilize
upload_csv
...Test Plan
Tests are modified to adapt to the new constraints (e.g. table_name must be in the form username.table_name)
New tests introduced covering the new code paths.
This can be tested end to end with
python -m src.main --jobs p2d-test
. See the resulting table here - it is being appended to.