Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Dune Table Create/Insert #129

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

[WIP] Dune Table Create/Insert #129

wants to merge 8 commits into from

Conversation

bh2smith
Copy link
Owner

@bh2smith bh2smith commented Dec 21, 2024

Using Dune Table Endpoints create and insert in favour of upload_csv brings type-consistency to the data columns.

We introduce TableExistsPolicy to Dune Destinations with support for append and replace. This logic is hinged on the "skip_create" function which checks if "append" is the desired insertion type and "table_exists". Table existence is determined by querying the table. When it does not exist, the query execution fails (this error is caught and false is returned).

Closes #127

Closes #69

Special Notes

The scope of Dune types supported is limited to the following:
https://docs.dune.com/api-reference/tables/endpoint/create#body-schema-type

This means that JSON document upload would have to utilize upload_csv...

Test Plan

  1. Tests are modified to adapt to the new constraints (e.g. table_name must be in the form username.table_name)

  2. New tests introduced covering the new code paths.

  3. This can be tested end to end with python -m src.main --jobs p2d-test. See the resulting table here - it is being appended to.

@bh2smith bh2smith changed the title Dune Table Create/Insert [WIP] Dune Table Create/Insert Dec 21, 2024
@bh2smith bh2smith force-pushed the 127/dune-insert branch 2 times, most recently from 8ef35c4 to a193538 Compare December 22, 2024 01:20
@bh2smith bh2smith requested a review from mooster531 December 22, 2024 01:21
bh2smith added a commit that referenced this pull request Dec 25, 2024
There was an inconsistency with the types returned from postgres and
those being passed in.

PG.fetch returned `DataFrame`
but 
PG.save expected `TypedDataFrame`

Resulted in a bug where data.is_empty did not exist. 

This PR aligns the types naively, but this will all be improved in a
follow up #129 (which assigns proper/non-empty types) to Postgres
dataframes.
@bh2smith
Copy link
Owner Author

bh2smith commented Jan 7, 2025

So, this has been a draft for a while because of some big changes it contains. Namely we utilize the create-table and insert-data endpoints documented here: https://docs.dune.com/api-reference/tables/endpoint/overview

Technically, this already provides an equivalent replacement for upload csv with the added benefit of having types.

However, naively calling create table before insert data wipes the table (this is what makes it equivalent to upload csv). Dune doesn't offer an explicit endpoint to check if a table exists. We could make a cheeky work-around that queries select count(*) from table_name to check for existence.

cc @fleupold

@fleupold
Copy link
Contributor

fleupold commented Jan 8, 2025

Could you outline how this new mechanism would be used in a config file?

I feel like the semantics should be similar to the current postgres datasource, where we also have optional replace logic.

@bh2smith
Copy link
Owner Author

bh2smith commented Jan 8, 2025

Could you outline how this new mechanism would be used in a config file?

I would assume that the config file would remain essentially the same except that now the parameters "append" "replace" (analogous to those for postgres) would now work for Dune Destinations. However, I don't expect we will get the specialized insert_ignore, insert_update to come along with it. We would no longer need upload_csv because this functionality is covered by "replace".

These would work as follows:

  • append: check for table existence with cheeky workaround (select from table) to check for existence and only call insert.
  • replace: call create (dropping the old table) and then insert.

This is all based on the following Dune API fact:

"create" endpoint has this property: "If a table already exists with the same name, the request will fail."

Screenshot 2025-01-08 at 13 39 43

@bh2smith bh2smith marked this pull request as ready for review January 11, 2025 11:17
@bh2smith
Copy link
Owner Author

bh2smith commented Jan 11, 2025

Update: Dune's API docs are not quite accurate.
An attempt to Create Table when Exists will return

  1. {'error': 'This table already exists'} when the schema declaration is different from the existing.
  2. {'already_existed': True, 'message': 'Table already existed and matched the request'} when the schema matches.

Reworking some of the logic to invoke delete_table reveals that there are some latency issues on the Dune side. Namely.

  • table_exists (determined by querying the table name) will still yield true for a while after the table has been Deleted.
  • insert can't be called immediately after create table (presumably because the table's creation has not yet been propagated throughout the system).

We could resolve this by introducing some polling/wait for table existence/non-existence before continue. However, this seems like a hack.

After polling for existence we find a further inconsistency regarding false positive of table existence. That is:

  1. create table and wait for table_exists to be true.
  2. Attempt to insert, "table doesn't exist"...

Its beginning to feel like we may want to mark this feature as "unstable"...

Comment on lines +45 to +46
if "." not in table_name:
raise ValueError("Table name must be in the format namespace.table_name")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the end to end test is expected to fail because it is reading a remote file from main. Once this PR lands. The file will be updated and the e2e test will pass again.

@bh2smith
Copy link
Owner Author

One idea that could simplify this dramatically is to leave the user in charge of "create_table". Then

  • append would merely call insert.
  • replace would call "clear_data" followed by insert.

This would be aligned the postgres functionality and eliminate all this create/delete logic.

Note that, if the table schema changes, the user would have to call "delete" and "create" again.

Copy link
Collaborator

@mooster531 mooster531 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big stuff, cool stuff! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent type inference Dune Append Seems Possible
3 participants