Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-redshift-batch: Code health improvements #2432

Merged
merged 8 commits into from
Feb 24, 2025

Conversation

willdonnelly
Copy link
Member

@willdonnelly willdonnelly commented Feb 20, 2025

Description:

As with source-postgres-batch and source-mysql-batch, this PR is mostly a bunch of test suite improvements along with the change to omit query templates from discovered bindings by default in favor of letting the connector pick the appropriate default query when unspecified.

I've also added CI tests for source-redshift-batch which simply launch a local Postgres instance to test against. While Redshift is different from Postgres in some important ways, it's still close enough that our test suite passes when run against Postgres and if it's a choice between "no tests in CI builds" and "somewhat unrepresentative tests in CI builds" the latter is definitely preferable.


This change is Reviewable

As with all the other batch SQL captures, omit the default query
template and instead retcon that resource property as an "override",
meanwhile adding new schema/table name properties and using those
by default in discovered bindings.
Moves around and renames the test suite helper functions to
match what source-postgres-batch has. This adds the table name
generation functions but doesn't use them since that would alter
the test snapshots too.
This changes the names of all test tables but means less manual
fiddling around in the table setup, a net win.
Redshift is kind of PostgreSQL and kind of not (for instance,
Redshift has a completely different columnar storage engine and
doesn't expose an XMIN system column), but it's close enough that
our test suite passes when run against Postgres.

Since it's tricky to test against Redshift in CI builds, we might
as well do the next-best thing and test against a local Postgres
instance since practically speaking the alternative is no CI tests
at all.
@willdonnelly willdonnelly added the change:planned This is a planned change label Feb 20, 2025
@willdonnelly willdonnelly requested a review from a team February 20, 2025 21:54
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We do use an actual redshift (serverless) application for CI tests wtih materialize-redshift, see this config. Not sure if that's worth trying to work in for the redshift batch capture tests now or in the future.

@willdonnelly
Copy link
Member Author

willdonnelly commented Feb 21, 2025

We do use an actual redshift (serverless) application for CI tests wtih materialize-redshift

Oh thanks, somehow I missed that. I'll see if I can get that working for source-redshift-batch as well, testing against the actual system is definitely preferable if it's not terribly complicated.

Adds SOPS-encrypted configs and tweaks the test logic to use them
for connecting to a test database. There are configs for testing
against a real Redshift instance and against a local Postgres
instance, but by default Redshift will be used.

The credentials plumbing so that SOPS can be run within a test
to decrypt the configs is nontrivial.
@willdonnelly willdonnelly force-pushed the wgd/redshift-batch-code-health-20250218 branch from 96a0b48 to 62bab10 Compare February 21, 2025 21:36
@willdonnelly
Copy link
Member Author

willdonnelly commented Feb 21, 2025

PTAL. I've added another commit which gets the full connector test suite running against that same Redshift instance as part of CI builds.

The setup for that felt a bit complicated. Adding encrypted config files and rewriting the test helper logic to sops-decrypt those encrypted configs was straightforward enough, but:

  • I had to add an extra CI step for running Go tests outside of a Docker build (because plumbing GCP KMS credentials into the Docker build was complicated and I didn't feel 100% confident it wouldn't manage to leak them somehow)
  • I'm not sure how exactly SOPS gets the appropriate credentials during the integration test setup, but I had to add a bit of logic to write them into a tempfile and set GOOGLE_APPLICATION_CREDENTIALS to get it working here. Possibly there's a simpler solution that I missed?

@williamhbaker
Copy link
Member

I'm also not precisely sure how the materialization connector integration tests get the GCP credentials where they need to be, which is here where the configs are decrypted using the sops command directly, unless its via the specific name of the environment variable being what sops expects it to be (GCP_SERVICE_ACCOUNT_KEY).

@willdonnelly
Copy link
Member Author

Yeah, I just tried doing the dumb thing and leaving it unset and apparently that's all it takes, so I guess SOPS (or rather, the GCP client library it uses, because I checked and the SOPS repo has no mentions of that environment variable in the code) will indeed take it directly from GCP_SERVICE_ACCOUNT_KEY.

I thought I had tried that already, but I guess maybe I screwed something up when I did that the first time (in the context of making it work via the Dockerfile before I gave up on that, so maybe that was part of the problem) because it seems like it's working now.

@willdonnelly willdonnelly merged commit d2a7998 into main Feb 24, 2025
52 of 56 checks passed
@willdonnelly willdonnelly deleted the wgd/redshift-batch-code-health-20250218 branch February 24, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:planned This is a planned change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants