From 05e110420a48c6ef8487635824b0e80ee7ccbe0b Mon Sep 17 00:00:00 2001 From: Mahdi Dibaiee Date: Tue, 25 Feb 2025 12:34:36 +0000 Subject: [PATCH] docs: add Oracle Batch Query connector docs Inspired by postgres docs --- .../capture-connectors/OracleDB/flashback.md | 1 - .../OracleDB/oracle-batch.md | 29 +++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) create mode 100644 site/docs/reference/Connectors/capture-connectors/OracleDB/oracle-batch.md diff --git a/site/docs/reference/Connectors/capture-connectors/OracleDB/flashback.md b/site/docs/reference/Connectors/capture-connectors/OracleDB/flashback.md index f0bcc561aa..61d7b99643 100644 --- a/site/docs/reference/Connectors/capture-connectors/OracleDB/flashback.md +++ b/site/docs/reference/Connectors/capture-connectors/OracleDB/flashback.md @@ -1,4 +1,3 @@ - # OracleDB (Flashback) This connector captures data from OracleDB into Flow collections using [Oracle Flashback](https://www.oracle.com/database/technologies/flashback/). diff --git a/site/docs/reference/Connectors/capture-connectors/OracleDB/oracle-batch.md b/site/docs/reference/Connectors/capture-connectors/OracleDB/oracle-batch.md new file mode 100644 index 0000000000..67966bf819 --- /dev/null +++ b/site/docs/reference/Connectors/capture-connectors/OracleDB/oracle-batch.md @@ -0,0 +1,29 @@ +# Oracle Batch Query Connector + +This connector captures data from Oracle into Flow collections by periodically +executing queries and translating the results into JSON documents. + +For local development or open-source workflows, [`ghcr.io/estuary/source-oracle-batch:dev`](https://ghcr.io/estuary/source-oracle-batch:dev) provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions. + +We recommend using our [Oracle CDC Connector](http://go.estuary.dev/source-oracle) instead +if possible. Using CDC provides lower latency data capture, delete and update events, and usually +has a smaller impact on the source database. + +However there are some circumstances where this might not be feasible. Perhaps you need +to capture from a managed Oracle instance which doesn't support logminer, has resource constraints +which make logminer unviable or perhaps you need to capture the contents of a view or the result of an ad-hoc query. +That's the sort of situation this connector is intended for. + +The number one caveat you need to be aware of when using this connector is that **it will +periodically execute its update query over and over**. At the default polling interval of +5 minutes, a naive `SELECT * FROM foo` query against a 100 MiB view will produce 30 GiB/day +of ingested data, most of it duplicated. + +This is why the connector's autodiscovery logic only returns ordinary tables of data, because +in that particular case we can use the `ORA_ROWSCN` system column as a cursor and ask the database +to `SELECT ORA_ROWSCN, foo.* FROM foo WHERE ORA_ROWSCN > $1;`. + +If you start editing these queries or manually adding capture bindings for views or to run +ad-hoc queries, you need to either have some way of restricting the query to "just the new +rows since last time" or else have your polling interval set high enough that the data rate +` / ` is an amount of data you're willing to deal with.