SDK and CLI for CrateDB Cloud Cluster APIs #81

amotl · 2023-11-14T12:28:57Z

About

Wraps a few calls of croud into a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.

Documentation

https://cratedb-toolkit--81.org.readthedocs.build/cluster/
https://cratedb-toolkit--81.org.readthedocs.build/util/shell.html

Configuration

Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (.env).

CRATEDB_CLUSTER_NAME=Hotzenplotz
CRATEDB_USERNAME='admin'
CRATEDB_PASSWORD='H3IgNXNvQBJM3CiElOiVHuSp6CjXMCiQYhB4I9dLccVHGvvvitPSYr1vTpt4'

Python SDK

Deploy/start/resume cluster, run database workload, and suspend cluster again.

from pprint import pprint
from cratedb_toolkit import ManagedCluster

# Acquire database cluster handle, obtaining cluster identifier
# or name from the user's environment.
cluster = ManagedCluster.from_env().start()

# Run database workload.
results = cluster.query("SELECT * from sys.summits LIMIT 2;")
pprint(results)

# Suspend cluster.
cluster.suspend()

See also examples/python/cloud_cluster.py and examples/python/cloud_import.py.

CLI interface

Deploy cluster, import data, and query a few samples worth of data.

ctk cluster start
ctk load table "https://cdn.crate.io/downloads/datasets/cratedb-datasets/machine-learning/timeseries/nab-machine-failure.csv"
ctk shell --command 'SELECT * FROM "nab-machine-failure" LIMIT 10;'

See also examples/shell/cloud_cluster.sh and examples/shell/cloud_import.sh.

References

Backlog

Software tests
Documentation

Trivia

Favorite poems

In the toolkit’s warren, new features abound,
With clusters managed, imports now sound.
CLI and shell, both smarter and spry,
Python and notebooks reach for the sky.
Docker skips gently if it’s not around,
While tests and examples in harmony are found.
Oh, what a hop for this code-loving bunny!
Data to the cloud—smooth, fast, and sunny!

A rabbit hopped to cloud and sky,
With toolkit new, it aimed to try:
"Start a cluster, import with flair,
Use Python, shell, or notebook there.
Settings checked, containers spun,
Tests and docs—oh what fun!
Cloud and code now leap as one! 🐇☁️

In the cloud and on the ground,
Rabbits hop where clusters are found.
With toolkit new and CLI bright,
They probe, deploy, and import with might.
From shell to Python, tests abound,
Data hops in—jobs resound!
🐇✨

🐇
A toolkit grows with clusters grand,
Cloud or local, at your command.
With settings checked and errors clear,
New guides and tests are gathered here.
Import, deploy, or query deep,
The rabbit’s promise: data you’ll keep!
Hopping forward, code refined,
Cloud and shell, now intertwined.

cratedb_toolkit/cluster/croud.py

amotl · 2023-11-17T20:52:49Z

cratedb_toolkit/cluster/croud.py

+        # FIXME: Fix `croud clusters deploy`.
+        #        It yields *two* payloads to stdout, making it
+        #        unusable in JSON-capturing situations.
+        # The main advantage of the `JSONDecoder` class is that it also provides
+        # a `.raw_decode` method, which will ignore extra data after the end of the JSON.
+        # https://stackoverflow.com/a/75168292
+        payload = wr.invoke()
+        decoder = json.JSONDecoder()
+        data = decoder.raw_decode(payload)


Observation

This works around a minor flaw of croud, which yields a JSON output two times, rendering the output not parseable.

Suggestion

croud should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy.

We submitted an upstream fix about this obstacle.

Fix duplicate JSON output in croud clusters deploy croud#564

amotl · 2023-11-17T20:55:14Z

cratedb_toolkit/io/croud.py

+    def fix_job_info_table_name(self):
+        """
+        Adjust full-qualified table name by adding appropriate quotes.
+        Fixes a minor flaw on the upstream API.
+
+        Currently, the API returns `testdrive.pems-1`, but that can not be used at
+        all, because it is not properly quoted. It also can not be used 1:1, because
+        it is not properly quoted.
+
+        So, converge the table name into `"testdrive"."pems-1"` manually, for a
+        full-qualified representation.
+
+        FIXME: Remove after upstream has fixed the flaw.
+        """
+        job_info = self.info
+        if "destination" in job_info and "table" in job_info["destination"]:
+            table = job_info["destination"]["table"]
+            if '"' not in table and "." in table:
+                schema, table = table.split(".")
+                table = f'"{schema}"."{table}"'
+                job_info["destination"]["table"] = table


Observation

This works around another minor upstream flaw.

Suggestion

The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table attribute so that it can be re-used without further ado.

Example

{ "destination": { "table": "\"testdrive\".\"pems-1\"" } }

We created an upstream ticket about this observation.

API: Improve representation of job's metadata destination.table attribute croud#566

amotl · 2023-11-17T21:01:02Z

cratedb_toolkit/config.py

+def configure_croud(no_spinner: bool = None, use_spinner: bool = None):
+    """
+    Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well.
+
+    - https://github.com/ManrajGrover/halo/issues/32
+    - https://github.com/manrajgrover/halo/issues/179
+    """
+    if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner):
+        mod = types.ModuleType(
+            "croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner"
+        )
+        setattr(mod, "HALO", NoopContextManager())  # noqa: B010
+        sys.modules["croud.tools.spinner"] = mod
+
+
+class NoopContextManager:
+    """
+    For making the Halo progressbar a no-op.
+    """
+
+    def __init__(self, *args, **kwargs):
+        pass
+
+    def __enter__(self):
+        pass
+
+    def __exit__(self, exc_type, exc_value, exc_traceback):
+        pass
+
+    def stop(self):
+        pass


Observation

We had to turn off the Halo spinner currently used in croud, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.

Jupyter Notebooks don't support Halo manrajgrover/halo#32

BUG: Halo.__init__.<locals>.clean_up() takes 0 positional arguments but 1 was given manrajgrover/halo#179

Suggestion

Submit a patch to croud to only use interactivity when is_tty() is True, or such. At least, don't start the HALO at module-scope level, but initialize/configure it at runtime instead.

We've reported this to the issue tracker of the upstream project.

Library use: Turn off the Halo spinner croud#567

amotl · 2023-11-20T22:17:21Z

examples/python/cloud_cluster.py

+    # Log in to CrateDB Cloud.
+    croud login --idp azuread


Observation

There is an alternative, headless way of doing that: Using croud config show, you can display the location of the croud configuration file in YAML format.

current-profile: cratedb.cloud default-format: table profiles: cratedb.cloud: auth-token: xxxxxxxxxx endpoint: https://console.cratedb.cloud key: REDACTED organization-id: null region: _any_ secret: xxxxxxxxxx

If you fill in the key and secret values, obtained by running croud api-keys create, to create an API key ¹, operations like croud clusters list will start working without further ado, even after logging out again using croud logout.

It works well. Thanks, @proddata.

Suggestion

Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_API_KEY and CRATEDB_CLOUD_API_SECRET.

Footnotes

Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings ↩

We've implemented headless/unattended operations in croud and ctk. Thanks again.

Headless mode: Accept API key authentication per environment variables croud#562

cratedb-toolkit/cratedb_toolkit/util/croud.py

Lines 149 to 157 in 3b601c1

@property

def headless_config(self) -> Configuration:

tmp_file = NamedTemporaryFile()

tmp_path = Path(tmp_file.name)

config = Configuration("headless.yaml", tmp_path)

config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")

config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")

config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")

return config

I think the interface could be further improved by using the new JWT authentication mechanism instead of traditional credentials?

Added the topic about using JWT to the backlog, to be processed on a later iteration.

Fluent authentication using provided per-cluster JWT token has been implemented per 72d3fd9.

examples/python/cloud_cluster.py

tests/cluster/test_examples_python.py

... likewise, rename `--cratedb-http-url` to `--http-url`.

- `CRATEDB_CLOUD_CLUSTER_ID` to `CRATEDB_CLUSTER_ID` - `CRATEDB_CLOUD_CLUSTER_NAME` to `CRATEDB_CLUSTER_NAME`

The current set of input parameters for addressing a database cluster, `cluster_id`, `cluster_name`, `sqlalchemy_url`, and `http_url`, needed a few bits of bundling, so they are now backed by a real model class. Likewise, the `ManagedCluster` and `StandaloneCluster` classes now provide the `adapter` property, which will return a suitable `DatabaseAdapter` instance. This significantly connects the dots for a better composite object model. In this spirit, the `DatabaseCluster` class emerges as a new entry point to acquire a database cluster handle universally.

... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.

coderabbitai

Actionable comments posted: 11

♻️ Duplicate comments (6)

cratedb_toolkit/__init__.py (1)
14-16: Consider making initialization explicit instead of running at import time

Running preconfigure() during module import can lead to unexpected side effects, especially in testing environments. Consider making this configurable or providing a way to disable this behavior.
#!/bin/bash
# Check how preconfigure is implemented and what it modifies
rg -A 10 "def preconfigure" --type py
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests

[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by tests
cratedb_toolkit/io/cli.py (1)
39-45: ⚠️ Potential issue

Function signature updated but empty string handling needs attention.

The function parameters have been updated to reflect the new unified approach to cluster addressing, but there's a potential issue with empty strings being treated as valid values.

Empty strings ("") passed as CLI arguments for cluster_id or cluster_name could lead to validation issues. Consider normalizing empty strings to None:
def load_table(
    ctx: click.Context,
    url: str,
    cluster_id: str,
    cluster_name: str,
    cluster_url: str,
    schema: str,
    table: str,
    format_: str,
    compression: str,
    transformation: t.Union[Path, None],
):
+    # Normalize empty strings to None
+    cluster_id = cluster_id if cluster_id else None
+    cluster_name = cluster_name if cluster_name else None
+    cluster_url = cluster_url if cluster_url else None
doc/cluster/backlog.md (1)
35-41: 🛠️ Refactor suggestion

Missing resolution action for concurrent resume operations.

The issue of concurrent cluster resume operations is described, but the proposed solution ends with "=>" without clearly stating what action should be taken.
  => Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`.

  See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384.
+  => Add retry mechanism with exponential backoff when this error is detected, or
+     provide clear error messages to users about the need to wait for the existing operation to complete.
cratedb_toolkit/io/croud.py (2)
162-163: Avoid mutating TableAddress in-place

The code modifies target.table in-place, which could be unexpected for callers that might use the original object again. Consider creating a new TableAddress object with the modified table name.
# Honor `schema` argument.
if target.schema is not None:
-    target.table = f'"{target.schema}"."{target.table}"'
+    quoted_table = f'"{target.schema}"."{target.table}"'
+    # Create a new target object with the quoted table name
+    target = dataclasses.replace(target, table=quoted_table)
112-117: Consider using exponential backoff for retries

The retry loop uses fixed parameters (max_retries=20, retry_delay=0.15) which may still be too short for real-world cloud lag. Cloud operations can sometimes take longer than expected to appear.

Consider implementing an exponential backoff strategy:
-for _ in range(max_retries):
-    job = self.find_job(job_id=job_id)
-    if job.found:
-        break
-    time.sleep(retry_delay)
+for attempt in range(max_retries):
+    job = self.find_job(job_id=job_id)
+    if job.found:
+        break
+    # Use exponential backoff with initial delay
+    sleep_time = retry_delay * (2 ** attempt)
+    time.sleep(sleep_time)
cratedb_toolkit/util/croud.py (1)
242-257: 🛠️ Refactor suggestion

Add checks for required credentials in headless_config

The headless configuration doesn't verify if required credentials are set before creating the configuration, which could lead to authentication failures.
@classproperty
def get_headless_config(cls) -> Configuration:
    cfg = Configuration("croud.yaml")
    if cfg._file_path.exists() and "CRATEDB_CLOUD_API_KEY" not in os.environ:
        return cfg

    tmp_file = NamedTemporaryFile()
    tmp_path = Path(tmp_file.name)
    config = Configuration("headless.yaml", tmp_path)

    # Get credentials from the environment.
    config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")
    config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")
    config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")
    # config.profile["endpoint"] = os.environ.get("CRATEDB_CLOUD_ENDPOINT")  # noqa: ERA001

+    # Check if required credentials are available
+    if not config.profile["key"] or not config.profile["secret"]:
+        logger.warning("Missing required cloud credentials: CRATEDB_CLOUD_API_KEY and/or CRATEDB_CLOUD_API_SECRET")
+        
    return config

🧹 Nitpick comments (24)

cratedb_toolkit/testing/testcontainers/util.py (2)
141-162: Robust error handling for Docker unavailability.

This new DockerSkippingContainer class elegantly converts Docker connection errors into pytest skips, making tests fail gracefully when Docker isn't available. The early initialization of _container prevents potential issues in the parent's destructor.

However, there's a TODO at line 155 to synchronize error detection with PytestTestcontainerAdapter. Consider creating a shared helper function for Docker error detection to avoid duplication and ensure consistent behavior.
+def is_docker_connection_error(ex: DockerException) -> bool:
+    """Check if the exception indicates Docker daemon is unavailable."""
+    return any(token in str(ex) for token in (
+        "Connection aborted", 
+        "Error while fetching server API version"
+    ))

 class DockerSkippingContainer(DockerContainer):
     # ...
     except DockerException as ex:
-            if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")):
+            if is_docker_connection_error(ex):
                 # ...

 class PytestTestcontainerAdapter:
     # ...
     except DockerException as ex:
-            if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")):
+            if is_docker_connection_error(ex):
                 # ...
192-196: Consider consistent error messages between container classes.

The skip reason message is slightly different between DockerSkippingContainer ("Docker is not running") and PytestTestcontainerAdapter ("Docker daemon is not available"). Consider standardizing these messages for consistency.
-                reason="Skipping test because Docker daemon is not available", allow_module_level=True
+                reason="Skipping test because Docker is not running", allow_module_level=True
cratedb_toolkit/util/setting.py (1)
93-93: Improve readability of the guidance message

The guidance string might display raw Python list representations, making it harder for users to understand the available options.

Consider formatting the lists for better readability:
-    guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}"
+    formatted_params = ", ".join(parameter_names)
+    formatted_envvars = ", ".join(environment_variables)
+    guidance = f"Use one of the CLI arguments ({formatted_params}) or environment variables ({formatted_envvars})"
CHANGES.md (1)
13-17: Use consistent Markdown formatting for breaking changes section

The Breaking changes section is currently formatted as bold text instead of a proper markdown heading, which triggered a linter warning.
-**Breaking changes**
+### Breaking changes
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

13-13: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)
doc/cluster/_address.md (1)
1-17: Clear documentation for cluster addressing options.

The documentation clearly explains the supported cluster addressing options through CLI and environment variables, along with their mutual exclusivity and precedence rules. This information is crucial for users to understand how to properly configure their connection to CrateDB clusters.

However, there are some minor formatting inconsistencies in the document. Consider adding a space after the colons in the definition lists to maintain consistent formatting:
-:CLI options:
+:CLI options: 
And similarly for the "Environment variables:" line.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/python.md (2)
41-42: Consider improving formatting for definition lists.

Similar to the _address.md file, there's a formatting inconsistency with the definition list. Consider adding a space after the colon:
-:Environment variables:
+:Environment variables: 
This would improve readability and maintain consistent formatting across documentation files.

54-61: Consider adding more details on error handling in examples.

The code example demonstrates basic usage but doesn't show how to handle potential errors when connecting to the cluster or executing queries. Consider expanding the example to include error handling to give users a more complete picture of working with the API in production environments.

For example:
from pprint import pprint
from cratedb_toolkit import ManagedCluster
from cratedb_toolkit.exception import ClusterOperationError

try:
    with ManagedCluster.from_env() as cluster:
        pprint(cluster.query("SELECT * from sys.summits LIMIT 2;"))
except ClusterOperationError as e:
    print(f"Cluster operation failed: {e}")
doc/util/shell.md (2)
1-8: Documentation formatting should follow standard markdown conventions.

The document starts with (shell)= which appears to be a reference label syntax for a documentation system (likely Sphinx with MyST markdown), but it creates somewhat awkward formatting in the heading structure.

Make sure this label syntax is intended for your documentation system. If using Sphinx with MyST markdown, this is the correct format for cross-referencing, but consider adding a line break between the label and the heading for better readability:
(shell)=
+
# ctk shell
🧰 Tools

🪛 LanguageTool

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (shell)= # ctk shell The database shell interface ...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

11-16: Consider adding version information for uv installation.

While recommending uv is good, it might be helpful to indicate minimum version requirements or installation instructions for uv itself, as it's still a relatively newer package manager.
uv tool install --upgrade 'cratedb-toolkit'
+
+# If you don't have uv installed, you can install it via:
+curl -fsS https://astral.sh/uv/install.sh | bash
cratedb_toolkit/model.py (1)
63-71: Consider improving the URL scheme detection logic.

The from_string method makes assumptions about URL schemes that might not cover all cases.

The current implementation assumes any URL starting with "crate" is an SQLAlchemy URI, but this might lead to incorrect parsing in edge cases. Consider a more robust detection method:
@classmethod
def from_string(cls, url: str) -> "DatabaseAddress":
    """
    Factory method to create an instance from an SQLAlchemy database URL in string format.
    """
-    if url.startswith("crate"):
+    # Parse the URL to determine the scheme
+    parsed_url = URL(url)
+    if parsed_url.scheme == "crate":
        return cls.from_sqlalchemy_uri(url)
    else:
        return cls.from_http_uri(url)
doc/cluster/backlog.md (2)
58-65: Formatting issue in code block example.

There's a formatting issue where the code block example and the next item don't have proper spacing.
  ```python
  # The context manager stops the cluster automatically on exit.
  with ManagedCluster.from_env().start() as cluster:
      # Do something with the cluster
      results = cluster.query("SELECT * FROM sys.nodes")
  # Cluster is automatically suspended when exiting the block
ctk cluster start: Avoid creating a project on each deploy operation
<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[uncategorized] ~65-~65: Loose punctuation mark.
Context: ...ng the block   ``` - `ctk cluster start`: Avoid creating a project on each deploy...

(UNLIKELY_OPENING_PUNCTUATION)

</details>

</details>

---

`87-90`: **Comma before 'because' clause.**

There's a comma before the 'because' clause which is not needed if the clause is essential to the meaning.

```diff
- Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`,
-  because parameters like `username` and `password` are obtained from the environment, 
+ Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`
+  because parameters like `username` and `password` are obtained from the environment, 
🧰 Tools

🪛 LanguageTool

[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of a ManagedCluster, because parameters like username and `passwor...

(COMMA_BEFORE_BECAUSE)
cratedb_toolkit/io/croud.py (1)
67-84: Workaround for upstream API flaw needs tracking

The fix_job_info_table_name method is a workaround for an upstream API flaw. The FIXME comment is good, but it would be better to link to a ticket tracking this issue for future removal of this workaround.
"""
Adjust full-qualified table name by adding appropriate quotes.
Fixes a minor flaw on the upstream API.

Currently, the API returns `testdrive.pems-1`, but that can not be used at
all, because it is not properly quoted. It also can not be used 1:1, because
it is not properly quoted.

So, converge the table name into `"testdrive"."pems-1"` manually, for a
full-qualified representation.

-FIXME: Remove after upstream has fixed the flaw.
+FIXME: Remove after upstream has fixed the flaw in https://github.com/crate/croud/issues/566
"""
cratedb_toolkit/cluster/cli.py (1)
115-116: Consider using Click's exit code mechanism instead of sys.exit()

Using sys.exit() directly in a Click command can bypass Click's error handling. Consider using Click's mechanisms for reporting errors and exit codes.
-        sys.exit(0 if (cluster_info.ready and has_db_result) else 1)
+        ctx.exit(0 if (cluster_info.ready and has_db_result) else 1)
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 115-116: cratedb_toolkit/cluster/cli.py#L115-L116
Added lines #L115 - L116 were not covered by tests
cratedb_toolkit/cluster/croud.py (2)
391-421: Improve local vs. remote file detection

The current check for determining if a resource is local or remote uses a combination of checking for "://" in the URL and checking if the file exists. This might not be reliable in all cases, especially with relative paths.
# Compute command-line arguments for invoking `croud`.
# TODO: Sanitize table name - which characters are allowed?
-is_remote = "://" in resource.url
-if not is_remote and Path(resource.url).exists():
+# More robust check for remote URLs
+def is_remote_url(url: str) -> bool:
+    """Check if a URL is remote by looking for standard URL schemes."""
+    remote_schemes = ("http://", "https://", "s3://", "gs://", "azure://", "ftp://")
+    return any(url.startswith(scheme) for scheme in remote_schemes)
+
+is_remote = is_remote_url(resource.url)
+if not is_remote:
+    # For local paths, verify they exist
+    path = Path(resource.url)
+    if not path.exists():
+        raise ValueError(f"Local file not found: {resource.url}")
+
+    # Proceed with local file import
324-336: Error message might be misleading on suspension

The error message in set_suspended always says "Resuming cluster failed" even when the operation is to suspend a cluster.
def set_suspended(self, body: t.Dict[str, t.Any]):
    data, errors = self.client.put(self.url.suspend, body=body)
    if data is None:
        if not errors:
            errors = "Unknown error"
-        raise CroudException(f"Resuming cluster failed: {errors}")
+        operation = "Resuming" if body.get("suspended") is False else "Suspending"
+        raise CroudException(f"{operation} cluster failed: {errors}")
    _wait_for_completed_operation(
        client=self.client,
        cluster_id=self.cluster_id,
        request_params={"type": "SUSPEND", "limit": 1},
    )
    return data
cratedb_toolkit/cluster/core.py (8)
195-200: Clarify error handling with TODO comment

There's a TODO comment about improving error handling with flexfun, but it's not clear what needs to be improved.

Either implement the improvement or add more details to the TODO to explain what needs to be improved:
-        # TODO: With `flexfun`, can this section be improved?
+        # TODO: Consider refactoring this error handling section to fully utilize the flexfun decorator's
+        # capabilities for more consistent error behavior across the codebase.
366-367: Implement TODO for better job reporting

The TODO comment indicates a need to explicitly report about failed_records and other potential issues during data loading.

Consider implementing a more detailed reporting structure for job status, especially for failed records:
            logger.info("Job information:\n%s", json.dumps(cloud_job.info, indent=2))
-            # TODO: Explicitly report about `failed_records`, etc.
+            # Report failed records if any
+            if cloud_job.info.get('failed_records', 0) > 0:
+                logger.warning(f"Job completed with {cloud_job.info.get('failed_records')} failed records")
+                if cloud_job.info.get('failed_records_sample'):
+                    logger.warning(f"Sample of failed records: {cloud_job.info.get('failed_records_sample')}")
Would you like me to implement a more comprehensive reporting solution for job status and failures?

202-204: Implement the delete method

The delete method is stubbed out with NotImplementedError but should be implemented for completeness of the cluster lifecycle management.

Would you like me to help implement the delete method to properly handle cluster deletion? It would involve:

Validating that a cluster ID is provided

Calling the appropriate Cloud API method for cluster deletion

Handling errors and updating the cluster state

Logging the operation

267-270: Improve error handling for failed deployments

The TODO comments indicate a need for better error reporting when deployments fail.

Consider implementing more detailed error reporting:
            if not self.exists:
-                # TODO: Gather and propagate more information why the deployment failed.
-                #       Capture deployment logs or status if available.
-                #       Possibly use details from `last_async_operation`.
-                raise CroudException(f"Deployment of cluster failed: {self.cluster_name}")
+                # Extract detailed error information from deployment status
+                error_details = {}
+                if hasattr(self.operation, "get_status"):
+                    status = self.operation.get_status()
+                    error_details["status"] = status
+                    if "last_async_operation" in status:
+                        error_details["operation"] = status["last_async_operation"]
+                
+                # Format a detailed error message
+                error_message = f"Deployment of cluster failed: {self.cluster_name}"
+                if error_details:
+                    error_message += f"\nDetails: {json.dumps(error_details, indent=2)}"
+                
+                logger.error(error_message)
+                raise CroudException(error_message)
405-422: Singleton pattern implementation may cause issues with multiple clusters

The current implementation of get_client_bundle creates a singleton client bundle, which could cause issues if different credentials are needed for different operations on the same cluster.

Consider adding a parameter to force creating a new client bundle with different credentials:
-    def get_client_bundle(self, username: str = None, password: str = None) -> ClientBundle:
+    def get_client_bundle(self, username: str = None, password: str = None, force_new: bool = False) -> ClientBundle:
        """
        Return a bundle of client handles to the CrateDB Cloud cluster database.

        - adapter: A high-level `DatabaseAdapter` instance, offering a few convenience methods.
        - dbapi: A DBAPI connection object, as provided by SQLAlchemy's `dbapi_connection`.
        - sqlalchemy: An SQLAlchemy `Engine` object.
        """

        # Store the client bundle, effectively making it a singleton.
-        if self._client_bundle is not None:
+        if self._client_bundle is not None and not force_new:
            return self._client_bundle
428-430: Address SQL query TODO

The TODO comment suggests using the records package for invoking SQL statements.

Either implement the suggestion or clarify the TODO:
-        # TODO: Use the `records` package for invoking SQL statements.
+        # TODO: Consider replacing this with the `records` package for 
+        # invoking SQL statements, which provides more robust result handling and formatting.
+        # See: https://github.com/kennethreitz/records
131-136: Check for empty strings in identifiers

The current validation only checks if identifiers are None but doesn't check for empty strings, which could lead to errors.

Add validation for empty strings:
        # Default settings and sanity checks.
        self.cluster_id = self.cluster_id or self.settings.cluster_id or None
        self.cluster_name = self.cluster_name or self.settings.cluster_name or None
-        if self.cluster_id is None and self.cluster_name is None:
+        if not self.cluster_id and not self.cluster_name:
            raise DatabaseAddressMissingError(
                "Failed to address cluster: Either cluster identifier or name needs to be specified"
            )
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 131-134: cratedb_toolkit/cluster/core.py#L131-L134
Added lines #L131 - L134 were not covered by tests

374-377: Add reason to exception message

The TODO comment indicates a need to add a "reason" to the exception message.

Implement the suggested improvement:
-                # TODO: Add "reason" to exception message.
-                message = f"Data loading failed.\n{cloud_job.message}"
+                # Extract failure reason from cloud_job
+                failure_reason = cloud_job.info.get('failure_reason', 'Unknown reason')
+                message = f"Data loading failed. Reason: {failure_reason}\n{cloud_job.message}"
                logger.error(f"{message}\n{texts.error()}\n")
                raise OperationFailed(message)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db1e1c5 and d92d5c0.

📒 Files selected for processing (107)

.github/workflows/main.yml (1 hunks)
.github/workflows/mongodb.yml (1 hunks)
CHANGES.md (2 hunks)
cratedb_toolkit/__init__.py (2 hunks)
cratedb_toolkit/adapter/rockset/cli.py (2 hunks)
cratedb_toolkit/adapter/rockset/server/dependencies.py (1 hunks)
cratedb_toolkit/api/cli.py (0 hunks)
cratedb_toolkit/api/guide.py (0 hunks)
cratedb_toolkit/api/main.py (0 hunks)
cratedb_toolkit/cfr/cli.py (9 hunks)
cratedb_toolkit/cfr/marimo.py (1 hunks)
cratedb_toolkit/cli.py (2 hunks)
cratedb_toolkit/cluster/cli.py (2 hunks)
cratedb_toolkit/cluster/core.py (1 hunks)
cratedb_toolkit/cluster/croud.py (1 hunks)
cratedb_toolkit/cluster/guide.py (1 hunks)
cratedb_toolkit/cluster/model.py (1 hunks)
cratedb_toolkit/cluster/util.py (0 hunks)
cratedb_toolkit/cmd/tail/cli.py (3 hunks)
cratedb_toolkit/config.py (1 hunks)
cratedb_toolkit/exception.py (2 hunks)
cratedb_toolkit/info/cli.py (6 hunks)
cratedb_toolkit/info/core.py (2 hunks)
cratedb_toolkit/info/http.py (1 hunks)
cratedb_toolkit/io/cli.py (4 hunks)
cratedb_toolkit/io/croud.py (4 hunks)
cratedb_toolkit/io/dynamodb/api.py (1 hunks)
cratedb_toolkit/io/influxdb.py (1 hunks)
cratedb_toolkit/io/mongodb/adapter.py (3 hunks)
cratedb_toolkit/io/mongodb/api.py (4 hunks)
cratedb_toolkit/io/processor/kinesis_lambda.py (2 hunks)
cratedb_toolkit/job/cli.py (0 hunks)
cratedb_toolkit/job/croud.py (0 hunks)
cratedb_toolkit/model.py (4 hunks)
cratedb_toolkit/option.py (1 hunks)
cratedb_toolkit/options.py (0 hunks)
cratedb_toolkit/settings/compare.py (5 hunks)
cratedb_toolkit/shell/cli.py (3 hunks)
cratedb_toolkit/testing/testcontainers/cratedb.py (5 hunks)
cratedb_toolkit/testing/testcontainers/influxdb2.py (2 hunks)
cratedb_toolkit/testing/testcontainers/minio.py (2 hunks)
cratedb_toolkit/testing/testcontainers/mongodb.py (2 hunks)
cratedb_toolkit/testing/testcontainers/util.py (4 hunks)
cratedb_toolkit/util/app.py (1 hunks)
cratedb_toolkit/util/client.py (1 hunks)
cratedb_toolkit/util/common.py (2 hunks)
cratedb_toolkit/util/crash.py (2 hunks)
cratedb_toolkit/util/croud.py (8 hunks)
cratedb_toolkit/util/database.py (6 hunks)
cratedb_toolkit/util/runtime.py (1 hunks)
cratedb_toolkit/util/setting.py (1 hunks)
doc/adapter/rockset.md (1 hunks)
doc/backlog/main.md (2 hunks)
doc/bugs.md (1 hunks)
doc/cfr/jobstats.md (1 hunks)
doc/cfr/systable.md (3 hunks)
doc/cluster/_address.md (1 hunks)
doc/cluster/backlog.md (1 hunks)
doc/cluster/cli.md (1 hunks)
doc/cluster/index.md (1 hunks)
doc/cluster/python.md (1 hunks)
doc/cluster/tutorial.md (1 hunks)
doc/conf.py (1 hunks)
doc/index.md (1 hunks)
doc/info/index.md (1 hunks)
doc/io/dms/standalone.md (1 hunks)
doc/io/dynamodb/cdc-lambda.md (4 hunks)
doc/io/dynamodb/cdc.md (2 hunks)
doc/io/dynamodb/loader.md (2 hunks)
doc/io/index.md (2 hunks)
doc/io/influxdb/loader.md (3 hunks)
doc/io/mongodb/cdc.md (3 hunks)
doc/io/mongodb/loader.md (4 hunks)
doc/io/mongodb/migr8.md (1 hunks)
doc/sandbox.md (2 hunks)
doc/util/index.md (1 hunks)
doc/util/settings.md (1 hunks)
doc/util/shell.md (1 hunks)
examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py (1 hunks)
examples/cloud_import.py (0 hunks)
examples/notebook/cloud_import.ipynb (1 hunks)
examples/python/cloud_cluster.py (1 hunks)
examples/python/cloud_import.py (1 hunks)
examples/shell/cloud_cluster.sh (1 hunks)
examples/shell/cloud_import.sh (1 hunks)
pyproject.toml (8 hunks)
tests/adapter/test_rockset.py (3 hunks)
tests/api/test_util.py (1 hunks)
tests/cfr/test_info.py (1 hunks)
tests/cfr/test_jobstats.py (3 hunks)
tests/cfr/test_systable.py (6 hunks)
tests/cluster/test_cli.py (1 hunks)
tests/cluster/test_core.py (1 hunks)
tests/cluster/test_guide.py (1 hunks)
tests/cluster/test_model.py (1 hunks)
tests/cmd/test_tail.py (1 hunks)
tests/conftest.py (3 hunks)
tests/examples/test_python.py (1 hunks)
tests/examples/test_shell.py (1 hunks)
tests/examples/test_shell.sh (1 hunks)
tests/info/test_cli.py (3 hunks)
tests/info/test_http.py (1 hunks)
tests/io/dynamodb/__init__.py (0 hunks)
tests/io/dynamodb/conftest.py (1 hunks)
tests/io/dynamodb/test_cli.py (1 hunks)
tests/io/influxdb/conftest.py (4 hunks)
tests/io/influxdb/test_cli.py (3 hunks)

⛔ Files not processed due to max files limit (12)

tests/io/mongodb/conftest.py
tests/io/mongodb/test_cli.py
tests/io/test_import.py
tests/io/test_processor.py
tests/retention/test_cli.py
tests/retention/test_examples.py
tests/settings/test_cli.py
tests/shell/test_cli.py
tests/test_model.py
tests/test_shell.py
tests/util/database.py
tests/util/shunit2

💤 Files with no reviewable changes (9)

tests/io/dynamodb/init.py
cratedb_toolkit/api/cli.py
cratedb_toolkit/api/guide.py
cratedb_toolkit/job/croud.py
cratedb_toolkit/cluster/util.py
cratedb_toolkit/options.py
examples/cloud_import.py
cratedb_toolkit/api/main.py
cratedb_toolkit/job/cli.py

✅ Files skipped from review due to trivial changes (9)

doc/io/mongodb/migr8.md
tests/io/dynamodb/conftest.py
.github/workflows/mongodb.yml
doc/cfr/systable.md
cratedb_toolkit/util/client.py
doc/cluster/tutorial.md
examples/shell/cloud_import.sh
examples/shell/cloud_cluster.sh
tests/cluster/test_guide.py

🚧 Files skipped from review as they are similar to previous changes (66)

doc/util/index.md
cratedb_toolkit/io/dynamodb/api.py
doc/cfr/jobstats.md
doc/io/dms/standalone.md
cratedb_toolkit/io/influxdb.py
doc/index.md
tests/io/influxdb/test_cli.py
doc/adapter/rockset.md
tests/io/dynamodb/test_cli.py
doc/util/settings.md
doc/io/dynamodb/cdc-lambda.md
tests/cluster/test_model.py
cratedb_toolkit/util/common.py
doc/conf.py
tests/info/test_http.py
tests/adapter/test_rockset.py
cratedb_toolkit/io/mongodb/api.py
tests/cfr/test_systable.py
doc/io/dynamodb/loader.md
doc/bugs.md
doc/io/influxdb/loader.md
tests/cmd/test_tail.py
tests/cfr/test_jobstats.py
tests/cfr/test_info.py
cratedb_toolkit/info/http.py
doc/info/index.md
cratedb_toolkit/testing/testcontainers/mongodb.py
doc/io/mongodb/loader.md
tests/examples/test_shell.py
doc/cluster/index.md
cratedb_toolkit/testing/testcontainers/minio.py
cratedb_toolkit/settings/compare.py
cratedb_toolkit/cmd/tail/cli.py
cratedb_toolkit/io/processor/kinesis_lambda.py
cratedb_toolkit/testing/testcontainers/influxdb2.py
tests/info/test_cli.py
tests/io/influxdb/conftest.py
examples/python/cloud_cluster.py
cratedb_toolkit/util/runtime.py
doc/io/mongodb/cdc.md
cratedb_toolkit/util/crash.py
tests/examples/test_shell.sh
tests/api/test_util.py
examples/notebook/cloud_import.ipynb
.github/workflows/main.yml
cratedb_toolkit/cluster/guide.py
doc/cluster/cli.md
cratedb_toolkit/info/core.py
cratedb_toolkit/option.py
doc/io/dynamodb/cdc.md
tests/examples/test_python.py
examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py
cratedb_toolkit/shell/cli.py
cratedb_toolkit/io/mongodb/adapter.py
tests/cluster/test_core.py
pyproject.toml
cratedb_toolkit/exception.py
cratedb_toolkit/info/cli.py
cratedb_toolkit/util/app.py
examples/python/cloud_import.py
doc/sandbox.md
cratedb_toolkit/testing/testcontainers/cratedb.py
cratedb_toolkit/util/database.py
tests/cluster/test_cli.py
cratedb_toolkit/config.py
cratedb_toolkit/cluster/model.py

🧰 Additional context used

🧠 Learnings (4)

tests/conftest.py (2)

Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:55-62
Timestamp: 2025-04-26T17:45:20.650Z
Learning: In the CrateDB toolkit, the container start operation is idempotent, meaning it can be safely called multiple times without causing issues. This allows for flexible container lifecycle management where both `setup()` may call `container.start()` and parent classes like `PytestTestcontainerAdapter` may also invoke `start()` methods.

Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:168-170
Timestamp: 2025-05-02T15:26:10.385Z
Learning: In the `cloud_environment` fixture in tests/conftest.py, environment variables like TEST_CRATEDB_CLOUD_API_KEY, TEST_CRATEDB_CLOUD_API_SECRET, TEST_CRATEDB_CLOUD_ORGANIZATION_ID, and TEST_CRATEDB_CLUSTER_NAME are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.

cratedb_toolkit/cluster/cli.py (1)

Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cluster/cli.py:82-89
Timestamp: 2025-04-27T03:07:04.375Z
Learning: In the cratedb-toolkit codebase, the `handle_command_errors` context manager is designed as an error handler for specific operations (not as a general protective wrapper). It's intentionally used to handle errors for targeted operations rather than wrapping entire function executions.

cratedb_toolkit/cluster/core.py (1)

Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/api/main.py:263-265
Timestamp: 2025-04-26T21:59:59.463Z
Learning: When handling DNS propagation delays after resource creation (like CrateDB Cloud clusters), both fixed sleeps and immediate polling have drawbacks. Fixed sleeps are brittle, but immediate polling risks negative DNS caching. A hybrid approach with a short initial sleep followed by polling with exponential backoff is preferred.

cratedb_toolkit/cfr/marimo.py (1)

Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cfr/marimo.py:53-54
Timestamp: 2025-05-02T11:43:42.639Z
Learning: The submodules within `cratedb_toolkit/io` employ a different naming scheme than other parts of the codebase and retain their own conventions (e.g., may continue using `cratedb_sqlalchemy_url` instead of `sqlalchemy_url`).

🧬 Code Graph Analysis (6)

cratedb_toolkit/util/setting.py (1)

cratedb_toolkit/config.py (2)

RUNNING_ON_PYTEST (26-27)

RUNNING_ON_JUPYTER (22-23)

cratedb_toolkit/model.py (3)

cratedb_toolkit/exception.py (2)

DatabaseAddressDuplicateError (35-43)

DatabaseAddressMissingError (16-32)

cratedb_toolkit/util/data.py (1)

asbool (25-36)

cratedb_toolkit/cluster/core.py (1)

from_params (599-603)

cratedb_toolkit/testing/testcontainers/util.py (4)

tests/conftest.py (1)

setup (56-62)

tests/io/mongodb/conftest.py (1)

setup (38-48)

tests/io/influxdb/conftest.py (1)

setup (28-34)

cratedb_toolkit/testing/testcontainers/cratedb.py (2)

start (176-182)

stop (184-189)

cratedb_toolkit/adapter/rockset/server/dependencies.py (2)

cratedb_toolkit/util/database.py (1)

DatabaseAdapter (43-421)

cratedb_toolkit/model.py (1)

dburi (101-105)

cratedb_toolkit/cfr/cli.py (4)

cratedb_toolkit/util/app.py (1)

make_cli (11-52)

cratedb_toolkit/util/data.py (1)

path_from_url (51-54)

cratedb_toolkit/cfr/systable.py (2)

SystemTableExporter (116-190)

SystemTableImporter (193-257)

cratedb_toolkit/model.py (4)

dburi (101-105)

DatabaseAddress (55-176)

from_string (63-70)

from_string (198-199)

cratedb_toolkit/cli.py (1)

cratedb_toolkit/util/setting.py (1)

init_dotenv (30-37)

🪛 GitHub Check: codecov/patch

cratedb_toolkit/cluster/cli.py

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests

[warning] 93-95: cratedb_toolkit/cluster/cli.py#L93-L95
Added lines #L93 - L95 were not covered by tests

[warning] 106-110: cratedb_toolkit/cluster/cli.py#L106-L110
Added lines #L106 - L110 were not covered by tests

[warning] 115-116: cratedb_toolkit/cluster/cli.py#L115-L116
Added lines #L115 - L116 were not covered by tests

[warning] 128-128: cratedb_toolkit/cluster/cli.py#L128
Added line #L128 was not covered by tests

[warning] 130-131: cratedb_toolkit/cluster/cli.py#L130-L131
Added lines #L130 - L131 were not covered by tests

[warning] 134-134: cratedb_toolkit/cluster/cli.py#L134
Added line #L134 was not covered by tests

[warning] 146-146: cratedb_toolkit/cluster/cli.py#L146
Added line #L146 was not covered by tests

[warning] 148-149: cratedb_toolkit/cluster/cli.py#L148-L149
Added lines #L148 - L149 were not covered by tests

[warning] 152-152: cratedb_toolkit/cluster/cli.py#L152
Added line #L152 was not covered by tests

[warning] 172-172: cratedb_toolkit/cluster/cli.py#L172
Added line #L172 was not covered by tests

[warning] 174-177: cratedb_toolkit/cluster/cli.py#L174-L177
Added lines #L174 - L177 were not covered by tests

cratedb_toolkit/cluster/core.py

[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests

[warning] 121-128: cratedb_toolkit/cluster/core.py#L121-L128
Added lines #L121 - L128 were not covered by tests

[warning] 131-134: cratedb_toolkit/cluster/core.py#L131-L134
Added lines #L131 - L134 were not covered by tests

[warning] 138-140: cratedb_toolkit/cluster/core.py#L138-L140
Added lines #L138 - L140 were not covered by tests

[warning] 144-144: cratedb_toolkit/cluster/core.py#L144
Added line #L144 was not covered by tests

[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests

cratedb_toolkit/__init__.py

[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests

[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by tests

[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests

[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by tests

cratedb_toolkit/adapter/rockset/cli.py

[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests

[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests

cratedb_toolkit/adapter/rockset/server/dependencies.py

[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by tests

cratedb_toolkit/cfr/cli.py

[warning] 32-32: cratedb_toolkit/cfr/cli.py#L32
Added line #L32 was not covered by tests

[warning] 35-35: cratedb_toolkit/cfr/cli.py#L35
Added line #L35 was not covered by tests

[warning] 61-61: cratedb_toolkit/cfr/cli.py#L61
Added line #L61 was not covered by tests

[warning] 63-63: cratedb_toolkit/cfr/cli.py#L63
Added line #L63 was not covered by tests

[warning] 112-112: cratedb_toolkit/cfr/cli.py#L112
Added line #L112 was not covered by tests

[warning] 141-141: cratedb_toolkit/cfr/cli.py#L141
Added line #L141 was not covered by tests

[warning] 164-165: cratedb_toolkit/cfr/cli.py#L164-L165
Added lines #L164 - L165 were not covered by tests

[warning] 181-182: cratedb_toolkit/cfr/cli.py#L181-L182
Added lines #L181 - L182 were not covered by tests

[warning] 207-207: cratedb_toolkit/cfr/cli.py#L207
Added line #L207 was not covered by tests

cratedb_toolkit/cfr/marimo.py

[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests

cratedb_toolkit/cli.py

[warning] 25-25: cratedb_toolkit/cli.py#L25
Added line #L25 was not covered by tests

🪛 markdownlint-cli2 (0.17.2)

CHANGES.md

13-13: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)

🪛 LanguageTool

doc/backlog/main.md

[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...

(UNLIKELY_OPENING_PUNCTUATION)

[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...

(SOONER_OR_LATER)

doc/cluster/_address.md

[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/backlog.md

[uncategorized] ~65-~65: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start: Avoid creating a project on each deploy...

(UNLIKELY_OPENING_PUNCTUATION)

[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of a ManagedCluster, because parameters like username and `passwor...

(COMMA_BEFORE_BECAUSE)

doc/cluster/python.md

[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)

doc/io/index.md

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (io-subsystem)= # I/O Subsystem Load and extract data int...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

doc/util/shell.md

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (shell)= # ctk shell The database shell interface ...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest

🔇 Additional comments (52)

tests/conftest.py (5)

45-71: Well-designed testing helper class for container management.

This new CrateDBFixture class creates a clean abstraction for CrateDB test containers, properly leveraging the new PytestTestcontainerAdapter base class. The implementation correctly handles container lifecycle and database connection setup.

59-62: Explicit container start doesn't cause issues.

As I've learned from your previous feedback, the container start operation is idempotent, so the explicit self.container.start() call here is fine even though PytestTestcontainerAdapter.run_setup() will call start() again.

89-105: Clean environment isolation for tests.

This new fixture nicely complements prune_environment by specifically targeting variables defined in ManagedClusterSettings. The implementation correctly guards against None envvar names as fixed in a previous commit.

140-146: Good default pattern for test cluster name.

This fixture properly retrieves the cluster name from environment variables with a sensible default fallback.

148-174: Correctly implements CrateDB Cloud test environment setup.

The fixture appropriately patches environment variables and handles toolkit configuration. Based on your previous feedback, I understand that environment variables here are partially mutually exclusive, so the current check at line 161-162 is appropriate for your use case.

cratedb_toolkit/testing/testcontainers/util.py (2)

48-54: Improved port handling for container addresses.

The enhanced get_real_host_address() method now properly handles port discovery from multiple attributes and normalizes port strings by stripping protocol suffixes. This makes the endpoint addresses more reliable for client usage.

164-206: Well-designed adapter for Pytest testcontainer integration.

The PytestTestcontainerAdapter class provides a clean abstraction for managing testcontainers in Pytest fixtures. It properly handles Docker availability checks and container lifecycle management. The design pattern requiring subclasses to implement setup() enables flexible container configuration.

cratedb_toolkit/util/setting.py (5)

14-20: Clean and well-structured dataclass implementation

The Setting dataclass provides a clean way to encapsulate a Click parameter along with an optional group identifier. Good use of type hints and a convenient asdict() helper method.

23-27: Good utility function for command-line argument detection

This helper function is concise and effectively detects long-form options in command-line arguments, which will help with determining whether to use CLI or environment variable sources.

30-37: Effective environment variable loading with appropriate conditional logic

The init_dotenv() function handles loading variables from the .env file with proper environment detection for pytest. The logging at info level is helpful for debugging configuration sources.

40-69: Robust configuration parser with multi-environment support

The obtain_settings() function effectively handles multiple input sources with appropriate environment detection for Jupyter and pytest. Good error propagation and context management with Click.

72-104: Well-implemented mutual exclusivity checker

The check_mutual_exclusiveness() function provides clear validation and helpful error messages. The approach of collecting names and values first before validation is clean and maintainable.
cratedb_toolkit/adapter/rockset/server/dependencies.py (1)

12-13:
❓ Verification inconclusive

Environment variable name standardized correctly

The code appropriately updates from CRATEDB_SQLALCHEMY_URL to CRATEDB_CLUSTER_URL, aligning with the PR's objective to unify cluster addressing across the codebase.

The static analysis hints indicate that this code lacks test coverage. Would you like me to help create a test for this function?

🏁 Script executed:
#!/bin/bash
# Check if there are tests for this specific module/function
rg "test.*database_adapter" --type py tests/adapter/rockset
Length of output: 123

🏁 Script executed:
#!/bin/bash
# Search the entire repository for any test definitions to determine if tests exist
rg "def test_" --type py -n
Length of output: 16979
Environment variable rename approved; add missing unit test

The switch from CRATEDB_SQLALCHEMY_URL → CRATEDB_CLUSTER_URL is correct and aligns with the PR’s goal. However, I didn’t find any unit tests for the function in
cratedb_toolkit/adapter/rockset/server/dependencies.py (the one that reads CRATEDB_CLUSTER_URL and returns a DatabaseAdapter).

• cratedb_toolkit/adapter/rockset/server/dependencies.py (lines 12–13)

Could we add a test (e.g. in tests/adapter/test_dependencies.py or alongside your rockset tests) such as:
def test_get_database_adapter_reads_cluster_url(monkeypatch):
    monkeypatch.setenv("CRATEDB_CLUSTER_URL", "http://example.com")
    from cratedb_toolkit.adapter.rockset.server.dependencies import get_database_adapter
    adapter = get_database_adapter()
    assert adapter.dburi == "http://example.com"
Let me know if you’d like me to draft that.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by tests
doc/io/index.md (3)

1-1: Added appropriate anchor for documentation cross-referencing

The added anchor (io-subsystem)= will improve navigation and cross-referencing within the documentation.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (io-subsystem)= # I/O Subsystem Load and extract data int...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

57-57: Environment variable name standardized correctly

The change from CRATEDB_CLOUD_CLUSTER_ID to CRATEDB_CLUSTER_ID aligns with the PR's objective to unify cluster addressing and configuration across the toolkit.

64-64: Updated to use CDN URL for example dataset

The example now uses a CDN URL (cdn.crate.io) instead of a GitHub raw URL, which should provide more reliable and potentially faster access to the example dataset.
cratedb_toolkit/cli.py (2)

16-16: LGTM: New import for .env support

This import of init_dotenv supports loading environment variables from .env files, which aligns with the PR objective to handle configuration through environment variables stored in dotenv files.

25-25:
✅ Verification successful

Ensure .env files are properly loaded before CLI execution

The addition of init_dotenv() call ensures that environment variables from .env files are loaded before any CLI logic executes. This enhances configuration flexibility for users.

🏁 Script executed:
#!/bin/bash
# Check if init_dotenv includes proper error handling for missing .env files
rg -A 5 "init_dotenv" --type py cratedb_toolkit/util/setting.py
Length of output: 467

We need to inspect what init_dotenv() actually does to confirm that it loads environment variables. Let’s pull in more of the implementation around find_dotenv and see if load_dotenv() is ever called or if any error handling exists.

🏁 Script executed:
#!/bin/bash
# Show the full init_dotenv function and related imports
rg -n "def init_dotenv" -A 20 cratedb_toolkit/util/setting.py
rg -n "load_dotenv" -n cratedb_toolkit/util/setting.py
rg -n "dotenv" -n cratedb_toolkit/util/setting.py
Length of output: 1398
Verified: .env loading is correctly implemented

In cratedb_toolkit/util/setting.py, init_dotenv() calls find_dotenv(), logs the resolved path, and invokes load_dotenv(), ensuring any .env variables are loaded before CLI execution.

Per python-dotenv’s default behavior, missing .env files are silently ignored, which is acceptable for optional configuration.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 25-25: cratedb_toolkit/cli.py#L25
Added line #L25 was not covered by tests
CHANGES.md (3)

7-11: LGTM: Comprehensive Cloud API feature documentation

Clear documentation of the new Cloud API features, including support for headless/unattended operations and the JWT authentication additions.

16-17: LGTM: Clear documentation of breaking changes

Explicit documentation of the breaking changes around environment variable and CLI option naming consolidation, which helps users adapt to the new API.

40-40: LGTM: Update to new environment variable naming

Good documentation of the Query Collector's update to use the new consolidated environment variable.

cratedb_toolkit/cfr/cli.py (9)

15-15: LGTM: Updated import path

Updated import path for make_cli to use the new location in util.app instead of api.cli.

32-32: LGTM: Unified cluster addressing

These changes align with the PR's goal of consolidating separate URL options into a single cluster_url parameter.

Also applies to: 35-35

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 32-32: cratedb_toolkit/cfr/cli.py#L32
Added line #L32 was not covered by tests

61-61: LGTM: Consistent naming in sys_import

The changes to use cluster_url in the sys_import command maintain consistency with the unified cluster addressing approach.

Also applies to: 63-63

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 61-61: cratedb_toolkit/cfr/cli.py#L61
Added line #L61 was not covered by tests

77-77: LGTM: Updated example to use new environment variable

The help text now correctly references the consolidated CRATEDB_CLUSTER_URL environment variable.

112-112: LGTM: Standardized URL retrieval for job statistics collection

Updated to use the unified cluster_url from context metadata for database address retrieval.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 112-112: cratedb_toolkit/cfr/cli.py#L112
Added line #L112 was not covered by tests

141-141: LGTM: Consistent URL handling in job statistics view

Using the standardized cluster_url for database address information maintains consistency with other commands.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 141-141: cratedb_toolkit/cfr/cli.py#L141
Added line #L141 was not covered by tests

164-165: LGTM: Updated environment variable for job statistics reporting

Updated to use the standardized cluster_url for database addressing and setting the new consolidated environment variable.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 164-165: cratedb_toolkit/cfr/cli.py#L164-L165
Added lines #L164 - L165 were not covered by tests

181-182: LGTM: Consistent URL handling in job statistics UI

Using the standardized cluster_url for database addressing and setting the new consolidated environment variable maintains consistency.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 181-182: cratedb_toolkit/cfr/cli.py#L181-L182
Added lines #L181 - L182 were not covered by tests

207-207: LGTM: Updated info recorder to use standardized URL

The info record command now uses the unified cluster_url for consistency with other commands.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 207-207: cratedb_toolkit/cfr/cli.py#L207
Added line #L207 was not covered by tests

cratedb_toolkit/__init__.py (3)

1-1: LGTM: Added linter directive

Adding the ruff: noqa: E402 directive suppresses linter warnings about import positioning, which is necessary due to the code structure.

18-21: LGTM: Exposing key cluster abstractions

Exposing the new cluster abstractions (DatabaseCluster, ManagedCluster, StandaloneCluster) and configuration functions at the package level provides a clean API for users.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests

22-29: LGTM: Well-defined public API

The explicit __all__ list clearly defines the public API of the package, which is a good practice for maintainability.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by tests

cratedb_toolkit/adapter/rockset/cli.py (1)

9-9: Consistent CLI option refactoring for unified cluster addressing.

The code has been properly refactored to use the standard option_cluster_url imported from cratedb_toolkit.option instead of a local definition. The parameter name in the function signature and all references have been updated consistently from cratedb_sqlalchemy_url to cluster_url. The environment variable in the help text has also been updated from CRATEDB_SQLALCHEMY_URL to CRATEDB_CLUSTER_URL.

This change aligns with the unified cluster management approach described in the PR objectives.

Also applies to: 22-22, 29-29, 34-34, 38-38, 41-41

doc/backlog/main.md (2)

3-17: Well-prioritized backlog for future API improvements.

The new "Iteration +0" section outlines important next steps for improving the CLI API, particularly around cluster addressing. The proposal to consolidate --cluster-url, --cluster-id, and --cluster-name into a single, more intuitive --cluster option could significantly improve the user experience.

The backlog also identifies important operational issues such as table pruning on CrateDB Cloud and verbose logging during tests, which could affect system reliability and developer experience.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...

(UNLIKELY_OPENING_PUNCTUATION)

[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...

(SOONER_OR_LATER)

57-57: Consistent command and environment variable naming.

The backlog has been updated to use the new command name ctk cluster list-jobs (instead of ctk list-jobs) and the environment variable CRATEDB_CLUSTER_ID (instead of CRATEDB_CLOUD_CLUSTER_ID), maintaining consistency with the changes in the actual implementation.

Also applies to: 62-62

doc/cluster/python.md (1)

1-80: Comprehensive documentation for the Python cluster management API.

The documentation provides a clear and complete guide to using the ManagedCluster class, covering installation, authentication, configuration, and usage with practical examples. The explanation of context manager behavior and the stop_on_exit parameter is particularly valuable for users to understand resource management.

The documentation also follows good practices by including:

Multiple authentication options (interactive and headless)

Environment variable configuration

Explicit notes about precedence and mutual exclusivity

Code examples demonstrating real-world usage

References to a full tutorial for more details

🧰 Tools

🪛 LanguageTool

[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)

doc/util/shell.md (2)

34-46: Excellent authentication documentation for CrateDB Cloud.

This section clearly explains the two authentication methods for CrateDB Cloud: interactive login with identity providers and headless authentication using API keys. The examples are helpful and easy to follow.

56-76: Connection examples align well with the unified cluster addressing scheme.

The examples for connecting to a standalone CrateDB instance properly demonstrate the use of the new unified --cluster-url option and the corresponding environment variable CRATEDB_CLUSTER_URL, which aligns with the broader refactoring across the toolkit.

cratedb_toolkit/io/cli.py (4)

8-10: Import changes reflect the new unified cluster addressing scheme.

The change from importing multiple cluster-related classes to a single DatabaseCluster import simplifies the code and aligns with the refactoring to consolidate cluster addressing options.

30-32: CLI options have been unified for cluster addressing.

The replacement of previous separate options with new consolidated options (option_cluster_id, option_cluster_name, option_cluster_url) improves the consistency of the CLI interface.

60-61: Renamed variable for clarity.

Changing the variable name from resource to source improves readability by making it clearer that this represents the source data for the import operation.

64-69: Simplified cluster creation using the new factory method.

The code now uses the DatabaseCluster.create factory method which encapsulates the logic for determining the cluster type based on the provided parameters. This is a significant improvement in code simplicity and maintainability.

cratedb_toolkit/model.py (3)

17-52: Well-structured validation for cluster address options.

The new ClusterAddressOptions dataclass provides a clean way to encapsulate and validate cluster address inputs. The validation logic ensures mutual exclusivity and handles edge cases properly.

90-99: Good addition of credential injection method.

The with_credentials method provides a clean way to add authentication credentials to a database address, supporting a fluent interface pattern.

114-116: Improved default handling for missing host.

Setting a default host of "localhost" when none is provided improves the robustness of the httpuri property.

doc/cluster/backlog.md (1)

1-13: Well-structured backlog document with clear contribution guidelines.

The introduction sets a positive tone for contributions and provides clear information about how to get involved with the project.

cratedb_toolkit/util/croud.py (2)

282-321: Well-implemented table quoting function

The table_fqn function correctly handles various table name scenarios, including schema-qualified names and already-quoted names. The docstring is thorough with good examples and edge case handling for empty tables.

47-62: Well-designed safedecode method with clean error handling

The invoke_safedecode method properly handles partial JSON data and unpacks the tuple returned by JSONDecoder.raw_decode(). The comments clearly explain why this is needed as a workaround for upstream issues.

cratedb_toolkit/cluster/croud.py (1)

24-62: Well-designed URL generators

The CloudRootUrlGenerator class is well-designed with clear properties for generating different API URLs. The with_organization method provides a fluent interface for setting the organization ID.
cratedb_toolkit/cluster/core.py (2)

426-435: 🛠️ Refactor suggestion

JWT token expiration not handled

The current implementation creates a JWT token context once in probe() and reuses it for every query. If the token expires (typically after 1 hour), subsequent queries will fail.

Track token creation time and refresh it when needed:
def __init__(self, ...):
    # Existing code...
    self._jwt_ctx = nullcontext()
+   self._jwt_token_created_at = 0  # Timestamp when token was created

def probe(self):
    # Existing code...
    self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
+   self._jwt_token_created_at = time.time()
    # Rest of the method...

def query(self, sql: str):
    # Ensure we have cluster connection details
    if not self.info or not self.info.cloud.get("url"):
        self.probe()
+   
+   # Refresh token if it's older than 30 minutes (to be safe before the 1h expiry)
+   current_time = time.time()
+   if current_time - self._jwt_token_created_at > 1800:  # 30 minutes
+       logger.debug("Refreshing JWT token")
+       self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
+       self._jwt_token_created_at = current_time
    
    with self._jwt_ctx:
        # Rest of the method...
Likely an incorrect or invalid review comment.

324-328: ⚠️ Potential issue

Add validation for empty operation

The code doesn't check if self.operation is None before calling suspend(), which could lead to an AttributeError.

Add validation for empty operation:
        if self.cluster_id is None:
            raise DatabaseAddressMissingError("Need cluster identifier to stop cluster")
        logger.info(f"Stopping CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}")
-        if self.operation:
-            self.operation.suspend()
+        if not self.operation:
+            # If the operation is None, try to probe the cluster first to initialize it
+            logger.debug("Operation not initialized, probing cluster")
+            self.probe()
+            if not self.operation:
+                raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}")
+        
+        self.operation.suspend()
        self.probe()
        return self
Likely an incorrect or invalid review comment.

coderabbitai · 2025-05-04T13:14:07Z

cratedb_toolkit/cfr/marimo.py

+    sqlalchemy_url = os.getenv("CRATEDB_CLUSTER_URL", "crate://?schema=stats")
+    engine = sa.create_engine(sqlalchemy_url)


💡 Verification agent

❓ Verification inconclusive

Environment variable and variable name standardized correctly

The change from CRATEDB_SQLALCHEMY_URL to CRATEDB_CLUSTER_URL and updating the local variable name aligns with the PR objective to unify cluster addressing across the codebase.

While the code changes look good, the static analysis tool indicates this code isn't covered by tests. Consider adding test coverage for this Marimo application.

🏁 Script executed:

#!/bin/bash # Check if there are any tests for Marimo integration rg "test.*marimo" --type py tests/

Length of output: 36

Add test coverage for Marimo integration

No tests currently cover cratedb_toolkit/cfr/marimo.py (lines 53–54). Please add unit or integration tests to exercise the new CRATEDB_CLUSTER_URL logic, for example:

• Create a new test file tests/test_marimo.py
• Mock os.getenv to verify both the default URL ("crate://?schema=stats") and an overridden CRATEDB_CLUSTER_URL
• Mock sa.create_engine to assert it’s called with the expected connection string

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/cfr/marimo.py at lines 53-54, add test coverage for the new environment variable logic by creating a test file tests/test_marimo.py. In this test, mock os.getenv to simulate both the default value "crate://?schema=stats" and a custom CRATEDB_CLUSTER_URL, and mock sa.create_engine to verify it is called with the correct connection string in each case. This will ensure the environment variable handling and engine creation are properly tested.

coderabbitai · 2025-05-04T13:14:08Z

doc/io/index.md

+export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400'
 ```


💡 Verification agent

🧩 Analysis chain

Missing username and password environment variables

The example removed the previously included environment variables for CRATEDB_USERNAME and CRATEDB_PASSWORD. Users might need these credentials to authenticate with the cluster.

Is authentication now handled differently, or should these environment variables still be included in the example?

🏁 Script executed:

#!/bin/bash # Look for references to authentication in the codebase rg "CRATEDB_USERNAME|CRATEDB_PASSWORD" --type md doc/

Length of output: 127

Restore missing authentication environment variables
The doc/io/index.md example only sets CRATEDB_CLUSTER_ID, but users still need credentials to connect. Unless you’ve switched to a token-based or OIDC flow, please re-add the username/password vars around lines 57–58:

export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400' +export CRATEDB_USERNAME='admin' +export CRATEDB_PASSWORD='<your-password>'

If authentication truly moved to another mechanism (e.g. API tokens or SSO), update the example and note the new steps here.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400'

```

export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400'

export CRATEDB_USERNAME='admin'

export CRATEDB_PASSWORD='<your-password>'

🤖 Prompt for AI Agents (early access)

In doc/io/index.md around lines 57 to 58, the example sets only CRATEDB_CLUSTER_ID but omits CRATEDB_USERNAME and CRATEDB_PASSWORD environment variables needed for authentication. Restore these two variables in the example to ensure users have the necessary credentials to connect, unless authentication has changed to a different method. If authentication has changed, update the example and documentation to reflect the new process clearly.

coderabbitai · 2025-05-04T13:14:08Z

cratedb_toolkit/adapter/rockset/cli.py

+    if not cluster_url:
        logger.error("Unable to operate without database address")
        sys.exit(1)
-    ctx.meta.update({"cratedb_sqlalchemy_url": cratedb_sqlalchemy_url})
+    ctx.meta.update({"cluster_url": cluster_url})


💡 Verification agent

🧩 Analysis chain

Consider adding test coverage for the error path.

The error handling for a missing cluster URL is not covered by tests according to the static analysis hints. This is a critical validation path that ensures the CLI fails properly when required parameters are missing.

🏁 Script executed:

#!/bin/bash # Check for existing test coverage of the error path # Look for test files related to rockset cli echo "Test files for rockset CLI:" fd "test.*rockset.*cli" -t f # Check if there are any tests that specifically test the error path echo -e "\nTests for missing cluster_url validation:" rg -A 3 -B 3 "missing.*cluster.*url|no.*cluster.*url|without.*cluster.*url" --glob="*test*" --glob="*.py"

Length of output: 1228

Add tests for missing cluster URL error path

No existing tests cover the scenario where cluster_url is not provided in the CLI. Please add unit tests to verify that the application:

Logs "Unable to operate without database address"

Exits with status code 1

at both locations:

cratedb_toolkit/adapter/rockset/cli.py (lines 38–41)

cratedb_toolkit/cmd/tail/cli.py

These tests should simulate invoking the CLI without a cluster_url and assert on the logged output and exit code.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests

[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/adapter/rockset/cli.py around lines 38 to 41, add unit tests to cover the error path when cluster_url is missing. The tests should simulate running the CLI without providing cluster_url, then assert that the logger outputs "Unable to operate without database address" and that the process exits with status code 1. This ensures the critical validation path is properly tested and the CLI fails as expected.

coderabbitai · 2025-05-04T13:14:08Z

cratedb_toolkit/io/croud.py

+    def __post_init__(self):
+        self.fix_job_info_table_name()
+
+    @classmethod
+    def unknown(cls, message: str):
+        cj = cls()
+        cj._custom_message = message
+        cj._custom_status = "UNKNOWN"
+        return cj
+


🛠️ Refactor suggestion

__post_init__ could handle non-existent keys more robustly

The fix_job_info_table_name method is called in __post_init__, but there's no check if self.info has been initialized correctly before accessing its nested keys. For jobs created with CloudJob.unknown(), info will be an empty dict.

def __post_init__(self): - self.fix_job_info_table_name() + if self.info: # Only attempt to fix table name if info dictionary is populated + self.fix_job_info_table_name()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __post_init__(self):

self.fix_job_info_table_name()

@classmethod

def unknown(cls, message: str):

cj = cls()

cj._custom_message = message

cj._custom_status = "UNKNOWN"

return cj

def __post_init__(self):

- self.fix_job_info_table_name()

+ if self.info: # Only attempt to fix table name if info dictionary is populated

+ self.fix_job_info_table_name()

@classmethod

def unknown(cls, message: str):

cj = cls()

cj._custom_message = message

cj._custom_status = "UNKNOWN"

return cj

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/io/croud.py around lines 37 to 46, the __post_init__ method calls fix_job_info_table_name without verifying if self.info is properly initialized, which can cause errors when info is an empty dict as in CloudJob.unknown(). Modify __post_init__ to check if self.info exists and contains the expected keys before calling fix_job_info_table_name, or update fix_job_info_table_name to handle missing or empty info gracefully to avoid key errors.

coderabbitai · 2025-05-04T13:14:08Z

cratedb_toolkit/cluster/cli.py

+@contextlib.contextmanager
+def handle_command_errors(operation_name):
+    """Handle common command errors and exit with appropriate error messages."""
+    try:
+        yield
+    except CroudException as ex:
+        logger.error(
+            f"Failed to {operation_name}. "
+            f"Please check if you are addressing the right cluster, "
+            f"and if credentials and permissions are valid. "
+            f"The underlying error was: {ex}"
+        )
+        raise
+    except ClickException:
+        raise
+    except Exception as ex:
+        logger.exception(f"Unexpected error on operation: {operation_name}")
+        raise SystemExit(1) from ex


💡 Verification agent

🧩 Analysis chain

Add tests for handle_command_errors context manager

This error-handling code lacks test coverage according to static analysis, which could lead to unexpected behavior in error cases.

You should add tests that verify:

A CroudException is logged and re-raised

A ClickException passes through without modification

A generic exception is converted to SystemExit(1)

🏁 Script executed:

#!/bin/bash # Check for existing tests for handle_command_errors rg "handle_command_errors" tests/ || echo "No tests found for the handle_command_errors context manager"

Length of output: 164

Add tests for the handle_command_errors context manager

The handle_command_errors helper in cratedb_toolkit/cluster/cli.py currently has no test coverage. Please add unit tests that verify:

A CroudException is caught, logged via logger.error(...), and then re-raised.

A ClickException passes straight through unchanged.

Any other Exception is logged via logger.exception(...) and translated into a SystemExit(1).

Suggested test locations:

tests/test_cli_error_handling.py

Example using pytest and caplog:

import pytest import contextlib from click import ClickException from cratedb_toolkit.cluster.cli import handle_command_errors from croud.exceptions import CroudException def test_croud_exception_is_logged_and_reraised(caplog): caplog.set_level("ERROR") with pytest.raises(CroudException): with handle_command_errors("do_op"): raise CroudException("oops") assert "Failed to do_op." in caplog.text assert "The underlying error was: oops" in caplog.text def test_click_exception_passes_through(): with pytest.raises(ClickException): with handle_command_errors("do_op"): raise ClickException("click fail") def test_generic_exception_becomes_system_exit(caplog): caplog.set_level("ERROR") with pytest.raises(SystemExit) as excinfo: with handle_command_errors("do_op"): raise ValueError("bad") assert excinfo.value.code == 1 assert "Unexpected error on operation: do_op" in caplog.text

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/cluster/cli.py around lines 186 to 203, the handle_command_errors context manager lacks test coverage. Add unit tests in a new file like tests/test_cli_error_handling.py that verify: a CroudException is logged with logger.error and re-raised, a ClickException passes through unchanged, and any other Exception is logged with logger.exception and converted into a SystemExit(1). Use pytest and caplog to capture logs and exceptions as shown in the provided example.

coderabbitai · 2025-05-04T13:14:08Z

cratedb_toolkit/cluster/croud.py

+        # TODO: Add documentation about those environment variables.
+        cratedb_cloud_tier = os.environ.get("CRATEDB_CLOUD_TIER", "default")
+        cratedb_cloud_product = os.environ.get("CRATEDB_CLOUD_PRODUCT", "crfree")  # s2, etc.
+        cratedb_channel = os.environ.get("CRATEDB_CHANNEL", "stable")
+        cratedb_version = os.environ.get("CRATEDB_VERSION", DEFAULT_CRATEDB_VERSION)


🛠️ Refactor suggestion

Environment variables need documentation

The code uses several environment variables (CRATEDB_CLOUD_TIER, CRATEDB_CLOUD_PRODUCT, CRATEDB_CHANNEL, and CRATEDB_VERSION) that lack documentation as noted in the TODO comment.

Remove the TODO and add detailed documentation for these variables in the user documentation. The variables should be explained with:

Their purpose

Default values

Allowed values (e.g., valid tiers and products)

Examples of usage

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/cluster/croud.py around lines 243 to 247, remove the TODO comment about documenting environment variables and instead add detailed documentation for CRATEDB_CLOUD_TIER, CRATEDB_CLOUD_PRODUCT, CRATEDB_CHANNEL, and CRATEDB_VERSION in the user documentation. The documentation should explain each variable's purpose, default values, allowed values (such as valid tiers and products), and provide usage examples.

coderabbitai · 2025-05-04T13:14:08Z

cratedb_toolkit/cluster/core.py

+    def __enter__(self):
+        """Enter the context manager, ensuring the cluster is running."""
+        return self.start()
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        """Exit the context manager, suspending the cluster."""
+        if exc_type is not None:
+            # Log the exception but still attempt to suspend the cluster
+            logger.error(f"Exception occurred while shutting down: {exc_type.__name__}: {exc_val}")
+
+        try:
+            self.close_connections()
+        except Exception as ex:
+            logger.error(f"Failed to close connections: {ex}")
+        finally:
+            if self.stop_on_exit:
+                try:
+                    self.stop()
+                    logger.info(f"Successfully stopped cluster: id={self.cluster_id}, name={self.cluster_name}")
+                except Exception as ex:
+                    logger.error(f"Failed to stop cluster: {ex}")
+
+        # Don't suppress the original exception.
+        return False


💡 Verification agent

🧩 Analysis chain

Ensure context manager test coverage

The context manager implementation (__enter__ and __exit__) lacks test coverage according to static analysis hints. These methods are critical as they ensure proper resource acquisition and cleanup, especially when handling errors.

Consider adding tests for:

Normal operation (verifying start() is called and its value returned)

Error scenarios (exceptions inside the with block)

Cleanup behavior (ensuring close_connections() is always called, and stop() is called when stop_on_exit=True)

🏁 Script executed:

#!/bin/bash # Check for any existing context manager tests echo "Searching for context manager tests..." rg -p "def test.*context|with .*[Cc]luster" tests/ echo "Searching for tests of __enter__ or __exit__..." rg -p "__enter__|__exit__" tests/

Length of output: 706

Add explicit tests for the cluster context manager

The __enter__/__exit__ implementation in cratedb_toolkit/cluster/core.py (lines 142–165) is critical for starting, cleaning up, and stopping clusters but isn’t directly exercised by existing tests. Please add focused unit tests that:

Verify normal operation:
• Mock start() and assert it’s called once and its return value is yielded by the with block.

Cover exception handling:
• Simulate an exception inside the with block, assert that the error is logged, that close_connections() still runs, and that the original exception propagates.

Ensure cleanup on exit:
• With stop_on_exit=True, mock and assert calls to close_connections() and then stop(), verifying the correct call order.

Location to test:
• cratedb_toolkit/cluster/core.py @ lines 142–165

Tests can live under tests/cluster/… and use unittest.mock (or pytest’s monkeypatch) to spy on start(), close_connections(), and stop().

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 144-144: cratedb_toolkit/cluster/core.py#L144
Added line #L144 was not covered by tests

[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests

🤖 Prompt for AI Agents (early access)

In tests/cluster/, add new unit tests targeting cratedb_toolkit/cluster/core.py lines 142-165 to cover the context manager methods __enter__ and __exit__. Create tests that mock start() to verify it is called once and its return value is yielded during normal operation. Add tests simulating exceptions inside the with block to confirm that logger.error is called with the exception details, close_connections() is still invoked, and the original exception is propagated. Also, test cleanup behavior with stop_on_exit=True by mocking close_connections() and stop(), asserting both are called in the correct order on exit. Use unittest.mock or pytest monkeypatch to spy on these methods and logger calls.

coderabbitai · 2025-05-04T13:14:08Z

cratedb_toolkit/cluster/core.py

+        settings = obtain_settings(specs=cls.settings_spec)
+        check_mutual_exclusiveness(specs=cls.settings_spec, settings=settings)
+        return cls(**settings)


💡 Verification agent

🧩 Analysis chain

Add test coverage for ManagedClusterSettings.from_cli_or_env

The static analysis hints indicate that the from_cli_or_env method lacks test coverage. This method is critical for obtaining and validating settings from CLI or environment variables.

Add tests to verify that:

Settings are correctly obtained from CLI arguments

Settings are correctly obtained from environment variables

Mutual exclusiveness checks work as expected

Error cases are handled properly

🏁 Script executed:

#!/bin/bash # Check for existing tests related to settings echo "Searching for tests related to ManagedClusterSettings..." rg -p "ManagedClusterSettings" tests/

Length of output: 450

🏁 Script executed:

#!/bin/bash # Find the definition of from_cli_or_env in the codebase echo "Searching for from_cli_or_env definition..." rg -n "def from_cli_or_env" -n cratedb_toolkit/cluster/core.py # Find any existing tests invoking from_cli_or_env echo "Searching for from_cli_or_env usage in tests..." rg -n "from_cli_or_env" -n tests/

Length of output: 333

Add missing tests for ManagedClusterSettings.from_cli_or_env

The from_cli_or_env class method (cratedb_toolkit/cluster/core.py:98–101) isn’t invoked by any existing tests. Please add coverage to verify that:

CLI arguments are correctly parsed and mapped to ManagedClusterSettings

Environment variables are picked up when CLI args are absent

check_mutual_exclusiveness enforces exclusive settings as expected

Error conditions (e.g. conflicting flags or missing required settings) raise the proper exceptions

Note that tests/conftest.py already references settings_spec but does not call from_cli_or_env directly—add new tests under tests/ to cover these scenarios.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/cluster/core.py around lines 99 to 101, the class method from_cli_or_env lacks test coverage. Add new tests under the tests/ directory that invoke from_cli_or_env to verify it correctly obtains settings from CLI arguments and environment variables, enforces mutual exclusiveness via check_mutual_exclusiveness, and properly raises exceptions on error cases such as conflicting or missing settings. Ensure these tests cover all expected behaviors and edge cases for this method.

coderabbitai · 2025-05-04T13:14:09Z

cratedb_toolkit/cluster/core.py

+        # Wait a bit to let the deployment settle, mostly to work around DNS propagation issues.
+        attempts = 0
+        while not self.info.ready and attempts < self.POST_DEPLOY_WAIT_ATTEMPTS:
+            attempts += 1
+            self.probe()
+            time.sleep(self.POST_DEPLOY_WAIT_PAUSE)


🛠️ Refactor suggestion

DNS propagation delay handling is brittle

The fixed-sleep approach for DNS propagation is brittle. DNS propagation timing varies greatly across networks and might not be resolved by a fixed sleep duration.

Implement a hybrid approach with initial wait followed by exponential backoff:

- # Wait a bit to let the deployment settle, mostly to work around DNS propagation issues. - attempts = 0 - while not self.info.ready and attempts < self.POST_DEPLOY_WAIT_ATTEMPTS: - attempts += 1 - self.probe() - time.sleep(self.POST_DEPLOY_WAIT_PAUSE) + # Hybrid approach for DNS propagation + # First, a short initial sleep to avoid negative DNS caching + time.sleep(1.5) + + # Then use exponential backoff for active polling + import random + from boltons.urlutils import URL + + max_retries = 10 + base_delay = 0.5 + for attempt in range(max_retries): + # Check if the cluster is ready + self.probe() + if self.info.ready: + break + + # Try to verify DNS resolution if URL is available + try: + cluster_url = self.info.cloud.get("url") + if cluster_url: + # Test DNS resolution + import socket + hostname = URL(cluster_url).host + socket.gethostbyname(hostname) + logger.info(f"DNS resolved after {attempt+1} attempts") + except Exception as ex: + logger.debug(f"Waiting for DNS propagation (attempt {attempt+1}): {ex}") + + # Exponential backoff with jitter + jitter = random.uniform(0, 0.1) + delay = min(base_delay * (2 ** attempt) + jitter, 10) # Cap at 10 seconds + time.sleep(delay)

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/cluster/core.py around lines 290 to 295, the current fixed sleep loop for waiting on DNS propagation is brittle and may not handle variable delays well. Replace the fixed sleep with a hybrid approach: start with an initial short wait, then use exponential backoff for subsequent retries, increasing the wait time after each failed probe attempt. This will make the DNS propagation wait more adaptive and robust.

coderabbitai · 2025-05-04T13:14:09Z

cratedb_toolkit/cluster/core.py

+        if self.cluster_id is None:
+            raise DatabaseAddressMissingError("Need cluster identifier to resume cluster")
+        logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}")
+        if self.operation:
+            self.operation.resume()
+        self.probe()


⚠️ Potential issue

Add validation for empty operation in resume method

Similar to the stop method, the resume method doesn't check if self.operation is None before calling resume(), which could lead to an AttributeError.

Add validation for empty operation:

if self.cluster_id is None: raise DatabaseAddressMissingError("Need cluster identifier to resume cluster") logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}") - if self.operation: - self.operation.resume() + if not self.operation: + # If the operation is None, try to probe the cluster first to initialize it + logger.debug("Operation not initialized, probing cluster") + self.probe() + if not self.operation: + raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}") + + self.operation.resume() self.probe() return self

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.cluster_id is None:

raise DatabaseAddressMissingError("Need cluster identifier to resume cluster")

logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}")

if self.operation:

self.operation.resume()

self.probe()

--- a/cratedb_toolkit/cluster/core.py

+++ b/cratedb_toolkit/cluster/core.py

@@ def resume(self):

if self.cluster_id is None:

raise DatabaseAddressMissingError("Need cluster identifier to resume cluster")

logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}")

- if self.operation:

- self.operation.resume()

+ if not self.operation:

+ # If the operation hasn’t been initialized yet, probe to set it up

+ logger.debug("Operation not initialized, probing cluster")

+ self.probe()

+ if not self.operation:

+ raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}")

+

+ # Now resume the operation

+ self.operation.resume()

self.probe()

return self

🤖 Prompt for AI Agents (early access)

In cratedb_toolkit/cluster/core.py around lines 309 to 314, the resume method calls self.operation.resume() without checking if self.operation is None, which can cause an AttributeError. Add a validation to check if self.operation is not None before calling resume(), similar to the stop method, to prevent errors when operation is empty.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

doc/cluster/backlog.md (1)
35-41: Specify a clear resolution after status evaluation
The backlog describes checking cloud.info.last_async_operation.status but stops short of advising what to do next. A concrete next step (e.g., retry behavior or user notification) will help contributors implement this scenario correctly.

Example addition:
  => Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`.
+ => If still in progress, retry with exponential backoff or inform the user to wait before retrying.
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

41-41: Bare URL used
null

(MD034, no-bare-urls)

🧹 Nitpick comments (9)

doc/backlog/main.md (4)
3-3: Nested header level for clarity. The new ## Iteration +0 header is at the same level as the top-level backlog title (## Main Backlog), making it appear as a peer section rather than a sub-section. Consider demoting it to ### Iteration +0 to maintain a clear hierarchy.

12-15: Refine debug log snippet and add language tag. The fenced code block for the HTTP client debug output currently lacks a language identifier (MD040) and includes full JSON responses that may be overly verbose. Consider:

Adding a language tag (e.g., text or shell).

Summarizing or truncating the logs (e.g., using ...) to keep the backlog concise.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

12-12: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

57-57: Escape inner quotes in code quote. The bullet Make \ctk cluster list-jobs` respect "status": "SUCCEEDED"` mixes backticks and double quotes, which can confuse Markdown renderers. Consider using single quotes for the status or escaping the inner quotes for clarity:
- UX: Make `ctk cluster list-jobs` respect 'status': 'SUCCEEDED'
61-63: Specify language for fenced environment snippet. The block showing the CRATEDB_CLUSTER_URL example is fenced without a language. Add shell (e.g., ```shell) to comply with markdownlint (MD040) and improve readability.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

61-61: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)
doc/cluster/backlog.md (5)
41-41: Wrap the bare URL in a markdown link
Bare URLs trigger markdownlint's MD034 rule. Converting this to a descriptive link improves readability and lint compliance.
- See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384.
+ See also [GitHub Actions run #14682363239](https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384).
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

41-41: Bare URL used
null

(MD034, no-bare-urls)

47-47: Wrap the bare URL in a markdown link
Avoid bare URLs per MD034; use a labeled hyperlink.
- - Update `shunit2`. -- https://github.com/kward/shunit2
+ - Update `shunit2`. -- [kward/shunit2 on GitHub](https://github.com/kward/shunit2)
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

47-47: Bare URL used
null

(MD034, no-bare-urls)

51-51: Remove stray backtick at end of line
There's an unmatched backtick that may render incorrectly in Markdown.
- - Naming things: s/UniversalCluster/DatabaseCluster/`
+ - Naming things: s/UniversalCluster/DatabaseCluster/
72-72: Add a language specifier to the fenced code block
Specifying a language tag enables syntax highlighting and satisfies MD040.
-   ```
+   ```json
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

72-72: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

87-87: Remove unnecessary comma before "because"
The comma before "because" is not needed when the clause is essential.
- - Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`, 
+ - Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`
🧰 Tools

🪛 LanguageTool

[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of a ManagedCluster, because parameters like username and `passwor...

(COMMA_BEFORE_BECAUSE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d92d5c0 and 07efa81.

📒 Files selected for processing (4)

cratedb_toolkit/shell/cli.py (3 hunks)
doc/backlog/main.md (2 hunks)
doc/cluster/backlog.md (1 hunks)
tests/examples/test_shell.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/examples/test_shell.sh
cratedb_toolkit/shell/cli.py

🧰 Additional context used

🪛 LanguageTool

doc/cluster/backlog.md

[uncategorized] ~65-~65: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start: Avoid creating a project on each deploy...

(UNLIKELY_OPENING_PUNCTUATION)

[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of a ManagedCluster, because parameters like username and `passwor...

(COMMA_BEFORE_BECAUSE)

doc/backlog/main.md

[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...

(UNLIKELY_OPENING_PUNCTUATION)

[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...

(SOONER_OR_LATER)

[uncategorized] ~59-~59: The preposition ‘to’ seems more likely in this position.
Context: ... Accept alias --format {jsonl,ndjson} for --format json_row - Catch recursion ...

(AI_HYDRA_LEO_REPLACE_FOR_TO)

🪛 markdownlint-cli2 (0.17.2)

doc/cluster/backlog.md

41-41: Bare URL used
null

(MD034, no-bare-urls)

47-47: Bare URL used
null

(MD034, no-bare-urls)

72-72: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

doc/backlog/main.md

12-12: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

61-61: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (12)

GitHub Check: build-and-publish
GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
GitHub Check: CFR for OS windows-latest
GitHub Check: CFR for OS ubuntu-latest
GitHub Check: Python 3.11, LocalStack 3.6, OS ubuntu-latest
GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
GitHub Check: Python 3.9, LocalStack 3.6, OS ubuntu-latest
GitHub Check: CFR for OS macos-13

🔇 Additional comments (4)

doc/backlog/main.md (4)

4-5: Clarify CLI flag consolidation. You propose both a canonical --cluster-url and a single --cluster option. It may be helpful to specify whether they serve different purposes or if one replaces multiple existing flags. This will guide future implementation and documentation.

8-8: Verify migration of CLI metadata. Ensure that cratedb_toolkit/cfr/cli.py is updated to set and pass ctx.meta["address"] identically to cratedb_toolkit/info/cli.py, and update any associated tests or documentation.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...

(UNLIKELY_OPENING_PUNCTUATION)

62-62: Verify updated environment variable. Confirm that the SDK and CLI configuration loader accept CRATEDB_CLUSTER_URL (instead of the deprecated CRATEDB_SQLALCHEMY_URL), and update any references in the documentation or .env examples accordingly.

70-70: Ensure constant is defined and used. The backlog notes storing CRATEDB_CLUSTER_ID in cratedb_toolkit.constants. Verify that this constant is declared in the codebase, documented in constants.py, and consumed by the CLI/SDK as intended.

amotl commented Nov 14, 2023

View reviewed changes

cratedb_toolkit/cluster/croud.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

amotl force-pushed the amo/cloud-second branch 3 times, most recently from 89f8c3a to f794fd9 Compare November 15, 2023 12:19

amotl mentioned this pull request Nov 16, 2023

[FALSE POSITIVE] "S608 Possible SQL injection" when not applicable astral-sh/ruff#8717

Open

amotl force-pushed the amo/cloud-second branch from 119c7ac to a9f4d82 Compare November 16, 2023 13:10

amotl marked this pull request as ready for review November 16, 2023 13:14

amotl commented Nov 17, 2023

View reviewed changes

amotl force-pushed the amo/cloud-second branch 3 times, most recently from 062ee9b to 7d89748 Compare November 18, 2023 01:56

amotl commented Nov 20, 2023

View reviewed changes

This was referenced Dec 5, 2023

Timeseries: Add notebook about using CrateDB Cloud with Datashader crate/cratedb-examples#186

Draft

[LIB] Improve UX for ad hoc applications #92

Open

amotl mentioned this pull request Feb 5, 2024

Support for in-kernel patching? chrisjsewell/pytest-notebook#26

Open

amotl force-pushed the amo/cloud-second branch 3 times, most recently from 8ca895a to 0eabc99 Compare July 6, 2024 10:47

amotl force-pushed the amo/cloud-second branch 11 times, most recently from 46203fb to 62a3a62 Compare January 19, 2025 07:05

amotl added 25 commits May 4, 2025 15:02

Cluster API: Module namespace refactoring

252820b

Cluster API: Address suggestions from code review by CodeRabbit

ea3dbf6

Chore: Update metadata in pyproject.toml

8832788

Cluster API: Naming things and refactoring

e950e32

Cluster API: Introduce stop_on_exit option for context manager

587da48

Cluster API: When connecting per SQLAlchemy, default to localhost

426a2a6

Cluster API: When a croud.yaml file exists, use it

3e2d813

Cluster API: Improve error handling

914491e

Cluster API: Add documentation

56c8ebd

Naming things: Rename --cratedb-sqlalchemy-url to --sqlalchemy-url

557817b

... likewise, rename `--cratedb-http-url` to `--http-url`.

Naming things: Rename environment variables

d9ec94f

- `CRATEDB_CLOUD_CLUSTER_ID` to `CRATEDB_CLUSTER_ID` - `CRATEDB_CLOUD_CLUSTER_NAME` to `CRATEDB_CLUSTER_NAME`

Auth: Add JWT authentication to client API and ctk shell

e799aec

Cluster API: Address suggestions from code review by CodeRabbit

af88ebe

Cluster API: Naming things and refactoring

59a0e88

Cluster API: Naming things and refactoring

5c9e7fc

Cluster API: Improve deployment procedure

69c1133

Cluster API: Use CrateDB Nightly for testing

4ca26f6

Cluster API: Add health and ping subcommands to ctk cluster

9c7a141

Cluster API: This and that

dd377a4

Cluster API: Document stop_on_exit option

d851454

Cluster API: Re-establish blocking operations

57cc3fa

Cluster API: Add database information to ctk cluster info

1dc2936

Naming things: Use single --cluster-url and CRATEDB_CLUSTER_URL

1822614

... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.

Naming things: Use single --cluster-url and CRATEDB_CLUSTER_URL

2cbe824

... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.

amotl force-pushed the amo/cloud-second branch from db1e1c5 to d92d5c0 Compare May 4, 2025 13:02

coderabbitai bot reviewed May 4, 2025

View reviewed changes

Chore: This and that

07efa81

amotl force-pushed the amo/cloud-second branch from d92d5c0 to 07efa81 Compare May 4, 2025 19:28

coderabbitai bot reviewed May 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK and CLI for CrateDB Cloud Cluster APIs #81

SDK and CLI for CrateDB Cloud Cluster APIs #81

amotl commented Nov 14, 2023 •

edited

Loading

This comment was marked as outdated.

amotl Nov 17, 2023

amotl Apr 26, 2025 •

edited

Loading

amotl Nov 17, 2023

amotl Apr 26, 2025

This comment was marked as off-topic.

amotl Nov 17, 2023 •

edited

Loading

amotl Apr 26, 2025 •

edited

Loading

amotl Nov 20, 2023 •

edited

Loading

amotl Apr 26, 2025

amotl Apr 26, 2025

amotl Apr 26, 2025 •

edited

Loading

amotl May 2, 2025 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot May 4, 2025

coderabbitai bot left a comment

	@property
	def headless_config(self) -> Configuration:
	tmp_file = NamedTemporaryFile()
	tmp_path = Path(tmp_file.name)
	config = Configuration("headless.yaml", tmp_path)
	config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")
	config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")
	config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")
	return config

		sqlalchemy_url = os.getenv("CRATEDB_CLUSTER_URL", "crate://?schema=stats")
		engine = sa.create_engine(sqlalchemy_url)

		export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400'
		```

SDK and CLI for CrateDB Cloud Cluster APIs #81

Are you sure you want to change the base?

SDK and CLI for CrateDB Cloud Cluster APIs #81

Conversation

amotl commented Nov 14, 2023 • edited Loading

About

Documentation

Configuration

Python SDK

CLI interface

References

Backlog

Trivia

This comment was marked as outdated.

amotl Nov 17, 2023

Choose a reason for hiding this comment

Observation

Suggestion

amotl Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

amotl Nov 17, 2023

Choose a reason for hiding this comment

Observation

Suggestion

Example

amotl Apr 26, 2025

Choose a reason for hiding this comment

This comment was marked as off-topic.

amotl Nov 17, 2023 • edited Loading

Choose a reason for hiding this comment

Observation

Suggestion

amotl Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

amotl Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Observation

Suggestion

Footnotes

amotl Apr 26, 2025

Choose a reason for hiding this comment

amotl Apr 26, 2025

Choose a reason for hiding this comment

amotl Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

amotl May 2, 2025 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot May 4, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

amotl commented Nov 14, 2023 •

edited

Loading

amotl Apr 26, 2025 •

edited

Loading

amotl Nov 17, 2023 •

edited

Loading

amotl Apr 26, 2025 •

edited

Loading

amotl Nov 20, 2023 •

edited

Loading

amotl Apr 26, 2025 •

edited

Loading

amotl May 2, 2025 •

edited

Loading