-
Notifications
You must be signed in to change notification settings - Fork 4
SDK and CLI for CrateDB Cloud Cluster APIs #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
89f8c3a
to
f794fd9
Compare
119c7ac
to
a9f4d82
Compare
cratedb_toolkit/cluster/croud.py
Outdated
# FIXME: Fix `croud clusters deploy`. | ||
# It yields *two* payloads to stdout, making it | ||
# unusable in JSON-capturing situations. | ||
# The main advantage of the `JSONDecoder` class is that it also provides | ||
# a `.raw_decode` method, which will ignore extra data after the end of the JSON. | ||
# https://stackoverflow.com/a/75168292 | ||
payload = wr.invoke() | ||
decoder = json.JSONDecoder() | ||
data = decoder.raw_decode(payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation
This works around a minor flaw of croud
, which yields a JSON output two times, rendering the output not parseable.
Suggestion
croud
should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We submitted an upstream fix about this obstacle.
cratedb_toolkit/io/croud.py
Outdated
def fix_job_info_table_name(self): | ||
""" | ||
Adjust full-qualified table name by adding appropriate quotes. | ||
Fixes a minor flaw on the upstream API. | ||
Currently, the API returns `testdrive.pems-1`, but that can not be used at | ||
all, because it is not properly quoted. It also can not be used 1:1, because | ||
it is not properly quoted. | ||
So, converge the table name into `"testdrive"."pems-1"` manually, for a | ||
full-qualified representation. | ||
FIXME: Remove after upstream has fixed the flaw. | ||
""" | ||
job_info = self.info | ||
if "destination" in job_info and "table" in job_info["destination"]: | ||
table = job_info["destination"]["table"] | ||
if '"' not in table and "." in table: | ||
schema, table = table.split(".") | ||
table = f'"{schema}"."{table}"' | ||
job_info["destination"]["table"] = table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation
This works around another minor upstream flaw.
Suggestion
The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table
attribute so that it can be re-used without further ado.
Example
{
"destination": {
"table": "\"testdrive\".\"pems-1\""
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We created an upstream ticket about this observation.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
cratedb_toolkit/config.py
Outdated
def configure_croud(no_spinner: bool = None, use_spinner: bool = None): | ||
""" | ||
Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well. | ||
- https://github.com/ManrajGrover/halo/issues/32 | ||
- https://github.com/manrajgrover/halo/issues/179 | ||
""" | ||
if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner): | ||
mod = types.ModuleType( | ||
"croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner" | ||
) | ||
setattr(mod, "HALO", NoopContextManager()) # noqa: B010 | ||
sys.modules["croud.tools.spinner"] = mod | ||
|
||
|
||
class NoopContextManager: | ||
""" | ||
For making the Halo progressbar a no-op. | ||
""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
pass | ||
|
||
def __enter__(self): | ||
pass | ||
|
||
def __exit__(self, exc_type, exc_value, exc_traceback): | ||
pass | ||
|
||
def stop(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation
We had to turn off the Halo spinner currently used in croud
, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.
- Jupyter Notebooks don't support Halo manrajgrover/halo#32
- BUG: Halo.__init__.<locals>.clean_up() takes 0 positional arguments but 1 was given manrajgrover/halo#179
Suggestion
Submit a patch to croud
to only use interactivity when is_tty() is True
, or such. At least, don't start the HALO
at module-scope level, but initialize/configure it at runtime instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've reported this to the issue tracker of the upstream project.
062ee9b
to
7d89748
Compare
# Log in to CrateDB Cloud. | ||
croud login --idp azuread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation
There is an alternative, headless way of doing that: Using croud config show
, you can display the location of the croud configuration file in YAML format.
current-profile: cratedb.cloud
default-format: table
profiles:
cratedb.cloud:
auth-token: xxxxxxxxxx
endpoint: https://console.cratedb.cloud
key: REDACTED
organization-id: null
region: _any_
secret: xxxxxxxxxx
If you fill in the key
and secret
values, obtained by running croud api-keys create
, to create an API key 1, operations like croud clusters list
will start working without further ado, even after logging out again using croud logout
.
It works well. Thanks, @proddata.
Suggestion
Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_API_KEY
and CRATEDB_CLOUD_API_SECRET
.
Footnotes
-
Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've implemented headless/unattended operations in croud and ctk. Thanks again.
cratedb-toolkit/cratedb_toolkit/util/croud.py
Lines 149 to 157 in 3b601c1
@property | |
def headless_config(self) -> Configuration: | |
tmp_file = NamedTemporaryFile() | |
tmp_path = Path(tmp_file.name) | |
config = Configuration("headless.yaml", tmp_path) | |
config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY") | |
config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET") | |
config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID") | |
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the interface could be further improved by using the new JWT authentication mechanism instead of traditional credentials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the topic about using JWT to the backlog, to be processed on a later iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fluent authentication using provided per-cluster JWT token has been implemented per 72d3fd9.
8ca895a
to
0eabc99
Compare
46203fb
to
62a3a62
Compare
... likewise, rename `--cratedb-http-url` to `--http-url`.
- `CRATEDB_CLOUD_CLUSTER_ID` to `CRATEDB_CLUSTER_ID` - `CRATEDB_CLOUD_CLUSTER_NAME` to `CRATEDB_CLUSTER_NAME`
The current set of input parameters for addressing a database cluster, `cluster_id`, `cluster_name`, `sqlalchemy_url`, and `http_url`, needed a few bits of bundling, so they are now backed by a real model class. Likewise, the `ManagedCluster` and `StandaloneCluster` classes now provide the `adapter` property, which will return a suitable `DatabaseAdapter` instance. This significantly connects the dots for a better composite object model. In this spirit, the `DatabaseCluster` class emerges as a new entry point to acquire a database cluster handle universally.
... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.
... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
♻️ Duplicate comments (6)
cratedb_toolkit/__init__.py (1)
14-16
: Consider making initialization explicit instead of running at import timeRunning
preconfigure()
during module import can lead to unexpected side effects, especially in testing environments. Consider making this configurable or providing a way to disable this behavior.#!/bin/bash # Check how preconfigure is implemented and what it modifies rg -A 10 "def preconfigure" --type py🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests
[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by testscratedb_toolkit/io/cli.py (1)
39-45
:⚠️ Potential issueFunction signature updated but empty string handling needs attention.
The function parameters have been updated to reflect the new unified approach to cluster addressing, but there's a potential issue with empty strings being treated as valid values.
Empty strings (
""
) passed as CLI arguments forcluster_id
orcluster_name
could lead to validation issues. Consider normalizing empty strings toNone
:def load_table( ctx: click.Context, url: str, cluster_id: str, cluster_name: str, cluster_url: str, schema: str, table: str, format_: str, compression: str, transformation: t.Union[Path, None], ): + # Normalize empty strings to None + cluster_id = cluster_id if cluster_id else None + cluster_name = cluster_name if cluster_name else None + cluster_url = cluster_url if cluster_url else Nonedoc/cluster/backlog.md (1)
35-41
: 🛠️ Refactor suggestionMissing resolution action for concurrent resume operations.
The issue of concurrent cluster resume operations is described, but the proposed solution ends with "=>" without clearly stating what action should be taken.
=> Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`. See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384. + => Add retry mechanism with exponential backoff when this error is detected, or + provide clear error messages to users about the need to wait for the existing operation to complete.cratedb_toolkit/io/croud.py (2)
162-163
: Avoid mutating TableAddress in-placeThe code modifies
target.table
in-place, which could be unexpected for callers that might use the original object again. Consider creating a newTableAddress
object with the modified table name.# Honor `schema` argument. if target.schema is not None: - target.table = f'"{target.schema}"."{target.table}"' + quoted_table = f'"{target.schema}"."{target.table}"' + # Create a new target object with the quoted table name + target = dataclasses.replace(target, table=quoted_table)
112-117
: Consider using exponential backoff for retriesThe retry loop uses fixed parameters (
max_retries=20, retry_delay=0.15
) which may still be too short for real-world cloud lag. Cloud operations can sometimes take longer than expected to appear.Consider implementing an exponential backoff strategy:
-for _ in range(max_retries): - job = self.find_job(job_id=job_id) - if job.found: - break - time.sleep(retry_delay) +for attempt in range(max_retries): + job = self.find_job(job_id=job_id) + if job.found: + break + # Use exponential backoff with initial delay + sleep_time = retry_delay * (2 ** attempt) + time.sleep(sleep_time)cratedb_toolkit/util/croud.py (1)
242-257
: 🛠️ Refactor suggestionAdd checks for required credentials in headless_config
The headless configuration doesn't verify if required credentials are set before creating the configuration, which could lead to authentication failures.
@classproperty def get_headless_config(cls) -> Configuration: cfg = Configuration("croud.yaml") if cfg._file_path.exists() and "CRATEDB_CLOUD_API_KEY" not in os.environ: return cfg tmp_file = NamedTemporaryFile() tmp_path = Path(tmp_file.name) config = Configuration("headless.yaml", tmp_path) # Get credentials from the environment. config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY") config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET") config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID") # config.profile["endpoint"] = os.environ.get("CRATEDB_CLOUD_ENDPOINT") # noqa: ERA001 + # Check if required credentials are available + if not config.profile["key"] or not config.profile["secret"]: + logger.warning("Missing required cloud credentials: CRATEDB_CLOUD_API_KEY and/or CRATEDB_CLOUD_API_SECRET") + return config
🧹 Nitpick comments (24)
cratedb_toolkit/testing/testcontainers/util.py (2)
141-162
: Robust error handling for Docker unavailability.This new
DockerSkippingContainer
class elegantly converts Docker connection errors into pytest skips, making tests fail gracefully when Docker isn't available. The early initialization of_container
prevents potential issues in the parent's destructor.However, there's a TODO at line 155 to synchronize error detection with
PytestTestcontainerAdapter
. Consider creating a shared helper function for Docker error detection to avoid duplication and ensure consistent behavior.+def is_docker_connection_error(ex: DockerException) -> bool: + """Check if the exception indicates Docker daemon is unavailable.""" + return any(token in str(ex) for token in ( + "Connection aborted", + "Error while fetching server API version" + )) class DockerSkippingContainer(DockerContainer): # ... except DockerException as ex: - if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")): + if is_docker_connection_error(ex): # ... class PytestTestcontainerAdapter: # ... except DockerException as ex: - if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")): + if is_docker_connection_error(ex): # ...
192-196
: Consider consistent error messages between container classes.The skip reason message is slightly different between
DockerSkippingContainer
("Docker is not running") andPytestTestcontainerAdapter
("Docker daemon is not available"). Consider standardizing these messages for consistency.- reason="Skipping test because Docker daemon is not available", allow_module_level=True + reason="Skipping test because Docker is not running", allow_module_level=Truecratedb_toolkit/util/setting.py (1)
93-93
: Improve readability of the guidance messageThe guidance string might display raw Python list representations, making it harder for users to understand the available options.
Consider formatting the lists for better readability:
- guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}" + formatted_params = ", ".join(parameter_names) + formatted_envvars = ", ".join(environment_variables) + guidance = f"Use one of the CLI arguments ({formatted_params}) or environment variables ({formatted_envvars})"CHANGES.md (1)
13-17
: Use consistent Markdown formatting for breaking changes sectionThe Breaking changes section is currently formatted as bold text instead of a proper markdown heading, which triggered a linter warning.
-**Breaking changes** +### Breaking changes🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
13-13: Emphasis used instead of a heading
null(MD036, no-emphasis-as-heading)
doc/cluster/_address.md (1)
1-17
: Clear documentation for cluster addressing options.The documentation clearly explains the supported cluster addressing options through CLI and environment variables, along with their mutual exclusivity and precedence rules. This information is crucial for users to understand how to properly configure their connection to CrateDB clusters.
However, there are some minor formatting inconsistencies in the document. Consider adding a space after the colons in the definition lists to maintain consistent formatting:
-:CLI options: +:CLI options:And similarly for the "Environment variables:" line.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options:--cluster-id
, `--cluste...(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/python.md (2)
41-42
: Consider improving formatting for definition lists.Similar to the _address.md file, there's a formatting inconsistency with the definition list. Consider adding a space after the colon:
-:Environment variables: +:Environment variables:This would improve readability and maintain consistent formatting across documentation files.
54-61
: Consider adding more details on error handling in examples.The code example demonstrates basic usage but doesn't show how to handle potential errors when connecting to the cluster or executing queries. Consider expanding the example to include error handling to give users a more complete picture of working with the API in production environments.
For example:
from pprint import pprint from cratedb_toolkit import ManagedCluster from cratedb_toolkit.exception import ClusterOperationError try: with ManagedCluster.from_env() as cluster: pprint(cluster.query("SELECT * from sys.summits LIMIT 2;")) except ClusterOperationError as e: print(f"Cluster operation failed: {e}")doc/util/shell.md (2)
1-8
: Documentation formatting should follow standard markdown conventions.The document starts with
(shell)=
which appears to be a reference label syntax for a documentation system (likely Sphinx with MyST markdown), but it creates somewhat awkward formatting in the heading structure.Make sure this label syntax is intended for your documentation system. If using Sphinx with MyST markdown, this is the correct format for cross-referencing, but consider adding a line break between the label and the heading for better readability:
(shell)= + # ctk shell🧰 Tools
🪛 LanguageTool
[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (shell)= # ctk shell The database shell interface ...(AI_EN_LECTOR_MISSING_PUNCTUATION)
11-16
: Consider adding version information for uv installation.While recommending uv is good, it might be helpful to indicate minimum version requirements or installation instructions for uv itself, as it's still a relatively newer package manager.
uv tool install --upgrade 'cratedb-toolkit' + +# If you don't have uv installed, you can install it via: +curl -fsS https://astral.sh/uv/install.sh | bashcratedb_toolkit/model.py (1)
63-71
: Consider improving the URL scheme detection logic.The
from_string
method makes assumptions about URL schemes that might not cover all cases.The current implementation assumes any URL starting with "crate" is an SQLAlchemy URI, but this might lead to incorrect parsing in edge cases. Consider a more robust detection method:
@classmethod def from_string(cls, url: str) -> "DatabaseAddress": """ Factory method to create an instance from an SQLAlchemy database URL in string format. """ - if url.startswith("crate"): + # Parse the URL to determine the scheme + parsed_url = URL(url) + if parsed_url.scheme == "crate": return cls.from_sqlalchemy_uri(url) else: return cls.from_http_uri(url)doc/cluster/backlog.md (2)
58-65
: Formatting issue in code block example.There's a formatting issue where the code block example and the next item don't have proper spacing.
```python # The context manager stops the cluster automatically on exit. with ManagedCluster.from_env().start() as cluster: # Do something with the cluster results = cluster.query("SELECT * FROM sys.nodes") # Cluster is automatically suspended when exiting the block
ctk cluster start
: Avoid creating a project on each deploy operation<details> <summary>🧰 Tools</summary> <details> <summary>🪛 LanguageTool</summary> [uncategorized] ~65-~65: Loose punctuation mark. Context: ...ng the block ``` - `ctk cluster start`: Avoid creating a project on each deploy... (UNLIKELY_OPENING_PUNCTUATION) </details> </details> --- `87-90`: **Comma before 'because' clause.** There's a comma before the 'because' clause which is not needed if the clause is essential to the meaning. ```diff - Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`, - because parameters like `username` and `password` are obtained from the environment, + Shell: Currently it is not possible to create multiple instances of a `ManagedCluster` + because parameters like `username` and `password` are obtained from the environment,
🧰 Tools
🪛 LanguageTool
[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of aManagedCluster
, because parameters likeusername
and `passwor...(COMMA_BEFORE_BECAUSE)
cratedb_toolkit/io/croud.py (1)
67-84
: Workaround for upstream API flaw needs trackingThe
fix_job_info_table_name
method is a workaround for an upstream API flaw. The FIXME comment is good, but it would be better to link to a ticket tracking this issue for future removal of this workaround.""" Adjust full-qualified table name by adding appropriate quotes. Fixes a minor flaw on the upstream API. Currently, the API returns `testdrive.pems-1`, but that can not be used at all, because it is not properly quoted. It also can not be used 1:1, because it is not properly quoted. So, converge the table name into `"testdrive"."pems-1"` manually, for a full-qualified representation. -FIXME: Remove after upstream has fixed the flaw. +FIXME: Remove after upstream has fixed the flaw in https://github.com/crate/croud/issues/566 """cratedb_toolkit/cluster/cli.py (1)
115-116
: Consider using Click's exit code mechanism instead of sys.exit()Using
sys.exit()
directly in a Click command can bypass Click's error handling. Consider using Click's mechanisms for reporting errors and exit codes.- sys.exit(0 if (cluster_info.ready and has_db_result) else 1) + ctx.exit(0 if (cluster_info.ready and has_db_result) else 1)🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 115-116: cratedb_toolkit/cluster/cli.py#L115-L116
Added lines #L115 - L116 were not covered by testscratedb_toolkit/cluster/croud.py (2)
391-421
: Improve local vs. remote file detectionThe current check for determining if a resource is local or remote uses a combination of checking for "://" in the URL and checking if the file exists. This might not be reliable in all cases, especially with relative paths.
# Compute command-line arguments for invoking `croud`. # TODO: Sanitize table name - which characters are allowed? -is_remote = "://" in resource.url -if not is_remote and Path(resource.url).exists(): +# More robust check for remote URLs +def is_remote_url(url: str) -> bool: + """Check if a URL is remote by looking for standard URL schemes.""" + remote_schemes = ("http://", "https://", "s3://", "gs://", "azure://", "ftp://") + return any(url.startswith(scheme) for scheme in remote_schemes) + +is_remote = is_remote_url(resource.url) +if not is_remote: + # For local paths, verify they exist + path = Path(resource.url) + if not path.exists(): + raise ValueError(f"Local file not found: {resource.url}") + + # Proceed with local file import
324-336
: Error message might be misleading on suspensionThe error message in
set_suspended
always says "Resuming cluster failed" even when the operation is to suspend a cluster.def set_suspended(self, body: t.Dict[str, t.Any]): data, errors = self.client.put(self.url.suspend, body=body) if data is None: if not errors: errors = "Unknown error" - raise CroudException(f"Resuming cluster failed: {errors}") + operation = "Resuming" if body.get("suspended") is False else "Suspending" + raise CroudException(f"{operation} cluster failed: {errors}") _wait_for_completed_operation( client=self.client, cluster_id=self.cluster_id, request_params={"type": "SUSPEND", "limit": 1}, ) return datacratedb_toolkit/cluster/core.py (8)
195-200
: Clarify error handling with TODO commentThere's a TODO comment about improving error handling with
flexfun
, but it's not clear what needs to be improved.Either implement the improvement or add more details to the TODO to explain what needs to be improved:
- # TODO: With `flexfun`, can this section be improved? + # TODO: Consider refactoring this error handling section to fully utilize the flexfun decorator's + # capabilities for more consistent error behavior across the codebase.
366-367
: Implement TODO for better job reportingThe TODO comment indicates a need to explicitly report about
failed_records
and other potential issues during data loading.Consider implementing a more detailed reporting structure for job status, especially for failed records:
logger.info("Job information:\n%s", json.dumps(cloud_job.info, indent=2)) - # TODO: Explicitly report about `failed_records`, etc. + # Report failed records if any + if cloud_job.info.get('failed_records', 0) > 0: + logger.warning(f"Job completed with {cloud_job.info.get('failed_records')} failed records") + if cloud_job.info.get('failed_records_sample'): + logger.warning(f"Sample of failed records: {cloud_job.info.get('failed_records_sample')}")Would you like me to implement a more comprehensive reporting solution for job status and failures?
202-204
: Implement the delete methodThe
delete
method is stubbed out withNotImplementedError
but should be implemented for completeness of the cluster lifecycle management.Would you like me to help implement the
delete
method to properly handle cluster deletion? It would involve:
- Validating that a cluster ID is provided
- Calling the appropriate Cloud API method for cluster deletion
- Handling errors and updating the cluster state
- Logging the operation
267-270
: Improve error handling for failed deploymentsThe TODO comments indicate a need for better error reporting when deployments fail.
Consider implementing more detailed error reporting:
if not self.exists: - # TODO: Gather and propagate more information why the deployment failed. - # Capture deployment logs or status if available. - # Possibly use details from `last_async_operation`. - raise CroudException(f"Deployment of cluster failed: {self.cluster_name}") + # Extract detailed error information from deployment status + error_details = {} + if hasattr(self.operation, "get_status"): + status = self.operation.get_status() + error_details["status"] = status + if "last_async_operation" in status: + error_details["operation"] = status["last_async_operation"] + + # Format a detailed error message + error_message = f"Deployment of cluster failed: {self.cluster_name}" + if error_details: + error_message += f"\nDetails: {json.dumps(error_details, indent=2)}" + + logger.error(error_message) + raise CroudException(error_message)
405-422
: Singleton pattern implementation may cause issues with multiple clustersThe current implementation of
get_client_bundle
creates a singleton client bundle, which could cause issues if different credentials are needed for different operations on the same cluster.Consider adding a parameter to force creating a new client bundle with different credentials:
- def get_client_bundle(self, username: str = None, password: str = None) -> ClientBundle: + def get_client_bundle(self, username: str = None, password: str = None, force_new: bool = False) -> ClientBundle: """ Return a bundle of client handles to the CrateDB Cloud cluster database. - adapter: A high-level `DatabaseAdapter` instance, offering a few convenience methods. - dbapi: A DBAPI connection object, as provided by SQLAlchemy's `dbapi_connection`. - sqlalchemy: An SQLAlchemy `Engine` object. """ # Store the client bundle, effectively making it a singleton. - if self._client_bundle is not None: + if self._client_bundle is not None and not force_new: return self._client_bundle
428-430
: Address SQL query TODOThe TODO comment suggests using the
records
package for invoking SQL statements.Either implement the suggestion or clarify the TODO:
- # TODO: Use the `records` package for invoking SQL statements. + # TODO: Consider replacing this with the `records` package for + # invoking SQL statements, which provides more robust result handling and formatting. + # See: https://github.com/kennethreitz/records
131-136
: Check for empty strings in identifiersThe current validation only checks if identifiers are
None
but doesn't check for empty strings, which could lead to errors.Add validation for empty strings:
# Default settings and sanity checks. self.cluster_id = self.cluster_id or self.settings.cluster_id or None self.cluster_name = self.cluster_name or self.settings.cluster_name or None - if self.cluster_id is None and self.cluster_name is None: + if not self.cluster_id and not self.cluster_name: raise DatabaseAddressMissingError( "Failed to address cluster: Either cluster identifier or name needs to be specified" )🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 131-134: cratedb_toolkit/cluster/core.py#L131-L134
Added lines #L131 - L134 were not covered by tests
374-377
: Add reason to exception messageThe TODO comment indicates a need to add a "reason" to the exception message.
Implement the suggested improvement:
- # TODO: Add "reason" to exception message. - message = f"Data loading failed.\n{cloud_job.message}" + # Extract failure reason from cloud_job + failure_reason = cloud_job.info.get('failure_reason', 'Unknown reason') + message = f"Data loading failed. Reason: {failure_reason}\n{cloud_job.message}" logger.error(f"{message}\n{texts.error()}\n") raise OperationFailed(message)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (107)
.github/workflows/main.yml
(1 hunks).github/workflows/mongodb.yml
(1 hunks)CHANGES.md
(2 hunks)cratedb_toolkit/__init__.py
(2 hunks)cratedb_toolkit/adapter/rockset/cli.py
(2 hunks)cratedb_toolkit/adapter/rockset/server/dependencies.py
(1 hunks)cratedb_toolkit/api/cli.py
(0 hunks)cratedb_toolkit/api/guide.py
(0 hunks)cratedb_toolkit/api/main.py
(0 hunks)cratedb_toolkit/cfr/cli.py
(9 hunks)cratedb_toolkit/cfr/marimo.py
(1 hunks)cratedb_toolkit/cli.py
(2 hunks)cratedb_toolkit/cluster/cli.py
(2 hunks)cratedb_toolkit/cluster/core.py
(1 hunks)cratedb_toolkit/cluster/croud.py
(1 hunks)cratedb_toolkit/cluster/guide.py
(1 hunks)cratedb_toolkit/cluster/model.py
(1 hunks)cratedb_toolkit/cluster/util.py
(0 hunks)cratedb_toolkit/cmd/tail/cli.py
(3 hunks)cratedb_toolkit/config.py
(1 hunks)cratedb_toolkit/exception.py
(2 hunks)cratedb_toolkit/info/cli.py
(6 hunks)cratedb_toolkit/info/core.py
(2 hunks)cratedb_toolkit/info/http.py
(1 hunks)cratedb_toolkit/io/cli.py
(4 hunks)cratedb_toolkit/io/croud.py
(4 hunks)cratedb_toolkit/io/dynamodb/api.py
(1 hunks)cratedb_toolkit/io/influxdb.py
(1 hunks)cratedb_toolkit/io/mongodb/adapter.py
(3 hunks)cratedb_toolkit/io/mongodb/api.py
(4 hunks)cratedb_toolkit/io/processor/kinesis_lambda.py
(2 hunks)cratedb_toolkit/job/cli.py
(0 hunks)cratedb_toolkit/job/croud.py
(0 hunks)cratedb_toolkit/model.py
(4 hunks)cratedb_toolkit/option.py
(1 hunks)cratedb_toolkit/options.py
(0 hunks)cratedb_toolkit/settings/compare.py
(5 hunks)cratedb_toolkit/shell/cli.py
(3 hunks)cratedb_toolkit/testing/testcontainers/cratedb.py
(5 hunks)cratedb_toolkit/testing/testcontainers/influxdb2.py
(2 hunks)cratedb_toolkit/testing/testcontainers/minio.py
(2 hunks)cratedb_toolkit/testing/testcontainers/mongodb.py
(2 hunks)cratedb_toolkit/testing/testcontainers/util.py
(4 hunks)cratedb_toolkit/util/app.py
(1 hunks)cratedb_toolkit/util/client.py
(1 hunks)cratedb_toolkit/util/common.py
(2 hunks)cratedb_toolkit/util/crash.py
(2 hunks)cratedb_toolkit/util/croud.py
(8 hunks)cratedb_toolkit/util/database.py
(6 hunks)cratedb_toolkit/util/runtime.py
(1 hunks)cratedb_toolkit/util/setting.py
(1 hunks)doc/adapter/rockset.md
(1 hunks)doc/backlog/main.md
(2 hunks)doc/bugs.md
(1 hunks)doc/cfr/jobstats.md
(1 hunks)doc/cfr/systable.md
(3 hunks)doc/cluster/_address.md
(1 hunks)doc/cluster/backlog.md
(1 hunks)doc/cluster/cli.md
(1 hunks)doc/cluster/index.md
(1 hunks)doc/cluster/python.md
(1 hunks)doc/cluster/tutorial.md
(1 hunks)doc/conf.py
(1 hunks)doc/index.md
(1 hunks)doc/info/index.md
(1 hunks)doc/io/dms/standalone.md
(1 hunks)doc/io/dynamodb/cdc-lambda.md
(4 hunks)doc/io/dynamodb/cdc.md
(2 hunks)doc/io/dynamodb/loader.md
(2 hunks)doc/io/index.md
(2 hunks)doc/io/influxdb/loader.md
(3 hunks)doc/io/mongodb/cdc.md
(3 hunks)doc/io/mongodb/loader.md
(4 hunks)doc/io/mongodb/migr8.md
(1 hunks)doc/sandbox.md
(2 hunks)doc/util/index.md
(1 hunks)doc/util/settings.md
(1 hunks)doc/util/shell.md
(1 hunks)examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py
(1 hunks)examples/cloud_import.py
(0 hunks)examples/notebook/cloud_import.ipynb
(1 hunks)examples/python/cloud_cluster.py
(1 hunks)examples/python/cloud_import.py
(1 hunks)examples/shell/cloud_cluster.sh
(1 hunks)examples/shell/cloud_import.sh
(1 hunks)pyproject.toml
(8 hunks)tests/adapter/test_rockset.py
(3 hunks)tests/api/test_util.py
(1 hunks)tests/cfr/test_info.py
(1 hunks)tests/cfr/test_jobstats.py
(3 hunks)tests/cfr/test_systable.py
(6 hunks)tests/cluster/test_cli.py
(1 hunks)tests/cluster/test_core.py
(1 hunks)tests/cluster/test_guide.py
(1 hunks)tests/cluster/test_model.py
(1 hunks)tests/cmd/test_tail.py
(1 hunks)tests/conftest.py
(3 hunks)tests/examples/test_python.py
(1 hunks)tests/examples/test_shell.py
(1 hunks)tests/examples/test_shell.sh
(1 hunks)tests/info/test_cli.py
(3 hunks)tests/info/test_http.py
(1 hunks)tests/io/dynamodb/__init__.py
(0 hunks)tests/io/dynamodb/conftest.py
(1 hunks)tests/io/dynamodb/test_cli.py
(1 hunks)tests/io/influxdb/conftest.py
(4 hunks)tests/io/influxdb/test_cli.py
(3 hunks)
⛔ Files not processed due to max files limit (12)
- tests/io/mongodb/conftest.py
- tests/io/mongodb/test_cli.py
- tests/io/test_import.py
- tests/io/test_processor.py
- tests/retention/test_cli.py
- tests/retention/test_examples.py
- tests/settings/test_cli.py
- tests/shell/test_cli.py
- tests/test_model.py
- tests/test_shell.py
- tests/util/database.py
- tests/util/shunit2
💤 Files with no reviewable changes (9)
- tests/io/dynamodb/init.py
- cratedb_toolkit/api/cli.py
- cratedb_toolkit/api/guide.py
- cratedb_toolkit/job/croud.py
- cratedb_toolkit/cluster/util.py
- cratedb_toolkit/options.py
- examples/cloud_import.py
- cratedb_toolkit/api/main.py
- cratedb_toolkit/job/cli.py
✅ Files skipped from review due to trivial changes (9)
- doc/io/mongodb/migr8.md
- tests/io/dynamodb/conftest.py
- .github/workflows/mongodb.yml
- doc/cfr/systable.md
- cratedb_toolkit/util/client.py
- doc/cluster/tutorial.md
- examples/shell/cloud_import.sh
- examples/shell/cloud_cluster.sh
- tests/cluster/test_guide.py
🚧 Files skipped from review as they are similar to previous changes (66)
- doc/util/index.md
- cratedb_toolkit/io/dynamodb/api.py
- doc/cfr/jobstats.md
- doc/io/dms/standalone.md
- cratedb_toolkit/io/influxdb.py
- doc/index.md
- tests/io/influxdb/test_cli.py
- doc/adapter/rockset.md
- tests/io/dynamodb/test_cli.py
- doc/util/settings.md
- doc/io/dynamodb/cdc-lambda.md
- tests/cluster/test_model.py
- cratedb_toolkit/util/common.py
- doc/conf.py
- tests/info/test_http.py
- tests/adapter/test_rockset.py
- cratedb_toolkit/io/mongodb/api.py
- tests/cfr/test_systable.py
- doc/io/dynamodb/loader.md
- doc/bugs.md
- doc/io/influxdb/loader.md
- tests/cmd/test_tail.py
- tests/cfr/test_jobstats.py
- tests/cfr/test_info.py
- cratedb_toolkit/info/http.py
- doc/info/index.md
- cratedb_toolkit/testing/testcontainers/mongodb.py
- doc/io/mongodb/loader.md
- tests/examples/test_shell.py
- doc/cluster/index.md
- cratedb_toolkit/testing/testcontainers/minio.py
- cratedb_toolkit/settings/compare.py
- cratedb_toolkit/cmd/tail/cli.py
- cratedb_toolkit/io/processor/kinesis_lambda.py
- cratedb_toolkit/testing/testcontainers/influxdb2.py
- tests/info/test_cli.py
- tests/io/influxdb/conftest.py
- examples/python/cloud_cluster.py
- cratedb_toolkit/util/runtime.py
- doc/io/mongodb/cdc.md
- cratedb_toolkit/util/crash.py
- tests/examples/test_shell.sh
- tests/api/test_util.py
- examples/notebook/cloud_import.ipynb
- .github/workflows/main.yml
- cratedb_toolkit/cluster/guide.py
- doc/cluster/cli.md
- cratedb_toolkit/info/core.py
- cratedb_toolkit/option.py
- doc/io/dynamodb/cdc.md
- tests/examples/test_python.py
- examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py
- cratedb_toolkit/shell/cli.py
- cratedb_toolkit/io/mongodb/adapter.py
- tests/cluster/test_core.py
- pyproject.toml
- cratedb_toolkit/exception.py
- cratedb_toolkit/info/cli.py
- cratedb_toolkit/util/app.py
- examples/python/cloud_import.py
- doc/sandbox.md
- cratedb_toolkit/testing/testcontainers/cratedb.py
- cratedb_toolkit/util/database.py
- tests/cluster/test_cli.py
- cratedb_toolkit/config.py
- cratedb_toolkit/cluster/model.py
🧰 Additional context used
🧠 Learnings (4)
tests/conftest.py (2)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:55-62
Timestamp: 2025-04-26T17:45:20.650Z
Learning: In the CrateDB toolkit, the container start operation is idempotent, meaning it can be safely called multiple times without causing issues. This allows for flexible container lifecycle management where both `setup()` may call `container.start()` and parent classes like `PytestTestcontainerAdapter` may also invoke `start()` methods.
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:168-170
Timestamp: 2025-05-02T15:26:10.385Z
Learning: In the `cloud_environment` fixture in tests/conftest.py, environment variables like TEST_CRATEDB_CLOUD_API_KEY, TEST_CRATEDB_CLOUD_API_SECRET, TEST_CRATEDB_CLOUD_ORGANIZATION_ID, and TEST_CRATEDB_CLUSTER_NAME are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.
cratedb_toolkit/cluster/cli.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cluster/cli.py:82-89
Timestamp: 2025-04-27T03:07:04.375Z
Learning: In the cratedb-toolkit codebase, the `handle_command_errors` context manager is designed as an error handler for specific operations (not as a general protective wrapper). It's intentionally used to handle errors for targeted operations rather than wrapping entire function executions.
cratedb_toolkit/cluster/core.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/api/main.py:263-265
Timestamp: 2025-04-26T21:59:59.463Z
Learning: When handling DNS propagation delays after resource creation (like CrateDB Cloud clusters), both fixed sleeps and immediate polling have drawbacks. Fixed sleeps are brittle, but immediate polling risks negative DNS caching. A hybrid approach with a short initial sleep followed by polling with exponential backoff is preferred.
cratedb_toolkit/cfr/marimo.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cfr/marimo.py:53-54
Timestamp: 2025-05-02T11:43:42.639Z
Learning: The submodules within `cratedb_toolkit/io` employ a different naming scheme than other parts of the codebase and retain their own conventions (e.g., may continue using `cratedb_sqlalchemy_url` instead of `sqlalchemy_url`).
🧬 Code Graph Analysis (6)
cratedb_toolkit/util/setting.py (1)
cratedb_toolkit/config.py (2)
RUNNING_ON_PYTEST
(26-27)RUNNING_ON_JUPYTER
(22-23)
cratedb_toolkit/model.py (3)
cratedb_toolkit/exception.py (2)
DatabaseAddressDuplicateError
(35-43)DatabaseAddressMissingError
(16-32)cratedb_toolkit/util/data.py (1)
asbool
(25-36)cratedb_toolkit/cluster/core.py (1)
from_params
(599-603)
cratedb_toolkit/testing/testcontainers/util.py (4)
tests/conftest.py (1)
setup
(56-62)tests/io/mongodb/conftest.py (1)
setup
(38-48)tests/io/influxdb/conftest.py (1)
setup
(28-34)cratedb_toolkit/testing/testcontainers/cratedb.py (2)
start
(176-182)stop
(184-189)
cratedb_toolkit/adapter/rockset/server/dependencies.py (2)
cratedb_toolkit/util/database.py (1)
DatabaseAdapter
(43-421)cratedb_toolkit/model.py (1)
dburi
(101-105)
cratedb_toolkit/cfr/cli.py (4)
cratedb_toolkit/util/app.py (1)
make_cli
(11-52)cratedb_toolkit/util/data.py (1)
path_from_url
(51-54)cratedb_toolkit/cfr/systable.py (2)
SystemTableExporter
(116-190)SystemTableImporter
(193-257)cratedb_toolkit/model.py (4)
dburi
(101-105)DatabaseAddress
(55-176)from_string
(63-70)from_string
(198-199)
cratedb_toolkit/cli.py (1)
cratedb_toolkit/util/setting.py (1)
init_dotenv
(30-37)
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/cli.py
[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests
[warning] 93-95: cratedb_toolkit/cluster/cli.py#L93-L95
Added lines #L93 - L95 were not covered by tests
[warning] 106-110: cratedb_toolkit/cluster/cli.py#L106-L110
Added lines #L106 - L110 were not covered by tests
[warning] 115-116: cratedb_toolkit/cluster/cli.py#L115-L116
Added lines #L115 - L116 were not covered by tests
[warning] 128-128: cratedb_toolkit/cluster/cli.py#L128
Added line #L128 was not covered by tests
[warning] 130-131: cratedb_toolkit/cluster/cli.py#L130-L131
Added lines #L130 - L131 were not covered by tests
[warning] 134-134: cratedb_toolkit/cluster/cli.py#L134
Added line #L134 was not covered by tests
[warning] 146-146: cratedb_toolkit/cluster/cli.py#L146
Added line #L146 was not covered by tests
[warning] 148-149: cratedb_toolkit/cluster/cli.py#L148-L149
Added lines #L148 - L149 were not covered by tests
[warning] 152-152: cratedb_toolkit/cluster/cli.py#L152
Added line #L152 was not covered by tests
[warning] 172-172: cratedb_toolkit/cluster/cli.py#L172
Added line #L172 was not covered by tests
[warning] 174-177: cratedb_toolkit/cluster/cli.py#L174-L177
Added lines #L174 - L177 were not covered by tests
cratedb_toolkit/cluster/core.py
[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests
[warning] 121-128: cratedb_toolkit/cluster/core.py#L121-L128
Added lines #L121 - L128 were not covered by tests
[warning] 131-134: cratedb_toolkit/cluster/core.py#L131-L134
Added lines #L131 - L134 were not covered by tests
[warning] 138-140: cratedb_toolkit/cluster/core.py#L138-L140
Added lines #L138 - L140 were not covered by tests
[warning] 144-144: cratedb_toolkit/cluster/core.py#L144
Added line #L144 was not covered by tests
[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests
cratedb_toolkit/__init__.py
[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests
[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by tests
[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests
[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by tests
cratedb_toolkit/adapter/rockset/cli.py
[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests
[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests
cratedb_toolkit/adapter/rockset/server/dependencies.py
[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by tests
cratedb_toolkit/cfr/cli.py
[warning] 32-32: cratedb_toolkit/cfr/cli.py#L32
Added line #L32 was not covered by tests
[warning] 35-35: cratedb_toolkit/cfr/cli.py#L35
Added line #L35 was not covered by tests
[warning] 61-61: cratedb_toolkit/cfr/cli.py#L61
Added line #L61 was not covered by tests
[warning] 63-63: cratedb_toolkit/cfr/cli.py#L63
Added line #L63 was not covered by tests
[warning] 112-112: cratedb_toolkit/cfr/cli.py#L112
Added line #L112 was not covered by tests
[warning] 141-141: cratedb_toolkit/cfr/cli.py#L141
Added line #L141 was not covered by tests
[warning] 164-165: cratedb_toolkit/cfr/cli.py#L164-L165
Added lines #L164 - L165 were not covered by tests
[warning] 181-182: cratedb_toolkit/cfr/cli.py#L181-L182
Added lines #L181 - L182 were not covered by tests
[warning] 207-207: cratedb_toolkit/cfr/cli.py#L207
Added line #L207 was not covered by tests
cratedb_toolkit/cfr/marimo.py
[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests
cratedb_toolkit/cli.py
[warning] 25-25: cratedb_toolkit/cli.py#L25
Added line #L25 was not covered by tests
🪛 markdownlint-cli2 (0.17.2)
CHANGES.md
13-13: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
🪛 LanguageTool
doc/backlog/main.md
[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py
: Migrate to ctx.meta["address"]
, like ...
(UNLIKELY_OPENING_PUNCTUATION)
[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...
(SOONER_OR_LATER)
doc/cluster/_address.md
[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id
, `--cluste...
(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/backlog.md
[uncategorized] ~65-~65: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start
: Avoid creating a project on each deploy...
(UNLIKELY_OPENING_PUNCTUATION)
[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of a ManagedCluster
, because parameters like username
and `passwor...
(COMMA_BEFORE_BECAUSE)
doc/cluster/python.md
[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...
(UNLIKELY_OPENING_PUNCTUATION)
doc/io/index.md
[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (io-subsystem)= # I/O Subsystem Load and extract data int...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
doc/util/shell.md
[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (shell)= # ctk shell The database shell interface ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
🔇 Additional comments (52)
tests/conftest.py (5)
45-71
: Well-designed testing helper class for container management.This new
CrateDBFixture
class creates a clean abstraction for CrateDB test containers, properly leveraging the newPytestTestcontainerAdapter
base class. The implementation correctly handles container lifecycle and database connection setup.
59-62
: Explicit container start doesn't cause issues.As I've learned from your previous feedback, the container start operation is idempotent, so the explicit
self.container.start()
call here is fine even thoughPytestTestcontainerAdapter.run_setup()
will callstart()
again.
89-105
: Clean environment isolation for tests.This new fixture nicely complements
prune_environment
by specifically targeting variables defined inManagedClusterSettings
. The implementation correctly guards againstNone
envvar names as fixed in a previous commit.
140-146
: Good default pattern for test cluster name.This fixture properly retrieves the cluster name from environment variables with a sensible default fallback.
148-174
: Correctly implements CrateDB Cloud test environment setup.The fixture appropriately patches environment variables and handles toolkit configuration. Based on your previous feedback, I understand that environment variables here are partially mutually exclusive, so the current check at line 161-162 is appropriate for your use case.
cratedb_toolkit/testing/testcontainers/util.py (2)
48-54
: Improved port handling for container addresses.The enhanced
get_real_host_address()
method now properly handles port discovery from multiple attributes and normalizes port strings by stripping protocol suffixes. This makes the endpoint addresses more reliable for client usage.
164-206
: Well-designed adapter for Pytest testcontainer integration.The
PytestTestcontainerAdapter
class provides a clean abstraction for managing testcontainers in Pytest fixtures. It properly handles Docker availability checks and container lifecycle management. The design pattern requiring subclasses to implementsetup()
enables flexible container configuration.cratedb_toolkit/util/setting.py (5)
14-20
: Clean and well-structured dataclass implementationThe
Setting
dataclass provides a clean way to encapsulate a Click parameter along with an optional group identifier. Good use of type hints and a convenientasdict()
helper method.
23-27
: Good utility function for command-line argument detectionThis helper function is concise and effectively detects long-form options in command-line arguments, which will help with determining whether to use CLI or environment variable sources.
30-37
: Effective environment variable loading with appropriate conditional logicThe
init_dotenv()
function handles loading variables from the.env
file with proper environment detection for pytest. The logging at info level is helpful for debugging configuration sources.
40-69
: Robust configuration parser with multi-environment supportThe
obtain_settings()
function effectively handles multiple input sources with appropriate environment detection for Jupyter and pytest. Good error propagation and context management with Click.
72-104
: Well-implemented mutual exclusivity checkerThe
check_mutual_exclusiveness()
function provides clear validation and helpful error messages. The approach of collecting names and values first before validation is clean and maintainable.cratedb_toolkit/adapter/rockset/server/dependencies.py (1)
12-13
:❓ Verification inconclusive
Environment variable name standardized correctly
The code appropriately updates from
CRATEDB_SQLALCHEMY_URL
toCRATEDB_CLUSTER_URL
, aligning with the PR's objective to unify cluster addressing across the codebase.The static analysis hints indicate that this code lacks test coverage. Would you like me to help create a test for this function?
🏁 Script executed:
#!/bin/bash # Check if there are tests for this specific module/function rg "test.*database_adapter" --type py tests/adapter/rocksetLength of output: 123
🏁 Script executed:
#!/bin/bash # Search the entire repository for any test definitions to determine if tests exist rg "def test_" --type py -nLength of output: 16979
Environment variable rename approved; add missing unit test
The switch from
CRATEDB_SQLALCHEMY_URL
→CRATEDB_CLUSTER_URL
is correct and aligns with the PR’s goal. However, I didn’t find any unit tests for the function in
cratedb_toolkit/adapter/rockset/server/dependencies.py
(the one that readsCRATEDB_CLUSTER_URL
and returns aDatabaseAdapter
).• cratedb_toolkit/adapter/rockset/server/dependencies.py (lines 12–13)
Could we add a test (e.g. in
tests/adapter/test_dependencies.py
or alongside your rockset tests) such as:def test_get_database_adapter_reads_cluster_url(monkeypatch): monkeypatch.setenv("CRATEDB_CLUSTER_URL", "http://example.com") from cratedb_toolkit.adapter.rockset.server.dependencies import get_database_adapter adapter = get_database_adapter() assert adapter.dburi == "http://example.com"Let me know if you’d like me to draft that.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by testsdoc/io/index.md (3)
1-1
: Added appropriate anchor for documentation cross-referencingThe added anchor
(io-subsystem)=
will improve navigation and cross-referencing within the documentation.🧰 Tools
🪛 LanguageTool
[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (io-subsystem)= # I/O Subsystem Load and extract data int...(AI_EN_LECTOR_MISSING_PUNCTUATION)
57-57
: Environment variable name standardized correctlyThe change from
CRATEDB_CLOUD_CLUSTER_ID
toCRATEDB_CLUSTER_ID
aligns with the PR's objective to unify cluster addressing and configuration across the toolkit.
64-64
: Updated to use CDN URL for example datasetThe example now uses a CDN URL (
cdn.crate.io
) instead of a GitHub raw URL, which should provide more reliable and potentially faster access to the example dataset.cratedb_toolkit/cli.py (2)
16-16
: LGTM: New import for .env supportThis import of
init_dotenv
supports loading environment variables from .env files, which aligns with the PR objective to handle configuration through environment variables stored in dotenv files.
25-25
:✅ Verification successful
Ensure .env files are properly loaded before CLI execution
The addition of
init_dotenv()
call ensures that environment variables from .env files are loaded before any CLI logic executes. This enhances configuration flexibility for users.
🏁 Script executed:
#!/bin/bash # Check if init_dotenv includes proper error handling for missing .env files rg -A 5 "init_dotenv" --type py cratedb_toolkit/util/setting.pyLength of output: 467
We need to inspect what
init_dotenv()
actually does to confirm that it loads environment variables. Let’s pull in more of the implementation aroundfind_dotenv
and see ifload_dotenv()
is ever called or if any error handling exists.
🏁 Script executed:
#!/bin/bash # Show the full init_dotenv function and related imports rg -n "def init_dotenv" -A 20 cratedb_toolkit/util/setting.py rg -n "load_dotenv" -n cratedb_toolkit/util/setting.py rg -n "dotenv" -n cratedb_toolkit/util/setting.pyLength of output: 1398
Verified:
.env
loading is correctly implemented
- In
cratedb_toolkit/util/setting.py
,init_dotenv()
callsfind_dotenv()
, logs the resolved path, and invokesload_dotenv()
, ensuring any.env
variables are loaded before CLI execution.- Per python-dotenv’s default behavior, missing
.env
files are silently ignored, which is acceptable for optional configuration.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 25-25: cratedb_toolkit/cli.py#L25
Added line #L25 was not covered by testsCHANGES.md (3)
7-11
: LGTM: Comprehensive Cloud API feature documentationClear documentation of the new Cloud API features, including support for headless/unattended operations and the JWT authentication additions.
16-17
: LGTM: Clear documentation of breaking changesExplicit documentation of the breaking changes around environment variable and CLI option naming consolidation, which helps users adapt to the new API.
40-40
: LGTM: Update to new environment variable namingGood documentation of the Query Collector's update to use the new consolidated environment variable.
cratedb_toolkit/cfr/cli.py (9)
15-15
: LGTM: Updated import pathUpdated import path for
make_cli
to use the new location inutil.app
instead ofapi.cli
.
32-32
: LGTM: Unified cluster addressingThese changes align with the PR's goal of consolidating separate URL options into a single
cluster_url
parameter.Also applies to: 35-35
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 32-32: cratedb_toolkit/cfr/cli.py#L32
Added line #L32 was not covered by tests
61-61
: LGTM: Consistent naming in sys_importThe changes to use
cluster_url
in thesys_import
command maintain consistency with the unified cluster addressing approach.Also applies to: 63-63
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 61-61: cratedb_toolkit/cfr/cli.py#L61
Added line #L61 was not covered by tests
77-77
: LGTM: Updated example to use new environment variableThe help text now correctly references the consolidated
CRATEDB_CLUSTER_URL
environment variable.
112-112
: LGTM: Standardized URL retrieval for job statistics collectionUpdated to use the unified
cluster_url
from context metadata for database address retrieval.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 112-112: cratedb_toolkit/cfr/cli.py#L112
Added line #L112 was not covered by tests
141-141
: LGTM: Consistent URL handling in job statistics viewUsing the standardized
cluster_url
for database address information maintains consistency with other commands.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 141-141: cratedb_toolkit/cfr/cli.py#L141
Added line #L141 was not covered by tests
164-165
: LGTM: Updated environment variable for job statistics reportingUpdated to use the standardized
cluster_url
for database addressing and setting the new consolidated environment variable.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 164-165: cratedb_toolkit/cfr/cli.py#L164-L165
Added lines #L164 - L165 were not covered by tests
181-182
: LGTM: Consistent URL handling in job statistics UIUsing the standardized
cluster_url
for database addressing and setting the new consolidated environment variable maintains consistency.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 181-182: cratedb_toolkit/cfr/cli.py#L181-L182
Added lines #L181 - L182 were not covered by tests
207-207
: LGTM: Updated info recorder to use standardized URLThe info record command now uses the unified
cluster_url
for consistency with other commands.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 207-207: cratedb_toolkit/cfr/cli.py#L207
Added line #L207 was not covered by testscratedb_toolkit/__init__.py (3)
1-1
: LGTM: Added linter directiveAdding the
ruff: noqa: E402
directive suppresses linter warnings about import positioning, which is necessary due to the code structure.
18-21
: LGTM: Exposing key cluster abstractionsExposing the new cluster abstractions (
DatabaseCluster
,ManagedCluster
,StandaloneCluster
) and configuration functions at the package level provides a clean API for users.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests
22-29
: LGTM: Well-defined public APIThe explicit
__all__
list clearly defines the public API of the package, which is a good practice for maintainability.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by testscratedb_toolkit/adapter/rockset/cli.py (1)
9-9
: Consistent CLI option refactoring for unified cluster addressing.The code has been properly refactored to use the standard
option_cluster_url
imported fromcratedb_toolkit.option
instead of a local definition. The parameter name in the function signature and all references have been updated consistently fromcratedb_sqlalchemy_url
tocluster_url
. The environment variable in the help text has also been updated fromCRATEDB_SQLALCHEMY_URL
toCRATEDB_CLUSTER_URL
.This change aligns with the unified cluster management approach described in the PR objectives.
Also applies to: 22-22, 29-29, 34-34, 38-38, 41-41
doc/backlog/main.md (2)
3-17
: Well-prioritized backlog for future API improvements.The new "Iteration +0" section outlines important next steps for improving the CLI API, particularly around cluster addressing. The proposal to consolidate
--cluster-url
,--cluster-id
, and--cluster-name
into a single, more intuitive--cluster
option could significantly improve the user experience.The backlog also identifies important operational issues such as table pruning on CrateDB Cloud and verbose logging during tests, which could affect system reliability and developer experience.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. -cratedb_toolkit/cfr/cli.py
: Migrate toctx.meta["address"]
, like ...(UNLIKELY_OPENING_PUNCTUATION)
[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...(SOONER_OR_LATER)
57-57
: Consistent command and environment variable naming.The backlog has been updated to use the new command name
ctk cluster list-jobs
(instead ofctk list-jobs
) and the environment variableCRATEDB_CLUSTER_ID
(instead ofCRATEDB_CLOUD_CLUSTER_ID
), maintaining consistency with the changes in the actual implementation.Also applies to: 62-62
doc/cluster/python.md (1)
1-80
: Comprehensive documentation for the Python cluster management API.The documentation provides a clear and complete guide to using the
ManagedCluster
class, covering installation, authentication, configuration, and usage with practical examples. The explanation of context manager behavior and thestop_on_exit
parameter is particularly valuable for users to understand resource management.The documentation also follows good practices by including:
- Multiple authentication options (interactive and headless)
- Environment variable configuration
- Explicit notes about precedence and mutual exclusivity
- Code examples demonstrating real-world usage
- References to a full tutorial for more details
🧰 Tools
🪛 LanguageTool
[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...(UNLIKELY_OPENING_PUNCTUATION)
doc/util/shell.md (2)
34-46
: Excellent authentication documentation for CrateDB Cloud.This section clearly explains the two authentication methods for CrateDB Cloud: interactive login with identity providers and headless authentication using API keys. The examples are helpful and easy to follow.
56-76
: Connection examples align well with the unified cluster addressing scheme.The examples for connecting to a standalone CrateDB instance properly demonstrate the use of the new unified
--cluster-url
option and the corresponding environment variableCRATEDB_CLUSTER_URL
, which aligns with the broader refactoring across the toolkit.cratedb_toolkit/io/cli.py (4)
8-10
: Import changes reflect the new unified cluster addressing scheme.The change from importing multiple cluster-related classes to a single
DatabaseCluster
import simplifies the code and aligns with the refactoring to consolidate cluster addressing options.
30-32
: CLI options have been unified for cluster addressing.The replacement of previous separate options with new consolidated options (
option_cluster_id
,option_cluster_name
,option_cluster_url
) improves the consistency of the CLI interface.
60-61
: Renamed variable for clarity.Changing the variable name from
resource
tosource
improves readability by making it clearer that this represents the source data for the import operation.
64-69
: Simplified cluster creation using the new factory method.The code now uses the
DatabaseCluster.create
factory method which encapsulates the logic for determining the cluster type based on the provided parameters. This is a significant improvement in code simplicity and maintainability.cratedb_toolkit/model.py (3)
17-52
: Well-structured validation for cluster address options.The new
ClusterAddressOptions
dataclass provides a clean way to encapsulate and validate cluster address inputs. The validation logic ensures mutual exclusivity and handles edge cases properly.
90-99
: Good addition of credential injection method.The
with_credentials
method provides a clean way to add authentication credentials to a database address, supporting a fluent interface pattern.
114-116
: Improved default handling for missing host.Setting a default host of "localhost" when none is provided improves the robustness of the
httpuri
property.doc/cluster/backlog.md (1)
1-13
: Well-structured backlog document with clear contribution guidelines.The introduction sets a positive tone for contributions and provides clear information about how to get involved with the project.
cratedb_toolkit/util/croud.py (2)
282-321
: Well-implemented table quoting functionThe
table_fqn
function correctly handles various table name scenarios, including schema-qualified names and already-quoted names. The docstring is thorough with good examples and edge case handling for empty tables.
47-62
: Well-designed safedecode method with clean error handlingThe
invoke_safedecode
method properly handles partial JSON data and unpacks the tuple returned byJSONDecoder.raw_decode()
. The comments clearly explain why this is needed as a workaround for upstream issues.cratedb_toolkit/cluster/croud.py (1)
24-62
: Well-designed URL generatorsThe
CloudRootUrlGenerator
class is well-designed with clear properties for generating different API URLs. Thewith_organization
method provides a fluent interface for setting the organization ID.cratedb_toolkit/cluster/core.py (2)
426-435
: 🛠️ Refactor suggestionJWT token expiration not handled
The current implementation creates a JWT token context once in
probe()
and reuses it for every query. If the token expires (typically after 1 hour), subsequent queries will fail.Track token creation time and refresh it when needed:
def __init__(self, ...): # Existing code... self._jwt_ctx = nullcontext() + self._jwt_token_created_at = 0 # Timestamp when token was created def probe(self): # Existing code... self._jwt_ctx = jwt_token_patch(self.info.jwt.token) + self._jwt_token_created_at = time.time() # Rest of the method... def query(self, sql: str): # Ensure we have cluster connection details if not self.info or not self.info.cloud.get("url"): self.probe() + + # Refresh token if it's older than 30 minutes (to be safe before the 1h expiry) + current_time = time.time() + if current_time - self._jwt_token_created_at > 1800: # 30 minutes + logger.debug("Refreshing JWT token") + self._jwt_ctx = jwt_token_patch(self.info.jwt.token) + self._jwt_token_created_at = current_time with self._jwt_ctx: # Rest of the method...Likely an incorrect or invalid review comment.
324-328
:⚠️ Potential issueAdd validation for empty
operation
The code doesn't check if
self.operation
isNone
before callingsuspend()
, which could lead to an AttributeError.Add validation for empty
operation
:if self.cluster_id is None: raise DatabaseAddressMissingError("Need cluster identifier to stop cluster") logger.info(f"Stopping CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}") - if self.operation: - self.operation.suspend() + if not self.operation: + # If the operation is None, try to probe the cluster first to initialize it + logger.debug("Operation not initialized, probing cluster") + self.probe() + if not self.operation: + raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}") + + self.operation.suspend() self.probe() return selfLikely an incorrect or invalid review comment.
sqlalchemy_url = os.getenv("CRATEDB_CLUSTER_URL", "crate://?schema=stats") | ||
engine = sa.create_engine(sqlalchemy_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Environment variable and variable name standardized correctly
The change from CRATEDB_SQLALCHEMY_URL
to CRATEDB_CLUSTER_URL
and updating the local variable name aligns with the PR objective to unify cluster addressing across the codebase.
While the code changes look good, the static analysis tool indicates this code isn't covered by tests. Consider adding test coverage for this Marimo application.
🏁 Script executed:
#!/bin/bash
# Check if there are any tests for Marimo integration
rg "test.*marimo" --type py tests/
Length of output: 36
Add test coverage for Marimo integration
No tests currently cover cratedb_toolkit/cfr/marimo.py
(lines 53–54). Please add unit or integration tests to exercise the new CRATEDB_CLUSTER_URL
logic, for example:
• Create a new test file tests/test_marimo.py
• Mock os.getenv
to verify both the default URL ("crate://?schema=stats"
) and an overridden CRATEDB_CLUSTER_URL
• Mock sa.create_engine
to assert it’s called with the expected connection string
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cfr/marimo.py at lines 53-54, add test coverage for the new environment variable logic by creating a test file tests/test_marimo.py. In this test, mock os.getenv to simulate both the default value "crate://?schema=stats" and a custom CRATEDB_CLUSTER_URL, and mock sa.create_engine to verify it is called with the correct connection string in each case. This will ensure the environment variable handling and engine creation are properly tested.
export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400' | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Missing username and password environment variables
The example removed the previously included environment variables for CRATEDB_USERNAME
and CRATEDB_PASSWORD
. Users might need these credentials to authenticate with the cluster.
Is authentication now handled differently, or should these environment variables still be included in the example?
🏁 Script executed:
#!/bin/bash
# Look for references to authentication in the codebase
rg "CRATEDB_USERNAME|CRATEDB_PASSWORD" --type md doc/
Length of output: 127
Restore missing authentication environment variables
The doc/io/index.md
example only sets CRATEDB_CLUSTER_ID
, but users still need credentials to connect. Unless you’ve switched to a token-based or OIDC flow, please re-add the username/password vars around lines 57–58:
export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400'
+export CRATEDB_USERNAME='admin'
+export CRATEDB_PASSWORD='<your-password>'
If authentication truly moved to another mechanism (e.g. API tokens or SSO), update the example and note the new steps here.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400' | |
``` | |
export CRATEDB_CLUSTER_ID='e1e38d92-a650-48f1-8a70-8133f2d5c400' | |
export CRATEDB_USERNAME='admin' | |
export CRATEDB_PASSWORD='<your-password>' |
🤖 Prompt for AI Agents (early access)
In doc/io/index.md around lines 57 to 58, the example sets only CRATEDB_CLUSTER_ID but omits CRATEDB_USERNAME and CRATEDB_PASSWORD environment variables needed for authentication. Restore these two variables in the example to ensure users have the necessary credentials to connect, unless authentication has changed to a different method. If authentication has changed, update the example and documentation to reflect the new process clearly.
if not cluster_url: | ||
logger.error("Unable to operate without database address") | ||
sys.exit(1) | ||
ctx.meta.update({"cratedb_sqlalchemy_url": cratedb_sqlalchemy_url}) | ||
ctx.meta.update({"cluster_url": cluster_url}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider adding test coverage for the error path.
The error handling for a missing cluster URL is not covered by tests according to the static analysis hints. This is a critical validation path that ensures the CLI fails properly when required parameters are missing.
🏁 Script executed:
#!/bin/bash
# Check for existing test coverage of the error path
# Look for test files related to rockset cli
echo "Test files for rockset CLI:"
fd "test.*rockset.*cli" -t f
# Check if there are any tests that specifically test the error path
echo -e "\nTests for missing cluster_url validation:"
rg -A 3 -B 3 "missing.*cluster.*url|no.*cluster.*url|without.*cluster.*url" --glob="*test*" --glob="*.py"
Length of output: 1228
Add tests for missing cluster URL error path
No existing tests cover the scenario where cluster_url
is not provided in the CLI. Please add unit tests to verify that the application:
- Logs
"Unable to operate without database address"
- Exits with status code 1
at both locations:
cratedb_toolkit/adapter/rockset/cli.py
(lines 38–41)cratedb_toolkit/cmd/tail/cli.py
These tests should simulate invoking the CLI without a cluster_url
and assert on the logged output and exit code.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests
[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/adapter/rockset/cli.py around lines 38 to 41, add unit tests to cover the error path when cluster_url is missing. The tests should simulate running the CLI without providing cluster_url, then assert that the logger outputs "Unable to operate without database address" and that the process exits with status code 1. This ensures the critical validation path is properly tested and the CLI fails as expected.
def __post_init__(self): | ||
self.fix_job_info_table_name() | ||
|
||
@classmethod | ||
def unknown(cls, message: str): | ||
cj = cls() | ||
cj._custom_message = message | ||
cj._custom_status = "UNKNOWN" | ||
return cj | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
__post_init__
could handle non-existent keys more robustly
The fix_job_info_table_name
method is called in __post_init__
, but there's no check if self.info
has been initialized correctly before accessing its nested keys. For jobs created with CloudJob.unknown()
, info
will be an empty dict.
def __post_init__(self):
- self.fix_job_info_table_name()
+ if self.info: # Only attempt to fix table name if info dictionary is populated
+ self.fix_job_info_table_name()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def __post_init__(self): | |
self.fix_job_info_table_name() | |
@classmethod | |
def unknown(cls, message: str): | |
cj = cls() | |
cj._custom_message = message | |
cj._custom_status = "UNKNOWN" | |
return cj | |
def __post_init__(self): | |
- self.fix_job_info_table_name() | |
+ if self.info: # Only attempt to fix table name if info dictionary is populated | |
+ self.fix_job_info_table_name() | |
@classmethod | |
def unknown(cls, message: str): | |
cj = cls() | |
cj._custom_message = message | |
cj._custom_status = "UNKNOWN" | |
return cj |
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/io/croud.py around lines 37 to 46, the __post_init__ method calls fix_job_info_table_name without verifying if self.info is properly initialized, which can cause errors when info is an empty dict as in CloudJob.unknown(). Modify __post_init__ to check if self.info exists and contains the expected keys before calling fix_job_info_table_name, or update fix_job_info_table_name to handle missing or empty info gracefully to avoid key errors.
@contextlib.contextmanager | ||
def handle_command_errors(operation_name): | ||
"""Handle common command errors and exit with appropriate error messages.""" | ||
try: | ||
yield | ||
except CroudException as ex: | ||
logger.error( | ||
f"Failed to {operation_name}. " | ||
f"Please check if you are addressing the right cluster, " | ||
f"and if credentials and permissions are valid. " | ||
f"The underlying error was: {ex}" | ||
) | ||
raise | ||
except ClickException: | ||
raise | ||
except Exception as ex: | ||
logger.exception(f"Unexpected error on operation: {operation_name}") | ||
raise SystemExit(1) from ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add tests for handle_command_errors
context manager
This error-handling code lacks test coverage according to static analysis, which could lead to unexpected behavior in error cases.
You should add tests that verify:
- A
CroudException
is logged and re-raised - A
ClickException
passes through without modification - A generic exception is converted to
SystemExit(1)
🏁 Script executed:
#!/bin/bash
# Check for existing tests for handle_command_errors
rg "handle_command_errors" tests/ || echo "No tests found for the handle_command_errors context manager"
Length of output: 164
Add tests for the handle_command_errors
context manager
The handle_command_errors
helper in cratedb_toolkit/cluster/cli.py
currently has no test coverage. Please add unit tests that verify:
- A
CroudException
is caught, logged vialogger.error(...)
, and then re-raised. - A
ClickException
passes straight through unchanged. - Any other
Exception
is logged vialogger.exception(...)
and translated into aSystemExit(1)
.
Suggested test locations:
- tests/test_cli_error_handling.py
Example using pytest and caplog:
import pytest
import contextlib
from click import ClickException
from cratedb_toolkit.cluster.cli import handle_command_errors
from croud.exceptions import CroudException
def test_croud_exception_is_logged_and_reraised(caplog):
caplog.set_level("ERROR")
with pytest.raises(CroudException):
with handle_command_errors("do_op"):
raise CroudException("oops")
assert "Failed to do_op." in caplog.text
assert "The underlying error was: oops" in caplog.text
def test_click_exception_passes_through():
with pytest.raises(ClickException):
with handle_command_errors("do_op"):
raise ClickException("click fail")
def test_generic_exception_becomes_system_exit(caplog):
caplog.set_level("ERROR")
with pytest.raises(SystemExit) as excinfo:
with handle_command_errors("do_op"):
raise ValueError("bad")
assert excinfo.value.code == 1
assert "Unexpected error on operation: do_op" in caplog.text
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/cli.py around lines 186 to 203, the handle_command_errors context manager lacks test coverage. Add unit tests in a new file like tests/test_cli_error_handling.py that verify: a CroudException is logged with logger.error and re-raised, a ClickException passes through unchanged, and any other Exception is logged with logger.exception and converted into a SystemExit(1). Use pytest and caplog to capture logs and exceptions as shown in the provided example.
# TODO: Add documentation about those environment variables. | ||
cratedb_cloud_tier = os.environ.get("CRATEDB_CLOUD_TIER", "default") | ||
cratedb_cloud_product = os.environ.get("CRATEDB_CLOUD_PRODUCT", "crfree") # s2, etc. | ||
cratedb_channel = os.environ.get("CRATEDB_CHANNEL", "stable") | ||
cratedb_version = os.environ.get("CRATEDB_VERSION", DEFAULT_CRATEDB_VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Environment variables need documentation
The code uses several environment variables (CRATEDB_CLOUD_TIER
, CRATEDB_CLOUD_PRODUCT
, CRATEDB_CHANNEL
, and CRATEDB_VERSION
) that lack documentation as noted in the TODO comment.
Remove the TODO and add detailed documentation for these variables in the user documentation. The variables should be explained with:
- Their purpose
- Default values
- Allowed values (e.g., valid tiers and products)
- Examples of usage
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/croud.py around lines 243 to 247, remove the TODO comment about documenting environment variables and instead add detailed documentation for CRATEDB_CLOUD_TIER, CRATEDB_CLOUD_PRODUCT, CRATEDB_CHANNEL, and CRATEDB_VERSION in the user documentation. The documentation should explain each variable's purpose, default values, allowed values (such as valid tiers and products), and provide usage examples.
def __enter__(self): | ||
"""Enter the context manager, ensuring the cluster is running.""" | ||
return self.start() | ||
|
||
def __exit__(self, exc_type, exc_val, exc_tb): | ||
"""Exit the context manager, suspending the cluster.""" | ||
if exc_type is not None: | ||
# Log the exception but still attempt to suspend the cluster | ||
logger.error(f"Exception occurred while shutting down: {exc_type.__name__}: {exc_val}") | ||
|
||
try: | ||
self.close_connections() | ||
except Exception as ex: | ||
logger.error(f"Failed to close connections: {ex}") | ||
finally: | ||
if self.stop_on_exit: | ||
try: | ||
self.stop() | ||
logger.info(f"Successfully stopped cluster: id={self.cluster_id}, name={self.cluster_name}") | ||
except Exception as ex: | ||
logger.error(f"Failed to stop cluster: {ex}") | ||
|
||
# Don't suppress the original exception. | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure context manager test coverage
The context manager implementation (__enter__
and __exit__
) lacks test coverage according to static analysis hints. These methods are critical as they ensure proper resource acquisition and cleanup, especially when handling errors.
Consider adding tests for:
- Normal operation (verifying
start()
is called and its value returned) - Error scenarios (exceptions inside the
with
block) - Cleanup behavior (ensuring
close_connections()
is always called, andstop()
is called whenstop_on_exit=True
)
🏁 Script executed:
#!/bin/bash
# Check for any existing context manager tests
echo "Searching for context manager tests..."
rg -p "def test.*context|with .*[Cc]luster" tests/
echo "Searching for tests of __enter__ or __exit__..."
rg -p "__enter__|__exit__" tests/
Length of output: 706
Add explicit tests for the cluster context manager
The __enter__
/__exit__
implementation in cratedb_toolkit/cluster/core.py (lines 142–165) is critical for starting, cleaning up, and stopping clusters but isn’t directly exercised by existing tests. Please add focused unit tests that:
- Verify normal operation:
• Mock start() and assert it’s called once and its return value is yielded by the with block. - Cover exception handling:
• Simulate an exception inside the with block, assert that the error is logged, that close_connections() still runs, and that the original exception propagates. - Ensure cleanup on exit:
• With stop_on_exit=True, mock and assert calls to close_connections() and then stop(), verifying the correct call order.
Location to test:
• cratedb_toolkit/cluster/core.py @ lines 142–165
Tests can live under tests/cluster/… and use unittest.mock (or pytest’s monkeypatch) to spy on start(), close_connections(), and stop().
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 144-144: cratedb_toolkit/cluster/core.py#L144
Added line #L144 was not covered by tests
[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests
🤖 Prompt for AI Agents (early access)
In tests/cluster/, add new unit tests targeting cratedb_toolkit/cluster/core.py lines 142-165 to cover the context manager methods __enter__ and __exit__. Create tests that mock start() to verify it is called once and its return value is yielded during normal operation. Add tests simulating exceptions inside the with block to confirm that logger.error is called with the exception details, close_connections() is still invoked, and the original exception is propagated. Also, test cleanup behavior with stop_on_exit=True by mocking close_connections() and stop(), asserting both are called in the correct order on exit. Use unittest.mock or pytest monkeypatch to spy on these methods and logger calls.
settings = obtain_settings(specs=cls.settings_spec) | ||
check_mutual_exclusiveness(specs=cls.settings_spec, settings=settings) | ||
return cls(**settings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add test coverage for ManagedClusterSettings.from_cli_or_env
The static analysis hints indicate that the from_cli_or_env
method lacks test coverage. This method is critical for obtaining and validating settings from CLI or environment variables.
Add tests to verify that:
- Settings are correctly obtained from CLI arguments
- Settings are correctly obtained from environment variables
- Mutual exclusiveness checks work as expected
- Error cases are handled properly
🏁 Script executed:
#!/bin/bash
# Check for existing tests related to settings
echo "Searching for tests related to ManagedClusterSettings..."
rg -p "ManagedClusterSettings" tests/
Length of output: 450
🏁 Script executed:
#!/bin/bash
# Find the definition of from_cli_or_env in the codebase
echo "Searching for from_cli_or_env definition..."
rg -n "def from_cli_or_env" -n cratedb_toolkit/cluster/core.py
# Find any existing tests invoking from_cli_or_env
echo "Searching for from_cli_or_env usage in tests..."
rg -n "from_cli_or_env" -n tests/
Length of output: 333
Add missing tests for ManagedClusterSettings.from_cli_or_env
The from_cli_or_env
class method (cratedb_toolkit/cluster/core.py:98–101) isn’t invoked by any existing tests. Please add coverage to verify that:
- CLI arguments are correctly parsed and mapped to
ManagedClusterSettings
- Environment variables are picked up when CLI args are absent
check_mutual_exclusiveness
enforces exclusive settings as expected- Error conditions (e.g. conflicting flags or missing required settings) raise the proper exceptions
Note that tests/conftest.py already references settings_spec
but does not call from_cli_or_env
directly—add new tests under tests/
to cover these scenarios.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/core.py around lines 99 to 101, the class method from_cli_or_env lacks test coverage. Add new tests under the tests/ directory that invoke from_cli_or_env to verify it correctly obtains settings from CLI arguments and environment variables, enforces mutual exclusiveness via check_mutual_exclusiveness, and properly raises exceptions on error cases such as conflicting or missing settings. Ensure these tests cover all expected behaviors and edge cases for this method.
# Wait a bit to let the deployment settle, mostly to work around DNS propagation issues. | ||
attempts = 0 | ||
while not self.info.ready and attempts < self.POST_DEPLOY_WAIT_ATTEMPTS: | ||
attempts += 1 | ||
self.probe() | ||
time.sleep(self.POST_DEPLOY_WAIT_PAUSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
DNS propagation delay handling is brittle
The fixed-sleep approach for DNS propagation is brittle. DNS propagation timing varies greatly across networks and might not be resolved by a fixed sleep duration.
Implement a hybrid approach with initial wait followed by exponential backoff:
- # Wait a bit to let the deployment settle, mostly to work around DNS propagation issues.
- attempts = 0
- while not self.info.ready and attempts < self.POST_DEPLOY_WAIT_ATTEMPTS:
- attempts += 1
- self.probe()
- time.sleep(self.POST_DEPLOY_WAIT_PAUSE)
+ # Hybrid approach for DNS propagation
+ # First, a short initial sleep to avoid negative DNS caching
+ time.sleep(1.5)
+
+ # Then use exponential backoff for active polling
+ import random
+ from boltons.urlutils import URL
+
+ max_retries = 10
+ base_delay = 0.5
+ for attempt in range(max_retries):
+ # Check if the cluster is ready
+ self.probe()
+ if self.info.ready:
+ break
+
+ # Try to verify DNS resolution if URL is available
+ try:
+ cluster_url = self.info.cloud.get("url")
+ if cluster_url:
+ # Test DNS resolution
+ import socket
+ hostname = URL(cluster_url).host
+ socket.gethostbyname(hostname)
+ logger.info(f"DNS resolved after {attempt+1} attempts")
+ except Exception as ex:
+ logger.debug(f"Waiting for DNS propagation (attempt {attempt+1}): {ex}")
+
+ # Exponential backoff with jitter
+ jitter = random.uniform(0, 0.1)
+ delay = min(base_delay * (2 ** attempt) + jitter, 10) # Cap at 10 seconds
+ time.sleep(delay)
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/core.py around lines 290 to 295, the current fixed sleep loop for waiting on DNS propagation is brittle and may not handle variable delays well. Replace the fixed sleep with a hybrid approach: start with an initial short wait, then use exponential backoff for subsequent retries, increasing the wait time after each failed probe attempt. This will make the DNS propagation wait more adaptive and robust.
if self.cluster_id is None: | ||
raise DatabaseAddressMissingError("Need cluster identifier to resume cluster") | ||
logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}") | ||
if self.operation: | ||
self.operation.resume() | ||
self.probe() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for empty operation
in resume method
Similar to the stop
method, the resume
method doesn't check if self.operation
is None
before calling resume()
, which could lead to an AttributeError.
Add validation for empty operation
:
if self.cluster_id is None:
raise DatabaseAddressMissingError("Need cluster identifier to resume cluster")
logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}")
- if self.operation:
- self.operation.resume()
+ if not self.operation:
+ # If the operation is None, try to probe the cluster first to initialize it
+ logger.debug("Operation not initialized, probing cluster")
+ self.probe()
+ if not self.operation:
+ raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}")
+
+ self.operation.resume()
self.probe()
return self
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if self.cluster_id is None: | |
raise DatabaseAddressMissingError("Need cluster identifier to resume cluster") | |
logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}") | |
if self.operation: | |
self.operation.resume() | |
self.probe() | |
--- a/cratedb_toolkit/cluster/core.py | |
+++ b/cratedb_toolkit/cluster/core.py | |
@@ def resume(self): | |
if self.cluster_id is None: | |
raise DatabaseAddressMissingError("Need cluster identifier to resume cluster") | |
logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}") | |
- if self.operation: | |
- self.operation.resume() | |
+ if not self.operation: | |
+ # If the operation hasn’t been initialized yet, probe to set it up | |
+ logger.debug("Operation not initialized, probing cluster") | |
+ self.probe() | |
+ if not self.operation: | |
+ raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}") | |
+ | |
+ # Now resume the operation | |
+ self.operation.resume() | |
self.probe() | |
return self |
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/core.py around lines 309 to 314, the resume method calls self.operation.resume() without checking if self.operation is None, which can cause an AttributeError. Add a validation to check if self.operation is not None before calling resume(), similar to the stop method, to prevent errors when operation is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
doc/cluster/backlog.md (1)
35-41
: Specify a clear resolution after status evaluation
The backlog describes checkingcloud.info.last_async_operation.status
but stops short of advising what to do next. A concrete next step (e.g., retry behavior or user notification) will help contributors implement this scenario correctly.Example addition:
=> Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`. + => If still in progress, retry with exponential backoff or inform the user to wait before retrying.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
41-41: Bare URL used
null(MD034, no-bare-urls)
🧹 Nitpick comments (9)
doc/backlog/main.md (4)
3-3
: Nested header level for clarity. The new## Iteration +0
header is at the same level as the top-level backlog title (## Main Backlog
), making it appear as a peer section rather than a sub-section. Consider demoting it to### Iteration +0
to maintain a clear hierarchy.
12-15
: Refine debug log snippet and add language tag. The fenced code block for the HTTP client debug output currently lacks a language identifier (MD040) and includes full JSON responses that may be overly verbose. Consider:
- Adding a language tag (e.g.,
text or
shell).- Summarizing or truncating the logs (e.g., using
...
) to keep the backlog concise.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
12-12: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
57-57
: Escape inner quotes in code quote. The bulletMake \
ctk cluster list-jobs` respect "status": "SUCCEEDED"` mixes backticks and double quotes, which can confuse Markdown renderers. Consider using single quotes for the status or escaping the inner quotes for clarity:- UX: Make `ctk cluster list-jobs` respect 'status': 'SUCCEEDED'
61-63
: Specify language for fenced environment snippet. The block showing theCRATEDB_CLUSTER_URL
example is fenced without a language. Addshell
(e.g., ```shell) to comply with markdownlint (MD040) and improve readability.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
61-61: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
doc/cluster/backlog.md (5)
41-41
: Wrap the bare URL in a markdown link
Bare URLs trigger markdownlint's MD034 rule. Converting this to a descriptive link improves readability and lint compliance.- See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384. + See also [GitHub Actions run #14682363239](https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384).🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
41-41: Bare URL used
null(MD034, no-bare-urls)
47-47
: Wrap the bare URL in a markdown link
Avoid bare URLs per MD034; use a labeled hyperlink.- - Update `shunit2`. -- https://github.com/kward/shunit2 + - Update `shunit2`. -- [kward/shunit2 on GitHub](https://github.com/kward/shunit2)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
47-47: Bare URL used
null(MD034, no-bare-urls)
51-51
: Remove stray backtick at end of line
There's an unmatched backtick that may render incorrectly in Markdown.- - Naming things: s/UniversalCluster/DatabaseCluster/` + - Naming things: s/UniversalCluster/DatabaseCluster/
72-72
: Add a language specifier to the fenced code block
Specifying a language tag enables syntax highlighting and satisfies MD040.- ``` + ```json🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
72-72: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
87-87
: Remove unnecessary comma before "because"
The comma before "because" is not needed when the clause is essential.- - Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`, + - Shell: Currently it is not possible to create multiple instances of a `ManagedCluster`🧰 Tools
🪛 LanguageTool
[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of aManagedCluster
, because parameters likeusername
and `passwor...(COMMA_BEFORE_BECAUSE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cratedb_toolkit/shell/cli.py
(3 hunks)doc/backlog/main.md
(2 hunks)doc/cluster/backlog.md
(1 hunks)tests/examples/test_shell.sh
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/examples/test_shell.sh
- cratedb_toolkit/shell/cli.py
🧰 Additional context used
🪛 LanguageTool
doc/cluster/backlog.md
[uncategorized] ~65-~65: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start
: Avoid creating a project on each deploy...
(UNLIKELY_OPENING_PUNCTUATION)
[formatting] ~87-~87: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...multiple instances of a ManagedCluster
, because parameters like username
and `passwor...
(COMMA_BEFORE_BECAUSE)
doc/backlog/main.md
[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py
: Migrate to ctx.meta["address"]
, like ...
(UNLIKELY_OPENING_PUNCTUATION)
[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...
(SOONER_OR_LATER)
[uncategorized] ~59-~59: The preposition ‘to’ seems more likely in this position.
Context: ... Accept alias --format {jsonl,ndjson}
for --format json_row
- Catch recursion ...
(AI_HYDRA_LEO_REPLACE_FOR_TO)
🪛 markdownlint-cli2 (0.17.2)
doc/cluster/backlog.md
41-41: Bare URL used
null
(MD034, no-bare-urls)
47-47: Bare URL used
null
(MD034, no-bare-urls)
72-72: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
doc/backlog/main.md
12-12: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
61-61: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: build-and-publish
- GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
- GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
- GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
- GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
- GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
- GitHub Check: CFR for OS windows-latest
- GitHub Check: CFR for OS ubuntu-latest
- GitHub Check: Python 3.11, LocalStack 3.6, OS ubuntu-latest
- GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
- GitHub Check: Python 3.9, LocalStack 3.6, OS ubuntu-latest
- GitHub Check: CFR for OS macos-13
🔇 Additional comments (4)
doc/backlog/main.md (4)
4-5
: Clarify CLI flag consolidation. You propose both a canonical--cluster-url
and a single--cluster
option. It may be helpful to specify whether they serve different purposes or if one replaces multiple existing flags. This will guide future implementation and documentation.
8-8
: Verify migration of CLI metadata. Ensure thatcratedb_toolkit/cfr/cli.py
is updated to set and passctx.meta["address"]
identically tocratedb_toolkit/info/cli.py
, and update any associated tests or documentation.🧰 Tools
🪛 LanguageTool
[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. -cratedb_toolkit/cfr/cli.py
: Migrate toctx.meta["address"]
, like ...(UNLIKELY_OPENING_PUNCTUATION)
62-62
: Verify updated environment variable. Confirm that the SDK and CLI configuration loader acceptCRATEDB_CLUSTER_URL
(instead of the deprecatedCRATEDB_SQLALCHEMY_URL
), and update any references in the documentation or.env
examples accordingly.
70-70
: Ensure constant is defined and used. The backlog notes storingCRATEDB_CLUSTER_ID
incratedb_toolkit.constants
. Verify that this constant is declared in the codebase, documented inconstants.py
, and consumed by the CLI/SDK as intended.
About
Wraps a few calls of
croud
into a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.Documentation
https://cratedb-toolkit--81.org.readthedocs.build/cluster/
https://cratedb-toolkit--81.org.readthedocs.build/util/shell.html
Configuration
Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (
.env
).Python SDK
Deploy/start/resume cluster, run database workload, and suspend cluster again.
See also
examples/python/cloud_cluster.py
andexamples/python/cloud_import.py
.CLI interface
Deploy cluster, import data, and query a few samples worth of data.
See also
examples/shell/cloud_cluster.sh
andexamples/shell/cloud_import.sh
.References
Backlog
Trivia
Favorite poems
In the toolkit’s warren, new features abound,
With clusters managed, imports now sound.
CLI and shell, both smarter and spry,
Python and notebooks reach for the sky.
Docker skips gently if it’s not around,
While tests and examples in harmony are found.
Oh, what a hop for this code-loving bunny!
Data to the cloud—smooth, fast, and sunny!
A rabbit hopped to cloud and sky,
With toolkit new, it aimed to try:
"Start a cluster, import with flair,
Use Python, shell, or notebook there.
Settings checked, containers spun,
Tests and docs—oh what fun!
Cloud and code now leap as one! 🐇☁️
In the cloud and on the ground,
Rabbits hop where clusters are found.
With toolkit new and CLI bright,
They probe, deploy, and import with might.
From shell to Python, tests abound,
Data hops in—jobs resound!
🐇✨
🐇
A toolkit grows with clusters grand,
Cloud or local, at your command.
With settings checked and errors clear,
New guides and tests are gathered here.
Import, deploy, or query deep,
The rabbit’s promise: data you’ll keep!
Hopping forward, code refined,
Cloud and shell, now intertwined.