Feat: state import/export #4038

erindru · 2025-03-26T21:51:05Z

This PR implements the ability to export the state database to a file and import it back.

The state export file format is a json file. I tried to implement a streaming interface via the StateStream abstraction and the use of the json_stream library. The goal is to be able to dump large projects without loading everything into memory and crashing with an OOM.

In terms of version compatibility, there is a hard requirement to use the same version of SQLMesh to load the state as was used to dump it. This greatly simplifies the implementation and ensures our Pydantic model definitions will always be compatible. Guidance is included in the documentation on how to upgrade an older state file to be compatible with a new version of SQLMesh.

State export:

State import:

erindru · 2025-03-26T21:54:27Z

sqlmesh/cli/main.py

+
+
+@cli.group(no_args_is_help=True)
+def state() -> None:


This is a slight departure from our existing structure but I decided to group state operations to keep things open for more operations in future (such as perhaps being able to query state).

So the CLI syntax is sqlmesh state dump or sqlmesh state load vs something like sqlmesh state_dump or sqlmesh state_load.

@treysp i'd be keen to know if this is a direction you were planning to head in given the recent CLI refactoring work

erindru · 2025-03-26T21:56:48Z

sqlmesh/core/context.py

+                    f"This [b]destructive[/b] operation will delete all existing state against the '{self.selected_gateway}' gateway \n"
+                    f"and replace it with what's in the '{input_file.as_posix()}' file."
+                )
+                if isinstance(self.console, TerminalConsole):


I had some trouble figuring out how to handle the confirmations since they need to go before the load or dump progress even starts.

I'm not super happy with this implementation, open to ideas

This logic needs to go into the console implementation itself.

I've refactored how the console interactions work and have moved it

erindru · 2025-03-26T21:59:27Z

sqlmesh/core/state_sync/base.py

@@ -459,6 +464,16 @@ def add_interval(
        )
        self.add_snapshots_intervals([snapshot_intervals])

+    @abc.abstractmethod
+    def load(self, stream: StateStream, clear: bool = True) -> None:


I decided to make dump/load first class citizens on StateSync rather than trying to coordinate everything over the public interface.

The reason is to allow different StateSync implementations to perform local optimizations or call internal methods without exposing them publicly and also helps with being able to wrap things in transactions.

sqlmesh/core/state_sync/common.py

sqlmesh/core/state_sync/dump_load.py

izeigerman · 2025-03-27T15:25:28Z

sqlmesh/core/context.py

@@ -2074,6 +2077,56 @@ def clear_caches(self) -> None:
        for path in self.configs:
            rmtree(path / c.CACHE)

+    def dump_state(self, output_file: Path, confirm: bool = True) -> None:


What about the following options:

Dump local state only

Only dump a specific environment(s)

I was planning to add those in follow-up PR's to prevent this one from becoming too big.

Also, what is local state? Sounds like something that isn't even present in the StateSync, is it a ContextDiff against prod?

Local state is context.snapshots reflecting local changes that have not yet been persisted to the StateSync.

To support it, we need some extra metadata in the state file to "taint" the file as non importable if it contains local state

Local state export is now available via:

$ sqlmesh state export --local -o local_state.json

And specific environments can now be exported like:

$ sqlmesh state export --environment foo --environment bar -o specific_environment_state.json

izeigerman · 2025-03-27T15:26:18Z

sqlmesh/core/context.py

+        # trigger a connection to the StateSync so we can fail early if there is a problem
+        self.state_sync.get_versions(validate=True)
+
+        if confirm and isinstance(self.console, TerminalConsole):


This logic should be a part of the console implementation. I don't like breaking the abstraction here.

I've refactored how the console interactions work and have moved it

izeigerman · 2025-03-27T15:27:19Z

sqlmesh/core/context.py

+
+        dump_state(self.state_sync, output_file, self.console)
+
+    def load_state(self, input_file: Path, confirm: bool = True) -> None:


I'd prefer export / import over dump / load.

I've switched the terminology

sqlmesh/core/state_sync/dump_load.py

izeigerman · 2025-03-27T18:06:13Z

sqlmesh/core/state_sync/dump_load.py

+            yield "environments", _dump_environments(state_stream.environments)
+            console.update_state_dump_environments(complete=True)
+
+            yield "auto_restatements", _dump_auto_restatements(state_stream.auto_restatements)


I don't know if we should literally dump our tables 1-to-1. This is way too low level. For example a complete snapshot instance is assembled using data from _snapshots, _intervals and _auto_restatements tables. I don't think we want to expose users to all these internals. Instead, I believe it should just be environments, snapshots, versions.

Additionally, the format should be compatible with export of the local state.

regarding AutoRestatements, I could see that the _auto_restatements table is joined in when calling get_snapshots() but I couldn't see how it was being populated if the snapshots were written back via push_snapshots().

But I guess part of the load could be to extract the auto restatement information from the Snapshot records themselves and call update_auto_restatements() to create the auto restatement records

I've improved the import implementation to keep track of the auto restatements as the snapshots are being inserted and then insert them at the end.

This means the auto restatements table no longer needs to be written to the state file

georgesittas

Did a quick first pass, agree with Iaroslav's comments. Nice 👍

georgesittas · 2025-03-27T19:29:38Z

docs/concepts/state.md

@@ -0,0 +1,235 @@
+# State
+
+SQLMesh stores information about your project in a state database separate from your main warehouse.


[Nit] should we rephrase as "... possibly separate ..." here?

Well, I agree that technically correct is the best kind of correct :)

I guess it's about what we want to encourage. The point is that state is a different workload type to a warehouse workload so for the best experience it needs to be stored in a suitable database type.

Of course this falls down if your warehouse is an OLTP database (PostgreSQL, MySQL, MSSQL) in which case storing state in your warehouse is perfectly fine.

I've tweaked this wording

erindru · 2025-03-31T02:58:50Z

sqlmesh/cli/main.py

+    required=True,
+    type=click.Path(exists=True, dir_okay=False, readable=True, path_type=Path),
+)
+@click.option(


This effectively makes the default import strategy "merge" instead of "replace", so if the user wants to wipe all state they need to pass --replace specifically.

@izeigerman do you have a preference of what the default should be?

erindru commented Mar 26, 2025

View reviewed changes

sqlmesh/core/state_sync/common.py Outdated Show resolved Hide resolved

erindru commented Mar 26, 2025

View reviewed changes

sqlmesh/core/state_sync/dump_load.py Outdated Show resolved Hide resolved

erindru force-pushed the erin/state-dump branch from 76b08ca to 480d023 Compare March 26, 2025 22:07

erindru marked this pull request as ready for review March 26, 2025 22:35

izeigerman reviewed Mar 27, 2025

View reviewed changes

sqlmesh/core/state_sync/dump_load.py Outdated Show resolved Hide resolved

izeigerman reviewed Mar 27, 2025

View reviewed changes

georgesittas reviewed Mar 27, 2025

View reviewed changes

erindru force-pushed the erin/state-dump branch from 480d023 to 6c67823 Compare March 31, 2025 02:48

erindru changed the title ~~Feat: state dump/load~~ Feat: state import/export Mar 31, 2025

erindru commented Mar 31, 2025

View reviewed changes

Feat: state import/export

27c6a44

erindru force-pushed the erin/state-dump branch from 6c67823 to 27c6a44 Compare March 31, 2025 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: state import/export #4038

Feat: state import/export #4038

erindru commented Mar 26, 2025 •

edited

Loading

erindru Mar 26, 2025

erindru Mar 26, 2025

izeigerman Mar 27, 2025

erindru Mar 31, 2025

erindru Mar 26, 2025

izeigerman Mar 27, 2025

erindru Mar 27, 2025

erindru Mar 27, 2025

erindru Mar 31, 2025

izeigerman Mar 27, 2025

erindru Mar 31, 2025

izeigerman Mar 27, 2025 •

edited

Loading

erindru Mar 31, 2025

izeigerman Mar 27, 2025 •

edited

Loading

izeigerman Mar 27, 2025

erindru Mar 27, 2025

erindru Mar 31, 2025

georgesittas left a comment

georgesittas Mar 27, 2025

erindru Mar 27, 2025

erindru Mar 31, 2025

erindru Mar 31, 2025


		dump_state(self.state_sync, output_file, self.console)

		def load_state(self, input_file: Path, confirm: bool = True) -> None:

		@@ -0,0 +1,235 @@
		# State

		SQLMesh stores information about your project in a state database separate from your main warehouse.

Feat: state import/export #4038

Are you sure you want to change the base?

Feat: state import/export #4038

Conversation

erindru commented Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgesittas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru commented Mar 26, 2025 •

edited

Loading

izeigerman Mar 27, 2025 •

edited

Loading

izeigerman Mar 27, 2025 •

edited

Loading