Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Restructure Defra codebase #3447

Open
AndrewSisley opened this issue Feb 13, 2025 · 0 comments
Open

[EPIC] Restructure Defra codebase #3447

AndrewSisley opened this issue Feb 13, 2025 · 0 comments
Assignees
Labels
epic Includes sub-issues refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Milestone

Comments

@AndrewSisley
Copy link
Contributor

AndrewSisley commented Feb 13, 2025

Summary

This ticket proposes a reorganisation of the Defra repository into multiple access-layers in order to achieve a clean separation of concerns, for both our users, and internal developers.

Each access layer represents a level at which users can interact with the database in a consistent way, for example, users may wish to embed the view layer within their mobile application, or use a lightweight read-only db to record data within a wind turbine.

Each layer will be tested independently using the shared language of the action and multiplier libraries (concept to be ported from the corekv repository, multiplier is explained here) - allowing us to keep the testing of each layer focused and decoupled whilst making them simple to read for maintainers usually focused on other layers.

This proposal does not propose that we implement all of the suggestions at once, it only seeks to lay out a long-term goal to move towards.

Existing problem space

Currently, we try and force all clients to implement the same, very large, client interfaces.

This was an optimistic way of forcing consistency and making testing simpler and consistent. It was semi-successful, but resulted in unsatisfiable interface elements (e.g. http.Blockstore can never realistically be implemented), and a very complicated test framework that is fairly wasteful in it's use of the existing CI-multiplier-matrix in a way that will likely become intolerable in the near future.

The forced enforcement of the client interfaces and it's consumption in the existing test framework makes required divergences unexpected and painful to test. When they are tested, and they are often not, because of the lack of framework support, the tests typically read very differently to the majority of Defra tests, making maintenance harder for less-familiar developers.

An additional problem is the spaghettiness encouraged by the current lack of structure. Outside of the http and cli packages, everything is lumped into one blob. Types are unspecialised, and often need to fulfil multiple roles, exposing internals of some areas to areas that have no use for them. The layering that does exist is often blurred by these cross-layer-objects, often defined in the user-public client package.

These problems will likely significantly worsen if/when the number of contributors expands, as is planned for later this year.

Proposal

The Defra repository is to be broken up into distinct layers. Each layer should be assumed to be consumable by external users, other, higher-level, layers within the repository should be seen as internal users of the layer.

The independence of each layer should be prioritised. It is possible that in the future they will be owned by different teams. Shared dependencies between layers should be minimised and extracted from the repository (e.g. errors and telemetry).

Long term this will permit, if desired, for each layer to be independently versioned/released, allowing feature support to be added layer-by layer. It will also allow each layer/team to be somewhat language independent.

Each layer will have it's own distinct set of integration and benchmark tests. The only permitted coupling between each test-layer should be the action and multiplier packages, which should be mandatory in order to allow maintainers (and users) from different layers to readily read all Defra tests. The concrete action and multiplier implementations should not be shared (e.g. a CreateDoc action likely needs to be independently implemented for the node, view and client layers). This decoupling will allow for much better scaling of the multiplier system, as there is no reason to mix a secondary-index multiplier with a p2p or gql multiplier.

The interfaces at each layer should be broken up. This will allow independent implementation of specialist vertical slices (e.g. a write-focused P2P node for recording and broadcasting data from a wind turbine).

discussion: It has been suggested that no types should be shared between layers, breaking as many direct dependencies as possible (constructors being the possible exception). This would maximise the decoupling between layers, but requiring additional boilerplate and misdirection.

The layers

4 or 5 layers initially within the Defra repository are proposed, with an additional, lowest layer, corekv, already existing outside.

Each layer owns any storage it is responsible for, for example the Datastore is owned, and internal to, the view layer. Long term, this opens up new opportunities for store-type independence, such as using a local file-persisted view cache alongside a global in-memory p2p/db store.

Higher level layers are free to re-export parts of other lower layers should they chose to, however they are not bound to reuse the lower level types and interfaces when they do so.

Client

The client layer holds the http and cli, and any future similar layers.

This is likely the easiest layer to conform to the proposal, as they are already fairly well separated, it would mostly be a case or moving some directories and adding tests.

discussion: I am leaning towards this being two layers, not one, as cli sits on top of the http layer and will differ slightly from it. I suggest this long-term decision is made primarily by the dev-ex team.

Node

High-level layer containing stuff largely pertaining to p2p or untyped access (e.g. gql/sql/etc). Description is a little vague, but but basically contains anything that not in the other layers.

View

The view layer contains everything required for local, core, embedded, human-readable, access to the data stored in a Defra database.

Separating this from db will probably require the updating of the datastore to be driven by the db.event system, as otherwise concurrent writes made via the db package will result in a partial datastore.

  • Collection definitions, but not schemas, which are instead owned by db
  • ACP, as it is required to decrypt docs, particularly during query execution
  • Secondary indexes
  • Views
  • Lens
  • The datastore (completely owned and managed by this layer)
  • The query engine, the mapper output types being defined in this layer, but created in node.

Db

The lowest layer currently held within the Defra repository, this layer is responsible for the management of global data - documents and schemas, via CRDTs.

This layer is not responsible for presenting the stored data in a human readable format. For example, it is not responsible for encrypting/decrypting the data stored within it.

Events will be broadcast via the events system as they currently are, driving p2p and view.

Example directory structure

Example directory structure (non-exhaustive):

./client/
./client/http/
./client/http/playground/
./client/http/test/
./client/cli/
./client/internal/
./client/example_interfaces.go      // declaring stuff that all sibling directories implement (e.g. cli/http)
./client/dependant_interfaces.go    // declaring shared dependencies (e.g. a duplication of ./node/example_interfaces.go)
./client/test/
./client/docs/
./node/node.go
./node/p2p/
./node/gql/
./node/keyring/
./node/internal/
./node/example_interfaces.go         // these of course don't need to be all in one file
./node/dependant_interfaces.go
./node/test/
./node/test/multiplier/    // 'node' specific multipliers, acp implementations
./view/db.go              // not all in one file, not necessarily one type either
./view/collection/
./view/lens/
./view/index/
./view/acp/local/
./view/acp/sourcehub/
./view/normal_value/
./view/datastore/
./view/internal/planner/
./view/example_interfaces.go
./view/dependant_interfaces.go
./view/test/
./view/test/multiplier/    // 'view' specific multipliers, such as secondary indexes, views, transactions
./db/schema/
./db/crdt/
./db/block/
./db/event/
./db/test/
@AndrewSisley AndrewSisley added the epic Includes sub-issues label Feb 13, 2025
@AndrewSisley AndrewSisley self-assigned this Feb 18, 2025
@AndrewSisley AndrewSisley added this to the DefraDB v0.16 milestone Feb 18, 2025
@AndrewSisley AndrewSisley added the refactor This issue specific to or requires *notable* refactoring of existing codebases and components label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Includes sub-issues refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

No branches or pull requests

1 participant