Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation that converts design matrix into records #2847

Closed
xjules opened this issue Feb 4, 2022 · 5 comments
Closed

Transformation that converts design matrix into records #2847

xjules opened this issue Feb 4, 2022 · 5 comments

Comments

@xjules
Copy link
Contributor

xjules commented Feb 4, 2022

When working on doe (design of experiment) it is currently quite cumbersome to read the parameters from the design matrix file (eg. by means of an external job) and then create parameter json representation (one by one) in order to load them later on explicitly as records. Therefore it would be meaningful to have a transformation that reads such a design matrix csv file and does the parameter records automatically.

@berland
Copy link
Contributor

berland commented Feb 4, 2022

For reference, this is the corresponding code for ERT2: https://github.com/equinor/semeio/blob/main/semeio/jobs/design2params/design2params.py

@jondequinor jondequinor self-assigned this Mar 16, 2022
@jondequinor
Copy link
Contributor

jondequinor commented Mar 16, 2022

Transformation multiplexing/singleplexing

There's clearly a need for multiplexing transformations, i.e. transformations that create multiple records from one file, or vice-versa—or both. Transformations already do singleplexing with from_record and to_record. Multiplexing would introduce to_records and from_records.

This increases the complexity of the transformation API. So, to make this livable, we need very strict rules for how *plexing is dealt with.

E.g.

  • SerializationTransformation does not currently allow any multiplexing.
  • CopyTransformation is singleplexing only.
  • DesignMatrixTransformation allows multiplexing and singleplexing.
  • EclSumTransformation will only allow one-directional singleplexing, because it produces a "single" record tree from file to record only.

After configuration and creation on the DesignMatrixTransformation instance, it should be decided what kind of *plexing the instance will do. So some form of rule or heuristic would exist in the instance factory. The point is, consumers of the transformation shouldn't have to guess, and it should fail immediately if something is unclear/unsupported.

Further, I initially thought that the interface would be something like this:

async def to_record(self, root_path = Path()) -> Record:

async def to_records(self, root_path = Path()) -> RecordCollection:

but a significant usage of design matrices is to create only one group (which we can call design_matrix). This group is only part of the parameters that is to be defined—other parameters might come from a stochastic source.

So RecordCollection is a too generic, too dumb data model for this. A RecordCollection should be able to support

  • selecting a subset based on grouping (in the parameter dimension)
  • be built iteratively, meaning multiple sources can constitute a record

For DOE there's also not vectors, but scalars, so #2934 blocks this.

TBC…

@eivindjahren
Copy link
Contributor

Closing as related to ert3, which is no longer the direction taken by the project. Feel free to reopen if still relevant.

@sondreso
Copy link
Collaborator

This one relates to code in the ert.data package and is still relevant

@sondreso sondreso reopened this Sep 16, 2022
@eivindjahren eivindjahren removed the ert3 label Sep 16, 2022
@sondreso
Copy link
Collaborator

sondreso commented Feb 7, 2023

Closing in favor of #4656

@sondreso sondreso closed this as completed Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants