Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Manager Documentation Update #951

Merged
merged 38 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c7dae3c
Delete Team Migration FAQ
jspeerless Jul 18, 2024
94a0432
Badly formatted migration guide
jspeerless Jul 18, 2024
2a1deb4
Attempted Table formatting
jspeerless Jul 18, 2024
a447f08
Remove erronious Molecular Design doc
jspeerless Jul 18, 2024
9642946
Migration guide formatting
jspeerless Jul 18, 2024
9ffa18a
Get endpoints right this time
jspeerless Jul 18, 2024
4b9b811
Update top level and getting_started docs for Teams over Projects
jspeerless Jul 19, 2024
49ae83c
Update "workflows" docs
jspeerless Jul 19, 2024
076f7b4
Merge branch 'main' of https://github.com/CitrineInformatics/citrine-…
jspeerless Jul 19, 2024
10eea2f
Update version
jspeerless Jul 19, 2024
47c2949
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
5e8c23c
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
5ba2c72
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
0f946fc
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
829fec0
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
64efd1b
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
cf0ba47
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
eb78af0
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
2129368
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
cafe381
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
a557a5f
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
4a89b1b
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
64e9b29
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
823fbbf
Update docs/source/getting_started/datasets.rst
jspeerless Jul 19, 2024
dd4cb13
Merge branch 'main' of https://github.com/CitrineInformatics/citrine-…
jspeerless Jul 19, 2024
b428239
update version
jspeerless Jul 19, 2024
19a4515
Update index.rst's to reflect changes
jspeerless Jul 19, 2024
b4af156
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
902fdac
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
8ca0744
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
767f9f6
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
c1aff0f
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
0035475
Update docs/source/data_entry.rst
jspeerless Jul 19, 2024
d760980
Update docs/source/getting_started/datasets.rst
jspeerless Jul 19, 2024
934d8ae
Suggestion to dataset op
jspeerless Jul 19, 2024
a7ea178
Update version
jspeerless Jul 19, 2024
b5de4db
Update docs/source/FAQ/data_manager_migration.rst
jspeerless Jul 19, 2024
10acd1e
Merge branch 'main' of https://github.com/CitrineInformatics/citrine-…
jspeerless Jul 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions docs/source/FAQ/data_manager_migration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
=============================
Migrating to Use Data Manager
=============================

Summary
=======

This guide provides users of Citrine Python background and instructions for migrating code to
take full advantage of Data Manager features and
prepare for the future removal of endpoints that will occur with Citrine Python v4.0.

The key change will be that :py:class:`Datasets <citrine.resources.dataset.Dataset>` are now assets
of :py:class:`Teams <citrine.resources.team.Team>`,
rather than :py:class:`Projects <citrine.resources.project.Project>`.
The bulk of code changes will be migrating calls that access collections of data objects and Datasets from a Project-based method to a Team or Dataset-based method.

If you require any additional assistance migrating your Citrine Python code,
do not hesitate to reach out to your Citrine customer support team.

What’s new?
===========

Once Data Manager has been enabled on your deployment of the Citrine Platform,
the primary change that will affect Citrine Python code is that Datasets,
formerly contained within a Project, are rather assets of a Team.
In other words, Teams contain both Datasets and Projects.

Projects still contain assets such as GEMTables, Predictors, DesignSpaces, etc., but Datasets and their contents are now at the level of a Team.
Data within a Dataset (in the form of GEMD Objects, Attributes, and Templates, as well as files) are only leveraged within a Project by creating a GemTable.

After Data Manager is activated, any new Datasets created,
either via Citrine Python or the Citrine Platform web UI, will be created at a Team level,
and will not be accessible via the typical `project.<Collection>.{method}` endpoints\* .
New collections, at both the Team and Dataset level, will be available in v3.4 of Citrine Python.

\*Newly-registered Datasets can be accessible via Project-based methods if pulled into a project with `project.pull_in_resource(resource=dataset)`.
However, this is not recommended as endpoints listing data by projects and the “pull_in” endpoint for datasets will be removed in 4.0.

How does this change my code?
=============================

The change in behavior is most localized to two sets of operations on Datasets and their constituent GEMD data objects:
Sharing and Project-based Collections.

Sharing
-------

**Within a Team**

Previously, sharing a Dataset from one Project to another was a 2-step process: first publishing the Dataset to a Team, then pulling the Dataset into the new project.
Now that all Datasets are assets of teams, sharing within a team is unnecessary.
All of the `publish`, `un_publish`, and `pull_in_resource` endpoints, when applied to Datasets will undergo deprecation.
To be precise, the following calls will return a deprecation warning version for Citrine Python versions 3.4 and above, and be removed in version 4.0:

.. code-block:: python

# Publishing a Dataset to a Team will do nothing once Data Manager is activated
project.publish(resource=dataset)

# Un-publishing a Dataset will similarly be a no-op with Data Manager activated
project.un_publish(resource=dataset)

# Pulling a Team into a Project can still be done with Data Manager activated, but not
# recommended and will be deprecated
project.pull_in_resource(resource=dataset)

**Between Teams**

Sharing a Dataset from one project to another where those projects are in different Teams was a 3-step process:
publishing to the Team, sharing from one Team to another, then pulling into a Project.
With Data Manager, only the sharing action is needed.

Previous code for sharing My Dataset from Project A in Team A to eventually use in a Training Set
in Project B in Team B:

.. code-block:: python

project_a.publish(resource=my_dataset)
team_a.share(
resource=my_datset,
target_team_id=team_b.uid,
)
project_b.pull_in_resource(resource=my_dataset)

Is now:

.. code-block:: python

team_a.share(
resource=my_datset,
target_team_id=team_b.uid,
)

Project-based Collections
-------------------------

As Datasets are now assets of Teams, typical ways to `list()`, `get()`, or otherwise manipulate Datasets or data objects within a Project will undergo a deprecation cycle.
As of v3.4, these endpoints will still work as usual with a deprecation warning, but will be removed in v4.0.
It is therefore recommended to migrate your code from all project-based listing endpoints as soon as possible
to adhere to supported patterns and avoid any costly errors.

The following endpoints will return a return a deprecation warning version for Citrine Python versions 3.4 and above, and be removed in version 4.0.
Moreover, they will not reference Datasets or their contents that are registered after Data Manager has been activated:

.. code-block:: python

# Listing Datasets or their Contents (such as MaterialSpecs or ProcessTemplates) from a Project
project.datasets.list()
project.gemd.list()
project.process_runs.list()
...

# Getting Datasets or GEMD Assets via their UID and a Project
project.datasets.get(uid)
project.measurement_specs.get(uid)
...

# Doing any operations (updating, deleting, etc.) to Datasets via a Project collection
project.datasets.update()

The following new methods introduced in citrine python v3.4 are preferred:

.. code-block:: python

# Listing Datasets or their Contents
team.datasets.list()
team.gemd.list()
dataset.property_templates.list()
...

# Getting Datasets or GEMD Assets via their UID
team.datasets.get(uid)
team.ingredient_runs.get(uid)
dataset.process_specs.get(uid)
...

# Doing any operations (updating, deleting, dumping, etc.) to Datasets or GEMD Assets
team.datasets.delete(uid)
dataset.condition_templates.update(object)
...

Note again that even though these endpoints will still be operational,
registration of any new Datasets will be at a Team level and thus inaccessible via these Project-based collections,
unless “pulled in” to a specific Project in that Team.
2 changes: 1 addition & 1 deletion docs/source/FAQ/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ FAQ
:maxdepth: 2

prohibited_data_patterns
team_management_migration
v3_migration
data_manager_migration
125 changes: 0 additions & 125 deletions docs/source/FAQ/team_management_migration.rst

This file was deleted.

10 changes: 5 additions & 5 deletions docs/source/data_entry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ Equivalent behavior is available through the type-agnostic ``gemd`` collection a
dataset.register(ProcessSpec(...))

Note that registration must be performed within the scope of a dataset: the dataset into which the objects are being written.
The data model object collections that are defined with the project scope (such as `project.process_specs`) are read-only and will throw an error if their register method is called.
The data model object collections that are defined with the team scope (such as `team.process_specs`) are read-only.
Attempts to register, update, etc. via those collections will throw an error.

If you are registering several objects at the same time, you can use the ``register_all`` method that is available via the same objects:

Expand All @@ -73,20 +74,19 @@ For example:

.. code-block:: python

project.process_templates.get(LinkByUID(scope="standard-templates", id="milling"))
team.process_templates.get(LinkByUID(scope="standard-templates", id="milling"))

If you know the CitrineID, you do not need to specify a scope:

.. code-block:: python

project.process_templates.get(CitrineID)

team.process_templates.get(CitrineID)

If you don't know any of the data model object's unique identifiers, then you can list the data model objects and find your object in that list:

.. code-block:: python

project.process_templates.list()
team.process_templates.list()

These results can be further constrained by dataset:

Expand Down
4 changes: 2 additions & 2 deletions docs/source/getting_started/basic_functionality.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ Get
^^^

Get retrieves a specific resource with a known unique identifier string.
If the project ``ceramic_resistors_project`` has a dataset with an id that you have saved as ``special_dataset_id``, then you could retrieve it with:
If the team ``ceramic_resistors_team`` has a dataset with an id that you have saved as ``special_dataset_id``, then you could retrieve it with:

.. code-block:: python

ceramic_resistors_project.datasets.get(special_dataset_id)
ceramic_resistors_team.datasets.get(special_dataset_id)

List
^^^^
Expand Down
36 changes: 18 additions & 18 deletions docs/source/getting_started/code_examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,44 +21,44 @@ Assuming that your Citrine deployment is ``https://matsci.citrine-platform.com``
citrine = Citrine(API_KEY, API_SCHEME, API_HOST, API_PORT)


Create a Project and Dataset
Create a Team and Dataset
----------------------------

One of your first actions might be to create a new Project and a Dataset, which you and your collaborators can populate.
The code below creates a Project and one Dataset associated with it.
It also inspects the newly registered Project to get its unique id.
One of your first actions might be to create a new Team and a Dataset, which you and your collaborators can populate.
The code below creates a Team and one Dataset associated with it.
It also inspects the newly registered Team to get its unique id.
Note that all resources are given descriptive names and summaries.

.. code-block:: python

from citrine.resources.project import Project
from citrine.resources.team import Team
from citrine.resources.dataset import Dataset
band_gaps_project = citrine.projects.register(name="Band gaps",
band_gaps_team = citrine.teams.register(name="Band gaps",
description="Actual and DFT computed band gaps")
print("My new project has name {} and id {}".format(
band_gaps_project.name, band_gaps_project.uid))
print("My new team has name {} and id {}".format(
band_gaps_team.name, band_gaps_team.uid))

Strehlow_Cook_description = "Band gaps for elemental and binary " \
"semiconductors with phase and temperature of measurement. DOI 10.1063/1.3253115"
Strehlow_Cook_dataset = Dataset(name="Strehlow and Cook",
summary="Strehlow and Cook band gaps", description=Strehlow_Cook_description)
Strehlow_Cook_dataset = band_gaps_project.datasets.register(Strehlow_Cook_dataset)
Strehlow_Cook_dataset = band_gaps_team.datasets.register(Strehlow_Cook_dataset)

Find an existing Project and Dataset
Find an existing Team and Dataset
------------------------------------

Often you will work with existing resources.
The code below retrieves a Project with the name "Copper oxides project" and a datset with a known unique id that is stored as ``dataset_A_uid``.
The code below retrieves a Team with the name "Copper oxides team" and a dataset with a known unique id that is stored as ``dataset_A_uid``.
For more information on retrieving resources, see :ref:`Reading Resources <functionality_reading_label>`.

.. code-block:: python

project_name = "Copper oxides project"
all_projects = citrine.projects.list()
copper_oxides_project = next((project for project in all_projects
if project.name == project_name), None)
assert copper_oxides_project is not None
dataset_A = copper_oxides_project.datasets.get(uid=dataset_A_uid)
team_name = "Copper oxides team"
all_teams = citrine.teams.list()
copper_oxides_team = next((team for team in all_teams
if team.name == team_name), None)
assert copper_oxides_team is not None
dataset_A = copper_oxides_team.datasets.get(uid=dataset_A_uid)

Find a template
---------------
Expand All @@ -69,7 +69,7 @@ The example below searches for a process template with the tag "Oven_17" and ass

.. code-block:: python

firing_templates = list(band_gaps_project.process_templates.list_by_tag(tag="Oven_17"))
firing_templates = list(band_gaps_team.process_templates.list_by_tag(tag="Oven_17"))
assert len(firing_templates) == 1
firing_template_17 = firing_templates[0]

Expand Down
Loading