|
| 1 | +============================= |
| 2 | +Migrating to Use Data Manager |
| 3 | +============================= |
| 4 | + |
| 5 | +Summary |
| 6 | +======= |
| 7 | + |
| 8 | +This guide provides users of Citrine Python background and instructions for migrating code to |
| 9 | +take full advantage of Data Manager features and |
| 10 | +prepare for the future removal of endpoints that will occur with Citrine Python v4.0. |
| 11 | + |
| 12 | +The key change will be that :py:class:`Datasets <citrine.resources.dataset.Dataset>` are now assets |
| 13 | +of :py:class:`Teams <citrine.resources.team.Team>`, |
| 14 | +rather than :py:class:`Projects <citrine.resources.project.Project>`. |
| 15 | +The bulk of code changes will be migrating calls that access collections of data objects and Datasets from a Project-based method to a Team or Dataset-based method. |
| 16 | + |
| 17 | +If you require any additional assistance migrating your Citrine Python code, |
| 18 | +do not hesitate to reach out to your Citrine customer support team. |
| 19 | + |
| 20 | +What’s new? |
| 21 | +=========== |
| 22 | + |
| 23 | +Once Data Manager has been enabled on your deployment of the Citrine Platform, |
| 24 | +the primary change that will affect Citrine Python code is that Datasets, |
| 25 | +formerly contained within a Project, are rather assets of a Team. |
| 26 | +In other words, Teams contain both Datasets and Projects. |
| 27 | + |
| 28 | +Projects still contain assets such as GEMTables, Predictors, DesignSpaces, etc., but Datasets and their contents are now at the level of a Team. |
| 29 | +Data within a Dataset (in the form of GEMD Objects, Attributes, and Templates, as well as files) are only leveraged within a Project by creating a GemTable. |
| 30 | + |
| 31 | +After Data Manager is activated, any new Datasets created, |
| 32 | +either via Citrine Python or the Citrine Platform web UI, will be created at a Team level, |
| 33 | +and will not be accessible via the typical `project.<Collection>.{method}` endpoints\* . |
| 34 | +New collections, at both the Team and Dataset level, will be available in v3.4 of Citrine Python. |
| 35 | + |
| 36 | +\*Newly-registered Datasets can be accessible via Project-based methods if pulled into a project with `project.pull_in_resource(resource=dataset)`. |
| 37 | +However, this is not recommended as endpoints listing data by projects and the “pull_in” endpoint for datasets will be removed in 4.0. |
| 38 | + |
| 39 | +How does this change my code? |
| 40 | +============================= |
| 41 | + |
| 42 | +The change in behavior is most localized to two sets of operations on Datasets and their constituent GEMD data objects: |
| 43 | +Sharing and Project-based Collections. |
| 44 | + |
| 45 | +Sharing |
| 46 | +------- |
| 47 | + |
| 48 | +**Within a Team** |
| 49 | + |
| 50 | +Previously, sharing a Dataset from one Project to another was a 2-step process: first publishing the Dataset to a Team, then pulling the Dataset into the new project. |
| 51 | +Now that all Datasets are assets of teams, sharing within a team is unnecessary. |
| 52 | +All of the `publish`, `un_publish`, and `pull_in_resource` endpoints, when applied to Datasets will undergo deprecation. |
| 53 | +To be precise, the following calls will return a deprecation warning version for Citrine Python versions 3.4 and above, and be removed in version 4.0: |
| 54 | + |
| 55 | +.. code-block:: python |
| 56 | +
|
| 57 | + # Publishing a Dataset to a Team will do nothing once Data Manager is activated |
| 58 | + project.publish(resource=dataset) |
| 59 | +
|
| 60 | + # Un-publishing a Dataset will similarly be a no-op with Data Manager activated |
| 61 | + project.un_publish(resource=dataset) |
| 62 | +
|
| 63 | + # Pulling a Team into a Project can still be done with Data Manager activated, but not |
| 64 | + # recommended and will be deprecated |
| 65 | + project.pull_in_resource(resource=dataset) |
| 66 | +
|
| 67 | +**Between Teams** |
| 68 | + |
| 69 | +Sharing a Dataset from one project to another where those projects are in different Teams was a 3-step process: |
| 70 | +publishing to the Team, sharing from one Team to another, then pulling into a Project. |
| 71 | +With Data Manager, only the sharing action is needed. |
| 72 | + |
| 73 | +Previous code for sharing My Dataset from Project A in Team A to eventually use in a Training Set |
| 74 | +in Project B in Team B: |
| 75 | + |
| 76 | +.. code-block:: python |
| 77 | +
|
| 78 | + project_a.publish(resource=my_dataset) |
| 79 | + team_a.share( |
| 80 | + resource=my_datset, |
| 81 | + target_team_id=team_b.uid, |
| 82 | + ) |
| 83 | + project_b.pull_in_resource(resource=my_dataset) |
| 84 | +
|
| 85 | +Is now: |
| 86 | + |
| 87 | +.. code-block:: python |
| 88 | +
|
| 89 | + team_a.share( |
| 90 | + resource=my_datset, |
| 91 | + target_team_id=team_b.uid, |
| 92 | + ) |
| 93 | +
|
| 94 | +Project-based Collections |
| 95 | +------------------------- |
| 96 | + |
| 97 | +As Datasets are now assets of Teams, typical ways to `list()`, `get()`, or otherwise manipulate Datasets or data objects within a Project will undergo a deprecation cycle. |
| 98 | +As of v3.4, these endpoints will still work as usual with a deprecation warning, but will be removed in v4.0. |
| 99 | +It is therefore recommended to migrate your code from all project-based listing endpoints as soon as possible |
| 100 | +to adhere to supported patterns and avoid any costly errors. |
| 101 | + |
| 102 | +The following endpoints will return a return a deprecation warning version for Citrine Python versions 3.4 and above, and be removed in version 4.0. |
| 103 | +Moreover, they will not reference Datasets or their contents that are registered after Data Manager has been activated: |
| 104 | + |
| 105 | +.. code-block:: python |
| 106 | +
|
| 107 | + # Listing Datasets or their Contents (such as MaterialSpecs or ProcessTemplates) from a Project |
| 108 | + project.datasets.list() |
| 109 | + project.gemd.list() |
| 110 | + project.process_runs.list() |
| 111 | + ... |
| 112 | +
|
| 113 | + # Getting Datasets or GEMD Assets via their UID and a Project |
| 114 | + project.datasets.get(uid) |
| 115 | + project.measurement_specs.get(uid) |
| 116 | + ... |
| 117 | +
|
| 118 | + # Doing any operations (updating, deleting, etc.) to Datasets via a Project collection |
| 119 | + project.datasets.update() |
| 120 | +
|
| 121 | +The following new methods introduced in citrine python v3.4 are preferred: |
| 122 | + |
| 123 | +.. code-block:: python |
| 124 | +
|
| 125 | + # Listing Datasets or their Contents |
| 126 | + team.datasets.list() |
| 127 | + team.gemd.list() |
| 128 | + dataset.property_templates.list() |
| 129 | + ... |
| 130 | +
|
| 131 | + # Getting Datasets or GEMD Assets via their UID |
| 132 | + team.datasets.get(uid) |
| 133 | + team.ingredient_runs.get(uid) |
| 134 | + dataset.process_specs.get(uid) |
| 135 | + ... |
| 136 | +
|
| 137 | + # Doing any operations (updating, deleting, dumping, etc.) to Datasets or GEMD Assets |
| 138 | + team.datasets.delete(uid) |
| 139 | + dataset.condition_templates.update(object) |
| 140 | + ... |
| 141 | +
|
| 142 | +Note again that even though these endpoints will still be operational, |
| 143 | +registration of any new Datasets will be at a Team level and thus inaccessible via these Project-based collections, |
| 144 | +unless “pulled in” to a specific Project in that Team. |
0 commit comments