Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
MrCsabaToth committed Oct 9, 2023
1 parent acd0280 commit aaa63a6
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
10 changes: 5 additions & 5 deletions workshops/tfx-caip-tf23/lab-01-tfx-walkthrough/labs/lab-01.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"\n",
"In this notebook, you will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features.\n",
"\n",
"You will utilize **TFX Interactive Context** to work with the TFX components interactivelly in a Jupyter notebook environment. Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts. In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells. In a production deployment, ML Metadata will be managed in a scalabe database like MySQL, and artifacts in apersistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.\n",
"You will utilize **TFX Interactive Context** to work with the TFX components interactively in a Jupyter notebook environment. Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts. In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells. In a production deployment, ML Metadata will be managed in a scalable database like MySQL, and artifacts in a persistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.\n",
"\n",
"**Setup Note**:\n",
"\n",
Expand Down Expand Up @@ -335,7 +335,7 @@
"source": [
"## Generating statistics using StatisticsGen\n",
"\n",
"The `StatisticsGen` component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generate statistics for each split in the `ExampleGen` component's output. In our case there two splits: `train` and `eval`.\n",
"The `StatisticsGen` component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generates statistics for each split in the `ExampleGen` component's output. In our case there two splits: `train` and `eval`.\n",
"\n",
"<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/StatisticsGen.png width=\"200\">"
]
Expand Down Expand Up @@ -388,7 +388,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Infering data schema using SchemaGen\n",
"## Inferring data schema using SchemaGen\n",
"\n",
"Some TFX components use a description input data called a schema. The schema is an instance of `schema.proto`. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. `SchemaGen` automatically generates the schema by inferring types, categories, and ranges from data statistics. The auto-generated schema is best-effort and only tries to infer basic properties of the data. It is expected that developers review and modify it as needed. `SchemaGen` uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started).\n",
"\n",
Expand Down Expand Up @@ -676,7 +676,7 @@
"\n",
"### Define the pre-processing module\n",
"\n",
"To configure `Trainsform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.\n",
"To configure `Transform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.\n",
"\n",
"In most cases, your implementation of the `preprocessing_fn` makes extensive use of [TensorFlow Transform](https://www.tensorflow.org/tfx/guide/tft) for performing feature engineering on your dataset."
]
Expand Down Expand Up @@ -1343,7 +1343,7 @@
"source": [
"## Next steps\n",
"\n",
"This concludes your introductory walthrough through TFX pipeline components. In the lab, you used TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features. You utilized a TFX Interactive Context for prototype development of a TFX pipeline directly in a Jupyter notebook. Next, you worked with the TFDV library to modify your dataset schema to add feature constraints to catch data anamolies that can negatively impact your model's performance. You utilized TFT library for feature proprocessing for consistent feature transformations for your model at training and serving time. Lastly, using the TFMA library, you added model performance constraints to ensure you only push more accurate models than previous runs to production.\n",
"This concludes your introductory walkthrough through TFX pipeline components. In the lab, you used TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features. You utilized a TFX Interactive Context for prototype development of a TFX pipeline directly in a Jupyter notebook. Next, you worked with the TFDV library to modify your dataset schema to add feature constraints to catch data anomalies that can negatively impact your model's performance. You utilized TFT library for feature preprocessing for consistent feature transformations for your model at training and serving time. Lastly, using the TFMA library, you added model performance constraints to ensure you only push more accurate models than previous runs to production.\n",
"\n",
"\n",
"The next labs in the series will guide through developing a TFX pipeline, deploying and running the pipeline on **AI Platform Pipelines** and automating the pipeline build and deployment processes with **Cloud Build**."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"\n",
"In this notebook, you will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features.\n",
"\n",
"You will utilize **TFX Interactive Context** to work with the TFX components interactivelly in a Jupyter notebook environment. Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts. In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells. In a production deployment, ML Metadata will be managed in a scalabe database like MySQL, and artifacts in apersistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.\n",
"You will utilize **TFX Interactive Context** to work with the TFX components interactively in a Jupyter notebook environment. Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts. In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells. In a production deployment, ML Metadata will be managed in a scalable database like MySQL, and artifacts in a persistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.\n",
"\n",
"**Setup Note**:\n",
"\n",
Expand Down Expand Up @@ -337,7 +337,7 @@
"source": [
"## Generating statistics using StatisticsGen\n",
"\n",
"The `StatisticsGen` component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generate statistics for each split in the `ExampleGen` component's output. In our case there two splits: `train` and `eval`.\n",
"The `StatisticsGen` component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generates statistics for each split in the `ExampleGen` component's output. In our case there two splits: `train` and `eval`.\n",
"\n",
"<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/StatisticsGen.png width=\"200\">"
]
Expand Down Expand Up @@ -390,7 +390,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Infering data schema using SchemaGen\n",
"## Inferring data schema using SchemaGen\n",
"\n",
"Some TFX components use a description input data called a schema. The schema is an instance of `schema.proto`. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. `SchemaGen` automatically generates the schema by inferring types, categories, and ranges from data statistics. The auto-generated schema is best-effort and only tries to infer basic properties of the data. It is expected that developers review and modify it as needed. `SchemaGen` uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started).\n",
"\n",
Expand Down Expand Up @@ -684,7 +684,7 @@
"\n",
"### Define the pre-processing module\n",
"\n",
"To configure `Trainsform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.\n",
"To configure `Transform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.\n",
"\n",
"In most cases, your implementation of the `preprocessing_fn` makes extensive use of [TensorFlow Transform](https://www.tensorflow.org/tfx/guide/tft) for performing feature engineering on your dataset."
]
Expand Down Expand Up @@ -1362,7 +1362,7 @@
"source": [
"## Next steps\n",
"\n",
"This concludes your introductory walthrough through TFX pipeline components. In the lab, you used TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features. You utilized a TFX Interactive Context for prototype development of a TFX pipeline directly in a Jupyter notebook. Next, you worked with the TFDV library to modify your dataset schema to add feature constraints to catch data anamolies that can negatively impact your model's performance. You utilized TFT library for feature proprocessing for consistent feature transformations for your model at training and serving time. Lastly, using the TFMA library, you added model performance constraints to ensure you only push more accurate models than previous runs to production.\n",
"This concludes your introductory walkthrough through TFX pipeline components. In the lab, you used TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features. You utilized a TFX Interactive Context for prototype development of a TFX pipeline directly in a Jupyter notebook. Next, you worked with the TFDV library to modify your dataset schema to add feature constraints to catch data anomalies that can negatively impact your model's performance. You utilized TFT library for feature preprocessing for consistent feature transformations for your model at training and serving time. Lastly, using the TFMA library, you added model performance constraints to ensure you only push more accurate models than previous runs to production.\n",
"\n",
"\n",
"The next labs in the series will guide through developing a TFX pipeline, deploying and running the pipeline on **AI Platform Pipelines** and automating the pipeline build and deployment processes with **Cloud Build**."
Expand Down

0 comments on commit aaa63a6

Please sign in to comment.