Skip to content

Commit

Permalink
added SHAP section; pending plot
Browse files Browse the repository at this point in the history
  • Loading branch information
Shpaner authored and Shpaner committed Nov 22, 2024
1 parent 6cfa71e commit d1c5d1c
Show file tree
Hide file tree
Showing 7 changed files with 221 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ ENV/
env.bak/
venv.bak/
eda/
venvpy_311

# Spyder project settings
.spyderproject
Expand Down
73 changes: 73 additions & 0 deletions docs/_sources/usage_guide.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1313,6 +1313,79 @@ Return Metrics (Optional)
weighted avg 0.98 0.98 0.98 200
--------------------------------------------------------------------------------
SHAP (SHapley Additive exPlanations)
---------------------------------------

This example demonstrates how to compute and visualize SHAP (SHapley Additive exPlanations)
values for a machine learning model with a pipeline that includes feature selection.
SHAP values provide insights into how individual features contribute to the predictions of a model.

**Steps**

1. The dataset is transformed through the model's feature selection pipeline to ensure only the selected features are used for SHAP analysis.

2. The final model (e.g., ``XGBoost`` classifier) is retrieved from the custom Model object. This is required because SHAP operates on the underlying model, not the pipeline.

3. SHAP's ``TreeExplainer`` is used to explain the predictions of the XGBoost classifier.

4. SHAP values are calculated for the transformed dataset to quantify the contribution of each feature to the predictions.

5. A summary plot is generated to visualize the impact of each feature across all data points.


Step 1: Transform the test data using the feature selection pipeline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## The pipeline applies preprocessing (e.g., imputation, scaling) and feature
## selection (RFE) to X_test
X_test_transformed = model_xgb.get_feature_selection_pipeline().transform(X_test)
Step 2: Retrieve the trained XGBoost classifier from the pipeline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## The last estimator in the pipeline is the XGBoost model
xgb_classifier = model_xgb.estimator[-1]
Step 3: Extract feature names from the training data, and initialize the SHAP explainer for the XGBoost classifier
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


.. code-block:: python
## Import SHAP for model explainability
import shap
## Feature names are required for interpretability in SHAP plots
feature_names = X_train.columns.to_list()
## Initialize the SHAP explainer with the model
explainer = shap.TreeExplainer(xgb_classifier)
Step 4: Compute SHAP values for the transformed test dataset and generate a summary plot of SHAP values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## Compute SHAP values for the transformed dataset
shap_values = explainer.shap_values(X_test_transformed)
Step 5: Generate a summary plot of SHAP values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## Plot SHAP values
## Summary plot of SHAP values for all features across all data points
shap.summary_plot(shap_values, X_test_transformed, feature_names=feature_names,)
.. _Regression:

Regression
Expand Down
8 changes: 8 additions & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,14 @@ <h1>Model Tuner Documentation<a class="headerlink" href="#model-tuner-documentat
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="usage_guide.html#shap-shapley-additive-explanations">SHAP (SHapley Additive exPlanations)</a><ul>
<li class="toctree-l3"><a class="reference internal" href="usage_guide.html#step-1-transform-the-test-data-using-the-feature-selection-pipeline">Step 1: Transform the test data using the feature selection pipeline</a></li>
<li class="toctree-l3"><a class="reference internal" href="usage_guide.html#step-2-retrieve-the-trained-xgboost-classifier-from-the-pipeline">Step 2: Retrieve the trained XGBoost classifier from the pipeline</a></li>
<li class="toctree-l3"><a class="reference internal" href="usage_guide.html#step-3-extract-feature-names-from-the-training-data-and-initialize-the-shap-explainer-for-the-xgboost-classifier">Step 3: Extract feature names from the training data, and initialize the SHAP explainer for the XGBoost classifier</a></li>
<li class="toctree-l3"><a class="reference internal" href="usage_guide.html#step-4-compute-shap-values-for-the-transformed-test-dataset-and-generate-a-summary-plot-of-shap-values">Step 4: Compute SHAP values for the transformed test dataset and generate a summary plot of SHAP values</a></li>
<li class="toctree-l3"><a class="reference internal" href="usage_guide.html#step-5-generate-a-summary-plot-of-shap-values">Step 5: Generate a summary plot of SHAP values</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="usage_guide.html#regression">Regression</a><ul>
Expand Down
Binary file modified docs/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions docs/usage_guide.html
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,14 @@
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#shap-shapley-additive-explanations">SHAP (SHapley Additive exPlanations)</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#step-1-transform-the-test-data-using-the-feature-selection-pipeline">Step 1: Transform the test data using the feature selection pipeline</a></li>
<li class="toctree-l3"><a class="reference internal" href="#step-2-retrieve-the-trained-xgboost-classifier-from-the-pipeline">Step 2: Retrieve the trained XGBoost classifier from the pipeline</a></li>
<li class="toctree-l3"><a class="reference internal" href="#step-3-extract-feature-names-from-the-training-data-and-initialize-the-shap-explainer-for-the-xgboost-classifier">Step 3: Extract feature names from the training data, and initialize the SHAP explainer for the XGBoost classifier</a></li>
<li class="toctree-l3"><a class="reference internal" href="#step-4-compute-shap-values-for-the-transformed-test-dataset-and-generate-a-summary-plot-of-shap-values">Step 4: Compute SHAP values for the transformed test dataset and generate a summary plot of SHAP values</a></li>
<li class="toctree-l3"><a class="reference internal" href="#step-5-generate-a-summary-plot-of-shap-values">Step 5: Generate a summary plot of SHAP values</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="#regression">Regression</a><ul>
Expand Down Expand Up @@ -1319,6 +1327,63 @@ <h4>Return Metrics (Optional)<a class="headerlink" href="#return-metrics-optiona
</section>
</section>
</section>
<section id="shap-shapley-additive-explanations">
<h2>SHAP (SHapley Additive exPlanations)<a class="headerlink" href="#shap-shapley-additive-explanations" title="Link to this heading"></a></h2>
<p>This example demonstrates how to compute and visualize SHAP (SHapley Additive exPlanations)
values for a machine learning model with a pipeline that includes feature selection.
SHAP values provide insights into how individual features contribute to the predictions of a model.</p>
<p><strong>Steps</strong></p>
<ol class="arabic simple">
<li><p>The dataset is transformed through the model’s feature selection pipeline to ensure only the selected features are used for SHAP analysis.</p></li>
<li><p>The final model (e.g., <code class="docutils literal notranslate"><span class="pre">XGBoost</span></code> classifier) is retrieved from the custom Model object. This is required because SHAP operates on the underlying model, not the pipeline.</p></li>
<li><p>SHAP’s <code class="docutils literal notranslate"><span class="pre">TreeExplainer</span></code> is used to explain the predictions of the XGBoost classifier.</p></li>
<li><p>SHAP values are calculated for the transformed dataset to quantify the contribution of each feature to the predictions.</p></li>
<li><p>A summary plot is generated to visualize the impact of each feature across all data points.</p></li>
</ol>
<section id="step-1-transform-the-test-data-using-the-feature-selection-pipeline">
<h3>Step 1: Transform the test data using the feature selection pipeline<a class="headerlink" href="#step-1-transform-the-test-data-using-the-feature-selection-pipeline" title="Link to this heading"></a></h3>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1">## The pipeline applies preprocessing (e.g., imputation, scaling) and feature</span>
<span class="c1">## selection (RFE) to X_test</span>
<span class="n">X_test_transformed</span> <span class="o">=</span> <span class="n">model_xgb</span><span class="o">.</span><span class="n">get_feature_selection_pipeline</span><span class="p">()</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="step-2-retrieve-the-trained-xgboost-classifier-from-the-pipeline">
<h3>Step 2: Retrieve the trained XGBoost classifier from the pipeline<a class="headerlink" href="#step-2-retrieve-the-trained-xgboost-classifier-from-the-pipeline" title="Link to this heading"></a></h3>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1">## The last estimator in the pipeline is the XGBoost model</span>
<span class="n">xgb_classifier</span> <span class="o">=</span> <span class="n">model_xgb</span><span class="o">.</span><span class="n">estimator</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
</section>
<section id="step-3-extract-feature-names-from-the-training-data-and-initialize-the-shap-explainer-for-the-xgboost-classifier">
<h3>Step 3: Extract feature names from the training data, and initialize the SHAP explainer for the XGBoost classifier<a class="headerlink" href="#step-3-extract-feature-names-from-the-training-data-and-initialize-the-shap-explainer-for-the-xgboost-classifier" title="Link to this heading"></a></h3>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1">## Import SHAP for model explainability</span>
<span class="kn">import</span> <span class="nn">shap</span>

<span class="c1">## Feature names are required for interpretability in SHAP plots</span>
<span class="n">feature_names</span> <span class="o">=</span> <span class="n">X_train</span><span class="o">.</span><span class="n">columns</span><span class="o">.</span><span class="n">to_list</span><span class="p">()</span>

<span class="c1">## Initialize the SHAP explainer with the model</span>
<span class="n">explainer</span> <span class="o">=</span> <span class="n">shap</span><span class="o">.</span><span class="n">TreeExplainer</span><span class="p">(</span><span class="n">xgb_classifier</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="step-4-compute-shap-values-for-the-transformed-test-dataset-and-generate-a-summary-plot-of-shap-values">
<h3>Step 4: Compute SHAP values for the transformed test dataset and generate a summary plot of SHAP values<a class="headerlink" href="#step-4-compute-shap-values-for-the-transformed-test-dataset-and-generate-a-summary-plot-of-shap-values" title="Link to this heading"></a></h3>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1">## Compute SHAP values for the transformed dataset</span>
<span class="n">shap_values</span> <span class="o">=</span> <span class="n">explainer</span><span class="o">.</span><span class="n">shap_values</span><span class="p">(</span><span class="n">X_test_transformed</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="step-5-generate-a-summary-plot-of-shap-values">
<h3>Step 5: Generate a summary plot of SHAP values<a class="headerlink" href="#step-5-generate-a-summary-plot-of-shap-values" title="Link to this heading"></a></h3>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1">## Plot SHAP values</span>
<span class="c1">## Summary plot of SHAP values for all features across all data points</span>
<span class="n">shap</span><span class="o">.</span><span class="n">summary_plot</span><span class="p">(</span><span class="n">shap_values</span><span class="p">,</span> <span class="n">X_test_transformed</span><span class="p">,</span> <span class="n">feature_names</span><span class="o">=</span><span class="n">feature_names</span><span class="p">,)</span>
</pre></div>
</div>
</section>
</section>
</section>
<section id="regression">
<span id="id1"></span><h1>Regression<a class="headerlink" href="#regression" title="Link to this heading"></a></h1>
Expand Down
73 changes: 73 additions & 0 deletions source/usage_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1313,6 +1313,79 @@ Return Metrics (Optional)
weighted avg 0.98 0.98 0.98 200
--------------------------------------------------------------------------------
SHAP (SHapley Additive exPlanations)
---------------------------------------

This example demonstrates how to compute and visualize SHAP (SHapley Additive exPlanations)
values for a machine learning model with a pipeline that includes feature selection.
SHAP values provide insights into how individual features contribute to the predictions of a model.

**Steps**

1. The dataset is transformed through the model's feature selection pipeline to ensure only the selected features are used for SHAP analysis.

2. The final model (e.g., ``XGBoost`` classifier) is retrieved from the custom Model object. This is required because SHAP operates on the underlying model, not the pipeline.

3. SHAP's ``TreeExplainer`` is used to explain the predictions of the XGBoost classifier.

4. SHAP values are calculated for the transformed dataset to quantify the contribution of each feature to the predictions.

5. A summary plot is generated to visualize the impact of each feature across all data points.


Step 1: Transform the test data using the feature selection pipeline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## The pipeline applies preprocessing (e.g., imputation, scaling) and feature
## selection (RFE) to X_test
X_test_transformed = model_xgb.get_feature_selection_pipeline().transform(X_test)
Step 2: Retrieve the trained XGBoost classifier from the pipeline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## The last estimator in the pipeline is the XGBoost model
xgb_classifier = model_xgb.estimator[-1]
Step 3: Extract feature names from the training data, and initialize the SHAP explainer for the XGBoost classifier
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


.. code-block:: python
## Import SHAP for model explainability
import shap
## Feature names are required for interpretability in SHAP plots
feature_names = X_train.columns.to_list()
## Initialize the SHAP explainer with the model
explainer = shap.TreeExplainer(xgb_classifier)
Step 4: Compute SHAP values for the transformed test dataset and generate a summary plot of SHAP values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## Compute SHAP values for the transformed dataset
shap_values = explainer.shap_values(X_test_transformed)
Step 5: Generate a summary plot of SHAP values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python
## Plot SHAP values
## Summary plot of SHAP values for all features across all data points
shap.summary_plot(shap_values, X_test_transformed, feature_names=feature_names,)
.. _Regression:

Regression
Expand Down

0 comments on commit d1c5d1c

Please sign in to comment.