diff --git a/assets/scatter_plots_all_grid.png b/assets/scatter_plots_grid.png similarity index 100% rename from assets/scatter_plots_all_grid.png rename to assets/scatter_plots_grid.png diff --git a/docs/_images/scatter_plots_grid.png b/docs/_images/scatter_plots_grid.png index 5a51facd8..78652ac74 100644 Binary files a/docs/_images/scatter_plots_grid.png and b/docs/_images/scatter_plots_grid.png differ diff --git a/docs/_sources/changelog.rst.txt b/docs/_sources/changelog.rst.txt index e2a4f46ee..81e95ca2a 100644 --- a/docs/_sources/changelog.rst.txt +++ b/docs/_sources/changelog.rst.txt @@ -24,6 +24,63 @@ Changelog ========= +`Version 0.0.14`_ +---------------------- + +.. _Version 0.0.14: https://lshpaner.github.io/eda_toolkit/v0.0.14/index.html + +Ensure Crosstabs Dictionary is Populated with ``return_dict=True`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This resolves the issue where the ``stacked_crosstab_plot`` function fails to +populate and return the crosstabs dictionary (``crosstabs_dict``) when +``return_dict=True`` and ``output="plots_only"``. The fix ensures that crosstabs +are always generated when ``return_dict=True``, regardless of the output parameter. + +- Always Generate Crosstabs with ``return_dict=True``: + + - Added logic to ensure crosstabs are created and populated in ``crosstabs_dict`` whenever ``return_dict=True``, even if the output parameter is set to ``"plots_only"``. + +- Separation of Crosstabs Display from Generation: + + - The generation of crosstabs is now independent of the output parameter. + - Crosstabs display (``print``) occurs only when output includes ``"both"`` or ``"crosstabs_only"``. + +Enhancements and Fixes for ``scatter_fit_plot`` Function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This addresses critical issues and introduces key enhancements for the ``scatter_fit_plot`` function. +These changes aim to improve usability, flexibility, and robustness of the function. + +1. Added ``exclude_combinations`` Parameter. Users can now exclude specific variable pairs from being plotted by providing a list of tuples with the combinations to omit. + +2. Added ``combinations`` Parameter to ``show_plot``. Users can also now show just the list of combinations that are part of the selection process when ``all_vars=True``. + +3. When plotting a single variable pair with ``show_plot="both"``, the function threw an ``AttributeError``. Single-variable pairs are now properly handled. + +4. Changed the default value of ``show_plot`` to ``"both"`` to prevent excessive individual plots when handling large variable sets. + +5. Fixed Issues with Legend, ``xlim``, and ``ylim``; inputs were not being used; these have been corrected. + + +Fix Default Title and Filename Handling in ``flex_corr_matrix`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This resolves issues in the ``flex_corr_matrix`` function where: + +1. No default title was provided when ``title=None``, resulting in missing titles on plots. +2. Saved plot filenames were incorrect, leading to issues like ``.png.png`` when ``title`` was not provided. + +The fix ensures that a default title ("Correlation Matrix") is used for both plot display and file saving when no ``title`` +is explicitly provided. If ``title`` is explicitly set to ``None``, the plot will have no title, +but the saved filename will still use ``"correlation_matrix"``. + +1. If no ``title`` is provided, ``"Correlation Matrix"`` is used as the default for filenames and displayed titles. If ``title=None`` is explicitly passed, no title is displayed on the plot. + +2. File names are generated based on the ``title`` or default to ``"correlation_matrix"`` if ``title`` is not provided. Spaces in the ``title`` are replaced with underscores, and special characters like ``:`` are removed to ensure valid filenames. + + + `Version 0.0.13`_ ---------------------- diff --git a/docs/_sources/eda_plots.rst.txt b/docs/_sources/eda_plots.rst.txt index c5fa3af87..1e79ec811 100644 --- a/docs/_sources/eda_plots.rst.txt +++ b/docs/_sources/eda_plots.rst.txt @@ -2931,7 +2931,7 @@ These settings allow for the creation of scatter plots that comprehensively expl
-.. image:: ../assets/scatter_plots_all_grid.png +.. image:: ../assets/scatter_plots_grid.png :alt: Scatter Plot Comparisons (Grouped2) :align: center :width: 900px diff --git a/docs/changelog.html b/docs/changelog.html index 4e5e9e118..c66148aec 100644 --- a/docs/changelog.html +++ b/docs/changelog.html @@ -88,6 +88,12 @@
  • Contributors/Maintainers
  • Citing EDA Toolkit
  • Changelog
  • Changelog

    +
    +

    Version 0.0.14

    +
    +

    Ensure Crosstabs Dictionary is Populated with return_dict=True

    +

    This resolves the issue where the stacked_crosstab_plot function fails to +populate and return the crosstabs dictionary (crosstabs_dict) when +return_dict=True and output="plots_only". The fix ensures that crosstabs +are always generated when return_dict=True, regardless of the output parameter.

    +
      +
    • Always Generate Crosstabs with return_dict=True:

      +
        +
      • Added logic to ensure crosstabs are created and populated in crosstabs_dict whenever return_dict=True, even if the output parameter is set to "plots_only".

      • +
      +
    • +
    • Separation of Crosstabs Display from Generation:

      +
        +
      • The generation of crosstabs is now independent of the output parameter.

      • +
      • Crosstabs display (print) occurs only when output includes "both" or "crosstabs_only".

      • +
      +
    • +
    +
    +
    +

    Enhancements and Fixes for scatter_fit_plot Function

    +

    This addresses critical issues and introduces key enhancements for the scatter_fit_plot function. +These changes aim to improve usability, flexibility, and robustness of the function.

    +
      +
    1. Added exclude_combinations Parameter. Users can now exclude specific variable pairs from being plotted by providing a list of tuples with the combinations to omit.

    2. +
    3. Added combinations Parameter to show_plot. Users can also now show just the list of combinations that are part of the selection process when all_vars=True.

    4. +
    5. When plotting a single variable pair with show_plot="both", the function threw an AttributeError. Single-variable pairs are now properly handled.

    6. +
    7. Changed the default value of show_plot to "both" to prevent excessive individual plots when handling large variable sets.

    8. +
    9. Fixed Issues with Legend, xlim, and ylim; inputs were not being used; these have been corrected.

    10. +
    +
    +
    +

    Fix Default Title and Filename Handling in flex_corr_matrix

    +

    This resolves issues in the flex_corr_matrix function where:

    +
      +
    1. No default title was provided when title=None, resulting in missing titles on plots.

    2. +
    3. Saved plot filenames were incorrect, leading to issues like .png.png when title was not provided.

    4. +
    +

    The fix ensures that a default title (“Correlation Matrix”) is used for both plot display and file saving when no title +is explicitly provided. If title is explicitly set to None, the plot will have no title, +but the saved filename will still use "correlation_matrix".

    +
      +
    1. If no title is provided, "Correlation Matrix" is used as the default for filenames and displayed titles. If title=None is explicitly passed, no title is displayed on the plot.

    2. +
    3. File names are generated based on the title or default to "correlation_matrix" if title is not provided. Spaces in the title are replaced with underscores, and special characters like : are removed to ensure valid filenames.

    4. +
    +
    +

    Version 0.0.13

    This version introduces a series of updates and fixes across multiple functions to enhance error handling, improve cross-environment compatibility, streamline usability, and optimize performance. These changes address critical issues, add new features, and ensure consistent behavior in both terminal and notebook environments.

    @@ -827,8 +883,8 @@

    Version 0.0.1b0stacked_crosstab_plot function, allowing for more customizable and specific plot generation based on user requirements.

    -
    -

    Version 0.0.1b0

    +
    +

    Version 0.0.1b0

    Refined KDE Distributions

    Key Changes

      @@ -846,8 +902,8 @@

      Version 0.0.1b0 -

      Version 0.0.1b0

      +
      +

      Version 0.0.1b0

      Enhanced KDE Distributions Function

      Added Parameters

        @@ -929,8 +985,8 @@

        Version 0.0.1b0 -

        Version 0.0.1b0

        +
        +

        Version 0.0.1b0

        Contingency Table Updates

        • fillna('') added to output so that null values come through, removed 'All' column name from output, sort options 0 and 1, updated docstring documentation. Tested successfully on Python 3.7.3.

        • diff --git a/docs/eda_plots.html b/docs/eda_plots.html index 80b72cb03..213920f2f 100644 --- a/docs/eda_plots.html +++ b/docs/eda_plots.html @@ -2353,7 +2353,7 @@

          Regression-Centric Scatter Plots Example) -

        @@ -2437,7 +2437,7 @@

        Scatter Plots (All Combinations Example)) -

        diff --git a/docs/index.html b/docs/index.html index e59808d38..7dd52bc4b 100644 --- a/docs/index.html +++ b/docs/index.html @@ -321,6 +321,12 @@

        Table of ContentsContributors/Maintainers
      1. Citing EDA Toolkit
      2. Changelog
          +
        • Version 0.0.14 +
        • Version 0.0.13
        • References
        • diff --git a/docs/objects.inv b/docs/objects.inv index b71df91a5..6f838b193 100644 Binary files a/docs/objects.inv and b/docs/objects.inv differ diff --git a/docs/searchindex.js b/docs/searchindex.js index f3bfffee1..52a596e17 100644 --- a/docs/searchindex.js +++ b/docs/searchindex.js @@ -1 +1 @@ -Search.setIndex({"alltitles": {"2D Partial Dependence Plots": [[6, "d-partial-dependence-plots"]], "2D Plots - CA Housing Example": [[6, "d-plots-ca-housing-example"]], "3D Partial Dependence Plots": [[6, "id19"]], "3D Plots - CA Housing Example": [[6, "id21"]], "ASCII Art": [[1, null]], "ASCII Art Collection": [[1, "ascii-art-collection"]], "About EDA Toolkit": [[8, null]], "Acknowledgements": [[0, null]], "Add Environment Detection to dataframe_columns Function": [[2, "add-environment-detection-to-dataframe-columns-function"]], "Add ValueError for Insufficient Pool Size in add_ids and Enhance ID Deduplication": [[2, "add-valueerror-for-insufficient-pool-size-in-add-ids-and-enhance-id-deduplication"]], "Add tqdm Progress Bar to dataframe_columns Function": [[2, "add-tqdm-progress-bar-to-dataframe-columns-function"]], "Adding Unique Identifiers": [[5, "adding-unique-identifiers"]], "Applications in Modeling": [[10, "applications-in-modeling"]], "Available Scale Conversions": [[6, "available-scale-conversions"]], "Binning Numerical Columns": [[5, "binning-numerical-columns"]], "Box Plots Grid Example": [[6, "box-plots-grid-example"]], "Box and Violin Plots": [[6, "box-and-violin-plots"]], "Box-Cox Transformation": [[10, "box-cox-transformation"]], "Box-Cox Transformation Example 1": [[6, "box-cox-transformation-example-1"]], "Box-Cox Transformation Example 2": [[6, "box-cox-transformation-example-2"]], "Calculation Details": [[5, "calculation-details"]], "Census Income Example": [[5, "census-income-example"]], "Centering Data Using the Median": [[10, "centering-data-using-the-median"]], "Changelog": [[2, null]], "Changes in stacked_crosstab_plot": [[2, "changes-in-stacked-crosstab-plot"]], "Citing EDA Toolkit": [[3, null]], "Confidence Intervals for Lambda": [[10, "confidence-intervals-for-lambda"]], "Contributors/Maintainers": [[4, null]], "Correlation Matrices": [[6, "correlation-matrices"]], "Creating Contingency Tables": [[5, "creating-contingency-tables"]], "Creating Effective Visualizations": [[6, null]], "Data Fraction Usage": [[6, "data-fraction-usage"]], "Data Management": [[8, null]], "Data Management Overview": [[5, null]], "Data Management Techniques": [[5, "data-management-techniques"]], "DataFrame Analysis": [[5, "dataframe-analysis"]], "DataFrame Column Names": [[5, "dataframe-column-names"]], "Description": [[7, "description"]], "Enhance strip_trailing_period to Support Strings and Mixed Data Types": [[2, "enhance-strip-trailing-period-to-support-strings-and-mixed-data-types"]], "Example 1": [[6, "example-1"]], "Example 2": [[6, "example-2"]], "Example Calculation": [[10, "example-calculation"]], "Examples": [[1, "examples"]], "Explanation of Each Component": [[10, "explanation-of-each-component"]], "Feature Scaling and Outliers": [[6, "feature-scaling-and-outliers"]], "Features": [[1, "features"]], "Full Correlation Matrix Example": [[6, "full-correlation-matrix-example"]], "Gaussian Assumption for Normality": [[10, null]], "Generating Summary Tables for Variable Combinations": [[5, "generating-summary-tables-for-variable-combinations"]], "Getting Started": [[8, null]], "Heuristics for Visualizations": [[6, "heuristics-for-visualizations"]], "Highlighting Specific Columns in a DataFrame": [[5, "highlighting-specific-columns-in-a-dataframe"]], "Histogram Example (Count)": [[6, "histogram-example-count"]], "Histogram Example (Density)": [[6, "histogram-example-density"]], "Histogram Example - (Mean and Median)": [[6, "histogram-example-mean-and-median"]], "Histogram Example - (Mean, Median, and Std. Deviation)": [[6, "histogram-example-mean-median-and-std-deviation"]], "Histograms and Kernel Density Estimation (KDE)": [[10, "histograms-and-kernel-density-estimation-kde"]], "Improvements": [[2, "improvements"]], "Installation": [[7, "installation"]], "Interactive Plot": [[6, "interactive-plot"]], "KDE Distribution Function": [[6, "kde-distribution-function"]], "KDE and Histogram Distribution Plots": [[6, "kde-and-histogram-distribution-plots"]], "KDE and Histograms Example": [[6, "kde-and-histograms-example"]], "Key Features": [[7, "key-features"]], "Logit Transformation": [[10, "logit-transformation"]], "Logit Transformation Example": [[6, "logit-transformation-example"]], "Mathematical Definition": [[10, "mathematical-definition"]], "Median and IQR Scaling": [[10, "median-and-iqr-scaling"]], "Methodologies": [[6, "methodologies"]], "New Features": [[2, "new-features"]], "Non-Normalized Stacked Bar Plots Example": [[6, "non-normalized-stacked-bar-plots-example"]], "Notes": [[1, null], [5, null], [5, null], [5, null], [5, null], [6, null], [6, null], [6, null]], "Notes:": [[5, null]], "Observed Outliers Sans Cutoffs": [[6, "observed-outliers-sans-cutoffs"]], "Other Enhancements and Fixes": [[2, "other-enhancements-and-fixes"]], "Overview": [[1, "overview"]], "Partial Dependence Foundations": [[10, "partial-dependence-foundations"]], "Partial Dependence Plots": [[6, "partial-dependence-plots"]], "Path directories": [[5, "path-directories"]], "Pearson Correlation Coefficient": [[10, "pearson-correlation-coefficient"]], "Pivoted Stacked Bar Plots Example": [[6, "pivoted-stacked-bar-plots-example"]], "Pivoted Violin Plots Grid Example": [[6, "pivoted-violin-plots-grid-example"]], "Plain Outliers Example": [[6, "plain-outliers-example"]], "Plotting Heuristics": [[8, null]], "Practical Considerations": [[10, "practical-considerations"]], "Prerequisites": [[7, "prerequisites"]], "Project Links": [[7, "project-links"]], "Properties and Benefits": [[10, "properties-and-benefits"]], "Purpose and Assumptions": [[10, "purpose-and-assumptions"]], "Purpose of EDA Toolkit": [[7, "purpose-of-eda-toolkit"]], "References": [[9, null]], "Regression-Centric Scatter Plots Example": [[6, "regression-centric-scatter-plots-example"]], "Regular Non-Stacked Bar Plots Example": [[6, "regular-non-stacked-bar-plots-example"]], "Retaining a Sample for Analysis": [[6, "retaining-a-sample-for-analysis"]], "RobustScaler Outliers Examples": [[6, "robustscaler-outliers-examples"]], "Saving DataFrames to Excel with Customized Formatting": [[5, "saving-dataframes-to-excel-with-customized-formatting"]], "Scatter Fit Plot": [[6, "scatter-fit-plot"]], "Scatter Plots (All Combinations Example)": [[6, "scatter-plots-all-combinations-example"]], "Scatter Plots Grouped by Category Example": [[6, "scatter-plots-grouped-by-category-example"]], "Scatter Plots and Best Fit Lines": [[6, "scatter-plots-and-best-fit-lines"]], "Scatter Plots: Excluding Specific Combinations": [[6, "scatter-plots-excluding-specific-combinations"]], "Stacked Bar Plots With Crosstabs Example": [[6, "stacked-bar-plots-with-crosstabs-example"]], "Stacked Crosstab Plots": [[6, "stacked-crosstab-plots"]], "Standardized Dates": [[5, "standardized-dates"]], "Static Plot": [[6, "static-plot"]], "Table of Contents": [[8, null]], "The Yeo-Johnson Transformation": [[10, "the-yeo-johnson-transformation"]], "Theoretical Overview": [[8, null]], "Trailing Period Removal": [[5, "trailing-period-removal"]], "Treated Outliers With Cutoffs": [[6, "treated-outliers-with-cutoffs"]], "Triangular Correlation Matrix Example": [[6, "triangular-correlation-matrix-example"]], "Version 0.0.10": [[2, "version-0-0-10"]], "Version 0.0.11": [[2, "version-0-0-11"]], "Version 0.0.12": [[2, "version-0-0-12"]], "Version 0.0.13": [[2, "version-0-0-13"]], "Version 0.0.1b0": [[2, "version-0-0-1b0"], [2, "id11"], [2, "id12"], [2, "id13"]], "Version 0.0.1rc0": [[2, "version-0-0-1rc0"]], "Version 0.0.2": [[2, "version-0-0-2"]], "Version 0.0.3": [[2, "version-0-0-3"]], "Version 0.0.4": [[2, "version-0-0-4"]], "Version 0.0.5": [[2, "version-0-0-5"]], "Version 0.0.6": [[2, "version-0-0-6"]], "Version 0.0.7": [[2, "version-0-0-7"]], "Version 0.0.8": [[2, "version-0-0-8"]], "Version 0.0.8a": [[2, "version-0-0-8a"]], "Version 0.0.8b": [[2, "version-0-0-8b"]], "Version 0.0.8c": [[2, "version-0-0-8c"]], "Version 0.0.9": [[2, "version-0-0-9"]], "Violin Plots Grid Example": [[6, "violin-plots-grid-example"]], "Welcome to the EDA Toolkit Python Library Documentation!": [[7, null]], "What is EDA?": [[7, "what-is-eda"]]}, "docnames": ["acknowledgements", "art", "changelog", "citations", "contributors", "data_management", "eda_plots", "getting_started", "index", "references", "theoretical_overview"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.todo": 2, "sphinx.ext.viewcode": 1}, "filenames": ["acknowledgements.rst", "art.rst", "changelog.rst", "citations.rst", "contributors.rst", "data_management.rst", "eda_plots.rst", "getting_started.rst", "index.rst", "references.rst", "theoretical_overview.rst"], "indexentries": {"add_ids()": [[5, "add_ids", false]], "box_violin_plot()": [[6, "box_violin_plot", false]], "built-in function": [[1, "print_art", false], [5, "add_ids", false], [5, "contingency_table", false], [5, "dataframe_columns", false], [5, "ensure_directory", false], [5, "highlight_columns", false], [5, "parse_date_with_rule", false], [5, "save_dataframes_to_excel", false], [5, "strip_trailing_period", false], [5, "summarize_all_combinations", false], [6, "box_violin_plot", false], [6, "data_doctor", false], [6, "flex_corr_matrix", false], [6, "kde_distributions", false], [6, "plot_2d_pdp", false], [6, "plot_3d_pdp", false], [6, "scatter_fit_plot", false], [6, "stacked_crosstab_plot", false]], "contingency_table()": [[5, "contingency_table", false]], "data_doctor()": [[6, "data_doctor", false]], "dataframe_columns()": [[5, "dataframe_columns", false]], "ensure_directory()": [[5, "ensure_directory", false]], "flex_corr_matrix()": [[6, "flex_corr_matrix", false]], "highlight_columns()": [[5, "highlight_columns", false]], "kde_distributions()": [[6, "kde_distributions", false]], "parse_date_with_rule()": [[5, "parse_date_with_rule", false]], "plot_2d_pdp()": [[6, "plot_2d_pdp", false]], "plot_3d_pdp()": [[6, "plot_3d_pdp", false]], "print_art()": [[1, "print_art", false]], "save_dataframes_to_excel()": [[5, "save_dataframes_to_excel", false]], "scatter_fit_plot()": [[6, "scatter_fit_plot", false]], "stacked_crosstab_plot()": [[6, "stacked_crosstab_plot", false]], "strip_trailing_period()": [[5, "strip_trailing_period", false]], "summarize_all_combinations()": [[5, "summarize_all_combinations", false]]}, "objects": {"": [[5, 0, 1, "", "add_ids"], [6, 0, 1, "", "box_violin_plot"], [5, 0, 1, "", "contingency_table"], [6, 0, 1, "", "data_doctor"], [5, 0, 1, "", "dataframe_columns"], [5, 0, 1, "", "ensure_directory"], [6, 0, 1, "", "flex_corr_matrix"], [5, 0, 1, "", "highlight_columns"], [6, 0, 1, "", "kde_distributions"], [5, 0, 1, "", "parse_date_with_rule"], [6, 0, 1, "", "plot_2d_pdp"], [6, 0, 1, "", "plot_3d_pdp"], [1, 0, 1, "", "print_art"], [5, 0, 1, "", "save_dataframes_to_excel"], [6, 0, 1, "", "scatter_fit_plot"], [6, 0, 1, "", "stacked_crosstab_plot"], [5, 0, 1, "", "strip_trailing_period"], [5, 0, 1, "", "summarize_all_combinations"]]}, "objnames": {"0": ["py", "function", "Python function"]}, "objtypes": {"0": "py:function"}, "terms": {"": [0, 1, 2, 4, 5, 6, 10], "0": [3, 5, 6, 7, 8, 10], "00": 5, "000": 6, "0000": 6, "000000": 6, "0000ff": 6, "00140": [6, 9], "0040": 6, "00bfc4": 6, "01": 5, "0119": 6, "0163": 6, "019590": 6, "02": [2, 5], "0278": 6, "03021": [6, 9], "033257": 6, "0333": 6, "037743": 6, "04": [5, 6], "05": [6, 10], "0517": 6, "0556": 6, "07": [5, 6], "0724": 6, "08": 5, "086108": 6, "09": 6, "1": [2, 5, 7, 8, 10], "10": [3, 5, 6, 7, 8, 9], "100": [5, 6, 10], "1016": [6, 9], "105": 6, "10724": 6, "11": [5, 6, 8], "1109": [6, 9], "111": [5, 6], "115": 6, "11687": 6, "117": 6, "119": 6, "11th": [5, 6], "12": [5, 6, 7, 8], "120": [5, 6], "123": [2, 5], "1234": 5, "12929": 6, "13": [3, 5, 6, 8], "131": 6, "13162633": 3, "13163208": 3, "13174": 6, "132222": 6, "1348": 5, "13706": 5, "13920": 6, "14": [3, 5, 6, 7], "147": 6, "14x4": 6, "15": [5, 6], "150": 5, "15784": 5, "15x5": 6, "16": [5, 6], "161880": 6, "16192": 6, "1667": 6, "17": 6, "1717": 6, "1748": 6, "177": 6, "1779": 6, "18": [5, 6, 7], "180807": 6, "181": 6, "1873": 6, "189": 6, "19": 6, "1964": 10, "19716": 5, "1994": 7, "1996": [5, 6, 7, 9], "1997": [6, 9], "1b0": 8, "1d": 6, "1rc0": 8, "2": [5, 7, 8, 10], "20": [5, 6], "200": 5, "2007": [6, 9], "2020": 5, "2021": [5, 6, 9], "2022": 5, "2024": 3, "203488": 5, "21": [5, 6, 7], "21105": [6, 9], "2115": 6, "215646": [5, 6], "216561": 6, "22": 6, "22379": 5, "2245": 6, "227960": 6, "22803": 5, "23": 6, "234721": [5, 6], "236": 6, "24": 5, "24432": [5, 6, 7, 9], "24720": 5, "25": [2, 5, 6], "250": 5, "2509": 6, "2565": 5, "25th": 10, "26": 6, "27": 6, "274": 5, "28": [5, 6], "280": 6, "285": 6, "28523": 5, "29": [5, 6], "291": [6, 9], "292": 6, "29305": 6, "295": 6, "297": [6, 9], "2d": [2, 8, 9, 10], "3": [5, 6, 7, 8, 9, 10], "30": [5, 6], "300": [5, 6], "3021": [6, 9], "3054": 6, "31": 5, "3188": 6, "32": 5, "32650": [5, 6], "33": [5, 6, 9], "3333": 6, "333333": 6, "338409": [5, 6], "33906": 5, "34": [5, 6], "3461": 6, "351102": 5, "355015": 6, "36": [5, 6], "3680": 5, "37": [5, 6], "37155": 6, "3719": 6, "38": [5, 6], "3809": 6, "3853": 6, "389562": 6, "38it": 5, "39": [5, 6], "3986": 6, "399428377": 6, "3d": [2, 8, 10], "3d_pdp": 6, "4": [5, 6, 7, 8, 10], "40": [5, 6], "400": 6, "400000": 6, "408117383": 6, "41": [5, 6], "4110": 6, "415": 6, "417": 6, "41762": 5, "42": [5, 6], "4267": 5, "43": 5, "43832": 5, "44807": 5, "45": [5, 6], "458295720": 6, "46": 5, "46560": 5, "467": 5, "468": 5, "469": 5, "47": 6, "470": 5, "471": 5, "472": 5, "4722": 6, "4746": 6, "477": 6, "479262902": [5, 6], "484": 6, "48842": [5, 6], "49": [5, 6], "5": [5, 6, 7, 8, 10], "50": [5, 6, 10], "5000": 6, "50k": [5, 6], "50k_": 6, "50th": 10, "51": [5, 6], "520438": 6, "5219": 6, "521908": 6, "5281": 3, "53": [5, 6], "5338": 6, "535": 6, "55": [6, 9], "5556": 6, "56": 5, "561810758": [5, 6], "5623": 5, "56it": 5, "5707": 6, "5713": 6, "58": 6, "582248222": [5, 6], "5856": 5, "59": [5, 6], "595": 6, "598098459": [5, 6], "6": [5, 6, 7, 8, 9], "60": [5, 6, 9], "61": [5, 6], "614411": 6, "6172": 5, "62": 6, "64": [5, 6], "65": 6, "66": [5, 6], "6619": 6, "6664": 6, "668": 6, "669717925": 6, "6738": 6, "6761": 6, "68": 10, "68624": 6, "69": [5, 6], "7": [5, 6, 7, 8, 10], "70": [5, 6], "705": 6, "71": 6, "7152": [6, 9], "720": 5, "73": 6, "73402": 6, "74": 5, "746": 6, "75": [5, 6], "7536": 6, "75th": 10, "76": [5, 6], "769": 6, "77": 6, "77516": [5, 6], "776705221": [5, 6], "7778": 6, "79": [5, 6], "8": [5, 6, 8], "80": [5, 6], "808080": 6, "809": 6, "81": 6, "815": 6, "82": 5, "8213": 6, "83": 6, "832": 5, "83311": [5, 6], "84": 10, "8409": 6, "85": [5, 6], "850675": 6, "8601": 5, "87": 6, "87it": 5, "88it": 5, "89": [5, 6], "8a": 8, "8b": 8, "8c": 8, "8d": 2, "9": [5, 6, 8, 9], "90": [2, 5, 6, 9], "9076": 6, "91": [5, 6], "912323": 6, "923": 6, "93": 6, "936876": 6, "939": 6, "94": 6, "9468": 6, "95": [5, 6, 9, 10], "955": 6, "96": [5, 6, 9], "961427355": 6, "963": 5, "966": 5, "97": 5, "97261": 6, "98": 5, "984": 6, "99": [5, 6, 10], "A": [1, 2, 5, 6, 7, 9, 10], "As": 10, "By": [2, 6, 10], "For": [5, 6, 7, 10], "If": [1, 2, 5, 6, 10], "In": [5, 6, 10], "Into": 6, "It": [2, 5, 6, 7, 10], "No": [2, 6, 10], "Not": [5, 6], "One": [2, 10], "The": [1, 2, 5, 6, 7, 8], "Then": [5, 10], "There": 10, "These": [2, 5, 6, 10], "To": [2, 6, 10], "With": [4, 8], "_": [1, 6, 10], "__": 1, "___": 1, "____": 1, "_____": 1, "_c": 10, "_cutoff": 6, "_plotli": 2, "_w_cutoff": 6, "ab": 6, "abil": [2, 6], "abl": 6, "abov": [2, 6], "absolut": [2, 6], "academ": 0, "accept": [2, 6], "access": [6, 10], "accord": [2, 6, 10], "accordingli": 5, "account": [2, 6], "accur": [2, 6], "accuraci": [5, 10], "achiev": 10, "acknowledg": [2, 8], "across": [2, 5, 6, 10], "act": 10, "actual": 6, "ad": [2, 6, 8, 10], "adapt": [2, 5], "add": [5, 6, 8], "add_best_fit_lin": 6, "add_id": [5, 8], "addit": [2, 6], "addition": [5, 6, 7], "address": [2, 6, 7, 10], "adher": [2, 6], "adjust": [2, 5, 6, 10], "adm": [5, 6], "advanc": [5, 6], "advis": 6, "aesthet": [2, 6], "affect": 6, "after": [2, 5, 6, 10], "ag": [5, 6], "against": [2, 6], "age_boxcox": 6, "age_boxcox_alpha": 6, "age_boxcox_kde_cutoff": 6, "age_boxplot_list": 6, "age_group": [5, 6], "age_robust": 6, "ages_18_to_40": 5, "aggreg": 6, "alic": 5, "alien": 1, "align": [2, 5, 6, 10], "all": [1, 2, 5, 7, 8, 10], "all_combin": 5, "all_var": 6, "allow": [2, 5, 6, 10], "alon": 10, "along": [2, 6], "alongsid": 6, "alpha": [2, 6, 10], "alphabet": 5, "alreadi": 5, "also": [0, 2, 6], "alter": 6, "altern": [6, 10], "alwai": [2, 5, 6], "ambigu": 6, "among": 2, "amount": 5, "an": [0, 2, 4, 5, 6, 10], "analysi": [2, 7, 8, 10], "analyst": 7, "analyt": 4, "analyz": [2, 5, 6], "angl": [2, 6], "ani": [2, 5, 6, 7, 10], "annot": [2, 6], "anomali": [6, 7], "anoth": [6, 10], "anyth": 2, "appar": [6, 10], "appeal": 6, "appear": [2, 5, 6], "append": [5, 6], "appli": [0, 2, 4, 5, 6, 7, 10], "applic": [2, 6, 8], "apply_as_new_col_to_df": 6, "apply_cutoff": 6, "approach": [2, 5, 6, 10], "appropri": [2, 6, 10], "approxim": [6, 10], "ar": [1, 2, 5, 6, 10], "arcsinh": 6, "area": 6, "arg": 2, "argument": [2, 6], "arima": 10, "aros": 2, "around": [2, 6, 10], "arrai": [2, 6], "arrang": 6, "arrow": 6, "art": 8, "art_nam": 1, "artifact": 5, "artifici": 4, "artwork": 1, "ascii": 8, "ascii_art": 1, "asian": 5, "aspect": [2, 6, 7], "assess": [6, 10], "assign": [2, 5, 6], "associ": [7, 10], "assum": [5, 10], "assumpt": [6, 8], "astyp": 2, "attempt": [5, 6], "attent": 6, "attract": 6, "attribut": 6, "aug": 3, "author": [3, 4], "auto": [5, 6], "autofit": 5, "autom": [4, 7], "automat": [1, 2, 5, 6, 7], "autoregress": [6, 9], "avail": [1, 8], "aveoccup": 6, "averag": [6, 10], "averoom": 6, "avoid": [2, 6], "ax": [2, 6], "axi": [2, 6], "azimuth": 6, "bachelor": [5, 6], "back": [2, 5, 6, 10], "backbon": 5, "background": [1, 5], "background_color": [2, 5], "backward": 2, "badg": 2, "balanc": 6, "band": 6, "bandwidth": 10, "bar": [5, 7, 8], "barebon": 6, "barh": 6, "barri": [6, 9], "base": [1, 2, 5, 6, 10], "base_path": 5, "baselin": 6, "basic": 6, "bb": 1, "bbox_inch": 6, "becaus": [5, 6], "becom": 10, "been": [2, 5, 6], "befor": [2, 5, 6, 7, 10], "begin": [5, 10], "behav": 10, "behavior": [1, 2, 6], "being": [2, 5, 6], "bell": 10, "belong": 5, "below": [2, 5, 6], "beneath": 6, "benefici": 6, "benefit": 8, "best": [2, 7, 8, 10], "best_fit_linecolor": 6, "best_fit_linestyl": 6, "beta": 2, "better": [2, 6, 7, 10], "between": [2, 5, 6, 10], "bin": [2, 6, 8, 10], "bin_ag": 5, "binrang": 6, "binwidth": [2, 6], "biolog": 10, "black": [1, 5, 6], "block": [2, 6], "blue": 6, "bob": 5, "bold": 5, "bool": [1, 5, 6], "boolean": [2, 6], "borderless": 5, "both": [1, 2, 5, 6, 10], "bound": [5, 6], "boundari": 5, "box": [2, 7, 8], "box_violin": 6, "box_violin_kw": 6, "box_violin_plot": [2, 6, 8], "box_violin_ylim": 6, "boxcox": 6, "boxplot": [2, 6], "boxprop": 6, "breakdown": [6, 10], "brief": 2, "bring": [4, 6], "broad": [2, 7], "brown": 6, "browser": 6, "bug": 2, "built": [2, 6], "bulk": 6, "c": 6, "c0": 6, "c5gp7": [5, 6, 7, 9], "c_i": 10, "ca": 8, "ca_state_bb": 1, "ca_state_wb": 1, "calcul": [2, 6, 8], "california": [1, 4, 6], "call": [2, 5], "camera": [2, 6], "can": [2, 5, 6, 7, 10], "cannot": [5, 10], "cap": 6, "capabl": [1, 2, 5], "capit": [5, 6], "captur": 10, "career": 0, "case": [2, 5, 6, 10], "categor": [2, 5, 6], "categori": [5, 8], "caus": 2, "cbar_label": 6, "cbar_thick": [2, 6], "cbar_x": [2, 6], "cbrt": 6, "cdot": 10, "cell": 5, "censu": [6, 7, 8, 9], "census_id": [5, 6], "census_summary_t": 5, "center": [6, 8], "center_baselin": 6, "central": [6, 10], "centric": 8, "certain": 6, "certifi": 2, "chang": [6, 8, 10], "changelog": 8, "charact": [5, 6], "characterist": [6, 7], "charli": 5, "chart": 6, "check": [2, 5, 6], "chi": 10, "choic": 2, "choos": [2, 5, 6, 10], "chosen": 10, "ci": 10, "circl": [], "citat": 2, "cite": 8, "civ": [5, 6], "clariti": [2, 6], "clean": [2, 5, 6, 7], "cleaner": [2, 5, 6], "cleanup": 2, "clear": [2, 6, 10], "clearer": [2, 6], "clearli": [2, 6], "cleric": [5, 6], "close": 10, "closer": 10, "clutter": 6, "cmap": [2, 6], "code": [2, 5, 6, 10], "codebas": 2, "coeffici": [6, 8], "cohes": 6, "col": [2, 5, 6], "col1": 2, "col2": 2, "collabor": 4, "collect": 8, "colleg": 6, "collis": [2, 5], "color": [2, 5, 6], "colorbar": 6, "colormap": [2, 6], "column": [2, 6, 8], "column_nam": 5, "combin": [2, 7, 8, 10], "come": 2, "comment": 2, "common": [2, 5, 6, 7], "commonli": 10, "compar": [2, 6, 10], "comparison": 6, "compat": [2, 6], "complement": 10, "complementari": 10, "complet": [5, 10], "complex": [2, 6, 10], "compon": 8, "comprehens": [2, 5, 6, 7, 10], "compress": 6, "comput": [2, 5, 6, 9], "concept": [6, 10], "concern": 10, "concis": 2, "condit": [2, 6, 10], "condition": 2, "confid": [6, 8], "configur": [2, 6], "confirm": [2, 6], "conflict": 6, "confus": [2, 6], "consecut": 5, "consid": [6, 10], "consider": 8, "consist": [2, 5, 6, 10], "consolid": 2, "constant": [5, 6, 10], "constitut": 5, "constrain": [6, 10], "constraint": 5, "construct": 10, "contain": [1, 2, 5, 6, 10], "content": [2, 5, 6], "context": 10, "conting": [2, 6, 7, 8], "contingency_t": [5, 8], "continu": [2, 5, 6, 10], "contour": 10, "contrast": 6, "contributor": 8, "control": [2, 6], "convei": 6, "convers": [2, 5, 8], "convert": [2, 5, 6], "coolwarm": [2, 6], "coordin": 2, "cornel": 4, "correct": [2, 6, 7], "correctli": [1, 2, 5, 6], "correl": [2, 8], "correspond": [2, 5, 6], "could": 6, "count": [2, 5, 8, 10], "countri": 5, "cours": 4, "cov": 10, "covari": 10, "cox": [2, 8], "creat": [1, 2, 7, 8], "creation": 6, "critic": [2, 6, 10], "crop": 6, "cross": 2, "crosstab": [2, 8], "crosstab_age_incom": 6, "crosstab_age_sex": 6, "crosstab_df": 2, "crosstabs_onli": 6, "crucial": [5, 6, 7], "cube": 6, "cumul": 6, "current": [1, 5], "curv": [6, 10], "custom": [2, 6, 7, 8], "custom_ord": 6, "customiz": [2, 6, 7], "cut": 5, "cutoff": [2, 8], "d": [2, 6, 9], "dai": 5, "dark": 6, "dashboard": 6, "data": [0, 4, 7, 9], "data_doctor": [2, 6, 8], "data_fract": 6, "data_nam": 5, "data_output": 5, "data_path": 5, "data_typ": 2, "datafram": [2, 6, 7, 8], "dataframe_column": [5, 8], "dataset": [5, 6, 7, 10], "date": [2, 7, 8], "date_column": 5, "date_str": 5, "datetim": 2, "david": [5, 10], "dd": 5, "deal": [5, 6, 10], "decad": 4, "decid": 6, "decim": [2, 5], "decimal_plac": [2, 5], "decis": [2, 6], "decreas": 10, "dedic": 0, "dedupl": 8, "deeper": 6, "deepest": 0, "default": [1, 2, 5, 6], "defin": [2, 5, 6, 10], "definit": [5, 8], "degre": [2, 6, 10], "demograph": 6, "demonstr": [5, 6, 7], "denot": 10, "densiti": [2, 8], "depend": [2, 5, 7, 8], "deprec": 2, "depth": [2, 6], "deriv": 10, "desc": 2, "descend": [5, 6], "describ": [2, 6], "descript": [2, 6, 8], "design": [2, 5, 6, 7, 10], "desir": [1, 6], "despit": 6, "detail": [1, 2, 6, 7, 8], "detect": [5, 6, 8], "determin": [2, 5, 6, 10], "dev": 6, "develop": [4, 10], "deviat": [2, 7, 8, 10], "df": [2, 5, 6], "df_censu": 5, "df_dict": 5, "df_num": 6, "diagon": 6, "dict": [5, 6], "dictionari": [1, 5, 6], "did": 2, "diego": [0, 4], "differ": [2, 6, 10], "digit": [2, 5], "dimens": 6, "dimensionless": 10, "dir": 5, "direct": [6, 10], "directli": [2, 5, 6, 7], "directori": [1, 6, 7, 8], "disabl": [2, 6], "disable_sci_not": [2, 6], "discov": 7, "discret": [5, 10], "dispers": 10, "displai": [1, 2, 5, 6], "distinct": [2, 5, 6], "distinguish": 6, "distort": 6, "distract": 6, "distribut": [2, 5, 7, 8, 10], "dive": 5, "divers": [2, 6], "divid": [5, 6, 10], "divorc": [5, 6], "do": [2, 5, 6], "docstr": [2, 6], "doctor": 6, "document": [1, 2, 5, 6, 8, 10], "doe": [2, 5, 6, 10], "doi": [3, 5, 6, 7, 9], "domin": 5, "don": 6, "done": [6, 10], "dot": 10, "doubl": 6, "down": 6, "downplai": 6, "downscal": 10, "dr": 0, "draw": 6, "driven": 5, "dtype": 5, "due": [2, 5, 6], "duplic": 2, "dure": [0, 2, 5], "dx_": 10, "dx_c": 10, "dynam": [2, 6], "e": [2, 5, 6, 10], "each": [1, 2, 5, 6, 8], "eas": [2, 5, 7], "easi": [2, 6, 7], "easier": [2, 6, 10], "easili": 6, "ebrahim": 0, "ecosystem": 2, "eda": [1, 2, 6], "eda_toolkit": [2, 5, 6, 7], "eda_toolkit_logo": 1, "edg": [2, 6], "edgecolor": [2, 6], "educ": [0, 4, 5, 6], "effect": [2, 4, 5, 7, 8, 10], "effici": [2, 6], "either": [2, 5, 6], "element": [2, 5, 6], "elev": 6, "elimin": 2, "els": 2, "emp": [5, 6], "emphas": [2, 6], "emploi": 6, "employ": 5, "empti": [2, 5], "enabl": [2, 6, 7], "enable_zoom": [2, 6], "encount": 6, "end": [2, 5, 10], "endeavor": 0, "endpoint": 5, "engin": [0, 5, 6, 9], "enhanc": [5, 6, 7, 8, 10], "enough": 2, "ensembl": 6, "ensu": 7, "ensur": [1, 2, 5, 6, 7], "ensure_directori": [5, 8], "enter": [2, 5], "entir": [5, 6, 10], "entri": [2, 5, 6], "environ": [0, 5, 6, 8, 9], "equal": [6, 10], "equat": 6, "equival": 5, "error": [2, 5, 6], "especi": [2, 5, 6, 10], "essenti": [5, 7, 10], "estim": [6, 8], "etc": [6, 7], "ev": 5, "evalu": 10, "even": [2, 5], "everyth": 6, "exact": [2, 6], "exactli": 6, "examin": 6, "exampl": [2, 7, 8], "exce": 5, "excel": [4, 7, 8], "except": [0, 2, 5], "excess": 6, "exclud": [5, 8], "exclude_combin": 6, "exclus": [5, 6], "exec": [5, 6], "execut": 6, "exhaust": 6, "exist": [1, 2, 5], "exp": [6, 10], "expand": 2, "expect": [2, 10], "expenditur": 10, "experi": [2, 4], "experienc": 2, "explain": [2, 5], "explan": [2, 6, 8], "explicit": 2, "explicitli": 2, "explor": [2, 6, 7, 10], "exploratori": [6, 7], "exponenti": 6, "export": [6, 7], "express": [0, 10], "extend": [0, 6], "extens": [1, 2, 6], "extract": [2, 6], "extrem": [2, 6, 10], "f": [2, 6, 10], "f8766d": 6, "f8c5c8": 5, "facecolor": 6, "facilit": [2, 4, 5, 6, 7], "factor": 6, "failur": 2, "fall": [5, 6, 10], "fallback": 2, "fals": [1, 2, 5, 6], "famili": [5, 6, 10], "far": 6, "farm": 6, "fashion": 6, "featur": [5, 8, 10], "feature_nam": 6, "feature_names_list": [2, 6], "feature_proport": 6, "feder": 5, "feedback": 2, "femal": [5, 6], "female_": 6, "fetch": 6, "fetch_california_h": 6, "few": [6, 10], "ff0000": 6, "field": 6, "figsiz": [2, 6], "figur": [2, 6], "file": [1, 2, 5, 6], "file_nam": 5, "file_path": 5, "file_prefix": [2, 6], "filenam": [2, 6], "fill": [2, 6], "fill_alpha": [2, 6], "fillna": 2, "filter": [5, 6], "filtered_df": 5, "final": [5, 6], "financi": [4, 6], "find": [2, 5, 10], "fine": 6, "finer": 6, "first": [2, 5, 6, 10], "fish": 6, "fit": [2, 5, 7, 8], "five": 6, "fix": [6, 8], "flag": 2, "flex_corr_matrix": [2, 6, 8], "flexibl": [1, 2, 6, 10], "flip": 6, "float": [2, 5, 6], "fnlwgt": [5, 6], "fnlwgt_w_cutoff": 6, "focu": 6, "focus": 6, "folder": 5, "follow": [1, 2, 5, 6, 7, 10], "font": [2, 6], "fontsiz": 2, "form": [5, 7], "format": [2, 6, 7, 8], "formatth": 6, "former": 5, "formerli": 2, "formula": 10, "found": [1, 2, 5, 6], "foundat": 8, "four": 6, "frac": [6, 10], "fraction": 8, "framework": 6, "freedom": [6, 10], "frequenc": [2, 5, 6, 10], "frequent": 5, "friendli": [2, 5], "from": [0, 1, 2, 4, 5, 6, 7, 10], "full": [2, 5, 8, 10], "fuller": 6, "fulli": 6, "func_col": [2, 6], "function": [1, 5, 7, 8, 10], "further": [2, 5, 6], "futur": [2, 6], "futurewarn": 6, "g": [2, 5, 6, 10], "gain": [5, 6, 7], "gaussian": [6, 8], "gener": [2, 6, 7, 8], "georg": 10, "geq": [5, 10], "get": 7, "get_legend": 2, "get_text": 2, "gil": [1, 3, 4], "github": 7, "give": [2, 6], "given": [2, 5, 6, 10], "glanc": 6, "go": 5, "goal": 10, "got": 2, "gov": [5, 6], "grace": 2, "gracefulli": 2, "grad": [5, 6], "gradient": 6, "gradientboostingregressor": 6, "graduat": 0, "grai": 6, "granular": 2, "graphic": [6, 9], "gratitud": 0, "greater": [2, 5, 6], "green": 6, "grei": [1, 6], "grey_alien_wb": 1, "grid": [2, 8, 10], "grid_figs": 6, "grid_resolut": 6, "grid_valu": 6, "ground": 6, "group": [2, 5, 8], "growth": 6, "gt": 6, "guarante": 2, "guid": [0, 7], "guidanc": 2, "guidelin": 6, "h": [5, 6, 10], "h_pad": 6, "ha": [2, 4, 5, 6, 10], "half": 6, "hall": 1, "halt": 6, "halv": 6, "handl": [1, 2, 5, 6, 7, 10], "handler": [5, 6], "hat": 10, "have": [2, 5, 6, 10], "he": 4, "head": [5, 6], "header": [2, 5], "health": 4, "healthcar": 4, "heatmap": [2, 6], "height": 6, "help": [2, 5, 6, 7, 10], "here": [5, 6, 10], "heteroscedast": 10, "hex": [2, 5], "hi": 0, "hidden": 6, "hide": [2, 5], "hide_index": [2, 5], "high": [2, 5, 6], "higher": [6, 7], "highest": 5, "highli": [6, 10], "highlight": [2, 6, 8, 10], "highlight_column": [5, 8], "highlighted_df": 5, "hist": [2, 6], "hist_color": 6, "hist_edgecolor": [2, 6], "hist_kw": 6, "hist_ylim": 6, "histogram": [2, 8], "histplot": 6, "hold": [4, 5, 6], "homoscedast": 10, "horizont": [2, 6], "hour": [5, 6], "hous": 8, "houseag": 6, "household": 6, "hover": 6, "how": [1, 5, 6, 7, 10], "howev": [2, 6, 10], "html": [5, 6], "html_file_nam": [2, 6], "html_file_path": [2, 6], "http": [3, 5, 6, 7, 9], "huber": 6, "hue": [2, 6], "hue_dict": 6, "hue_palett": 6, "hunter": [6, 9], "husband": [5, 6], "hyperbol": 6, "hyperlink": 5, "hypothes": 7, "i": [1, 2, 4, 5, 6, 8, 10], "icon": 2, "id": [5, 6, 7, 8], "id_colnam": 5, "idea": 5, "ideal": 6, "identif": 2, "identifi": [2, 6, 7, 8], "ignor": 6, "illustr": [1, 6, 10], "imag": [1, 5, 6], "image_filenam": 6, "image_path_png": [2, 5, 6], "image_path_svg": [2, 5, 6], "imbal": 5, "immedi": 6, "impact": [2, 6, 10], "implement": [2, 5, 10], "import": [1, 2, 5, 6], "imposs": 5, "improv": [5, 8, 10], "inc": [5, 6], "inch": 6, "includ": [2, 5, 6, 7, 10], "inclus": 5, "incom": [6, 7, 8, 9, 10], "inconsist": [2, 5], "incorpor": [2, 6], "incorrect": [2, 6], "incorrectli": 6, "increas": [2, 5, 6, 10], "increment": 2, "inde": 6, "independ": 2, "index": [2, 5, 6], "indic": [2, 5, 6, 10], "individu": [2, 5, 6, 10], "individual_figs": 6, "industri": 4, "inf": 5, "infer": 10, "infin": 5, "influenc": [2, 6, 10], "influenti": 6, "inform": [2, 5, 6, 7], "infti": 10, "initi": [2, 6, 7], "inner": 6, "input": [1, 2, 6, 10], "insight": [5, 6, 7, 10], "inspct": 6, "inspect": 6, "instal": [5, 8], "instanc": [5, 6, 10], "instead": [2, 6, 10], "instruct": [5, 6, 7], "insuffici": 8, "int": [2, 5, 6, 10], "int64": 5, "intact": 6, "integ": [2, 5], "integr": [2, 6, 7], "intellig": 4, "intend": [2, 6], "intent": 6, "interact": [1, 2, 8, 10], "interest": [6, 10], "interfac": [2, 6], "intern": [2, 6], "interpret": [2, 6, 10], "interquartil": [6, 10], "interv": [5, 6, 8], "introduc": [2, 5], "introduct": 2, "intuit": [2, 6, 7, 10], "invalid": [2, 6], "invalu": 10, "invers": [6, 10], "investig": 7, "involv": [5, 6, 7], "io": 5, "ipykernel": 2, "ipython": 2, "iqr": [6, 8], "irrelev": 6, "is_notebook_env": 2, "island": 5, "iso": 5, "issu": [2, 7, 10], "item": 6, "iter": [2, 6, 10], "its": [2, 5, 6, 10], "itself": 6, "j": [6, 9], "jinja2": 7, "johnson": [6, 8], "join": 5, "joint": [6, 10], "jointli": 6, "joss": [6, 9], "journal": [6, 9], "journei": 0, "jupyt": [2, 5], "just": 6, "justifi": 10, "k": [6, 9, 10], "kde": [2, 7, 8], "kde_color": 6, "kde_density_single_distribut": 6, "kde_distribut": [2, 6, 8], "kde_kw": 6, "kde_ylim": 6, "kdeplot": 6, "keep": 6, "kei": [1, 2, 5, 6, 8, 10], "kernel": [6, 8], "keyboard": 6, "keyerror": 6, "keyword": [2, 6], "kind": 6, "known": 6, "kohavi": [5, 6, 7, 9], "kwarg": [2, 6], "l": [3, 10], "label": [2, 5, 6], "label_ag": 5, "label_fonts": [2, 6], "label_nam": 6, "lambda": [6, 8], "larg": [2, 5, 6], "larger": 6, "largest": [], "last": 5, "later": 6, "latest": 2, "layout": [2, 6], "ldot": 10, "lead": [6, 10], "learn": [0, 2, 4, 5, 6, 7, 9, 10], "learning_r": 6, "least": [2, 5, 6, 10], "leav": 2, "lectur": 4, "left": [5, 6, 10], "left_margin": [2, 6], "legend": [2, 6], "legend_label": 6, "legend_labels_list": 6, "legibl": 6, "len": 2, "length": [2, 5, 6], "leon": 1, "leon_shpaner_bb": 1, "leon_shpaner_wb": 1, "leonid": [3, 4], "leq": 5, "less": [2, 5, 6, 10], "let": 10, "letter": [6, 9], "level": [5, 6, 10], "leverag": [2, 6, 7], "librari": [2, 5, 6, 8], "licens": 2, "lie": [6, 10], "lightblu": 6, "like": [0, 2, 5, 6, 10], "likelihood": 10, "limit": [2, 6], "line": [2, 7, 8, 10], "linear": [6, 10], "linestyl": 6, "link": 8, "list": [1, 2, 5, 6], "lmbda": 6, "ln": 10, "load": [5, 6], "local": 5, "locat": [5, 6], "log": [2, 6, 10], "log_scale_var": [2, 6], "logarithm": [6, 10], "logic": [2, 5, 6], "logist": 6, "logit": 8, "logo": [1, 2], "logscal": 6, "long": 6, "longer": 6, "look": 6, "loop": [2, 6], "lose": 10, "loss": [2, 5, 6], "lower": [2, 6], "lower_cutoff": 6, "lr": 10, "lt": [5, 6], "m": [0, 4, 6, 9], "machin": [2, 4, 5, 6, 7, 9, 10], "made": [2, 10], "magnitud": 6, "mai": [2, 5, 6, 10], "main": 7, "maintain": [2, 6, 8], "major": [5, 10], "make": [2, 5, 6, 10], "male": [5, 6], "male_": 6, "manag": [2, 4, 6, 7, 10], "manageri": [5, 6], "mani": [6, 7, 10], "manipul": 7, "manner": 6, "manual": [2, 6], "map": [2, 6, 10], "marco": 0, "margin": [2, 6, 10], "marit": [5, 6], "mark": [2, 6], "marker": 6, "marri": [5, 6], "master": 4, "match": [1, 2, 6], "mathbb": 10, "mathbf": 10, "mathemat": [5, 8], "matplotlib": [2, 6, 7, 9], "matplotlib_colormap": 6, "matric": [2, 8], "matrix": [2, 8], "max": [2, 6, 10], "max_col": 6, "max_depth": 6, "max_unique_valu": 5, "max_unique_value_pct": 5, "max_unique_value_tot": 5, "maxab": 6, "maxim": 10, "maximum": [6, 10], "mcse": [6, 9], "mean": [2, 5, 7, 8, 10], "mean_color": 6, "meaning": [5, 6], "measur": [5, 6, 10], "mechan": 2, "median": [2, 7, 8], "median_color": 6, "medinc": 6, "meet": [5, 7, 10], "mentor": 0, "mentorship": 0, "messag": [2, 6], "method": [2, 5, 6, 7, 10], "methodologi": 8, "metric": 6, "metrics_box_violin": 2, "metrics_comp": 6, "metrics_list": 6, "mid": 10, "middl": 10, "might": [6, 10], "min": [2, 6, 10], "min_length": 5, "mind": [6, 7], "minim": [2, 6], "minimum": [5, 6], "minmax": 6, "minor": 2, "minu": 6, "misalign": 6, "misinterpret": 6, "mislead": 2, "miss": [1, 2, 5, 6, 7], "mix": 8, "mle": 10, "mm": 5, "mode": [2, 6], "model": [2, 6, 7, 8], "model_select": 6, "modifi": [2, 6], "modul": [1, 2], "modulenotfounderror": 2, "month": [3, 5], "more": [2, 5, 6, 10], "most": [2, 5, 6, 7], "mous": 6, "move": [2, 6], "mu": 10, "mu_i": 10, "mu_x": 10, "much": 10, "multi": 6, "multidimension": 6, "multipl": [1, 2, 5, 6, 7, 10], "multipli": 2, "must": [5, 6, 10], "my_datafram": 2, "n": 10, "n_col": 6, "n_estim": 6, "n_row": [2, 6], "na": [2, 5], "name": [1, 2, 6, 8], "nan": [2, 5, 6], "narrow": [6, 10], "nativ": 5, "natur": [6, 10], "navig": [5, 6], "nbformat": 7, "ndarrai": 6, "neatli": 2, "necessari": [2, 5, 10], "need": [1, 2, 5, 6, 7, 10], "neg": [6, 10], "neither": [2, 6], "neq": 10, "nest": 6, "neutral": 6, "never": [5, 6], "new": [5, 6, 8], "newer": 6, "next": [5, 6], "nh": 10, "nomenclatur": 2, "non": [1, 2, 5, 8, 10], "none": [1, 2, 5, 6], "nonetyp": 2, "nor": [2, 6], "normal": [2, 8], "notat": [2, 6], "notebook": [2, 5], "noth": [5, 10], "notic": [5, 6], "now": [2, 6], "np": [2, 6], "null": [2, 5], "null_pct": 5, "null_tot": 5, "num": [5, 6], "num_digit": 5, "number": [2, 5, 6, 10], "numer": [2, 6, 8], "numpi": [2, 6, 7], "nuniqu": 5, "o": [3, 5, 6], "object": [2, 5, 6], "observ": [8, 10], "obviou": 6, "occup": [5, 6], "occur": [2, 5], "occurr": 5, "odd": [6, 10], "off": 10, "offer": [2, 6, 7, 10], "often": [6, 7, 10], "ol": 10, "older": [2, 6], "omit": [2, 6], "one": [1, 2, 5, 6, 10], "ones": 6, "onli": [2, 6, 10], "op": 6, "opaqu": 6, "open": [6, 9, 10], "oper": [2, 5, 6, 10], "opportun": 5, "optim": [2, 6, 10], "option": [1, 2, 5, 6, 7], "orang": 6, "order": [2, 5, 6], "ordinari": 10, "org": [3, 5, 6, 7, 9], "organ": [2, 6], "orient": 6, "origin": [5, 6, 10], "original_df": 5, "oscar": [1, 3, 4], "oscar_gil_bb": 1, "oscar_gil_wb": 1, "other": [5, 6, 8, 10], "otherwis": 6, "our": [0, 10], "out": [6, 10], "outcom": [6, 10], "outlier": [2, 7, 8, 10], "outlin": 1, "output": [1, 2, 5, 6, 10], "output_fil": 1, "output_path": 1, "outsid": [2, 6, 10], "over": [2, 4, 6, 10], "overal": [2, 6], "overcompl": 6, "overhead": 2, "overlai": 6, "overlaid": 6, "overlap": 6, "overrid": 6, "overview": [6, 10], "own": 6, "p": [6, 10], "pac": 5, "pace": [6, 9], "packag": 7, "pad": [2, 6], "page": [6, 7], "pair": [5, 6], "pairwis": 6, "palett": 6, "panda": [2, 5, 6, 7], "param": 2, "paramet": [1, 2, 5, 6, 10], "parametr": 10, "pardir": 5, "parent": 5, "pars": 5, "parse_date_with_rul": [5, 8], "part": 5, "partial": [2, 8], "partial_depend": 6, "particular": [], "particularli": [2, 5, 6, 10], "pass": [2, 6], "path": [1, 2, 6, 8], "patient": 5, "pattern": [6, 7], "pd": [5, 6, 10], "pdf": 10, "pdp": [6, 10], "pearson": [6, 8], "per": [5, 6], "percent": [2, 6], "percentag": [5, 6], "percentil": [6, 10], "perfect": 10, "perfectli": 6, "perform": [2, 5, 6], "performancewarn": 2, "period": [2, 6, 8], "person": 4, "perspect": [2, 6], "pi": 10, "pictur": 6, "pink": 5, "pip": 7, "pitfal": 2, "pivot": [0, 8], "place": [2, 5], "plai": 0, "plain": [2, 5, 8], "plot": [2, 7, 10], "plot_2d_pdp": [2, 6, 8], "plot_3d_pdp": [2, 6, 8], "plot_mean": 6, "plot_median": 6, "plot_typ": [2, 6], "plotli": [2, 6, 7], "plotly_colormap": 6, "plots_onli": 6, "plt": 2, "pm": 10, "png": [2, 5, 6], "png_imag": 5, "point": [2, 6, 10], "pointer": 6, "pool": [5, 8], "pool_siz": 2, "popul": 6, "popular": 7, "posit": [2, 6, 10], "possibl": [2, 5, 6, 7, 10], "potenti": [2, 5, 6, 10], "power": [2, 6, 10], "pr": 2, "practic": [6, 8], "practition": 10, "pre": 2, "preced": 6, "predefin": 1, "predict": [2, 6, 10], "prefer": [2, 6], "prefix": [2, 6], "prepar": [2, 5, 6], "preprocess": [5, 6], "prerequisit": 8, "presenc": 6, "present": [2, 5, 6], "preserv": [5, 6, 10], "preval": 6, "prevent": [2, 5, 6], "previou": [2, 6], "previous": 2, "price": 6, "primari": 6, "print": [1, 2, 5, 6], "print_art": [1, 8], "prior": 6, "privat": [5, 6], "probabl": [2, 6, 9, 10], "proceed": [5, 6], "process": [2, 5, 6, 7], "produc": [2, 6], "product": 10, "prof": [5, 6], "profession": 4, "profil": 10, "program": [0, 4], "programmat": 6, "progress": [5, 8], "project": [2, 4, 5, 6, 8], "promin": 6, "proper": [2, 5, 6], "properli": [2, 6], "properti": [6, 8], "proport": [2, 5, 6, 10], "provid": [0, 1, 2, 5, 6, 7, 10], "public": 6, "publish": 3, "purpl": 6, "purpos": [2, 6, 8], "pursu": 0, "py": [1, 2], "pypi": [2, 7], "python": [2, 4, 6, 8], "q1": 6, "q2": 6, "q3": 6, "q4": 6, "q_1": 10, "q_3": 10, "qualiti": [6, 7], "quantifi": [6, 10], "quantil": 6, "quantile_rang": 6, "quantit": 6, "quantiti": 6, "quartil": 6, "quick": [], "quickli": [6, 7], "r": [4, 5, 6, 7, 9, 10], "r_": 10, "race": 5, "racial": 5, "rais": [1, 2, 5, 6, 10], "random": [2, 5, 6], "random_st": 6, "rang": [2, 5, 6, 7, 10], "rather": [6, 10], "ratio": 10, "raw": 6, "re": [2, 6, 10], "read": 5, "readabl": [2, 5, 6], "readi": 7, "readm": 2, "real": [2, 10], "reciproc": 6, "recommend": [5, 6], "record": 5, "red": 6, "reduc": [2, 6, 10], "redund": 6, "refactor": 2, "refer": [6, 8], "referenc": 1, "refin": [2, 6], "reflect": [2, 6, 10], "regardless": 6, "regener": 2, "regress": [8, 10], "regular": [2, 8], "rel": 10, "relat": [2, 5, 6], "relationship": [5, 6, 7, 10], "releas": 2, "relev": [2, 6, 7], "reli": 6, "reliabl": [5, 6, 10], "relianc": 2, "remain": [2, 5, 6, 10], "remov": [2, 6, 7, 8, 10], "remove_stack": [2, 6], "renam": [2, 6], "render": [2, 5], "repeat": 10, "replac": [2, 5], "replica": 2, "report": [4, 6, 7], "repositori": [5, 6, 7, 9], "repres": [2, 5, 6, 10], "represent": [1, 2, 6], "reproduc": [2, 5, 6], "requir": [2, 5, 6, 7, 10], "rescal": 6, "research": 7, "reset": 2, "residu": 10, "resolut": 6, "resolv": [2, 5], "respect": [2, 5, 6, 10], "respons": 10, "rest": 6, "result": [2, 5, 6, 10], "result_df": 2, "retain": [2, 8, 10], "retri": 2, "retriev": 6, "return": [2, 5, 6], "return_df": [2, 5], "return_dict": 6, "reveal": 6, "rich": [6, 7], "right": [5, 6, 10], "right_margin": [2, 6], "riversid": 4, "robust": [2, 6, 10], "robustscal": [8, 10], "role": [0, 2, 6], "root": [2, 6, 10], "rot": 6, "rotat": [2, 6], "rotate_plot": 6, "round": 5, "row": [2, 5, 6], "royc": 1, "royce_hal": 1, "royce_hall_bb": 1, "royce_hall_wb": 1, "rule": [5, 6], "run": [2, 5, 6, 7], "runtim": 2, "s0167": [6, 9], "same": [2, 6], "sampl": [5, 8, 10], "sampled_df": 6, "san": [0, 4, 8], "save": [1, 2, 6, 7, 8], "save_dataframes_to_excel": [2, 5, 8], "save_format": [2, 6], "save_plot": [2, 6], "scalabl": 2, "scale": [2, 8], "scale_convers": [2, 6], "scale_conversion_kw": [2, 6], "scatter": [2, 7, 8], "scatter_color": 6, "scatter_fit_plot": [2, 6, 8], "scatterplot": 6, "scenario": [2, 6, 10], "scheme": 6, "school": 0, "scienc": [0, 4, 5, 6, 7, 9], "scientif": [2, 6], "scientist": [0, 4, 7], "scikit": [2, 6, 7, 10], "scope": 6, "score": 6, "scroll": 6, "seaborn": [2, 6, 7, 9], "seamless": [2, 6], "seamlessli": [2, 7], "second": [5, 6], "section": [2, 5, 6], "see": 6, "seed": [2, 5, 6], "seen": 6, "select": [2, 6, 10], "select_dtyp": 6, "self": [5, 6], "sensit": 10, "separ": [2, 5, 6], "sequenc": 6, "seri": [2, 5, 6, 10], "serv": [2, 4, 6], "servic": 4, "session": 2, "set": [2, 5, 6, 10], "set_as_index": 5, "set_titl": 2, "setminu": 10, "setp": 2, "setup": [2, 5, 6], "sever": [2, 6], "sex": [5, 6], "shape": [5, 6, 10], "sheet": 5, "shift": 6, "shilei": 0, "should": [1, 6], "show": [2, 5, 6, 10], "show_cbar": 6, "show_correl": 6, "show_legend": [2, 6], "show_modebar": [2, 6], "show_plot": 6, "showcas": 6, "shown": 6, "shpaner": [1, 3, 4], "shpaner_2024_13162633": 3, "shrink": 2, "side": 6, "sigma": 10, "sigma_i": 10, "sigma_x": 10, "sign": 6, "signal": 10, "signatur": 2, "signific": [2, 5, 10], "significantli": 2, "silver": 6, "similar": 6, "similarli": [6, 10], "simpl": 6, "simpler": 2, "simplic": [6, 7], "simplif": 2, "simplifi": [2, 5, 10], "simultan": [1, 6, 10], "sinc": [5, 6, 10], "sine": 6, "singl": [2, 5, 6, 10], "single_figs": 6, "single_var_image_filenam": 6, "size": [5, 6, 8], "skew": [6, 10], "skip": 2, "sklearn": 6, "slightli": 2, "small": 2, "smaller": 6, "smallest": [], "smooth": [6, 10], "smoother": [2, 6], "smoothli": 10, "sn": 6, "snippet": [5, 6], "so": [2, 5, 6], "softwar": [3, 6, 9], "some": [2, 5, 6, 10], "sort": [2, 5], "sort_bi": [2, 5], "sort_cols_alpha": 5, "sortbi": 2, "sourc": [6, 7, 9], "space": [2, 6], "span": 6, "spars": [6, 9], "spatial": [6, 9], "special": [2, 10], "specialti": [5, 6], "specif": [2, 7, 8], "specifi": [1, 2, 5, 6, 7], "split": 6, "spot": 5, "spous": [5, 6], "spread": [6, 10], "sql": 4, "sqrt": [6, 10], "squar": [2, 6, 10], "stabil": [2, 6, 10], "stabl": [2, 10], "stack": [2, 7, 8], "stacked_crosstab": 6, "stacked_crosstab_plot": [6, 8], "standard": [2, 6, 7, 8, 10], "standardized_d": 5, "start": [2, 5, 7], "stat": [2, 6], "state": [1, 5, 6], "statement": 2, "static": [2, 8], "statist": [2, 4, 5, 6, 7, 9, 10], "statistician": 10, "statu": [2, 5, 6], "std": 8, "std_color": 6, "std_dev_level": 6, "stdrz": 6, "stem": 6, "step": [2, 5, 7], "still": [2, 6], "store": [2, 6], "str": [1, 2, 5, 6], "straightforward": 2, "strategi": 6, "streamlin": [2, 5, 7], "strength": [6, 10], "strictli": [2, 10], "string": [5, 6, 8], "strip": 5, "strip_trailing_period": [5, 8], "stronger": 10, "structur": [1, 2, 7], "style": [2, 5, 6], "styler": [2, 5], "subplot": 6, "subset": [6, 10], "substitut": 10, "subtl": 2, "subtract": 10, "success": 0, "successfulli": [0, 2], "suffici": [2, 6], "suffix": 1, "suggest": [2, 5, 10], "suit": 7, "suitabl": [2, 6, 10], "sum_": 10, "summar": [7, 10], "summari": [2, 6, 7, 8], "summarize_all_combin": [5, 8], "summary_t": 5, "support": [0, 5, 6, 8], "suppos": [6, 10], "suppress": 6, "sure": 5, "surfac": [2, 10], "svg": [2, 5, 6], "svg_imag": 5, "swap": 6, "switch": 2, "sy": 2, "symmetr": 6, "syntax": 6, "system": [5, 7], "t": 6, "tab": 5, "tabl": [2, 6, 7], "tabular": 6, "tailor": 6, "take": [5, 6, 10], "tall": 6, "target": [6, 10], "tarshizi": 0, "task": [5, 7], "tatist": 6, "teach": 4, "techniqu": [6, 7, 8, 10], "tell": 6, "ten": 4, "tend": 10, "tendenc": 6, "term": 6, "termin": [2, 5], "test": [2, 6, 10], "test_siz": 6, "text": [2, 5, 6, 10], "text_wrap": [2, 6], "th": 10, "than": [2, 5, 6, 10], "thank": 0, "thei": [2, 5, 6, 10], "them": [1, 2, 5, 6, 7], "theoret": [6, 10], "therefor": 6, "thi": [1, 2, 5, 6, 7, 10], "thick": 6, "those": [2, 6, 10], "three": 6, "through": [2, 6], "throw": 2, "thu": [5, 6], "tick": [2, 6], "tick_fonts": [2, 6], "tight": 6, "time": [0, 2, 5, 6, 10], "titl": [2, 3, 6], "title_i": [2, 6], "title_x": [2, 6], "to_list": 6, "togeth": 10, "toggl": [2, 6], "tone": [], "tool": [2, 6, 7], "toolkit": [1, 2, 6], "top": 6, "top_margin": [2, 6], "topic": 5, "total": [5, 6, 10], "toward": 2, "tqdm": [5, 8], "track": [2, 5], "trade": 10, "tradit": 10, "trail": [2, 6, 8], "train": [6, 10], "train_test_split": 6, "transform": [2, 8], "transpar": [2, 6], "treat": 8, "treatment": 6, "trend": [6, 7], "triangl": 6, "triangular": [2, 8], "trigger": 6, "true": [1, 2, 5, 6, 10], "truncat": 5, "truth": 6, "try": 2, "tune": 6, "tupl": [2, 5, 6], "two": [2, 5, 6, 10], "txt": 1, "type": [5, 6, 7, 8], "typeerror": 2, "typic": [6, 10], "u": [0, 5, 6], "uci": [5, 6, 7, 9], "ucla": [1, 4], "unambigu": 5, "unbound": 6, "unchang": [2, 10], "uncov": [6, 7], "undefin": [6, 10], "under": [5, 6, 10], "underli": [7, 10], "understand": [5, 6, 7, 10], "unequ": 6, "unifi": 6, "uniform": 2, "uniqu": [2, 6, 7, 8], "unique_id": 2, "unique_values_tot": 5, "unique_var": 5, "unit": 5, "univers": [0, 4], "unknown_art": 1, "unlik": 10, "unnecessari": [2, 6], "unprocess": 5, "unrecogn": 5, "unscal": 6, "unstack": 6, "unus": 2, "unwav": 0, "up": [2, 5, 6], "updat": [2, 5, 6], "upper": [2, 5, 6], "upper_cutoff": 6, "upright": 6, "url": 3, "us": [1, 2, 5, 6, 7, 8], "usabl": 2, "usag": [2, 8], "user": [1, 2, 5, 6, 7], "userwarn": 6, "util": [1, 5, 6, 7], "v": 6, "valid": [2, 6, 10], "valid_plot_typ": 2, "valu": [2, 5, 6, 7, 10], "value_count": 5, "valueerror": [1, 5, 6, 8, 10], "vari": [5, 10], "variabl": [2, 6, 7, 8, 10], "varianc": [6, 10], "varieti": [4, 6, 7], "variou": [2, 6, 7, 10], "vars_of_interest": 6, "vdot": 5, "vector": [2, 10], "verbiag": 2, "verifi": [2, 5], "versa": 6, "versatil": [2, 6], "version": [3, 5, 6, 7, 8], "version_info": 2, "versu": 2, "vertic": [2, 6], "via": [2, 6], "vice": 6, "view": [2, 6, 10], "view_angl": 6, "violat": 10, "violin": [2, 7, 8], "violinplot": 6, "viridi": 6, "visibl": [2, 6], "visual": [2, 5, 7, 8, 9, 10], "vmax": 6, "vmin": 6, "vriabl": 6, "w_pad": 6, "wa": [2, 6], "wai": [6, 10], "want": [2, 6], "wareh": 4, "warn": [2, 5, 6], "waskom": [6, 9], "we": [0, 5, 6, 7, 10], "week": [5, 6], "weight": 6, "welcom": 8, "well": [6, 10], "were": [5, 10], "what": [2, 8], "wheel": 6, "when": [1, 2, 5, 6, 7, 10], "where": [1, 2, 5, 6, 10], "whether": [2, 5, 6], "which": [2, 5, 6, 7, 10], "while": [2, 5, 6, 10], "white": [1, 5], "whitespac": 6, "who": 2, "why": 10, "wide": [4, 6, 10], "width": [2, 6], "wife": [5, 6], "wirefram": [2, 6], "wireframe_color": 6, "wish": 6, "with_cent": 6, "within": [2, 4, 5, 6, 10], "without": [2, 6], "word": 10, "work": [1, 2, 5, 6, 10], "workclass": [5, 6], "workflow": [2, 5, 7], "world": 10, "would": [0, 2, 6, 10], "wrangl": 4, "wrap": [2, 6], "write": 5, "x": [2, 5, 6, 9, 10], "x_": 10, "x_1": 10, "x_2": 10, "x_c": 10, "x_i": 10, "x_j": 10, "x_k": 10, "x_label": [2, 6], "x_label_plotli": 2, "x_n": 10, "x_p": 10, "x_test": 6, "x_train": 6, "x_var": 6, "xlabel": 6, "xlabel_align": 6, "xlabel_rot": 6, "xlim": [2, 6], "xlsx": 5, "xlsxwriter": [5, 7], "xmax": 6, "xmin": 6, "xx": 2, "xy": 10, "y": [2, 6, 10], "y_axis_label": 6, "y_i": 10, "y_label": [2, 6], "y_label_plotli": 2, "y_test": 6, "y_train": 6, "y_var": 6, "year": [3, 4, 5], "yellow": 5, "yeo": [6, 8], "ylabel": 6, "ylabel_align": 6, "ylabel_rot": 6, "ylim": [2, 6], "ymax": 6, "ymin": 6, "you": [5, 6, 7, 10], "your": [5, 6, 7, 10], "yy": 2, "yyyi": 5, "z": 6, "z_label": [2, 6], "z_label_plotli": 2, "zenodo": [2, 3], "zero": [2, 5, 6, 10], "zoom": [2, 6], "zoom_out_factor": [2, 6], "zz": 2}, "titles": ["Acknowledgements", "ASCII Art", "Changelog", "Citing EDA Toolkit", "Contributors/Maintainers", "Data Management Overview", "Creating Effective Visualizations", "Welcome to the EDA Toolkit Python Library Documentation!", "Table of Contents", "References", "Gaussian Assumption for Normality"], "titleterms": {"0": 2, "1": 6, "10": 2, "11": 2, "12": 2, "13": 2, "1b0": 2, "1rc0": 2, "2": [2, 6], "2d": 6, "3": 2, "3d": 6, "4": 2, "5": 2, "6": 2, "7": 2, "8": 2, "8a": 2, "8b": 2, "8c": 2, "9": 2, "The": 10, "With": 6, "about": 8, "acknowledg": 0, "ad": 5, "add": 2, "add_id": 2, "all": 6, "analysi": [5, 6], "applic": 10, "art": 1, "ascii": 1, "assumpt": 10, "avail": 6, "bar": [2, 6], "benefit": 10, "best": 6, "bin": 5, "box": [6, 10], "ca": 6, "calcul": [5, 10], "categori": 6, "censu": 5, "center": 10, "centric": 6, "chang": 2, "changelog": 2, "cite": 3, "coeffici": 10, "collect": 1, "column": 5, "combin": [5, 6], "compon": 10, "confid": 10, "consider": 10, "content": 8, "conting": 5, "contributor": 4, "convers": 6, "correl": [6, 10], "count": 6, "cox": [6, 10], "creat": [5, 6], "crosstab": 6, "custom": 5, "cutoff": 6, "data": [2, 5, 6, 8, 10], "datafram": 5, "dataframe_column": 2, "date": 5, "dedupl": 2, "definit": 10, "densiti": [6, 10], "depend": [6, 10], "descript": 7, "detail": 5, "detect": 2, "deviat": 6, "directori": 5, "distribut": 6, "document": 7, "each": 10, "eda": [3, 7, 8], "effect": 6, "enhanc": 2, "environ": 2, "estim": 10, "exampl": [1, 5, 6, 10], "excel": 5, "exclud": 6, "explan": 10, "featur": [1, 2, 6, 7], "fit": 6, "fix": 2, "format": 5, "foundat": 10, "fraction": 6, "full": 6, "function": [2, 6], "gaussian": 10, "gener": 5, "get": 8, "grid": 6, "group": 6, "heurist": [6, 8], "highlight": 5, "histogram": [6, 10], "hous": 6, "i": 7, "id": 2, "identifi": 5, "improv": 2, "incom": 5, "instal": 7, "insuffici": 2, "interact": 6, "interv": 10, "iqr": 10, "johnson": 10, "kde": [6, 10], "kei": 7, "kernel": 10, "lambda": 10, "librari": 7, "line": 6, "link": 7, "list": [], "logit": [6, 10], "maintain": 4, "manag": [5, 8], "mathemat": 10, "matric": 6, "matrix": 6, "mean": 6, "median": [6, 10], "methodologi": 6, "mix": 2, "model": 10, "name": 5, "new": 2, "non": 6, "normal": [6, 10], "note": [1, 5, 6], "numer": 5, "observ": 6, "other": 2, "outlier": 6, "overview": [1, 5, 8], "partial": [6, 10], "path": 5, "pearson": 10, "period": 5, "pivot": 6, "plain": 6, "plot": [6, 8], "pool": 2, "practic": 10, "prerequisit": 7, "progress": 2, "project": 7, "properti": 10, "purpos": [7, 10], "python": 7, "refer": 9, "regress": 6, "regular": 6, "remov": 5, "retain": 6, "robustscal": 6, "sampl": 6, "san": 6, "save": 5, "scale": [6, 10], "scatter": 6, "size": 2, "specif": [5, 6], "stack": 6, "stacked_crosstab_plot": 2, "standard": 5, "start": 8, "static": 6, "std": 6, "string": 2, "strip_trailing_period": 2, "summari": 5, "support": 2, "tabl": [5, 8], "techniqu": 5, "theoret": 8, "toolkit": [3, 7, 8], "tqdm": 2, "trail": 5, "transform": [6, 10], "treat": 6, "triangular": 6, "type": 2, "uniqu": 5, "us": 10, "usag": 6, "valueerror": 2, "variabl": 5, "version": 2, "violin": 6, "visual": 6, "welcom": 7, "what": 7, "yeo": 10}}) \ No newline at end of file +Search.setIndex({"alltitles": {"2D Partial Dependence Plots": [[6, "d-partial-dependence-plots"]], "2D Plots - CA Housing Example": [[6, "d-plots-ca-housing-example"]], "3D Partial Dependence Plots": [[6, "id19"]], "3D Plots - CA Housing Example": [[6, "id21"]], "ASCII Art": [[1, null]], "ASCII Art Collection": [[1, "ascii-art-collection"]], "About EDA Toolkit": [[8, null]], "Acknowledgements": [[0, null]], "Add Environment Detection to dataframe_columns Function": [[2, "add-environment-detection-to-dataframe-columns-function"]], "Add ValueError for Insufficient Pool Size in add_ids and Enhance ID Deduplication": [[2, "add-valueerror-for-insufficient-pool-size-in-add-ids-and-enhance-id-deduplication"]], "Add tqdm Progress Bar to dataframe_columns Function": [[2, "add-tqdm-progress-bar-to-dataframe-columns-function"]], "Adding Unique Identifiers": [[5, "adding-unique-identifiers"]], "Applications in Modeling": [[10, "applications-in-modeling"]], "Available Scale Conversions": [[6, "available-scale-conversions"]], "Binning Numerical Columns": [[5, "binning-numerical-columns"]], "Box Plots Grid Example": [[6, "box-plots-grid-example"]], "Box and Violin Plots": [[6, "box-and-violin-plots"]], "Box-Cox Transformation": [[10, "box-cox-transformation"]], "Box-Cox Transformation Example 1": [[6, "box-cox-transformation-example-1"]], "Box-Cox Transformation Example 2": [[6, "box-cox-transformation-example-2"]], "Calculation Details": [[5, "calculation-details"]], "Census Income Example": [[5, "census-income-example"]], "Centering Data Using the Median": [[10, "centering-data-using-the-median"]], "Changelog": [[2, null]], "Changes in stacked_crosstab_plot": [[2, "changes-in-stacked-crosstab-plot"]], "Citing EDA Toolkit": [[3, null]], "Confidence Intervals for Lambda": [[10, "confidence-intervals-for-lambda"]], "Contributors/Maintainers": [[4, null]], "Correlation Matrices": [[6, "correlation-matrices"]], "Creating Contingency Tables": [[5, "creating-contingency-tables"]], "Creating Effective Visualizations": [[6, null]], "Data Fraction Usage": [[6, "data-fraction-usage"]], "Data Management": [[8, null]], "Data Management Overview": [[5, null]], "Data Management Techniques": [[5, "data-management-techniques"]], "DataFrame Analysis": [[5, "dataframe-analysis"]], "DataFrame Column Names": [[5, "dataframe-column-names"]], "Description": [[7, "description"]], "Enhance strip_trailing_period to Support Strings and Mixed Data Types": [[2, "enhance-strip-trailing-period-to-support-strings-and-mixed-data-types"]], "Enhancements and Fixes for scatter_fit_plot Function": [[2, "enhancements-and-fixes-for-scatter-fit-plot-function"]], "Ensure Crosstabs Dictionary is Populated with return_dict=True": [[2, "ensure-crosstabs-dictionary-is-populated-with-return-dict-true"]], "Example 1": [[6, "example-1"]], "Example 2": [[6, "example-2"]], "Example Calculation": [[10, "example-calculation"]], "Examples": [[1, "examples"]], "Explanation of Each Component": [[10, "explanation-of-each-component"]], "Feature Scaling and Outliers": [[6, "feature-scaling-and-outliers"]], "Features": [[1, "features"]], "Fix Default Title and Filename Handling in flex_corr_matrix": [[2, "fix-default-title-and-filename-handling-in-flex-corr-matrix"]], "Full Correlation Matrix Example": [[6, "full-correlation-matrix-example"]], "Gaussian Assumption for Normality": [[10, null]], "Generating Summary Tables for Variable Combinations": [[5, "generating-summary-tables-for-variable-combinations"]], "Getting Started": [[8, null]], "Heuristics for Visualizations": [[6, "heuristics-for-visualizations"]], "Highlighting Specific Columns in a DataFrame": [[5, "highlighting-specific-columns-in-a-dataframe"]], "Histogram Example (Count)": [[6, "histogram-example-count"]], "Histogram Example (Density)": [[6, "histogram-example-density"]], "Histogram Example - (Mean and Median)": [[6, "histogram-example-mean-and-median"]], "Histogram Example - (Mean, Median, and Std. Deviation)": [[6, "histogram-example-mean-median-and-std-deviation"]], "Histograms and Kernel Density Estimation (KDE)": [[10, "histograms-and-kernel-density-estimation-kde"]], "Improvements": [[2, "improvements"]], "Installation": [[7, "installation"]], "Interactive Plot": [[6, "interactive-plot"]], "KDE Distribution Function": [[6, "kde-distribution-function"]], "KDE and Histogram Distribution Plots": [[6, "kde-and-histogram-distribution-plots"]], "KDE and Histograms Example": [[6, "kde-and-histograms-example"]], "Key Features": [[7, "key-features"]], "Logit Transformation": [[10, "logit-transformation"]], "Logit Transformation Example": [[6, "logit-transformation-example"]], "Mathematical Definition": [[10, "mathematical-definition"]], "Median and IQR Scaling": [[10, "median-and-iqr-scaling"]], "Methodologies": [[6, "methodologies"]], "New Features": [[2, "new-features"]], "Non-Normalized Stacked Bar Plots Example": [[6, "non-normalized-stacked-bar-plots-example"]], "Notes": [[1, null], [5, null], [5, null], [5, null], [5, null], [6, null], [6, null], [6, null]], "Notes:": [[5, null]], "Observed Outliers Sans Cutoffs": [[6, "observed-outliers-sans-cutoffs"]], "Other Enhancements and Fixes": [[2, "other-enhancements-and-fixes"]], "Overview": [[1, "overview"]], "Partial Dependence Foundations": [[10, "partial-dependence-foundations"]], "Partial Dependence Plots": [[6, "partial-dependence-plots"]], "Path directories": [[5, "path-directories"]], "Pearson Correlation Coefficient": [[10, "pearson-correlation-coefficient"]], "Pivoted Stacked Bar Plots Example": [[6, "pivoted-stacked-bar-plots-example"]], "Pivoted Violin Plots Grid Example": [[6, "pivoted-violin-plots-grid-example"]], "Plain Outliers Example": [[6, "plain-outliers-example"]], "Plotting Heuristics": [[8, null]], "Practical Considerations": [[10, "practical-considerations"]], "Prerequisites": [[7, "prerequisites"]], "Project Links": [[7, "project-links"]], "Properties and Benefits": [[10, "properties-and-benefits"]], "Purpose and Assumptions": [[10, "purpose-and-assumptions"]], "Purpose of EDA Toolkit": [[7, "purpose-of-eda-toolkit"]], "References": [[9, null]], "Regression-Centric Scatter Plots Example": [[6, "regression-centric-scatter-plots-example"]], "Regular Non-Stacked Bar Plots Example": [[6, "regular-non-stacked-bar-plots-example"]], "Retaining a Sample for Analysis": [[6, "retaining-a-sample-for-analysis"]], "RobustScaler Outliers Examples": [[6, "robustscaler-outliers-examples"]], "Saving DataFrames to Excel with Customized Formatting": [[5, "saving-dataframes-to-excel-with-customized-formatting"]], "Scatter Fit Plot": [[6, "scatter-fit-plot"]], "Scatter Plots (All Combinations Example)": [[6, "scatter-plots-all-combinations-example"]], "Scatter Plots Grouped by Category Example": [[6, "scatter-plots-grouped-by-category-example"]], "Scatter Plots and Best Fit Lines": [[6, "scatter-plots-and-best-fit-lines"]], "Scatter Plots: Excluding Specific Combinations": [[6, "scatter-plots-excluding-specific-combinations"]], "Stacked Bar Plots With Crosstabs Example": [[6, "stacked-bar-plots-with-crosstabs-example"]], "Stacked Crosstab Plots": [[6, "stacked-crosstab-plots"]], "Standardized Dates": [[5, "standardized-dates"]], "Static Plot": [[6, "static-plot"]], "Table of Contents": [[8, null]], "The Yeo-Johnson Transformation": [[10, "the-yeo-johnson-transformation"]], "Theoretical Overview": [[8, null]], "Trailing Period Removal": [[5, "trailing-period-removal"]], "Treated Outliers With Cutoffs": [[6, "treated-outliers-with-cutoffs"]], "Triangular Correlation Matrix Example": [[6, "triangular-correlation-matrix-example"]], "Version 0.0.10": [[2, "version-0-0-10"]], "Version 0.0.11": [[2, "version-0-0-11"]], "Version 0.0.12": [[2, "version-0-0-12"]], "Version 0.0.13": [[2, "version-0-0-13"]], "Version 0.0.14": [[2, "version-0-0-14"]], "Version 0.0.1b0": [[2, "version-0-0-1b0"], [2, "id12"], [2, "id13"], [2, "id14"]], "Version 0.0.1rc0": [[2, "version-0-0-1rc0"]], "Version 0.0.2": [[2, "version-0-0-2"]], "Version 0.0.3": [[2, "version-0-0-3"]], "Version 0.0.4": [[2, "version-0-0-4"]], "Version 0.0.5": [[2, "version-0-0-5"]], "Version 0.0.6": [[2, "version-0-0-6"]], "Version 0.0.7": [[2, "version-0-0-7"]], "Version 0.0.8": [[2, "version-0-0-8"]], "Version 0.0.8a": [[2, "version-0-0-8a"]], "Version 0.0.8b": [[2, "version-0-0-8b"]], "Version 0.0.8c": [[2, "version-0-0-8c"]], "Version 0.0.9": [[2, "version-0-0-9"]], "Violin Plots Grid Example": [[6, "violin-plots-grid-example"]], "Welcome to the EDA Toolkit Python Library Documentation!": [[7, null]], "What is EDA?": [[7, "what-is-eda"]]}, "docnames": ["acknowledgements", "art", "changelog", "citations", "contributors", "data_management", "eda_plots", "getting_started", "index", "references", "theoretical_overview"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.todo": 2, "sphinx.ext.viewcode": 1}, "filenames": ["acknowledgements.rst", "art.rst", "changelog.rst", "citations.rst", "contributors.rst", "data_management.rst", "eda_plots.rst", "getting_started.rst", "index.rst", "references.rst", "theoretical_overview.rst"], "indexentries": {"add_ids()": [[5, "add_ids", false]], "box_violin_plot()": [[6, "box_violin_plot", false]], "built-in function": [[1, "print_art", false], [5, "add_ids", false], [5, "contingency_table", false], [5, "dataframe_columns", false], [5, "ensure_directory", false], [5, "highlight_columns", false], [5, "parse_date_with_rule", false], [5, "save_dataframes_to_excel", false], [5, "strip_trailing_period", false], [5, "summarize_all_combinations", false], [6, "box_violin_plot", false], [6, "data_doctor", false], [6, "flex_corr_matrix", false], [6, "kde_distributions", false], [6, "plot_2d_pdp", false], [6, "plot_3d_pdp", false], [6, "scatter_fit_plot", false], [6, "stacked_crosstab_plot", false]], "contingency_table()": [[5, "contingency_table", false]], "data_doctor()": [[6, "data_doctor", false]], "dataframe_columns()": [[5, "dataframe_columns", false]], "ensure_directory()": [[5, "ensure_directory", false]], "flex_corr_matrix()": [[6, "flex_corr_matrix", false]], "highlight_columns()": [[5, "highlight_columns", false]], "kde_distributions()": [[6, "kde_distributions", false]], "parse_date_with_rule()": [[5, "parse_date_with_rule", false]], "plot_2d_pdp()": [[6, "plot_2d_pdp", false]], "plot_3d_pdp()": [[6, "plot_3d_pdp", false]], "print_art()": [[1, "print_art", false]], "save_dataframes_to_excel()": [[5, "save_dataframes_to_excel", false]], "scatter_fit_plot()": [[6, "scatter_fit_plot", false]], "stacked_crosstab_plot()": [[6, "stacked_crosstab_plot", false]], "strip_trailing_period()": [[5, "strip_trailing_period", false]], "summarize_all_combinations()": [[5, "summarize_all_combinations", false]]}, "objects": {"": [[5, 0, 1, "", "add_ids"], [6, 0, 1, "", "box_violin_plot"], [5, 0, 1, "", "contingency_table"], [6, 0, 1, "", "data_doctor"], [5, 0, 1, "", "dataframe_columns"], [5, 0, 1, "", "ensure_directory"], [6, 0, 1, "", "flex_corr_matrix"], [5, 0, 1, "", "highlight_columns"], [6, 0, 1, "", "kde_distributions"], [5, 0, 1, "", "parse_date_with_rule"], [6, 0, 1, "", "plot_2d_pdp"], [6, 0, 1, "", "plot_3d_pdp"], [1, 0, 1, "", "print_art"], [5, 0, 1, "", "save_dataframes_to_excel"], [6, 0, 1, "", "scatter_fit_plot"], [6, 0, 1, "", "stacked_crosstab_plot"], [5, 0, 1, "", "strip_trailing_period"], [5, 0, 1, "", "summarize_all_combinations"]]}, "objnames": {"0": ["py", "function", "Python function"]}, "objtypes": {"0": "py:function"}, "terms": {"": [0, 1, 2, 4, 5, 6, 10], "0": [3, 5, 6, 7, 8, 10], "00": 5, "000": 6, "0000": 6, "000000": 6, "0000ff": 6, "00140": [6, 9], "0040": 6, "00bfc4": 6, "01": 5, "0119": 6, "0163": 6, "019590": 6, "02": [2, 5], "0278": 6, "03021": [6, 9], "033257": 6, "0333": 6, "037743": 6, "04": [5, 6], "05": [6, 10], "0517": 6, "0556": 6, "07": [5, 6], "0724": 6, "08": 5, "086108": 6, "09": 6, "1": [2, 5, 7, 8, 10], "10": [3, 5, 6, 7, 8, 9], "100": [5, 6, 10], "1016": [6, 9], "105": 6, "10724": 6, "11": [5, 6, 8], "1109": [6, 9], "111": [5, 6], "115": 6, "11687": 6, "117": 6, "119": 6, "11th": [5, 6], "12": [5, 6, 7, 8], "120": [5, 6], "123": [2, 5], "1234": 5, "12929": 6, "13": [3, 5, 6, 8], "131": 6, "13162633": 3, "13163208": 3, "13174": 6, "132222": 6, "1348": 5, "13706": 5, "13920": 6, "14": [3, 5, 6, 7, 8], "147": 6, "14x4": 6, "15": [5, 6], "150": 5, "15784": 5, "15x5": 6, "16": [5, 6], "161880": 6, "16192": 6, "1667": 6, "17": 6, "1717": 6, "1748": 6, "177": 6, "1779": 6, "18": [5, 6, 7], "180807": 6, "181": 6, "1873": 6, "189": 6, "19": 6, "1964": 10, "19716": 5, "1994": 7, "1996": [5, 6, 7, 9], "1997": [6, 9], "1b0": 8, "1d": 6, "1rc0": 8, "2": [5, 7, 8, 10], "20": [5, 6], "200": 5, "2007": [6, 9], "2020": 5, "2021": [5, 6, 9], "2022": 5, "2024": 3, "203488": 5, "21": [5, 6, 7], "21105": [6, 9], "2115": 6, "215646": [5, 6], "216561": 6, "22": 6, "22379": 5, "2245": 6, "227960": 6, "22803": 5, "23": 6, "234721": [5, 6], "236": 6, "24": 5, "24432": [5, 6, 7, 9], "24720": 5, "25": [2, 5, 6], "250": 5, "2509": 6, "2565": 5, "25th": 10, "26": 6, "27": 6, "274": 5, "28": [5, 6], "280": 6, "285": 6, "28523": 5, "29": [5, 6], "291": [6, 9], "292": 6, "29305": 6, "295": 6, "297": [6, 9], "2d": [2, 8, 9, 10], "3": [5, 6, 7, 8, 9, 10], "30": [5, 6], "300": [5, 6], "3021": [6, 9], "3054": 6, "31": 5, "3188": 6, "32": 5, "32650": [5, 6], "33": [5, 6, 9], "3333": 6, "333333": 6, "338409": [5, 6], "33906": 5, "34": [5, 6], "3461": 6, "351102": 5, "355015": 6, "36": [5, 6], "3680": 5, "37": [5, 6], "37155": 6, "3719": 6, "38": [5, 6], "3809": 6, "3853": 6, "389562": 6, "38it": 5, "39": [5, 6], "3986": 6, "399428377": 6, "3d": [2, 8, 10], "3d_pdp": 6, "4": [5, 6, 7, 8, 10], "40": [5, 6], "400": 6, "400000": 6, "408117383": 6, "41": [5, 6], "4110": 6, "415": 6, "417": 6, "41762": 5, "42": [5, 6], "4267": 5, "43": 5, "43832": 5, "44807": 5, "45": [5, 6], "458295720": 6, "46": 5, "46560": 5, "467": 5, "468": 5, "469": 5, "47": 6, "470": 5, "471": 5, "472": 5, "4722": 6, "4746": 6, "477": 6, "479262902": [5, 6], "484": 6, "48842": [5, 6], "49": [5, 6], "5": [5, 6, 7, 8, 10], "50": [5, 6, 10], "5000": 6, "50k": [5, 6], "50k_": 6, "50th": 10, "51": [5, 6], "520438": 6, "5219": 6, "521908": 6, "5281": 3, "53": [5, 6], "5338": 6, "535": 6, "55": [6, 9], "5556": 6, "56": 5, "561810758": [5, 6], "5623": 5, "56it": 5, "5707": 6, "5713": 6, "58": 6, "582248222": [5, 6], "5856": 5, "59": [5, 6], "595": 6, "598098459": [5, 6], "6": [5, 6, 7, 8, 9], "60": [5, 6, 9], "61": [5, 6], "614411": 6, "6172": 5, "62": 6, "64": [5, 6], "65": 6, "66": [5, 6], "6619": 6, "6664": 6, "668": 6, "669717925": 6, "6738": 6, "6761": 6, "68": 10, "68624": 6, "69": [5, 6], "7": [5, 6, 7, 8, 10], "70": [5, 6], "705": 6, "71": 6, "7152": [6, 9], "720": 5, "73": 6, "73402": 6, "74": 5, "746": 6, "75": [5, 6], "7536": 6, "75th": 10, "76": [5, 6], "769": 6, "77": 6, "77516": [5, 6], "776705221": [5, 6], "7778": 6, "79": [5, 6], "8": [5, 6, 8], "80": [5, 6], "808080": 6, "809": 6, "81": 6, "815": 6, "82": 5, "8213": 6, "83": 6, "832": 5, "83311": [5, 6], "84": 10, "8409": 6, "85": [5, 6], "850675": 6, "8601": 5, "87": 6, "87it": 5, "88it": 5, "89": [5, 6], "8a": 8, "8b": 8, "8c": 8, "8d": 2, "9": [5, 6, 8, 9], "90": [2, 5, 6, 9], "9076": 6, "91": [5, 6], "912323": 6, "923": 6, "93": 6, "936876": 6, "939": 6, "94": 6, "9468": 6, "95": [5, 6, 9, 10], "955": 6, "96": [5, 6, 9], "961427355": 6, "963": 5, "966": 5, "97": 5, "97261": 6, "98": 5, "984": 6, "99": [5, 6, 10], "A": [1, 2, 5, 6, 7, 9, 10], "As": 10, "By": [2, 6, 10], "For": [5, 6, 7, 10], "If": [1, 2, 5, 6, 10], "In": [5, 6, 10], "Into": 6, "It": [2, 5, 6, 7, 10], "No": [2, 6, 10], "Not": [5, 6], "One": [2, 10], "The": [1, 2, 5, 6, 7, 8], "Then": [5, 10], "There": 10, "These": [2, 5, 6, 10], "To": [2, 6, 10], "With": [4, 8], "_": [1, 6, 10], "__": 1, "___": 1, "____": 1, "_____": 1, "_c": 10, "_cutoff": 6, "_plotli": 2, "_w_cutoff": 6, "ab": 6, "abil": [2, 6], "abl": 6, "abov": [2, 6], "absolut": [2, 6], "academ": 0, "accept": [2, 6], "access": [6, 10], "accord": [2, 6, 10], "accordingli": 5, "account": [2, 6], "accur": [2, 6], "accuraci": [5, 10], "achiev": 10, "acknowledg": [2, 8], "across": [2, 5, 6, 10], "act": 10, "actual": 6, "ad": [2, 6, 8, 10], "adapt": [2, 5], "add": [5, 6, 8], "add_best_fit_lin": 6, "add_id": [5, 8], "addit": [2, 6], "addition": [5, 6, 7], "address": [2, 6, 7, 10], "adher": [2, 6], "adjust": [2, 5, 6, 10], "adm": [5, 6], "advanc": [5, 6], "advis": 6, "aesthet": [2, 6], "affect": 6, "after": [2, 5, 6, 10], "ag": [5, 6], "against": [2, 6], "age_boxcox": 6, "age_boxcox_alpha": 6, "age_boxcox_kde_cutoff": 6, "age_boxplot_list": 6, "age_group": [5, 6], "age_robust": 6, "ages_18_to_40": 5, "aggreg": 6, "aim": 2, "alic": 5, "alien": 1, "align": [2, 5, 6, 10], "all": [1, 2, 5, 7, 8, 10], "all_combin": 5, "all_var": [2, 6], "allow": [2, 5, 6, 10], "alon": 10, "along": [2, 6], "alongsid": 6, "alpha": [2, 6, 10], "alphabet": 5, "alreadi": 5, "also": [0, 2, 6], "alter": 6, "altern": [6, 10], "alwai": [2, 5, 6], "ambigu": 6, "among": 2, "amount": 5, "an": [0, 2, 4, 5, 6, 10], "analysi": [2, 7, 8, 10], "analyst": 7, "analyt": 4, "analyz": [2, 5, 6], "angl": [2, 6], "ani": [2, 5, 6, 7, 10], "annot": [2, 6], "anomali": [6, 7], "anoth": [6, 10], "anyth": 2, "appar": [6, 10], "appeal": 6, "appear": [2, 5, 6], "append": [5, 6], "appli": [0, 2, 4, 5, 6, 7, 10], "applic": [2, 6, 8], "apply_as_new_col_to_df": 6, "apply_cutoff": 6, "approach": [2, 5, 6, 10], "appropri": [2, 6, 10], "approxim": [6, 10], "ar": [1, 2, 5, 6, 10], "arcsinh": 6, "area": 6, "arg": 2, "argument": [2, 6], "arima": 10, "aros": 2, "around": [2, 6, 10], "arrai": [2, 6], "arrang": 6, "arrow": 6, "art": 8, "art_nam": 1, "artifact": 5, "artifici": 4, "artwork": 1, "ascii": 8, "ascii_art": 1, "asian": 5, "aspect": [2, 6, 7], "assess": [6, 10], "assign": [2, 5, 6], "associ": [7, 10], "assum": [5, 10], "assumpt": [6, 8], "astyp": 2, "attempt": [5, 6], "attent": 6, "attract": 6, "attribut": 6, "attributeerror": 2, "aug": 3, "author": [3, 4], "auto": [5, 6], "autofit": 5, "autom": [4, 7], "automat": [1, 2, 5, 6, 7], "autoregress": [6, 9], "avail": [1, 8], "aveoccup": 6, "averag": [6, 10], "averoom": 6, "avoid": [2, 6], "ax": [2, 6], "axi": [2, 6], "azimuth": 6, "bachelor": [5, 6], "back": [2, 5, 6, 10], "backbon": 5, "background": [1, 5], "background_color": [2, 5], "backward": 2, "badg": 2, "balanc": 6, "band": 6, "bandwidth": 10, "bar": [5, 7, 8], "barebon": 6, "barh": 6, "barri": [6, 9], "base": [1, 2, 5, 6, 10], "base_path": 5, "baselin": 6, "basic": 6, "bb": 1, "bbox_inch": 6, "becaus": [5, 6], "becom": 10, "been": [2, 5, 6], "befor": [2, 5, 6, 7, 10], "begin": [5, 10], "behav": 10, "behavior": [1, 2, 6], "being": [2, 5, 6], "bell": 10, "belong": 5, "below": [2, 5, 6], "beneath": 6, "benefici": 6, "benefit": 8, "best": [2, 7, 8, 10], "best_fit_linecolor": 6, "best_fit_linestyl": 6, "beta": 2, "better": [2, 6, 7, 10], "between": [2, 5, 6, 10], "bin": [2, 6, 8, 10], "bin_ag": 5, "binrang": 6, "binwidth": [2, 6], "biolog": 10, "black": [1, 5, 6], "block": [2, 6], "blue": 6, "bob": 5, "bold": 5, "bool": [1, 5, 6], "boolean": [2, 6], "borderless": 5, "both": [1, 2, 5, 6, 10], "bound": [5, 6], "boundari": 5, "box": [2, 7, 8], "box_violin": 6, "box_violin_kw": 6, "box_violin_plot": [2, 6, 8], "box_violin_ylim": 6, "boxcox": 6, "boxplot": [2, 6], "boxprop": 6, "breakdown": [6, 10], "brief": 2, "bring": [4, 6], "broad": [2, 7], "brown": 6, "browser": 6, "bug": 2, "built": [2, 6], "bulk": 6, "c": 6, "c0": 6, "c5gp7": [5, 6, 7, 9], "c_i": 10, "ca": 8, "ca_state_bb": 1, "ca_state_wb": 1, "calcul": [2, 6, 8], "california": [1, 4, 6], "call": [2, 5], "camera": [2, 6], "can": [2, 5, 6, 7, 10], "cannot": [5, 10], "cap": 6, "capabl": [1, 2, 5], "capit": [5, 6], "captur": 10, "career": 0, "case": [2, 5, 6, 10], "categor": [2, 5, 6], "categori": [5, 8], "caus": 2, "cbar_label": 6, "cbar_thick": [2, 6], "cbar_x": [2, 6], "cbrt": 6, "cdot": 10, "cell": 5, "censu": [6, 7, 8, 9], "census_id": [5, 6], "census_summary_t": 5, "center": [6, 8], "center_baselin": 6, "central": [6, 10], "centric": 8, "certain": 6, "certifi": 2, "chang": [6, 8, 10], "changelog": 8, "charact": [2, 5, 6], "characterist": [6, 7], "charli": 5, "chart": 6, "check": [2, 5, 6], "chi": 10, "choic": 2, "choos": [2, 5, 6, 10], "chosen": 10, "ci": 10, "circl": [], "citat": 2, "cite": 8, "civ": [5, 6], "clariti": [2, 6], "clean": [2, 5, 6, 7], "cleaner": [2, 5, 6], "cleanup": 2, "clear": [2, 6, 10], "clearer": [2, 6], "clearli": [2, 6], "cleric": [5, 6], "close": 10, "closer": 10, "clutter": 6, "cmap": [2, 6], "code": [2, 5, 6, 10], "codebas": 2, "coeffici": [6, 8], "cohes": 6, "col": [2, 5, 6], "col1": 2, "col2": 2, "collabor": 4, "collect": 8, "colleg": 6, "collis": [2, 5], "color": [2, 5, 6], "colorbar": 6, "colormap": [2, 6], "column": [2, 6, 8], "column_nam": 5, "combin": [2, 7, 8, 10], "come": 2, "comment": 2, "common": [2, 5, 6, 7], "commonli": 10, "compar": [2, 6, 10], "comparison": 6, "compat": [2, 6], "complement": 10, "complementari": 10, "complet": [5, 10], "complex": [2, 6, 10], "compon": 8, "comprehens": [2, 5, 6, 7, 10], "compress": 6, "comput": [2, 5, 6, 9], "concept": [6, 10], "concern": 10, "concis": 2, "condit": [2, 6, 10], "condition": 2, "confid": [6, 8], "configur": [2, 6], "confirm": [2, 6], "conflict": 6, "confus": [2, 6], "consecut": 5, "consid": [6, 10], "consider": 8, "consist": [2, 5, 6, 10], "consolid": 2, "constant": [5, 6, 10], "constitut": 5, "constrain": [6, 10], "constraint": 5, "construct": 10, "contain": [1, 2, 5, 6, 10], "content": [2, 5, 6], "context": 10, "conting": [2, 6, 7, 8], "contingency_t": [5, 8], "continu": [2, 5, 6, 10], "contour": 10, "contrast": 6, "contributor": 8, "control": [2, 6], "convei": 6, "convers": [2, 5, 8], "convert": [2, 5, 6], "coolwarm": [2, 6], "coordin": 2, "cornel": 4, "correct": [2, 6, 7], "correctli": [1, 2, 5, 6], "correl": [2, 8], "correlation_matrix": 2, "correspond": [2, 5, 6], "could": 6, "count": [2, 5, 8, 10], "countri": 5, "cours": 4, "cov": 10, "covari": 10, "cox": [2, 8], "creat": [1, 2, 7, 8], "creation": 6, "critic": [2, 6, 10], "crop": 6, "cross": 2, "crosstab": 8, "crosstab_age_incom": 6, "crosstab_age_sex": 6, "crosstab_df": 2, "crosstabs_dict": 2, "crosstabs_onli": [2, 6], "crucial": [5, 6, 7], "cube": 6, "cumul": 6, "current": [1, 5], "curv": [6, 10], "custom": [2, 6, 7, 8], "custom_ord": 6, "customiz": [2, 6, 7], "cut": 5, "cutoff": [2, 8], "d": [2, 6, 9], "dai": 5, "dark": 6, "dashboard": 6, "data": [0, 4, 7, 9], "data_doctor": [2, 6, 8], "data_fract": 6, "data_nam": 5, "data_output": 5, "data_path": 5, "data_typ": 2, "datafram": [2, 6, 7, 8], "dataframe_column": [5, 8], "dataset": [5, 6, 7, 10], "date": [2, 7, 8], "date_column": 5, "date_str": 5, "datetim": 2, "david": [5, 10], "dd": 5, "deal": [5, 6, 10], "decad": 4, "decid": 6, "decim": [2, 5], "decimal_plac": [2, 5], "decis": [2, 6], "decreas": 10, "dedic": 0, "dedupl": 8, "deeper": 6, "deepest": 0, "default": [1, 5, 6, 8], "defin": [2, 5, 6, 10], "definit": [5, 8], "degre": [2, 6, 10], "demograph": 6, "demonstr": [5, 6, 7], "denot": 10, "densiti": [2, 8], "depend": [2, 5, 7, 8], "deprec": 2, "depth": [2, 6], "deriv": 10, "desc": 2, "descend": [5, 6], "describ": [2, 6], "descript": [2, 6, 8], "design": [2, 5, 6, 7, 10], "desir": [1, 6], "despit": 6, "detail": [1, 2, 6, 7, 8], "detect": [5, 6, 8], "determin": [2, 5, 6, 10], "dev": 6, "develop": [4, 10], "deviat": [2, 7, 8, 10], "df": [2, 5, 6], "df_censu": 5, "df_dict": 5, "df_num": 6, "diagon": 6, "dict": [5, 6], "dictionari": [1, 5, 6, 8], "did": 2, "diego": [0, 4], "differ": [2, 6, 10], "digit": [2, 5], "dimens": 6, "dimensionless": 10, "dir": 5, "direct": [6, 10], "directli": [2, 5, 6, 7], "directori": [1, 6, 7, 8], "disabl": [2, 6], "disable_sci_not": [2, 6], "discov": 7, "discret": [5, 10], "dispers": 10, "displai": [1, 2, 5, 6], "distinct": [2, 5, 6], "distinguish": 6, "distort": 6, "distract": 6, "distribut": [2, 5, 7, 8, 10], "dive": 5, "divers": [2, 6], "divid": [5, 6, 10], "divorc": [5, 6], "do": [2, 5, 6], "docstr": [2, 6], "doctor": 6, "document": [1, 2, 5, 6, 8, 10], "doe": [2, 5, 6, 10], "doi": [3, 5, 6, 7, 9], "domin": 5, "don": 6, "done": [6, 10], "dot": 10, "doubl": 6, "down": 6, "downplai": 6, "downscal": 10, "dr": 0, "draw": 6, "driven": 5, "dtype": 5, "due": [2, 5, 6], "duplic": 2, "dure": [0, 2, 5], "dx_": 10, "dx_c": 10, "dynam": [2, 6], "e": [2, 5, 6, 10], "each": [1, 2, 5, 6, 8], "eas": [2, 5, 7], "easi": [2, 6, 7], "easier": [2, 6, 10], "easili": 6, "ebrahim": 0, "ecosystem": 2, "eda": [1, 2, 6], "eda_toolkit": [2, 5, 6, 7], "eda_toolkit_logo": 1, "edg": [2, 6], "edgecolor": [2, 6], "educ": [0, 4, 5, 6], "effect": [2, 4, 5, 7, 8, 10], "effici": [2, 6], "either": [2, 5, 6], "element": [2, 5, 6], "elev": 6, "elimin": 2, "els": 2, "emp": [5, 6], "emphas": [2, 6], "emploi": 6, "employ": 5, "empti": [2, 5], "enabl": [2, 6, 7], "enable_zoom": [2, 6], "encount": 6, "end": [2, 5, 10], "endeavor": 0, "endpoint": 5, "engin": [0, 5, 6, 9], "enhanc": [5, 6, 7, 8, 10], "enough": 2, "ensembl": 6, "ensu": 7, "ensur": [1, 5, 6, 7, 8], "ensure_directori": [5, 8], "enter": [2, 5], "entir": [5, 6, 10], "entri": [2, 5, 6], "environ": [0, 5, 6, 8, 9], "equal": [6, 10], "equat": 6, "equival": 5, "error": [2, 5, 6], "especi": [2, 5, 6, 10], "essenti": [5, 7, 10], "estim": [6, 8], "etc": [6, 7], "ev": 5, "evalu": 10, "even": [2, 5], "everyth": 6, "exact": [2, 6], "exactli": 6, "examin": 6, "exampl": [2, 7, 8], "exce": 5, "excel": [4, 7, 8], "except": [0, 2, 5], "excess": [2, 6], "exclud": [2, 5, 8], "exclude_combin": [2, 6], "exclus": [5, 6], "exec": [5, 6], "execut": 6, "exhaust": 6, "exist": [1, 2, 5], "exp": [6, 10], "expand": 2, "expect": [2, 10], "expenditur": 10, "experi": [2, 4], "experienc": 2, "explain": [2, 5], "explan": [2, 6, 8], "explicit": 2, "explicitli": 2, "explor": [2, 6, 7, 10], "exploratori": [6, 7], "exponenti": 6, "export": [6, 7], "express": [0, 10], "extend": [0, 6], "extens": [1, 2, 6], "extract": [2, 6], "extrem": [2, 6, 10], "f": [2, 6, 10], "f8766d": 6, "f8c5c8": 5, "facecolor": 6, "facilit": [2, 4, 5, 6, 7], "factor": 6, "fail": 2, "failur": 2, "fall": [5, 6, 10], "fallback": 2, "fals": [1, 2, 5, 6], "famili": [5, 6, 10], "far": 6, "farm": 6, "fashion": 6, "featur": [5, 8, 10], "feature_nam": 6, "feature_names_list": [2, 6], "feature_proport": 6, "feder": 5, "feedback": 2, "femal": [5, 6], "female_": 6, "fetch": 6, "fetch_california_h": 6, "few": [6, 10], "ff0000": 6, "field": 6, "figsiz": [2, 6], "figur": [2, 6], "file": [1, 2, 5, 6], "file_nam": 5, "file_path": 5, "file_prefix": [2, 6], "filenam": [6, 8], "fill": [2, 6], "fill_alpha": [2, 6], "fillna": 2, "filter": [5, 6], "filtered_df": 5, "final": [5, 6], "financi": [4, 6], "find": [2, 5, 10], "fine": 6, "finer": 6, "first": [2, 5, 6, 10], "fish": 6, "fit": [2, 5, 7, 8], "five": 6, "fix": [6, 8], "flag": 2, "flex_corr_matrix": [6, 8], "flexibl": [1, 2, 6, 10], "flip": 6, "float": [2, 5, 6], "fnlwgt": [5, 6], "fnlwgt_w_cutoff": 6, "focu": 6, "focus": 6, "folder": 5, "follow": [1, 2, 5, 6, 7, 10], "font": [2, 6], "fontsiz": 2, "form": [5, 7], "format": [2, 6, 7, 8], "formatth": 6, "former": 5, "formerli": 2, "formula": 10, "found": [1, 2, 5, 6], "foundat": 8, "four": 6, "frac": [6, 10], "fraction": 8, "framework": 6, "freedom": [6, 10], "frequenc": [2, 5, 6, 10], "frequent": 5, "friendli": [2, 5], "from": [0, 1, 2, 4, 5, 6, 7, 10], "full": [2, 5, 8, 10], "fuller": 6, "fulli": 6, "func_col": [2, 6], "function": [1, 5, 7, 8, 10], "further": [2, 5, 6], "futur": [2, 6], "futurewarn": 6, "g": [2, 5, 6, 10], "gain": [5, 6, 7], "gaussian": [6, 8], "gener": [2, 6, 7, 8], "georg": 10, "geq": [5, 10], "get": 7, "get_legend": 2, "get_text": 2, "gil": [1, 3, 4], "github": 7, "give": [2, 6], "given": [2, 5, 6, 10], "glanc": 6, "go": 5, "goal": 10, "got": 2, "gov": [5, 6], "grace": 2, "gracefulli": 2, "grad": [5, 6], "gradient": 6, "gradientboostingregressor": 6, "graduat": 0, "grai": 6, "granular": 2, "graphic": [6, 9], "gratitud": 0, "greater": [2, 5, 6], "green": 6, "grei": [1, 6], "grey_alien_wb": 1, "grid": [2, 8, 10], "grid_figs": 6, "grid_resolut": 6, "grid_valu": 6, "ground": 6, "group": [2, 5, 8], "growth": 6, "gt": 6, "guarante": 2, "guid": [0, 7], "guidanc": 2, "guidelin": 6, "h": [5, 6, 10], "h_pad": 6, "ha": [2, 4, 5, 6, 10], "half": 6, "hall": 1, "halt": 6, "halv": 6, "handl": [1, 5, 6, 7, 8, 10], "handler": [5, 6], "hat": 10, "have": [2, 5, 6, 10], "he": 4, "head": [5, 6], "header": [2, 5], "health": 4, "healthcar": 4, "heatmap": [2, 6], "height": 6, "help": [2, 5, 6, 7, 10], "here": [5, 6, 10], "heteroscedast": 10, "hex": [2, 5], "hi": 0, "hidden": 6, "hide": [2, 5], "hide_index": [2, 5], "high": [2, 5, 6], "higher": [6, 7], "highest": 5, "highli": [6, 10], "highlight": [2, 6, 8, 10], "highlight_column": [5, 8], "highlighted_df": 5, "hist": [2, 6], "hist_color": 6, "hist_edgecolor": [2, 6], "hist_kw": 6, "hist_ylim": 6, "histogram": [2, 8], "histplot": 6, "hold": [4, 5, 6], "homoscedast": 10, "horizont": [2, 6], "hour": [5, 6], "hous": 8, "houseag": 6, "household": 6, "hover": 6, "how": [1, 5, 6, 7, 10], "howev": [2, 6, 10], "html": [5, 6], "html_file_nam": [2, 6], "html_file_path": [2, 6], "http": [3, 5, 6, 7, 9], "huber": 6, "hue": [2, 6], "hue_dict": 6, "hue_palett": 6, "hunter": [6, 9], "husband": [5, 6], "hyperbol": 6, "hyperlink": 5, "hypothes": 7, "i": [1, 4, 5, 6, 8, 10], "icon": 2, "id": [5, 6, 7, 8], "id_colnam": 5, "idea": 5, "ideal": 6, "identif": 2, "identifi": [2, 6, 7, 8], "ignor": 6, "illustr": [1, 6, 10], "imag": [1, 5, 6], "image_filenam": 6, "image_path_png": [2, 5, 6], "image_path_svg": [2, 5, 6], "imbal": 5, "immedi": 6, "impact": [2, 6, 10], "implement": [2, 5, 10], "import": [1, 2, 5, 6], "imposs": 5, "improv": [5, 8, 10], "inc": [5, 6], "inch": 6, "includ": [2, 5, 6, 7, 10], "inclus": 5, "incom": [6, 7, 8, 9, 10], "inconsist": [2, 5], "incorpor": [2, 6], "incorrect": [2, 6], "incorrectli": 6, "increas": [2, 5, 6, 10], "increment": 2, "inde": 6, "independ": 2, "index": [2, 5, 6], "indic": [2, 5, 6, 10], "individu": [2, 5, 6, 10], "individual_figs": 6, "industri": 4, "inf": 5, "infer": 10, "infin": 5, "influenc": [2, 6, 10], "influenti": 6, "inform": [2, 5, 6, 7], "infti": 10, "initi": [2, 6, 7], "inner": 6, "input": [1, 2, 6, 10], "insight": [5, 6, 7, 10], "inspct": 6, "inspect": 6, "instal": [5, 8], "instanc": [5, 6, 10], "instead": [2, 6, 10], "instruct": [5, 6, 7], "insuffici": 8, "int": [2, 5, 6, 10], "int64": 5, "intact": 6, "integ": [2, 5], "integr": [2, 6, 7], "intellig": 4, "intend": [2, 6], "intent": 6, "interact": [1, 2, 8, 10], "interest": [6, 10], "interfac": [2, 6], "intern": [2, 6], "interpret": [2, 6, 10], "interquartil": [6, 10], "interv": [5, 6, 8], "introduc": [2, 5], "introduct": 2, "intuit": [2, 6, 7, 10], "invalid": [2, 6], "invalu": 10, "invers": [6, 10], "investig": 7, "involv": [5, 6, 7], "io": 5, "ipykernel": 2, "ipython": 2, "iqr": [6, 8], "irrelev": 6, "is_notebook_env": 2, "island": 5, "iso": 5, "issu": [2, 7, 10], "item": 6, "iter": [2, 6, 10], "its": [2, 5, 6, 10], "itself": 6, "j": [6, 9], "jinja2": 7, "johnson": [6, 8], "join": 5, "joint": [6, 10], "jointli": 6, "joss": [6, 9], "journal": [6, 9], "journei": 0, "jupyt": [2, 5], "just": [2, 6], "justifi": 10, "k": [6, 9, 10], "kde": [2, 7, 8], "kde_color": 6, "kde_density_single_distribut": 6, "kde_distribut": [2, 6, 8], "kde_kw": 6, "kde_ylim": 6, "kdeplot": 6, "keep": 6, "kei": [1, 2, 5, 6, 8, 10], "kernel": [6, 8], "keyboard": 6, "keyerror": 6, "keyword": [2, 6], "kind": 6, "known": 6, "kohavi": [5, 6, 7, 9], "kwarg": [2, 6], "l": [3, 10], "label": [2, 5, 6], "label_ag": 5, "label_fonts": [2, 6], "label_nam": 6, "lambda": [6, 8], "larg": [2, 5, 6], "larger": 6, "largest": [], "last": 5, "later": 6, "latest": 2, "layout": [2, 6], "ldot": 10, "lead": [2, 6, 10], "learn": [0, 2, 4, 5, 6, 7, 9, 10], "learning_r": 6, "least": [2, 5, 6, 10], "leav": 2, "lectur": 4, "left": [5, 6, 10], "left_margin": [2, 6], "legend": [2, 6], "legend_label": 6, "legend_labels_list": 6, "legibl": 6, "len": 2, "length": [2, 5, 6], "leon": 1, "leon_shpaner_bb": 1, "leon_shpaner_wb": 1, "leonid": [3, 4], "leq": 5, "less": [2, 5, 6, 10], "let": 10, "letter": [6, 9], "level": [5, 6, 10], "leverag": [2, 6, 7], "librari": [2, 5, 6, 8], "licens": 2, "lie": [6, 10], "lightblu": 6, "like": [0, 2, 5, 6, 10], "likelihood": 10, "limit": [2, 6], "line": [2, 7, 8, 10], "linear": [6, 10], "linestyl": 6, "link": 8, "list": [1, 2, 5, 6], "lmbda": 6, "ln": 10, "load": [5, 6], "local": 5, "locat": [5, 6], "log": [2, 6, 10], "log_scale_var": [2, 6], "logarithm": [6, 10], "logic": [2, 5, 6], "logist": 6, "logit": 8, "logo": [1, 2], "logscal": 6, "long": 6, "longer": 6, "look": 6, "loop": [2, 6], "lose": 10, "loss": [2, 5, 6], "lower": [2, 6], "lower_cutoff": 6, "lr": 10, "lt": [5, 6], "m": [0, 4, 6, 9], "machin": [2, 4, 5, 6, 7, 9, 10], "made": [2, 10], "magnitud": 6, "mai": [2, 5, 6, 10], "main": 7, "maintain": [2, 6, 8], "major": [5, 10], "make": [2, 5, 6, 10], "male": [5, 6], "male_": 6, "manag": [2, 4, 6, 7, 10], "manageri": [5, 6], "mani": [6, 7, 10], "manipul": 7, "manner": 6, "manual": [2, 6], "map": [2, 6, 10], "marco": 0, "margin": [2, 6, 10], "marit": [5, 6], "mark": [2, 6], "marker": 6, "marri": [5, 6], "master": 4, "match": [1, 2, 6], "mathbb": 10, "mathbf": 10, "mathemat": [5, 8], "matplotlib": [2, 6, 7, 9], "matplotlib_colormap": 6, "matric": [2, 8], "matrix": [2, 8], "max": [2, 6, 10], "max_col": 6, "max_depth": 6, "max_unique_valu": 5, "max_unique_value_pct": 5, "max_unique_value_tot": 5, "maxab": 6, "maxim": 10, "maximum": [6, 10], "mcse": [6, 9], "mean": [2, 5, 7, 8, 10], "mean_color": 6, "meaning": [5, 6], "measur": [5, 6, 10], "mechan": 2, "median": [2, 7, 8], "median_color": 6, "medinc": 6, "meet": [5, 7, 10], "mentor": 0, "mentorship": 0, "messag": [2, 6], "method": [2, 5, 6, 7, 10], "methodologi": 8, "metric": 6, "metrics_box_violin": 2, "metrics_comp": 6, "metrics_list": 6, "mid": 10, "middl": 10, "might": [6, 10], "min": [2, 6, 10], "min_length": 5, "mind": [6, 7], "minim": [2, 6], "minimum": [5, 6], "minmax": 6, "minor": 2, "minu": 6, "misalign": 6, "misinterpret": 6, "mislead": 2, "miss": [1, 2, 5, 6, 7], "mix": 8, "mle": 10, "mm": 5, "mode": [2, 6], "model": [2, 6, 7, 8], "model_select": 6, "modifi": [2, 6], "modul": [1, 2], "modulenotfounderror": 2, "month": [3, 5], "more": [2, 5, 6, 10], "most": [2, 5, 6, 7], "mous": 6, "move": [2, 6], "mu": 10, "mu_i": 10, "mu_x": 10, "much": 10, "multi": 6, "multidimension": 6, "multipl": [1, 2, 5, 6, 7, 10], "multipli": 2, "must": [5, 6, 10], "my_datafram": 2, "n": 10, "n_col": 6, "n_estim": 6, "n_row": [2, 6], "na": [2, 5], "name": [1, 2, 6, 8], "nan": [2, 5, 6], "narrow": [6, 10], "nativ": 5, "natur": [6, 10], "navig": [5, 6], "nbformat": 7, "ndarrai": 6, "neatli": 2, "necessari": [2, 5, 10], "need": [1, 2, 5, 6, 7, 10], "neg": [6, 10], "neither": [2, 6], "neq": 10, "nest": 6, "neutral": 6, "never": [5, 6], "new": [5, 6, 8], "newer": 6, "next": [5, 6], "nh": 10, "nomenclatur": 2, "non": [1, 2, 5, 8, 10], "none": [1, 2, 5, 6], "nonetyp": 2, "nor": [2, 6], "normal": [2, 8], "notat": [2, 6], "notebook": [2, 5], "noth": [5, 10], "notic": [5, 6], "now": [2, 6], "np": [2, 6], "null": [2, 5], "null_pct": 5, "null_tot": 5, "num": [5, 6], "num_digit": 5, "number": [2, 5, 6, 10], "numer": [2, 6, 8], "numpi": [2, 6, 7], "nuniqu": 5, "o": [3, 5, 6], "object": [2, 5, 6], "observ": [8, 10], "obviou": 6, "occup": [5, 6], "occur": [2, 5], "occurr": 5, "odd": [6, 10], "off": 10, "offer": [2, 6, 7, 10], "often": [6, 7, 10], "ol": 10, "older": [2, 6], "omit": [2, 6], "one": [1, 2, 5, 6, 10], "ones": 6, "onli": [2, 6, 10], "op": 6, "opaqu": 6, "open": [6, 9, 10], "oper": [2, 5, 6, 10], "opportun": 5, "optim": [2, 6, 10], "option": [1, 2, 5, 6, 7], "orang": 6, "order": [2, 5, 6], "ordinari": 10, "org": [3, 5, 6, 7, 9], "organ": [2, 6], "orient": 6, "origin": [5, 6, 10], "original_df": 5, "oscar": [1, 3, 4], "oscar_gil_bb": 1, "oscar_gil_wb": 1, "other": [5, 6, 8, 10], "otherwis": 6, "our": [0, 10], "out": [6, 10], "outcom": [6, 10], "outlier": [2, 7, 8, 10], "outlin": 1, "output": [1, 2, 5, 6, 10], "output_fil": 1, "output_path": 1, "outsid": [2, 6, 10], "over": [2, 4, 6, 10], "overal": [2, 6], "overcompl": 6, "overhead": 2, "overlai": 6, "overlaid": 6, "overlap": 6, "overrid": 6, "overview": [6, 10], "own": 6, "p": [6, 10], "pac": 5, "pace": [6, 9], "packag": 7, "pad": [2, 6], "page": [6, 7], "pair": [2, 5, 6], "pairwis": 6, "palett": 6, "panda": [2, 5, 6, 7], "param": 2, "paramet": [1, 2, 5, 6, 10], "parametr": 10, "pardir": 5, "parent": 5, "pars": 5, "parse_date_with_rul": [5, 8], "part": [2, 5], "partial": [2, 8], "partial_depend": 6, "particular": [], "particularli": [2, 5, 6, 10], "pass": [2, 6], "path": [1, 2, 6, 8], "patient": 5, "pattern": [6, 7], "pd": [5, 6, 10], "pdf": 10, "pdp": [6, 10], "pearson": [6, 8], "per": [5, 6], "percent": [2, 6], "percentag": [5, 6], "percentil": [6, 10], "perfect": 10, "perfectli": 6, "perform": [2, 5, 6], "performancewarn": 2, "period": [2, 6, 8], "person": 4, "perspect": [2, 6], "pi": 10, "pictur": 6, "pink": 5, "pip": 7, "pitfal": 2, "pivot": [0, 8], "place": [2, 5], "plai": 0, "plain": [2, 5, 8], "plot": [2, 7, 10], "plot_2d_pdp": [2, 6, 8], "plot_3d_pdp": [2, 6, 8], "plot_mean": 6, "plot_median": 6, "plot_typ": [2, 6], "plotli": [2, 6, 7], "plotly_colormap": 6, "plots_onli": [2, 6], "plt": 2, "pm": 10, "png": [2, 5, 6], "png_imag": 5, "point": [2, 6, 10], "pointer": 6, "pool": [5, 8], "pool_siz": 2, "popul": [6, 8], "popular": 7, "posit": [2, 6, 10], "possibl": [2, 5, 6, 7, 10], "potenti": [2, 5, 6, 10], "power": [2, 6, 10], "pr": 2, "practic": [6, 8], "practition": 10, "pre": 2, "preced": 6, "predefin": 1, "predict": [2, 6, 10], "prefer": [2, 6], "prefix": [2, 6], "prepar": [2, 5, 6], "preprocess": [5, 6], "prerequisit": 8, "presenc": 6, "present": [2, 5, 6], "preserv": [5, 6, 10], "preval": 6, "prevent": [2, 5, 6], "previou": [2, 6], "previous": 2, "price": 6, "primari": 6, "print": [1, 2, 5, 6], "print_art": [1, 8], "prior": 6, "privat": [5, 6], "probabl": [2, 6, 9, 10], "proceed": [5, 6], "process": [2, 5, 6, 7], "produc": [2, 6], "product": 10, "prof": [5, 6], "profession": 4, "profil": 10, "program": [0, 4], "programmat": 6, "progress": [5, 8], "project": [2, 4, 5, 6, 8], "promin": 6, "proper": [2, 5, 6], "properli": [2, 6], "properti": [6, 8], "proport": [2, 5, 6, 10], "provid": [0, 1, 2, 5, 6, 7, 10], "public": 6, "publish": 3, "purpl": 6, "purpos": [2, 6, 8], "pursu": 0, "py": [1, 2], "pypi": [2, 7], "python": [2, 4, 6, 8], "q1": 6, "q2": 6, "q3": 6, "q4": 6, "q_1": 10, "q_3": 10, "qualiti": [6, 7], "quantifi": [6, 10], "quantil": 6, "quantile_rang": 6, "quantit": 6, "quantiti": 6, "quartil": 6, "quick": [], "quickli": [6, 7], "r": [4, 5, 6, 7, 9, 10], "r_": 10, "race": 5, "racial": 5, "rais": [1, 2, 5, 6, 10], "random": [2, 5, 6], "random_st": 6, "rang": [2, 5, 6, 7, 10], "rather": [6, 10], "ratio": 10, "raw": 6, "re": [2, 6, 10], "read": 5, "readabl": [2, 5, 6], "readi": 7, "readm": 2, "real": [2, 10], "reciproc": 6, "recommend": [5, 6], "record": 5, "red": 6, "reduc": [2, 6, 10], "redund": 6, "refactor": 2, "refer": [6, 8], "referenc": 1, "refin": [2, 6], "reflect": [2, 6, 10], "regardless": [2, 6], "regener": 2, "regress": [8, 10], "regular": [2, 8], "rel": 10, "relat": [2, 5, 6], "relationship": [5, 6, 7, 10], "releas": 2, "relev": [2, 6, 7], "reli": 6, "reliabl": [5, 6, 10], "relianc": 2, "remain": [2, 5, 6, 10], "remov": [2, 6, 7, 8, 10], "remove_stack": [2, 6], "renam": [2, 6], "render": [2, 5], "repeat": 10, "replac": [2, 5], "replica": 2, "report": [4, 6, 7], "repositori": [5, 6, 7, 9], "repres": [2, 5, 6, 10], "represent": [1, 2, 6], "reproduc": [2, 5, 6], "requir": [2, 5, 6, 7, 10], "rescal": 6, "research": 7, "reset": 2, "residu": 10, "resolut": 6, "resolv": [2, 5], "respect": [2, 5, 6, 10], "respons": 10, "rest": 6, "result": [2, 5, 6, 10], "result_df": 2, "retain": [2, 8, 10], "retri": 2, "retriev": 6, "return": [2, 5, 6], "return_df": [2, 5], "return_dict": [6, 8], "reveal": 6, "rich": [6, 7], "right": [5, 6, 10], "right_margin": [2, 6], "riversid": 4, "robust": [2, 6, 10], "robustscal": [8, 10], "role": [0, 2, 6], "root": [2, 6, 10], "rot": 6, "rotat": [2, 6], "rotate_plot": 6, "round": 5, "row": [2, 5, 6], "royc": 1, "royce_hal": 1, "royce_hall_bb": 1, "royce_hall_wb": 1, "rule": [5, 6], "run": [2, 5, 6, 7], "runtim": 2, "s0167": [6, 9], "same": [2, 6], "sampl": [5, 8, 10], "sampled_df": 6, "san": [0, 4, 8], "save": [1, 2, 6, 7, 8], "save_dataframes_to_excel": [2, 5, 8], "save_format": [2, 6], "save_plot": [2, 6], "scalabl": 2, "scale": [2, 8], "scale_convers": [2, 6], "scale_conversion_kw": [2, 6], "scatter": [2, 7, 8], "scatter_color": 6, "scatter_fit_plot": [6, 8], "scatterplot": 6, "scenario": [2, 6, 10], "scheme": 6, "school": 0, "scienc": [0, 4, 5, 6, 7, 9], "scientif": [2, 6], "scientist": [0, 4, 7], "scikit": [2, 6, 7, 10], "scope": 6, "score": 6, "scroll": 6, "seaborn": [2, 6, 7, 9], "seamless": [2, 6], "seamlessli": [2, 7], "second": [5, 6], "section": [2, 5, 6], "see": 6, "seed": [2, 5, 6], "seen": 6, "select": [2, 6, 10], "select_dtyp": 6, "self": [5, 6], "sensit": 10, "separ": [2, 5, 6], "sequenc": 6, "seri": [2, 5, 6, 10], "serv": [2, 4, 6], "servic": 4, "session": 2, "set": [2, 5, 6, 10], "set_as_index": 5, "set_titl": 2, "setminu": 10, "setp": 2, "setup": [2, 5, 6], "sever": [2, 6], "sex": [5, 6], "shape": [5, 6, 10], "sheet": 5, "shift": 6, "shilei": 0, "should": [1, 6], "show": [2, 5, 6, 10], "show_cbar": 6, "show_correl": 6, "show_legend": [2, 6], "show_modebar": [2, 6], "show_plot": [2, 6], "showcas": 6, "shown": 6, "shpaner": [1, 3, 4], "shpaner_2024_13162633": 3, "shrink": 2, "side": 6, "sigma": 10, "sigma_i": 10, "sigma_x": 10, "sign": 6, "signal": 10, "signatur": 2, "signific": [2, 5, 10], "significantli": 2, "silver": 6, "similar": 6, "similarli": [6, 10], "simpl": 6, "simpler": 2, "simplic": [6, 7], "simplif": 2, "simplifi": [2, 5, 10], "simultan": [1, 6, 10], "sinc": [5, 6, 10], "sine": 6, "singl": [2, 5, 6, 10], "single_figs": 6, "single_var_image_filenam": 6, "size": [5, 6, 8], "skew": [6, 10], "skip": 2, "sklearn": 6, "slightli": 2, "small": 2, "smaller": 6, "smallest": [], "smooth": [6, 10], "smoother": [2, 6], "smoothli": 10, "sn": 6, "snippet": [5, 6], "so": [2, 5, 6], "softwar": [3, 6, 9], "some": [2, 5, 6, 10], "sort": [2, 5], "sort_bi": [2, 5], "sort_cols_alpha": 5, "sortbi": 2, "sourc": [6, 7, 9], "space": [2, 6], "span": 6, "spars": [6, 9], "spatial": [6, 9], "special": [2, 10], "specialti": [5, 6], "specif": [2, 7, 8], "specifi": [1, 2, 5, 6, 7], "split": 6, "spot": 5, "spous": [5, 6], "spread": [6, 10], "sql": 4, "sqrt": [6, 10], "squar": [2, 6, 10], "stabil": [2, 6, 10], "stabl": [2, 10], "stack": [2, 7, 8], "stacked_crosstab": 6, "stacked_crosstab_plot": [6, 8], "standard": [2, 6, 7, 8, 10], "standardized_d": 5, "start": [2, 5, 7], "stat": [2, 6], "state": [1, 5, 6], "statement": 2, "static": [2, 8], "statist": [2, 4, 5, 6, 7, 9, 10], "statistician": 10, "statu": [2, 5, 6], "std": 8, "std_color": 6, "std_dev_level": 6, "stdrz": 6, "stem": 6, "step": [2, 5, 7], "still": [2, 6], "store": [2, 6], "str": [1, 2, 5, 6], "straightforward": 2, "strategi": 6, "streamlin": [2, 5, 7], "strength": [6, 10], "strictli": [2, 10], "string": [5, 6, 8], "strip": 5, "strip_trailing_period": [5, 8], "stronger": 10, "structur": [1, 2, 7], "style": [2, 5, 6], "styler": [2, 5], "subplot": 6, "subset": [6, 10], "substitut": 10, "subtl": 2, "subtract": 10, "success": 0, "successfulli": [0, 2], "suffici": [2, 6], "suffix": 1, "suggest": [2, 5, 10], "suit": 7, "suitabl": [2, 6, 10], "sum_": 10, "summar": [7, 10], "summari": [2, 6, 7, 8], "summarize_all_combin": [5, 8], "summary_t": 5, "support": [0, 5, 6, 8], "suppos": [6, 10], "suppress": 6, "sure": 5, "surfac": [2, 10], "svg": [2, 5, 6], "svg_imag": 5, "swap": 6, "switch": 2, "sy": 2, "symmetr": 6, "syntax": 6, "system": [5, 7], "t": 6, "tab": 5, "tabl": [2, 6, 7], "tabular": 6, "tailor": 6, "take": [5, 6, 10], "tall": 6, "target": [6, 10], "tarshizi": 0, "task": [5, 7], "tatist": 6, "teach": 4, "techniqu": [6, 7, 8, 10], "tell": 6, "ten": 4, "tend": 10, "tendenc": 6, "term": 6, "termin": [2, 5], "test": [2, 6, 10], "test_siz": 6, "text": [2, 5, 6, 10], "text_wrap": [2, 6], "th": 10, "than": [2, 5, 6, 10], "thank": 0, "thei": [2, 5, 6, 10], "them": [1, 2, 5, 6, 7], "theoret": [6, 10], "therefor": 6, "thi": [1, 2, 5, 6, 7, 10], "thick": 6, "those": [2, 6, 10], "three": 6, "threw": 2, "through": [2, 6], "throw": 2, "thu": [5, 6], "tick": [2, 6], "tick_fonts": [2, 6], "tight": 6, "time": [0, 2, 5, 6, 10], "titl": [3, 6, 8], "title_i": [2, 6], "title_x": [2, 6], "to_list": 6, "togeth": 10, "toggl": [2, 6], "tone": [], "tool": [2, 6, 7], "toolkit": [1, 2, 6], "top": 6, "top_margin": [2, 6], "topic": 5, "total": [5, 6, 10], "toward": 2, "tqdm": [5, 8], "track": [2, 5], "trade": 10, "tradit": 10, "trail": [2, 6, 8], "train": [6, 10], "train_test_split": 6, "transform": [2, 8], "transpar": [2, 6], "treat": 8, "treatment": 6, "trend": [6, 7], "triangl": 6, "triangular": [2, 8], "trigger": 6, "true": [1, 5, 6, 8, 10], "truncat": 5, "truth": 6, "try": 2, "tune": 6, "tupl": [2, 5, 6], "two": [2, 5, 6, 10], "txt": 1, "type": [5, 6, 7, 8], "typeerror": 2, "typic": [6, 10], "u": [0, 5, 6], "uci": [5, 6, 7, 9], "ucla": [1, 4], "unambigu": 5, "unbound": 6, "unchang": [2, 10], "uncov": [6, 7], "undefin": [6, 10], "under": [5, 6, 10], "underli": [7, 10], "underscor": 2, "understand": [5, 6, 7, 10], "unequ": 6, "unifi": 6, "uniform": 2, "uniqu": [2, 6, 7, 8], "unique_id": 2, "unique_values_tot": 5, "unique_var": 5, "unit": 5, "univers": [0, 4], "unknown_art": 1, "unlik": 10, "unnecessari": [2, 6], "unprocess": 5, "unrecogn": 5, "unscal": 6, "unstack": 6, "unus": 2, "unwav": 0, "up": [2, 5, 6], "updat": [2, 5, 6], "upper": [2, 5, 6], "upper_cutoff": 6, "upright": 6, "url": 3, "us": [1, 2, 5, 6, 7, 8], "usabl": 2, "usag": [2, 8], "user": [1, 2, 5, 6, 7], "userwarn": 6, "util": [1, 5, 6, 7], "v": 6, "valid": [2, 6, 10], "valid_plot_typ": 2, "valu": [2, 5, 6, 7, 10], "value_count": 5, "valueerror": [1, 5, 6, 8, 10], "vari": [5, 10], "variabl": [2, 6, 7, 8, 10], "varianc": [6, 10], "varieti": [4, 6, 7], "variou": [2, 6, 7, 10], "vars_of_interest": 6, "vdot": 5, "vector": [2, 10], "verbiag": 2, "verifi": [2, 5], "versa": 6, "versatil": [2, 6], "version": [3, 5, 6, 7, 8], "version_info": 2, "versu": 2, "vertic": [2, 6], "via": [2, 6], "vice": 6, "view": [2, 6, 10], "view_angl": 6, "violat": 10, "violin": [2, 7, 8], "violinplot": 6, "viridi": 6, "visibl": [2, 6], "visual": [2, 5, 7, 8, 9, 10], "vmax": 6, "vmin": 6, "vriabl": 6, "w_pad": 6, "wa": [2, 6], "wai": [6, 10], "want": [2, 6], "wareh": 4, "warn": [2, 5, 6], "waskom": [6, 9], "we": [0, 5, 6, 7, 10], "week": [5, 6], "weight": 6, "welcom": 8, "well": [6, 10], "were": [2, 5, 10], "what": [2, 8], "wheel": 6, "when": [1, 2, 5, 6, 7, 10], "whenev": 2, "where": [1, 2, 5, 6, 10], "whether": [2, 5, 6], "which": [2, 5, 6, 7, 10], "while": [2, 5, 6, 10], "white": [1, 5], "whitespac": 6, "who": 2, "why": 10, "wide": [4, 6, 10], "width": [2, 6], "wife": [5, 6], "wirefram": [2, 6], "wireframe_color": 6, "wish": 6, "with_cent": 6, "within": [2, 4, 5, 6, 10], "without": [2, 6], "word": 10, "work": [1, 2, 5, 6, 10], "workclass": [5, 6], "workflow": [2, 5, 7], "world": 10, "would": [0, 2, 6, 10], "wrangl": 4, "wrap": [2, 6], "write": 5, "x": [2, 5, 6, 9, 10], "x_": 10, "x_1": 10, "x_2": 10, "x_c": 10, "x_i": 10, "x_j": 10, "x_k": 10, "x_label": [2, 6], "x_label_plotli": 2, "x_n": 10, "x_p": 10, "x_test": 6, "x_train": 6, "x_var": 6, "xlabel": 6, "xlabel_align": 6, "xlabel_rot": 6, "xlim": [2, 6], "xlsx": 5, "xlsxwriter": [5, 7], "xmax": 6, "xmin": 6, "xx": 2, "xy": 10, "y": [2, 6, 10], "y_axis_label": 6, "y_i": 10, "y_label": [2, 6], "y_label_plotli": 2, "y_test": 6, "y_train": 6, "y_var": 6, "year": [3, 4, 5], "yellow": 5, "yeo": [6, 8], "ylabel": 6, "ylabel_align": 6, "ylabel_rot": 6, "ylim": [2, 6], "ymax": 6, "ymin": 6, "you": [5, 6, 7, 10], "your": [5, 6, 7, 10], "yy": 2, "yyyi": 5, "z": 6, "z_label": [2, 6], "z_label_plotli": 2, "zenodo": [2, 3], "zero": [2, 5, 6, 10], "zoom": [2, 6], "zoom_out_factor": [2, 6], "zz": 2}, "titles": ["Acknowledgements", "ASCII Art", "Changelog", "Citing EDA Toolkit", "Contributors/Maintainers", "Data Management Overview", "Creating Effective Visualizations", "Welcome to the EDA Toolkit Python Library Documentation!", "Table of Contents", "References", "Gaussian Assumption for Normality"], "titleterms": {"0": 2, "1": 6, "10": 2, "11": 2, "12": 2, "13": 2, "14": 2, "1b0": 2, "1rc0": 2, "2": [2, 6], "2d": 6, "3": 2, "3d": 6, "4": 2, "5": 2, "6": 2, "7": 2, "8": 2, "8a": 2, "8b": 2, "8c": 2, "9": 2, "The": 10, "With": 6, "about": 8, "acknowledg": 0, "ad": 5, "add": 2, "add_id": 2, "all": 6, "analysi": [5, 6], "applic": 10, "art": 1, "ascii": 1, "assumpt": 10, "avail": 6, "bar": [2, 6], "benefit": 10, "best": 6, "bin": 5, "box": [6, 10], "ca": 6, "calcul": [5, 10], "categori": 6, "censu": 5, "center": 10, "centric": 6, "chang": 2, "changelog": 2, "cite": 3, "coeffici": 10, "collect": 1, "column": 5, "combin": [5, 6], "compon": 10, "confid": 10, "consider": 10, "content": 8, "conting": 5, "contributor": 4, "convers": 6, "correl": [6, 10], "count": 6, "cox": [6, 10], "creat": [5, 6], "crosstab": [2, 6], "custom": 5, "cutoff": 6, "data": [2, 5, 6, 8, 10], "datafram": 5, "dataframe_column": 2, "date": 5, "dedupl": 2, "default": 2, "definit": 10, "densiti": [6, 10], "depend": [6, 10], "descript": 7, "detail": 5, "detect": 2, "deviat": 6, "dictionari": 2, "directori": 5, "distribut": 6, "document": 7, "each": 10, "eda": [3, 7, 8], "effect": 6, "enhanc": 2, "ensur": 2, "environ": 2, "estim": 10, "exampl": [1, 5, 6, 10], "excel": 5, "exclud": 6, "explan": 10, "featur": [1, 2, 6, 7], "filenam": 2, "fit": 6, "fix": 2, "flex_corr_matrix": 2, "format": 5, "foundat": 10, "fraction": 6, "full": 6, "function": [2, 6], "gaussian": 10, "gener": 5, "get": 8, "grid": 6, "group": 6, "handl": 2, "heurist": [6, 8], "highlight": 5, "histogram": [6, 10], "hous": 6, "i": [2, 7], "id": 2, "identifi": 5, "improv": 2, "incom": 5, "instal": 7, "insuffici": 2, "interact": 6, "interv": 10, "iqr": 10, "johnson": 10, "kde": [6, 10], "kei": 7, "kernel": 10, "lambda": 10, "librari": 7, "line": 6, "link": 7, "list": [], "logit": [6, 10], "maintain": 4, "manag": [5, 8], "mathemat": 10, "matric": 6, "matrix": 6, "mean": 6, "median": [6, 10], "methodologi": 6, "mix": 2, "model": 10, "name": 5, "new": 2, "non": 6, "normal": [6, 10], "note": [1, 5, 6], "numer": 5, "observ": 6, "other": 2, "outlier": 6, "overview": [1, 5, 8], "partial": [6, 10], "path": 5, "pearson": 10, "period": 5, "pivot": 6, "plain": 6, "plot": [6, 8], "pool": 2, "popul": 2, "practic": 10, "prerequisit": 7, "progress": 2, "project": 7, "properti": 10, "purpos": [7, 10], "python": 7, "refer": 9, "regress": 6, "regular": 6, "remov": 5, "retain": 6, "return_dict": 2, "robustscal": 6, "sampl": 6, "san": 6, "save": 5, "scale": [6, 10], "scatter": 6, "scatter_fit_plot": 2, "size": 2, "specif": [5, 6], "stack": 6, "stacked_crosstab_plot": 2, "standard": 5, "start": 8, "static": 6, "std": 6, "string": 2, "strip_trailing_period": 2, "summari": 5, "support": 2, "tabl": [5, 8], "techniqu": 5, "theoret": 8, "titl": 2, "toolkit": [3, 7, 8], "tqdm": 2, "trail": 5, "transform": [6, 10], "treat": 6, "triangular": 6, "true": 2, "type": 2, "uniqu": 5, "us": 10, "usag": 6, "valueerror": 2, "variabl": 5, "version": 2, "violin": 6, "visual": 6, "welcom": 7, "what": 7, "yeo": 10}}) \ No newline at end of file diff --git a/docs/v0.0.10/.doctrees/environment.pickle b/docs/v0.0.10/.doctrees/environment.pickle index b62914952..994429ee8 100644 Binary files a/docs/v0.0.10/.doctrees/environment.pickle and b/docs/v0.0.10/.doctrees/environment.pickle differ diff --git a/docs/v0.0.10/data_management.html b/docs/v0.0.10/data_management.html index ca85634d7..a8fe117f4 100644 --- a/docs/v0.0.10/data_management.html +++ b/docs/v0.0.10/data_management.html @@ -162,7 +162,7 @@

          Path directoriesensure_directory(path)
          Parameters:
          -

          path (str) – The path to the directory that needs to be ensured.

          +

          path (str) – The path to the directory that needs to be ensured.

          Returns:

          None

          @@ -231,10 +231,10 @@

          Adding Unique IdentifiersParameters:
          • df (pd.DataFrame) – The dataframe to add IDs to.

          • -
          • id_colname (str, optional) – The name of the new column for the IDs. Defaults to "ID".

          • -
          • num_digits (int, optional) – The number of digits for the unique IDs. Defaults to 9.

          • -
          • seed (int, optional) – The seed for the random number generator. Defaults to None.

          • -
          • set_as_index (bool, optional) – Whether to set the new ID column as the index. Defaults to False.

          • +
          • id_colname (str, optional) – The name of the new column for the IDs. Defaults to "ID".

          • +
          • num_digits (int, optional) – The number of digits for the unique IDs. Defaults to 9.

          • +
          • seed (int, optional) – The seed for the random number generator. Defaults to None.

          • +
          • set_as_index (bool, optional) – Whether to set the new ID column as the index. Defaults to False.

          Returns:
          @@ -403,7 +403,7 @@

          Trailing Period RemovalParameters:
          Returns:
          @@ -532,16 +532,16 @@

          Standardized Dates
          Parameters:
          -

          date_str (str) – A date string to be standardized.

          +

          date_str (str) – A date string to be standardized.

          Returns:

          A standardized date string in the format YYYY-MM-DD.

          Return type:
          -

          str

          +

          str

          Raises:
          -

          ValueError – If date_str is in an unrecognized format or if the function +

          ValueError – If date_str is in an unrecognized format or if the function cannot parse the date.

          @@ -621,9 +621,9 @@

          DataFrame AnalysisParameters:
          • df (pandas.DataFrame) – The DataFrame to analyze.

          • -
          • background_color (str, optional) – Hex color code or color name for background styling in the output +

          • background_color (str, optional) – Hex color code or color name for background styling in the output DataFrame. Defaults to None.

          • -
          • return_df (bool, optional) – If True, returns the plain DataFrame with the summary statistics. If +

          • return_df (bool, optional) – If True, returns the plain DataFrame with the summary statistics. If False, returns a styled DataFrame for visual presentation. Defaults to False.

          @@ -875,17 +875,17 @@

          Generating Summary Tables for Variable CombinationsParameters:
          • df (pandas.DataFrame) – The pandas DataFrame containing the data.

          • -
          • variables (list of str) – List of column names from the DataFrame to generate combinations.

          • -
          • data_path (str) – Path where the output Excel file will be saved.

          • -
          • data_name (str) – Name of the output Excel file.

          • -
          • min_length (int, optional) – Minimum size of the combinations to generate. Defaults to 2.

          • +
          • variables (list of str) – List of column names from the DataFrame to generate combinations.

          • +
          • data_path (str) – Path where the output Excel file will be saved.

          • +
          • data_name (str) – Name of the output Excel file.

          • +
          • min_length (int, optional) – Minimum size of the combinations to generate. Defaults to 2.

          Returns:

          A tuple containing a dictionary of summary tables and a list of all generated combinations.

          Return type:
          -

          tuple(dict, list)

          +

          tuple(dict, list)

          @@ -1081,9 +1081,9 @@

          Saving DataFrames to Excel with Customized Formatting
          Parameters:
            -
          • file_path (str) – Full path to the output Excel file.

          • -
          • df_dict (dict) – Dictionary where keys are sheet names and values are DataFrames to save.

          • -
          • decimal_places (int) – Number of decimal places to round numeric columns. Default is 0.

          • +
          • file_path (str) – Full path to the output Excel file.

          • +
          • df_dict (dict) – Dictionary where keys are sheet names and values are DataFrames to save.

          • +
          • decimal_places (int) – Number of decimal places to round numeric columns. Default is 0.

          @@ -1143,12 +1143,12 @@

          Creating Contingency TablesParameters:
          • df (pandas.DataFrame) – The DataFrame to analyze.

          • -
          • cols (str or list of str, optional) – Name of the column (as a string) for a single column or list of column names for multiple columns. Must provide at least one column.

          • -
          • sort_by (int, optional) – Enter 0 to sort results by column groups; enter 1 to sort results by totals in descending order. Defaults to 0.

          • +
          • cols (str or list of str, optional) – Name of the column (as a string) for a single column or list of column names for multiple columns. Must provide at least one column.

          • +
          • sort_by (int, optional) – Enter 0 to sort results by column groups; enter 1 to sort results by totals in descending order. Defaults to 0.

          Raises:
          -

          ValueError – If no columns are specified or if sort_by is not 0 or 1.

          +

          ValueError – If no columns are specified or if sort_by is not 0 or 1.

          Returns:

          A DataFrame containing the contingency table with the specified columns, a 'Total' column representing the count of occurrences, and a 'Percentage' column representing the percentage of the total count.

          @@ -1211,8 +1211,8 @@

          Highlighting Specific Columns in a DataFrameParameters:
          • df (pandas.DataFrame) – The DataFrame to be styled.

          • -
          • columns (list of str) – List of column names to be highlighted.

          • -
          • color (str, optional) – The background color to be applied for highlighting (default is “yellow”).

          • +
          • columns (list of str) – List of column names to be highlighted.

          • +
          • color (str, optional) – The background color to be applied for highlighting (default is “yellow”).

          Returns:
          diff --git a/docs/v0.0.10/eda_plots.html b/docs/v0.0.10/eda_plots.html index d8f2c6496..b204bbd92 100644 --- a/docs/v0.0.10/eda_plots.html +++ b/docs/v0.0.10/eda_plots.html @@ -339,50 +339,50 @@

          KDE Distribution FunctionParameters:
          • df (pandas.DataFrame) – The DataFrame containing the data to plot.

          • -
          • vars_of_interest (list of str, optional) – List of column names for which to generate distribution plots. If ‘all’, plots will be generated for all numeric columns.

          • -
          • figsize (tuple of int, optional) – Size of each individual plot, default is (5, 5). Used when only one plot is being generated or when saving individual plots.

          • -
          • grid_figsize (tuple of int, optional) – Size of the overall grid of plots when multiple plots are generated in a grid. Ignored when only one plot is being generated or when saving individual plots. If not specified, it is calculated based on figsize, n_rows, and n_cols.

          • -
          • hist_color (str, optional) – Color of the histogram bars, default is '#0000FF'.

          • -
          • kde_color (str, optional) – Color of the KDE plot, default is '#FF0000'.

          • -
          • mean_color (str, optional) – Color of the mean line if plot_mean is True, default is '#000000'.

          • -
          • median_color (str, optional) – Color of the median line if plot_median is True, default is '#000000'.

          • -
          • hist_edgecolor (str, optional) – Color of the histogram bar edges, default is '#000000'.

          • -
          • hue (str, optional) – Column name to group data by, adding different colors for each group.

          • -
          • fill (bool, optional) – Whether to fill the histogram bars with color, default is True.

          • -
          • fill_alpha (float, optional) – Alpha transparency for the fill color of the histogram bars, where 0 is fully transparent and 1 is fully opaque. Default is 1.

          • -
          • n_rows (int, optional) – Number of rows in the subplot grid. If not provided, it will be calculated automatically.

          • -
          • n_cols (int, optional) – Number of columns in the subplot grid. If not provided, it will be calculated automatically.

          • -
          • w_pad (float, optional) – Width padding between subplots, default is 1.0.

          • -
          • h_pad (float, optional) – Height padding between subplots, default is 1.0.

          • -
          • image_path_png (str, optional) – Directory path to save the PNG image of the overall distribution plots.

          • -
          • image_path_svg (str, optional) – Directory path to save the SVG image of the overall distribution plots.

          • -
          • image_filename (str, optional) – Filename to use when saving the overall distribution plots.

          • -
          • bbox_inches (str, optional) – Bounding box to use when saving the figure. For example, 'tight'.

          • -
          • single_var_image_filename (str, optional) – Filename to use when saving the separate distribution plots. The variable name will be appended to this filename. This parameter uses figsize for determining the plot size, ignoring grid_figsize.

          • -
          • y_axis_label (str, optional) – The label to display on the y-axis, default is 'Density'.

          • -
          • plot_type (str, optional) – The type of plot to generate, options are 'hist', 'kde', or 'both'. Default is 'both'.

          • -
          • log_scale_vars (str or list of str, optional) – Variable name(s) to apply log scaling. Can be a single string or a list of strings.

          • -
          • bins (int or sequence, optional) – Specification of histogram bins, default is 'auto'.

          • -
          • binwidth (float, optional) – Width of each bin, overrides bins but can be used with binrange.

          • -
          • label_fontsize (int, optional) – Font size for axis labels, including xlabel, ylabel, and tick marks, default is 10.

          • -
          • tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10.

          • -
          • text_wrap (int, optional) – Maximum width of the title text before wrapping, default is 50.

          • -
          • disable_sci_notation (bool, optional) – Toggle to disable scientific notation on axes, default is False.

          • -
          • stat (str, optional) – Aggregate statistic to compute in each bin (e.g., 'count', 'frequency', 'probability', 'percent', 'density'), default is 'density'.

          • -
          • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

          • -
          • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

          • -
          • plot_mean (bool, optional) – Whether to plot the mean as a vertical line, default is False.

          • -
          • plot_median (bool, optional) – Whether to plot the median as a vertical line, default is False.

          • -
          • std_dev_levels (list of int, optional) – Levels of standard deviation to plot around the mean.

          • -
          • std_color (str or list of str, optional) – Color(s) for the standard deviation lines, default is '#808080'.

          • -
          • label_names (dict, optional) – Custom labels for the variables of interest. Keys should be column names, and values should be the corresponding labels to display.

          • -
          • show_legend (bool, optional) – Whether to show the legend on the plots, default is True.

          • +
          • vars_of_interest (list of str, optional) – List of column names for which to generate distribution plots. If ‘all’, plots will be generated for all numeric columns.

          • +
          • figsize (tuple of int, optional) – Size of each individual plot, default is (5, 5). Used when only one plot is being generated or when saving individual plots.

          • +
          • grid_figsize (tuple of int, optional) – Size of the overall grid of plots when multiple plots are generated in a grid. Ignored when only one plot is being generated or when saving individual plots. If not specified, it is calculated based on figsize, n_rows, and n_cols.

          • +
          • hist_color (str, optional) – Color of the histogram bars, default is '#0000FF'.

          • +
          • kde_color (str, optional) – Color of the KDE plot, default is '#FF0000'.

          • +
          • mean_color (str, optional) – Color of the mean line if plot_mean is True, default is '#000000'.

          • +
          • median_color (str, optional) – Color of the median line if plot_median is True, default is '#000000'.

          • +
          • hist_edgecolor (str, optional) – Color of the histogram bar edges, default is '#000000'.

          • +
          • hue (str, optional) – Column name to group data by, adding different colors for each group.

          • +
          • fill (bool, optional) – Whether to fill the histogram bars with color, default is True.

          • +
          • fill_alpha (float, optional) – Alpha transparency for the fill color of the histogram bars, where 0 is fully transparent and 1 is fully opaque. Default is 1.

          • +
          • n_rows (int, optional) – Number of rows in the subplot grid. If not provided, it will be calculated automatically.

          • +
          • n_cols (int, optional) – Number of columns in the subplot grid. If not provided, it will be calculated automatically.

          • +
          • w_pad (float, optional) – Width padding between subplots, default is 1.0.

          • +
          • h_pad (float, optional) – Height padding between subplots, default is 1.0.

          • +
          • image_path_png (str, optional) – Directory path to save the PNG image of the overall distribution plots.

          • +
          • image_path_svg (str, optional) – Directory path to save the SVG image of the overall distribution plots.

          • +
          • image_filename (str, optional) – Filename to use when saving the overall distribution plots.

          • +
          • bbox_inches (str, optional) – Bounding box to use when saving the figure. For example, 'tight'.

          • +
          • single_var_image_filename (str, optional) – Filename to use when saving the separate distribution plots. The variable name will be appended to this filename. This parameter uses figsize for determining the plot size, ignoring grid_figsize.

          • +
          • y_axis_label (str, optional) – The label to display on the y-axis, default is 'Density'.

          • +
          • plot_type (str, optional) – The type of plot to generate, options are 'hist', 'kde', or 'both'. Default is 'both'.

          • +
          • log_scale_vars (str or list of str, optional) – Variable name(s) to apply log scaling. Can be a single string or a list of strings.

          • +
          • bins (int or sequence, optional) – Specification of histogram bins, default is 'auto'.

          • +
          • binwidth (float, optional) – Width of each bin, overrides bins but can be used with binrange.

          • +
          • label_fontsize (int, optional) – Font size for axis labels, including xlabel, ylabel, and tick marks, default is 10.

          • +
          • tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10.

          • +
          • text_wrap (int, optional) – Maximum width of the title text before wrapping, default is 50.

          • +
          • disable_sci_notation (bool, optional) – Toggle to disable scientific notation on axes, default is False.

          • +
          • stat (str, optional) – Aggregate statistic to compute in each bin (e.g., 'count', 'frequency', 'probability', 'percent', 'density'), default is 'density'.

          • +
          • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

          • +
          • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

          • +
          • plot_mean (bool, optional) – Whether to plot the mean as a vertical line, default is False.

          • +
          • plot_median (bool, optional) – Whether to plot the median as a vertical line, default is False.

          • +
          • std_dev_levels (list of int, optional) – Levels of standard deviation to plot around the mean.

          • +
          • std_color (str or list of str, optional) – Color(s) for the standard deviation lines, default is '#808080'.

          • +
          • label_names (dict, optional) – Custom labels for the variables of interest. Keys should be column names, and values should be the corresponding labels to display.

          • +
          • show_legend (bool, optional) – Whether to show the legend on the plots, default is True.

          • kwargs (additional keyword arguments) – Additional keyword arguments passed to the Seaborn plotting function.

          Raises:
            -
          • ValueError

              +
            • ValueError

              • If plot_type is not one of 'hist', 'kde', or 'both'.

              • If stat is not one of 'count', 'density', 'frequency', 'probability', 'proportion', 'percent'.

              • If log_scale_vars contains variables that are not present in the DataFrame.

              • @@ -390,7 +390,7 @@

                KDE Distribution Functiongrid_figsize is provided when only one plot is being created.

            • -
            • UserWarning

                +
              • UserWarning

                • If both bins and binwidth are specified, which may affect performance.

              • @@ -680,45 +680,45 @@

                Stacked Crosstab PlotsParameters:
                • df (pandas.DataFrame) – The DataFrame containing the data to plot.

                • -
                • col (str) – The name of the column in the DataFrame to be analyzed.

                • -
                • func_col (list) – List of ground truth columns to be analyzed.

                • -
                • legend_labels_list (list) – List of legend labels for each ground truth column.

                • -
                • title (list) – List of titles for the plots.

                • -
                • kind (str, optional) – The kind of plot to generate ('bar' or 'barh' for horizontal bars), default is 'bar'.

                • -
                • width (float, optional) – The width of the bars in the bar plot, default is 0.9.

                • -
                • rot (int, optional) – The rotation angle of the x-axis labels, default is 0.

                • -
                • custom_order (list, optional) – Specifies a custom order for the categories in the col.

                • -
                • image_path_png (str, optional) – Directory path where generated PNG plot images will be saved.

                • -
                • image_path_svg (str, optional) – Directory path where generated SVG plot images will be saved.

                • -
                • save_formats (list, optional) – List of file formats to save the plot images in.

                • -
                • color (list, optional) – List of colors to use for the plots. If not provided, a default color scheme is used.

                • -
                • output (str, optional) – Specify the output type: "plots_only", "crosstabs_only", or "both". Default is "both".

                • -
                • return_dict (bool, optional) – Specify whether to return the crosstabs dictionary, default is False.

                • -
                • x (int, optional) – The width of the figure.

                • -
                • y (int, optional) – The height of the figure.

                • -
                • p (int, optional) – The padding between the subplots.

                • -
                • file_prefix (str, optional) – Prefix for the filename when output includes plots.

                • -
                • logscale (bool, optional) – Apply log scale to the y-axis, default is False.

                • -
                • plot_type (str, optional) – Specify the type of plot to generate: "both", "regular", "normalized". Default is "both".

                • -
                • show_legend (bool, optional) – Specify whether to show the legend, default is True.

                • -
                • label_fontsize (int, optional) – Font size for axis labels, default is 12.

                • -
                • tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10.

                • -
                • text_wrap (int, optional) – The maximum width of the title text before wrapping, default is 50.

                • -
                • remove_stacks (bool, optional) – If True, removes stacks and creates a regular bar plot using only the col parameter. Only works when plot_type is set to 'regular'. Default is False.

                • -
                • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

                • -
                • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

                • +
                • col (str) – The name of the column in the DataFrame to be analyzed.

                • +
                • func_col (list) – List of ground truth columns to be analyzed.

                • +
                • legend_labels_list (list) – List of legend labels for each ground truth column.

                • +
                • title (list) – List of titles for the plots.

                • +
                • kind (str, optional) – The kind of plot to generate ('bar' or 'barh' for horizontal bars), default is 'bar'.

                • +
                • width (float, optional) – The width of the bars in the bar plot, default is 0.9.

                • +
                • rot (int, optional) – The rotation angle of the x-axis labels, default is 0.

                • +
                • custom_order (list, optional) – Specifies a custom order for the categories in the col.

                • +
                • image_path_png (str, optional) – Directory path where generated PNG plot images will be saved.

                • +
                • image_path_svg (str, optional) – Directory path where generated SVG plot images will be saved.

                • +
                • save_formats (list, optional) – List of file formats to save the plot images in.

                • +
                • color (list, optional) – List of colors to use for the plots. If not provided, a default color scheme is used.

                • +
                • output (str, optional) – Specify the output type: "plots_only", "crosstabs_only", or "both". Default is "both".

                • +
                • return_dict (bool, optional) – Specify whether to return the crosstabs dictionary, default is False.

                • +
                • x (int, optional) – The width of the figure.

                • +
                • y (int, optional) – The height of the figure.

                • +
                • p (int, optional) – The padding between the subplots.

                • +
                • file_prefix (str, optional) – Prefix for the filename when output includes plots.

                • +
                • logscale (bool, optional) – Apply log scale to the y-axis, default is False.

                • +
                • plot_type (str, optional) – Specify the type of plot to generate: "both", "regular", "normalized". Default is "both".

                • +
                • show_legend (bool, optional) – Specify whether to show the legend, default is True.

                • +
                • label_fontsize (int, optional) – Font size for axis labels, default is 12.

                • +
                • tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10.

                • +
                • text_wrap (int, optional) – The maximum width of the title text before wrapping, default is 50.

                • +
                • remove_stacks (bool, optional) – If True, removes stacks and creates a regular bar plot using only the col parameter. Only works when plot_type is set to 'regular'. Default is False.

                • +
                • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

                • +
                • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

                Raises:
                  -
                • ValueError

                    +
                  • ValueError

                    • If output is not one of "both", "plots_only", or "crosstabs_only".

                    • If plot_type is not one of "both", "regular", "normalized".

                    • If remove_stacks is set to True and plot_type is not "regular".

                    • If the lengths of title, func_col, and legend_labels_list are not equal.

                  • -
                  • KeyError – If any columns specified in col or func_col are missing in the DataFrame.

                  • +
                  • KeyError – If any columns specified in col or func_col are missing in the DataFrame.

                Returns:
                @@ -1165,31 +1165,31 @@

                Box and Violin PlotsParameters:
                • df (pandas.DataFrame) – The DataFrame containing the data to plot.

                • -
                • metrics_list (list of str) – List of metric names (columns in df) to plot.

                • -
                • metrics_comp (list of str) – List of comparison categories (columns in df).

                • -
                • n_rows (int, optional) – Number of rows in the subplot grid. Calculated automatically if not provided.

                • -
                • n_cols (int, optional) – Number of columns in the subplot grid. Calculated automatically if not provided.

                • -
                • image_path_png (str, optional) – Optional directory path to save .png images.

                • -
                • image_path_svg (str, optional) – Optional directory path to save .svg images.

                • -
                • save_plots (str, optional) – String, "all", "individual", or "grid" to control saving plots.

                • -
                • show_legend (bool, optional) – Boolean, True if showing the legend in the plots. Default is True.

                • -
                • plot_type (str, optional) – Specify the type of plot, either "boxplot" or "violinplot". Default is "boxplot".

                • -
                • xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 0.

                • -
                • show_plot (str, optional) – Specify the plot display mode: "individual", "grid", or "both". Default is "both".

                • -
                • rotate_plot (bool, optional) – Boolean, True if rotating (pivoting) the plots. Default is False.

                • -
                • individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4).

                • -
                • grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots.

                • -
                • label_fontsize (int, optional) – Font size for axis labels. Default is 12.

                • -
                • tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.

                • -
                • text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50.

                • -
                • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

                • -
                • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

                • -
                • label_names (dict, optional) – Dictionary mapping original column names to custom labels. Default is None.

                • +
                • metrics_list (list of str) – List of metric names (columns in df) to plot.

                • +
                • metrics_comp (list of str) – List of comparison categories (columns in df).

                • +
                • n_rows (int, optional) – Number of rows in the subplot grid. Calculated automatically if not provided.

                • +
                • n_cols (int, optional) – Number of columns in the subplot grid. Calculated automatically if not provided.

                • +
                • image_path_png (str, optional) – Optional directory path to save .png images.

                • +
                • image_path_svg (str, optional) – Optional directory path to save .svg images.

                • +
                • save_plots (str, optional) – String, "all", "individual", or "grid" to control saving plots.

                • +
                • show_legend (bool, optional) – Boolean, True if showing the legend in the plots. Default is True.

                • +
                • plot_type (str, optional) – Specify the type of plot, either "boxplot" or "violinplot". Default is "boxplot".

                • +
                • xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 0.

                • +
                • show_plot (str, optional) – Specify the plot display mode: "individual", "grid", or "both". Default is "both".

                • +
                • rotate_plot (bool, optional) – Boolean, True if rotating (pivoting) the plots. Default is False.

                • +
                • individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4).

                • +
                • grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots.

                • +
                • label_fontsize (int, optional) – Font size for axis labels. Default is 12.

                • +
                • tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.

                • +
                • text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50.

                • +
                • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

                • +
                • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

                • +
                • label_names (dict, optional) – Dictionary mapping original column names to custom labels. Default is None.

                • kwargs (additional keyword arguments) – Additional keyword arguments passed to the Seaborn plotting function.

                Raises:
                -

                ValueError

                  +

                  ValueError

                  • If show_plot is not one of "individual", "grid", or "both".

                  • If save_plots is not one of None, "all", "individual", or "grid".

                  • If save_plots is set without specifying image_path_png or image_path_svg.

                  • @@ -1344,42 +1344,42 @@

                    Scatter Fit PlotParameters:
                    • df (pandas.DataFrame) – The DataFrame containing the data.

                    • -
                    • x_vars (list of str, optional) – List of variable names to plot on the x-axis.

                    • -
                    • y_vars (list of str, optional) – List of variable names to plot on the y-axis.

                    • -
                    • n_rows (int, optional) – Number of rows in the subplot grid. Calculated based on the number of plots and n_cols if not specified.

                    • -
                    • n_cols (int, optional) – Number of columns in the subplot grid. Calculated based on the number of plots and max_cols if not specified.

                    • -
                    • max_cols (int, optional) – Maximum number of columns in the subplot grid. Default is 4.

                    • -
                    • image_path_png (str, optional) – Directory path to save PNG images of the scatter plots.

                    • -
                    • image_path_svg (str, optional) – Directory path to save SVG images of the scatter plots.

                    • -
                    • save_plots (str, optional) – Controls which plots to save: "all", "individual", or "grid". If None, plots will not be saved.

                    • -
                    • show_legend (bool, optional) – Whether to display the legend on the plots. Default is True.

                    • -
                    • xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 0.

                    • -
                    • show_plot (str, optional) – Controls plot display: "individual", "grid", or "both". Default is "both".

                    • -
                    • rotate_plot (bool, optional) – Whether to rotate (pivot) the plots. Default is False.

                    • -
                    • individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4).

                    • -
                    • grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots. Calculated based on the number of rows and columns if not specified.

                    • -
                    • label_fontsize (int, optional) – Font size for axis labels. Default is 12.

                    • -
                    • tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.

                    • -
                    • text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50.

                    • -
                    • add_best_fit_line (bool, optional) – Whether to add a best fit line to the scatter plots. Default is False.

                    • -
                    • scatter_color (str, optional) – Color code for the scattered points. Default is "C0".

                    • -
                    • best_fit_linecolor (str, optional) – Color code for the best fit line. Default is "red".

                    • -
                    • best_fit_linestyle (str, optional) – Linestyle for the best fit line. Default is "-".

                    • -
                    • hue (str, optional) – Column name for the grouping variable that will produce points with different colors.

                    • -
                    • hue_palette (dict, list, or str, optional) – Specifies colors for each hue level. Can be a dictionary mapping hue levels to colors, a list of colors, or the name of a seaborn color palette. This parameter requires the hue parameter to be set.

                    • -
                    • size (str, optional) – Column name for the grouping variable that will produce points with different sizes.

                    • -
                    • sizes (dict, optional) – Dictionary mapping sizes (smallest and largest) to min and max values.

                    • -
                    • marker (str, optional) – Marker style used for the scatter points. Default is "o".

                    • -
                    • show_correlation (bool, optional) – Whether to display the Pearson correlation coefficient in the plot title. Default is True.

                    • -
                    • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

                    • -
                    • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

                    • -
                    • all_vars (list of str, optional) – If provided, automatically generates scatter plots for all combinations of variables in this list, overriding x_vars and y_vars.

                    • -
                    • label_names (dict, optional) – A dictionary to rename columns for display in the plot titles and labels.

                    • -
                    • kwargs (dict, optional) – Additional keyword arguments to pass to sns.scatterplot.

                    • +
                    • x_vars (list of str, optional) – List of variable names to plot on the x-axis.

                    • +
                    • y_vars (list of str, optional) – List of variable names to plot on the y-axis.

                    • +
                    • n_rows (int, optional) – Number of rows in the subplot grid. Calculated based on the number of plots and n_cols if not specified.

                    • +
                    • n_cols (int, optional) – Number of columns in the subplot grid. Calculated based on the number of plots and max_cols if not specified.

                    • +
                    • max_cols (int, optional) – Maximum number of columns in the subplot grid. Default is 4.

                    • +
                    • image_path_png (str, optional) – Directory path to save PNG images of the scatter plots.

                    • +
                    • image_path_svg (str, optional) – Directory path to save SVG images of the scatter plots.

                    • +
                    • save_plots (str, optional) – Controls which plots to save: "all", "individual", or "grid". If None, plots will not be saved.

                    • +
                    • show_legend (bool, optional) – Whether to display the legend on the plots. Default is True.

                    • +
                    • xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 0.

                    • +
                    • show_plot (str, optional) – Controls plot display: "individual", "grid", or "both". Default is "both".

                    • +
                    • rotate_plot (bool, optional) – Whether to rotate (pivot) the plots. Default is False.

                    • +
                    • individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4).

                    • +
                    • grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots. Calculated based on the number of rows and columns if not specified.

                    • +
                    • label_fontsize (int, optional) – Font size for axis labels. Default is 12.

                    • +
                    • tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.

                    • +
                    • text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50.

                    • +
                    • add_best_fit_line (bool, optional) – Whether to add a best fit line to the scatter plots. Default is False.

                    • +
                    • scatter_color (str, optional) – Color code for the scattered points. Default is "C0".

                    • +
                    • best_fit_linecolor (str, optional) – Color code for the best fit line. Default is "red".

                    • +
                    • best_fit_linestyle (str, optional) – Linestyle for the best fit line. Default is "-".

                    • +
                    • hue (str, optional) – Column name for the grouping variable that will produce points with different colors.

                    • +
                    • hue_palette (dict, list, or str, optional) – Specifies colors for each hue level. Can be a dictionary mapping hue levels to colors, a list of colors, or the name of a seaborn color palette. This parameter requires the hue parameter to be set.

                    • +
                    • size (str, optional) – Column name for the grouping variable that will produce points with different sizes.

                    • +
                    • sizes (dict, optional) – Dictionary mapping sizes (smallest and largest) to min and max values.

                    • +
                    • marker (str, optional) – Marker style used for the scatter points. Default is "o".

                    • +
                    • show_correlation (bool, optional) – Whether to display the Pearson correlation coefficient in the plot title. Default is True.

                    • +
                    • xlim (tuple or list, optional) – Limits for the x-axis as a tuple or list of (min, max).

                    • +
                    • ylim (tuple or list, optional) – Limits for the y-axis as a tuple or list of (min, max).

                    • +
                    • all_vars (list of str, optional) – If provided, automatically generates scatter plots for all combinations of variables in this list, overriding x_vars and y_vars.

                    • +
                    • label_names (dict, optional) – A dictionary to rename columns for display in the plot titles and labels.

                    • +
                    • kwargs (dict, optional) – Additional keyword arguments to pass to sns.scatterplot.

                    Raises:
                    -

                    ValueError

                      +

                      ValueError

                      • If all_vars is provided and either x_vars or y_vars is also provided.

                      • If neither all_vars nor both x_vars and y_vars are provided.

                      • If hue_palette is specified without hue.

                      • @@ -1547,30 +1547,30 @@

                        Correlation MatricesParameters:
                        • df (pandas.DataFrame) – The DataFrame containing the data.

                        • -
                        • cols (list of str, optional) – List of column names to include in the correlation matrix. If None, all columns are included.

                        • -
                        • annot (bool, optional) – Whether to annotate the heatmap with correlation coefficients. Default is True.

                        • -
                        • cmap (str, optional) – The colormap to use for the heatmap. Default is "coolwarm".

                        • -
                        • save_plots (bool, optional) – Controls whether to save the plots. Default is False.

                        • -
                        • image_path_png (str, optional) – Directory path to save PNG images of the heatmap.

                        • -
                        • image_path_svg (str, optional) – Directory path to save SVG images of the heatmap.

                        • -
                        • figsize (tuple, optional) – Width and height of the figure for the heatmap. Default is (10, 10).

                        • -
                        • title (str, optional) – Title of the heatmap. Default is "Cervical Cancer Data: Correlation Matrix".

                        • -
                        • label_fontsize (int, optional) – Font size for tick labels and colorbar label. Default is 12.

                        • -
                        • tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.

                        • -
                        • xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 45.

                        • -
                        • ylabel_rot (int, optional) – Rotation angle for y-axis labels. Default is 0.

                        • -
                        • xlabel_alignment (str, optional) – Horizontal alignment for x-axis labels. Default is "right".

                        • -
                        • ylabel_alignment (str, optional) – Vertical alignment for y-axis labels. Default is "center_baseline".

                        • -
                        • text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50.

                        • -
                        • vmin (float, optional) – Minimum value for the heatmap color scale. Default is -1.

                        • -
                        • vmax (float, optional) – Maximum value for the heatmap color scale. Default is 1.

                        • -
                        • cbar_label (str, optional) – Label for the colorbar. Default is "Correlation Index".

                        • -
                        • triangular (bool, optional) – Whether to show only the upper triangle of the correlation matrix. Default is True.

                        • -
                        • kwargs (dict, optional) – Additional keyword arguments to pass to seaborn.heatmap().

                        • +
                        • cols (list of str, optional) – List of column names to include in the correlation matrix. If None, all columns are included.

                        • +
                        • annot (bool, optional) – Whether to annotate the heatmap with correlation coefficients. Default is True.

                        • +
                        • cmap (str, optional) – The colormap to use for the heatmap. Default is "coolwarm".

                        • +
                        • save_plots (bool, optional) – Controls whether to save the plots. Default is False.

                        • +
                        • image_path_png (str, optional) – Directory path to save PNG images of the heatmap.

                        • +
                        • image_path_svg (str, optional) – Directory path to save SVG images of the heatmap.

                        • +
                        • figsize (tuple, optional) – Width and height of the figure for the heatmap. Default is (10, 10).

                        • +
                        • title (str, optional) – Title of the heatmap. Default is "Cervical Cancer Data: Correlation Matrix".

                        • +
                        • label_fontsize (int, optional) – Font size for tick labels and colorbar label. Default is 12.

                        • +
                        • tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.

                        • +
                        • xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 45.

                        • +
                        • ylabel_rot (int, optional) – Rotation angle for y-axis labels. Default is 0.

                        • +
                        • xlabel_alignment (str, optional) – Horizontal alignment for x-axis labels. Default is "right".

                        • +
                        • ylabel_alignment (str, optional) – Vertical alignment for y-axis labels. Default is "center_baseline".

                        • +
                        • text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50.

                        • +
                        • vmin (float, optional) – Minimum value for the heatmap color scale. Default is -1.

                        • +
                        • vmax (float, optional) – Maximum value for the heatmap color scale. Default is 1.

                        • +
                        • cbar_label (str, optional) – Label for the colorbar. Default is "Correlation Index".

                        • +
                        • triangular (bool, optional) – Whether to show only the upper triangle of the correlation matrix. Default is True.

                        • +
                        • kwargs (dict, optional) – Additional keyword arguments to pass to seaborn.heatmap().

                        Raises:
                        -

                        ValueError

                          +

                          ValueError

                          • If annot is not a boolean.

                          • If cols is not a list.

                          • If save_plots is not a boolean.

                          • @@ -1691,24 +1691,24 @@

                            2D Partial Dependence Plots
                            • model (estimator object) – The trained machine learning model used to generate partial dependence plots.

                            • X_train (pandas.DataFrame or numpy.ndarray) – The training data used to compute partial dependence. Should correspond to the features used to train the model.

                            • -
                            • feature_names (list of str) – A list of feature names corresponding to the columns in X_train.

                            • -
                            • features (list of int or tuple of int) – A list of feature indices or tuples of feature indices for which to generate partial dependence plots.

                            • -
                            • title (str, optional) – The title for the entire plot. Default is "PDP of house value on CA non-location features".

                            • -
                            • grid_resolution (int, optional) – The number of grid points to use for plotting the partial dependence. Higher values provide smoother curves but may increase computation time. Default is 50.

                            • -
                            • plot_type (str, optional) – The type of plot to generate. Choose "grid" for a grid layout, "individual" for separate plots, or "both" to generate both layouts. Default is "grid".

                            • -
                            • grid_figsize (tuple, optional) – Tuple specifying the width and height of the figure for the grid layout. Default is (12, 8).

                            • -
                            • individual_figsize (tuple, optional) – Tuple specifying the width and height of the figure for individual plots. Default is (6, 4).

                            • -
                            • label_fontsize (int, optional) – Font size for the axis labels and titles. Default is 12.

                            • -
                            • tick_fontsize (int, optional) – Font size for the axis tick labels. Default is 10.

                            • -
                            • text_wrap (int, optional) – The maximum width of the title text before wrapping. Useful for managing long titles. Default is 50.

                            • -
                            • image_path_png (str, optional) – The directory path where PNG images of the plots will be saved, if saving is enabled.

                            • -
                            • image_path_svg (str, optional) – The directory path where SVG images of the plots will be saved, if saving is enabled.

                            • -
                            • save_plots (str, optional) – Controls whether to save the plots. Options include "all", "individual", "grid", or None (default). If saving is enabled, ensure image_path_png or image_path_svg are provided.

                            • -
                            • file_prefix (str, optional) – Prefix for the filenames of the saved grid plots. Default is "partial_dependence".

                            • +
                            • feature_names (list of str) – A list of feature names corresponding to the columns in X_train.

                            • +
                            • features (list of int or tuple of int) – A list of feature indices or tuples of feature indices for which to generate partial dependence plots.

                            • +
                            • title (str, optional) – The title for the entire plot. Default is "PDP of house value on CA non-location features".

                            • +
                            • grid_resolution (int, optional) – The number of grid points to use for plotting the partial dependence. Higher values provide smoother curves but may increase computation time. Default is 50.

                            • +
                            • plot_type (str, optional) – The type of plot to generate. Choose "grid" for a grid layout, "individual" for separate plots, or "both" to generate both layouts. Default is "grid".

                            • +
                            • grid_figsize (tuple, optional) – Tuple specifying the width and height of the figure for the grid layout. Default is (12, 8).

                            • +
                            • individual_figsize (tuple, optional) – Tuple specifying the width and height of the figure for individual plots. Default is (6, 4).

                            • +
                            • label_fontsize (int, optional) – Font size for the axis labels and titles. Default is 12.

                            • +
                            • tick_fontsize (int, optional) – Font size for the axis tick labels. Default is 10.

                            • +
                            • text_wrap (int, optional) – The maximum width of the title text before wrapping. Useful for managing long titles. Default is 50.

                            • +
                            • image_path_png (str, optional) – The directory path where PNG images of the plots will be saved, if saving is enabled.

                            • +
                            • image_path_svg (str, optional) – The directory path where SVG images of the plots will be saved, if saving is enabled.

                            • +
                            • save_plots (str, optional) – Controls whether to save the plots. Options include "all", "individual", "grid", or None (default). If saving is enabled, ensure image_path_png or image_path_svg are provided.

                            • +
                            • file_prefix (str, optional) – Prefix for the filenames of the saved grid plots. Default is "partial_dependence".

                          Raises:
                          -

                          ValueError

                            +

                            ValueError

                            • If plot_type is not one of "grid", "individual", or "both".

                            • If save_plots is enabled but neither image_path_png nor image_path_svg is provided.

                            @@ -1825,47 +1825,47 @@

                            3D Partial Dependence Plots
                            • model (estimator object) – The trained machine learning model used to generate partial dependence plots.

                            • dataframe (pandas.DataFrame or numpy.ndarray) – The dataset on which the model was trained or a representative sample. If a DataFrame is provided, feature_names_list should correspond to the column names. If a NumPy array is provided, feature_names_list should correspond to the indices of the columns.

                            • -
                            • feature_names_list (list of str) – A list of two feature names or indices corresponding to the features for which partial dependence plots are generated.

                            • -
                            • x_label (str, optional) – Label for the x-axis in the plots. Default is None.

                            • -
                            • y_label (str, optional) – Label for the y-axis in the plots. Default is None.

                            • -
                            • z_label (str, optional) – Label for the z-axis in the plots. Default is None.

                            • -
                            • title (str) – The title for the plots.

                            • -
                            • html_file_path (str, optional) – Path to save the interactive Plotly HTML file. Required if plot_type is "interactive" or "both". Default is None.

                            • -
                            • html_file_name (str, optional) – Name of the HTML file to save the interactive Plotly plot. Required if plot_type is "interactive" or "both". Default is None.

                            • -
                            • image_filename (str, optional) – Base filename for saving static Matplotlib plots as PNG and/or SVG. Default is None.

                            • -
                            • plot_type (str, optional) – The type of plots to generate. Options are: +

                            • feature_names_list (list of str) – A list of two feature names or indices corresponding to the features for which partial dependence plots are generated.

                            • +
                            • x_label (str, optional) – Label for the x-axis in the plots. Default is None.

                            • +
                            • y_label (str, optional) – Label for the y-axis in the plots. Default is None.

                            • +
                            • z_label (str, optional) – Label for the z-axis in the plots. Default is None.

                            • +
                            • title (str) – The title for the plots.

                            • +
                            • html_file_path (str, optional) – Path to save the interactive Plotly HTML file. Required if plot_type is "interactive" or "both". Default is None.

                            • +
                            • html_file_name (str, optional) – Name of the HTML file to save the interactive Plotly plot. Required if plot_type is "interactive" or "both". Default is None.

                            • +
                            • image_filename (str, optional) – Base filename for saving static Matplotlib plots as PNG and/or SVG. Default is None.

                            • +
                            • plot_type (str, optional) – The type of plots to generate. Options are: - "static": Generate only static Matplotlib plots. - "interactive": Generate only interactive Plotly plots. - "both": Generate both static and interactive plots. Default is "both".

                            • matplotlib_colormap (matplotlib.colors.Colormap, optional) – Custom colormap for the Matplotlib plot. If not provided, a default colormap is used.

                            • -
                            • plotly_colormap (str, optional) – Colormap for the Plotly plot. Default is "Viridis".

                            • -
                            • zoom_out_factor (float, optional) – Factor to adjust the zoom level of the Plotly plot. Default is None.

                            • -
                            • wireframe_color (str, optional) – Color for the wireframe in the Matplotlib plot. If None, no wireframe is plotted. Default is None.

                            • -
                            • view_angle (tuple, optional) – Elevation and azimuthal angles for the Matplotlib plot view. Default is (22, 70).

                            • -
                            • figsize (tuple, optional) – Figure size for the Matplotlib plot. Default is (7, 4.5).

                            • -
                            • text_wrap (int, optional) – Maximum width of the title text before wrapping. Useful for managing long titles. Default is 50.

                            • -
                            • horizontal (float, optional) – Horizontal camera position for the Plotly plot. Default is -1.25.

                            • -
                            • depth (float, optional) – Depth camera position for the Plotly plot. Default is 1.25.

                            • -
                            • vertical (float, optional) – Vertical camera position for the Plotly plot. Default is 1.25.

                            • -
                            • cbar_x (float, optional) – Position of the color bar along the x-axis in the Plotly plot. Default is 1.05.

                            • -
                            • cbar_thickness (int, optional) – Thickness of the color bar in the Plotly plot. Default is 25.

                            • -
                            • title_x (float, optional) – Horizontal position of the title in the Plotly plot. Default is 0.5.

                            • -
                            • title_y (float, optional) – Vertical position of the title in the Plotly plot. Default is 0.95.

                            • -
                            • top_margin (int, optional) – Top margin for the Plotly plot layout. Default is 100.

                            • -
                            • image_path_png (str, optional) – Directory path to save the PNG file of the Matplotlib plot. Default is None.

                            • -
                            • image_path_svg (str, optional) – Directory path to save the SVG file of the Matplotlib plot. Default is None.

                            • -
                            • show_cbar (bool, optional) – Whether to display the color bar in the Matplotlib plot. Default is True.

                            • -
                            • grid_resolution (int, optional) – The resolution of the grid for computing partial dependence. Default is 20.

                            • -
                            • left_margin (int, optional) – Left margin for the Plotly plot layout. Default is 20.

                            • -
                            • right_margin (int, optional) – Right margin for the Plotly plot layout. Default is 65.

                            • -
                            • label_fontsize (int, optional) – Font size for axis labels in the Matplotlib plot. Default is 8.

                            • -
                            • tick_fontsize (int, optional) – Font size for tick labels in the Matplotlib plot. Default is 6.

                            • -
                            • enable_zoom (bool, optional) – Whether to enable zooming in the Plotly plot. Default is True.

                            • -
                            • show_modebar (bool, optional) – Whether to display the mode bar in the Plotly plot. Default is True.

                            • +
                            • plotly_colormap (str, optional) – Colormap for the Plotly plot. Default is "Viridis".

                            • +
                            • zoom_out_factor (float, optional) – Factor to adjust the zoom level of the Plotly plot. Default is None.

                            • +
                            • wireframe_color (str, optional) – Color for the wireframe in the Matplotlib plot. If None, no wireframe is plotted. Default is None.

                            • +
                            • view_angle (tuple, optional) – Elevation and azimuthal angles for the Matplotlib plot view. Default is (22, 70).

                            • +
                            • figsize (tuple, optional) – Figure size for the Matplotlib plot. Default is (7, 4.5).

                            • +
                            • text_wrap (int, optional) – Maximum width of the title text before wrapping. Useful for managing long titles. Default is 50.

                            • +
                            • horizontal (float, optional) – Horizontal camera position for the Plotly plot. Default is -1.25.

                            • +
                            • depth (float, optional) – Depth camera position for the Plotly plot. Default is 1.25.

                            • +
                            • vertical (float, optional) – Vertical camera position for the Plotly plot. Default is 1.25.

                            • +
                            • cbar_x (float, optional) – Position of the color bar along the x-axis in the Plotly plot. Default is 1.05.

                            • +
                            • cbar_thickness (int, optional) – Thickness of the color bar in the Plotly plot. Default is 25.

                            • +
                            • title_x (float, optional) – Horizontal position of the title in the Plotly plot. Default is 0.5.

                            • +
                            • title_y (float, optional) – Vertical position of the title in the Plotly plot. Default is 0.95.

                            • +
                            • top_margin (int, optional) – Top margin for the Plotly plot layout. Default is 100.

                            • +
                            • image_path_png (str, optional) – Directory path to save the PNG file of the Matplotlib plot. Default is None.

                            • +
                            • image_path_svg (str, optional) – Directory path to save the SVG file of the Matplotlib plot. Default is None.

                            • +
                            • show_cbar (bool, optional) – Whether to display the color bar in the Matplotlib plot. Default is True.

                            • +
                            • grid_resolution (int, optional) – The resolution of the grid for computing partial dependence. Default is 20.

                            • +
                            • left_margin (int, optional) – Left margin for the Plotly plot layout. Default is 20.

                            • +
                            • right_margin (int, optional) – Right margin for the Plotly plot layout. Default is 65.

                            • +
                            • label_fontsize (int, optional) – Font size for axis labels in the Matplotlib plot. Default is 8.

                            • +
                            • tick_fontsize (int, optional) – Font size for tick labels in the Matplotlib plot. Default is 6.

                            • +
                            • enable_zoom (bool, optional) – Whether to enable zooming in the Plotly plot. Default is True.

                            • +
                            • show_modebar (bool, optional) – Whether to display the mode bar in the Plotly plot. Default is True.

                            Raises:
                            -

                            ValueError

                              +

                              ValueError

                              • If plot_type is not one of "static", "interactive", or "both".

                              • If plot_type is "interactive" or "both" and html_file_path or html_file_name are not provided.

                              diff --git a/docs/v0.0.11/.doctrees/environment.pickle b/docs/v0.0.11/.doctrees/environment.pickle index 38e645849..7347cafd3 100644 Binary files a/docs/v0.0.11/.doctrees/environment.pickle and b/docs/v0.0.11/.doctrees/environment.pickle differ diff --git a/docs/v0.0.12/.doctrees/environment.pickle b/docs/v0.0.12/.doctrees/environment.pickle index f244cc9da..720304bbc 100644 Binary files a/docs/v0.0.12/.doctrees/environment.pickle and b/docs/v0.0.12/.doctrees/environment.pickle differ diff --git a/docs/v0.0.5/.buildinfo b/docs/v0.0.5/.buildinfo index 353dc5664..56e615cbd 100644 --- a/docs/v0.0.5/.buildinfo +++ b/docs/v0.0.5/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 14d3b31e6ef370db026ab086dac9c520 +config: 2c4b89aaffb03de3043e77ead7021a5d tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.5/.buildinfo.bak b/docs/v0.0.5/.buildinfo.bak index ff6ef39ed..353dc5664 100644 --- a/docs/v0.0.5/.buildinfo.bak +++ b/docs/v0.0.5/.buildinfo.bak @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 42de9baa26c146a450cafdcd02e4c95f +config: 14d3b31e6ef370db026ab086dac9c520 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.5/.doctrees/environment.pickle b/docs/v0.0.5/.doctrees/environment.pickle index 7fb10cbe7..46e828b92 100644 Binary files a/docs/v0.0.5/.doctrees/environment.pickle and b/docs/v0.0.5/.doctrees/environment.pickle differ diff --git a/docs/v0.0.6/.buildinfo b/docs/v0.0.6/.buildinfo index cf5e3e8fe..ea039be3e 100644 --- a/docs/v0.0.6/.buildinfo +++ b/docs/v0.0.6/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: d6a3131db8c8d02d091982941432fad9 +config: 2346f42102dd7aee5865ae04881f86e0 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.6/.buildinfo.bak b/docs/v0.0.6/.buildinfo.bak index 1a500f15c..cf5e3e8fe 100644 --- a/docs/v0.0.6/.buildinfo.bak +++ b/docs/v0.0.6/.buildinfo.bak @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 5e377420975cef2c832be9a69a8a7b5e +config: d6a3131db8c8d02d091982941432fad9 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.6/.doctrees/environment.pickle b/docs/v0.0.6/.doctrees/environment.pickle index cc6448e7c..fba467dce 100644 Binary files a/docs/v0.0.6/.doctrees/environment.pickle and b/docs/v0.0.6/.doctrees/environment.pickle differ diff --git a/docs/v0.0.7/.buildinfo b/docs/v0.0.7/.buildinfo index ca0720dd0..b9498ddbe 100644 --- a/docs/v0.0.7/.buildinfo +++ b/docs/v0.0.7/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: c22d1538c30028bdb9ac082dcebbb68f +config: 912163ee02898a10ae5ecb04fe35fd30 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.7/.buildinfo.bak b/docs/v0.0.7/.buildinfo.bak index 4e82a2271..ca0720dd0 100644 --- a/docs/v0.0.7/.buildinfo.bak +++ b/docs/v0.0.7/.buildinfo.bak @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 9454070ee1599010ba800b72973e1365 +config: c22d1538c30028bdb9ac082dcebbb68f tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.7/.doctrees/environment.pickle b/docs/v0.0.7/.doctrees/environment.pickle index abd845ef2..e6ac3af60 100644 Binary files a/docs/v0.0.7/.doctrees/environment.pickle and b/docs/v0.0.7/.doctrees/environment.pickle differ diff --git a/docs/v0.0.8/.buildinfo b/docs/v0.0.8/.buildinfo index e01e2c532..9740f9490 100644 --- a/docs/v0.0.8/.buildinfo +++ b/docs/v0.0.8/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 39b654b79ced0121f2e15ada0c422029 +config: 42f68d4f63cd7edcd4e14882dcd27b7c tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.8/.buildinfo.bak b/docs/v0.0.8/.buildinfo.bak index ba4ff1fb7..e01e2c532 100644 --- a/docs/v0.0.8/.buildinfo.bak +++ b/docs/v0.0.8/.buildinfo.bak @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 6bd52975582850e0487917125869e347 +config: 39b654b79ced0121f2e15ada0c422029 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.8/.doctrees/environment.pickle b/docs/v0.0.8/.doctrees/environment.pickle index b6f64fa1c..010762aed 100644 Binary files a/docs/v0.0.8/.doctrees/environment.pickle and b/docs/v0.0.8/.doctrees/environment.pickle differ diff --git a/docs/v0.0.9/.buildinfo b/docs/v0.0.9/.buildinfo index 107c1d352..42de70526 100644 --- a/docs/v0.0.9/.buildinfo +++ b/docs/v0.0.9/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 616a87ce13b44e73dae606393e5ad6ba +config: b2540de4381bcc93826364b882104ea5 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.9/.buildinfo.bak b/docs/v0.0.9/.buildinfo.bak index 5ce48115b..107c1d352 100644 --- a/docs/v0.0.9/.buildinfo.bak +++ b/docs/v0.0.9/.buildinfo.bak @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done. -config: ad776a938cae4ff505a099d2a0b3a002 +config: 616a87ce13b44e73dae606393e5ad6ba tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/v0.0.9/.doctrees/environment.pickle b/docs/v0.0.9/.doctrees/environment.pickle index e716eb90b..7092b5d90 100644 Binary files a/docs/v0.0.9/.doctrees/environment.pickle and b/docs/v0.0.9/.doctrees/environment.pickle differ diff --git a/source/changelog.rst b/source/changelog.rst index e2a4f46ee..81e95ca2a 100644 --- a/source/changelog.rst +++ b/source/changelog.rst @@ -24,6 +24,63 @@ Changelog ========= +`Version 0.0.14`_ +---------------------- + +.. _Version 0.0.14: https://lshpaner.github.io/eda_toolkit/v0.0.14/index.html + +Ensure Crosstabs Dictionary is Populated with ``return_dict=True`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This resolves the issue where the ``stacked_crosstab_plot`` function fails to +populate and return the crosstabs dictionary (``crosstabs_dict``) when +``return_dict=True`` and ``output="plots_only"``. The fix ensures that crosstabs +are always generated when ``return_dict=True``, regardless of the output parameter. + +- Always Generate Crosstabs with ``return_dict=True``: + + - Added logic to ensure crosstabs are created and populated in ``crosstabs_dict`` whenever ``return_dict=True``, even if the output parameter is set to ``"plots_only"``. + +- Separation of Crosstabs Display from Generation: + + - The generation of crosstabs is now independent of the output parameter. + - Crosstabs display (``print``) occurs only when output includes ``"both"`` or ``"crosstabs_only"``. + +Enhancements and Fixes for ``scatter_fit_plot`` Function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This addresses critical issues and introduces key enhancements for the ``scatter_fit_plot`` function. +These changes aim to improve usability, flexibility, and robustness of the function. + +1. Added ``exclude_combinations`` Parameter. Users can now exclude specific variable pairs from being plotted by providing a list of tuples with the combinations to omit. + +2. Added ``combinations`` Parameter to ``show_plot``. Users can also now show just the list of combinations that are part of the selection process when ``all_vars=True``. + +3. When plotting a single variable pair with ``show_plot="both"``, the function threw an ``AttributeError``. Single-variable pairs are now properly handled. + +4. Changed the default value of ``show_plot`` to ``"both"`` to prevent excessive individual plots when handling large variable sets. + +5. Fixed Issues with Legend, ``xlim``, and ``ylim``; inputs were not being used; these have been corrected. + + +Fix Default Title and Filename Handling in ``flex_corr_matrix`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This resolves issues in the ``flex_corr_matrix`` function where: + +1. No default title was provided when ``title=None``, resulting in missing titles on plots. +2. Saved plot filenames were incorrect, leading to issues like ``.png.png`` when ``title`` was not provided. + +The fix ensures that a default title ("Correlation Matrix") is used for both plot display and file saving when no ``title`` +is explicitly provided. If ``title`` is explicitly set to ``None``, the plot will have no title, +but the saved filename will still use ``"correlation_matrix"``. + +1. If no ``title`` is provided, ``"Correlation Matrix"`` is used as the default for filenames and displayed titles. If ``title=None`` is explicitly passed, no title is displayed on the plot. + +2. File names are generated based on the ``title`` or default to ``"correlation_matrix"`` if ``title`` is not provided. Spaces in the ``title`` are replaced with underscores, and special characters like ``:`` are removed to ensure valid filenames. + + + `Version 0.0.13`_ ---------------------- diff --git a/source/eda_plots.rst b/source/eda_plots.rst index c5fa3af87..1e79ec811 100644 --- a/source/eda_plots.rst +++ b/source/eda_plots.rst @@ -2931,7 +2931,7 @@ These settings allow for the creation of scatter plots that comprehensively expl
                              -.. image:: ../assets/scatter_plots_all_grid.png +.. image:: ../assets/scatter_plots_grid.png :alt: Scatter Plot Comparisons (Grouped2) :align: center :width: 900px