diff --git a/assets/scatter_plots_all_grid.png b/assets/scatter_plots_grid.png similarity index 100% rename from assets/scatter_plots_all_grid.png rename to assets/scatter_plots_grid.png diff --git a/docs/_images/scatter_plots_grid.png b/docs/_images/scatter_plots_grid.png index 5a51facd8..78652ac74 100644 Binary files a/docs/_images/scatter_plots_grid.png and b/docs/_images/scatter_plots_grid.png differ diff --git a/docs/_sources/changelog.rst.txt b/docs/_sources/changelog.rst.txt index e2a4f46ee..81e95ca2a 100644 --- a/docs/_sources/changelog.rst.txt +++ b/docs/_sources/changelog.rst.txt @@ -24,6 +24,63 @@ Changelog ========= +`Version 0.0.14`_ +---------------------- + +.. _Version 0.0.14: https://lshpaner.github.io/eda_toolkit/v0.0.14/index.html + +Ensure Crosstabs Dictionary is Populated with ``return_dict=True`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This resolves the issue where the ``stacked_crosstab_plot`` function fails to +populate and return the crosstabs dictionary (``crosstabs_dict``) when +``return_dict=True`` and ``output="plots_only"``. The fix ensures that crosstabs +are always generated when ``return_dict=True``, regardless of the output parameter. + +- Always Generate Crosstabs with ``return_dict=True``: + + - Added logic to ensure crosstabs are created and populated in ``crosstabs_dict`` whenever ``return_dict=True``, even if the output parameter is set to ``"plots_only"``. + +- Separation of Crosstabs Display from Generation: + + - The generation of crosstabs is now independent of the output parameter. + - Crosstabs display (``print``) occurs only when output includes ``"both"`` or ``"crosstabs_only"``. + +Enhancements and Fixes for ``scatter_fit_plot`` Function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This addresses critical issues and introduces key enhancements for the ``scatter_fit_plot`` function. +These changes aim to improve usability, flexibility, and robustness of the function. + +1. Added ``exclude_combinations`` Parameter. Users can now exclude specific variable pairs from being plotted by providing a list of tuples with the combinations to omit. + +2. Added ``combinations`` Parameter to ``show_plot``. Users can also now show just the list of combinations that are part of the selection process when ``all_vars=True``. + +3. When plotting a single variable pair with ``show_plot="both"``, the function threw an ``AttributeError``. Single-variable pairs are now properly handled. + +4. Changed the default value of ``show_plot`` to ``"both"`` to prevent excessive individual plots when handling large variable sets. + +5. Fixed Issues with Legend, ``xlim``, and ``ylim``; inputs were not being used; these have been corrected. + + +Fix Default Title and Filename Handling in ``flex_corr_matrix`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This resolves issues in the ``flex_corr_matrix`` function where: + +1. No default title was provided when ``title=None``, resulting in missing titles on plots. +2. Saved plot filenames were incorrect, leading to issues like ``.png.png`` when ``title`` was not provided. + +The fix ensures that a default title ("Correlation Matrix") is used for both plot display and file saving when no ``title`` +is explicitly provided. If ``title`` is explicitly set to ``None``, the plot will have no title, +but the saved filename will still use ``"correlation_matrix"``. + +1. If no ``title`` is provided, ``"Correlation Matrix"`` is used as the default for filenames and displayed titles. If ``title=None`` is explicitly passed, no title is displayed on the plot. + +2. File names are generated based on the ``title`` or default to ``"correlation_matrix"`` if ``title`` is not provided. Spaces in the ``title`` are replaced with underscores, and special characters like ``:`` are removed to ensure valid filenames. + + + `Version 0.0.13`_ ---------------------- diff --git a/docs/_sources/eda_plots.rst.txt b/docs/_sources/eda_plots.rst.txt index c5fa3af87..1e79ec811 100644 --- a/docs/_sources/eda_plots.rst.txt +++ b/docs/_sources/eda_plots.rst.txt @@ -2931,7 +2931,7 @@ These settings allow for the creation of scatter plots that comprehensively expl
return_dict=True
This resolves the issue where the stacked_crosstab_plot
function fails to
+populate and return the crosstabs dictionary (crosstabs_dict
) when
+return_dict=True
and output="plots_only"
. The fix ensures that crosstabs
+are always generated when return_dict=True
, regardless of the output parameter.
Always Generate Crosstabs with return_dict=True
:
Added logic to ensure crosstabs are created and populated in crosstabs_dict
whenever return_dict=True
, even if the output parameter is set to "plots_only"
.
Separation of Crosstabs Display from Generation:
+The generation of crosstabs is now independent of the output parameter.
Crosstabs display (print
) occurs only when output includes "both"
or "crosstabs_only"
.
scatter_fit_plot
FunctionThis addresses critical issues and introduces key enhancements for the scatter_fit_plot
function.
+These changes aim to improve usability, flexibility, and robustness of the function.
Added exclude_combinations
Parameter. Users can now exclude specific variable pairs from being plotted by providing a list of tuples with the combinations to omit.
Added combinations
Parameter to show_plot
. Users can also now show just the list of combinations that are part of the selection process when all_vars=True
.
When plotting a single variable pair with show_plot="both"
, the function threw an AttributeError
. Single-variable pairs are now properly handled.
Changed the default value of show_plot
to "both"
to prevent excessive individual plots when handling large variable sets.
Fixed Issues with Legend, xlim
, and ylim
; inputs were not being used; these have been corrected.
flex_corr_matrix
This resolves issues in the flex_corr_matrix
function where:
No default title was provided when title=None
, resulting in missing titles on plots.
Saved plot filenames were incorrect, leading to issues like .png.png
when title
was not provided.
The fix ensures that a default title (“Correlation Matrix”) is used for both plot display and file saving when no title
+is explicitly provided. If title
is explicitly set to None
, the plot will have no title,
+but the saved filename will still use "correlation_matrix"
.
If no title
is provided, "Correlation Matrix"
is used as the default for filenames and displayed titles. If title=None
is explicitly passed, no title is displayed on the plot.
File names are generated based on the title
or default to "correlation_matrix"
if title
is not provided. Spaces in the title
are replaced with underscores, and special characters like :
are removed to ensure valid filenames.
This version introduces a series of updates and fixes across multiple functions to enhance error handling, improve cross-environment compatibility, streamline usability, and optimize performance. These changes address critical issues, add new features, and ensure consistent behavior in both terminal and notebook environments.
@@ -827,8 +883,8 @@Refined KDE Distributions
Key Changes
Enhanced KDE Distributions Function
Added Parameters
Contingency Table Updates
fillna('')
added to output so that null values come through, removed 'All'
column name from output, sort options 0
and 1
, updated docstring documentation. Tested successfully on Python 3.7.3
.
path (str) – The path to the directory that needs to be ensured.
+path (str) – The path to the directory that needs to be ensured.
None
@@ -231,10 +231,10 @@df (pd.DataFrame) – The dataframe to add IDs to.
id_colname (str, optional) – The name of the new column for the IDs. Defaults to "ID"
.
num_digits (int, optional) – The number of digits for the unique IDs. Defaults to 9
.
seed (int, optional) – The seed for the random number generator. Defaults to None
.
set_as_index (bool, optional) – Whether to set the new ID column as the index. Defaults to False
.
id_colname (str, optional) – The name of the new column for the IDs. Defaults to "ID"
.
num_digits (int, optional) – The number of digits for the unique IDs. Defaults to 9
.
seed (int, optional) – The seed for the random number generator. Defaults to None
.
set_as_index (bool, optional) – Whether to set the new ID column as the index. Defaults to False
.
df (pd.DataFrame) – The DataFrame containing the column to be processed.
column_name (str) – The name of the column containing floats with potential trailing periods.
column_name (str) – The name of the column containing floats with potential trailing periods.
date_str (str) – A date string to be standardized.
+date_str (str) – A date string to be standardized.
A standardized date string in the format YYYY-MM-DD
.
str
ValueError – If date_str
is in an unrecognized format or if the function
+
ValueError – If date_str
is in an unrecognized format or if the function
cannot parse the date.
df (pandas.DataFrame) – The DataFrame to analyze.
background_color (str, optional) – Hex color code or color name for background styling in the output +
background_color (str, optional) – Hex color code or color name for background styling in the output
DataFrame. Defaults to None
.
return_df (bool, optional) – If True
, returns the plain DataFrame with the summary statistics. If
+
return_df (bool, optional) – If True
, returns the plain DataFrame with the summary statistics. If
False
, returns a styled DataFrame for visual presentation. Defaults to False
.
df (pandas.DataFrame) – The pandas DataFrame containing the data.
variables (list of str) – List of column names from the DataFrame to generate combinations.
data_path (str) – Path where the output Excel file will be saved.
data_name (str) – Name of the output Excel file.
min_length (int, optional) – Minimum size of the combinations to generate. Defaults to 2
.
variables (list of str) – List of column names from the DataFrame to generate combinations.
data_path (str) – Path where the output Excel file will be saved.
data_name (str) – Name of the output Excel file.
min_length (int, optional) – Minimum size of the combinations to generate. Defaults to 2
.
A tuple containing a dictionary of summary tables and a list of all generated combinations.
tuple(dict, list)
file_path (str) – Full path to the output Excel file.
df_dict (dict) – Dictionary where keys are sheet names and values are DataFrames to save.
decimal_places (int) – Number of decimal places to round numeric columns. Default is 0.
file_path (str) – Full path to the output Excel file.
df_dict (dict) – Dictionary where keys are sheet names and values are DataFrames to save.
decimal_places (int) – Number of decimal places to round numeric columns. Default is 0.
df (pandas.DataFrame) – The DataFrame to analyze.
cols (str or list of str, optional) – Name of the column (as a string) for a single column or list of column names for multiple columns. Must provide at least one column.
sort_by (int, optional) – Enter 0
to sort results by column groups; enter 1
to sort results by totals in descending order. Defaults to 0
.
cols (str or list of str, optional) – Name of the column (as a string) for a single column or list of column names for multiple columns. Must provide at least one column.
sort_by (int, optional) – Enter 0
to sort results by column groups; enter 1
to sort results by totals in descending order. Defaults to 0
.
ValueError – If no columns are specified or if sort_by
is not 0
or 1
.
ValueError – If no columns are specified or if sort_by
is not 0
or 1
.
A DataFrame containing the contingency table with the specified columns, a 'Total'
column representing the count of occurrences, and a 'Percentage'
column representing the percentage of the total count.
df (pandas.DataFrame) – The DataFrame to be styled.
columns (list of str) – List of column names to be highlighted.
color (str, optional) – The background color to be applied for highlighting (default is “yellow”).
columns (list of str) – List of column names to be highlighted.
color (str, optional) – The background color to be applied for highlighting (default is “yellow”).
df (pandas.DataFrame) – The DataFrame containing the data to plot.
vars_of_interest (list of str, optional) – List of column names for which to generate distribution plots. If ‘all’, plots will be generated for all numeric columns.
figsize (tuple of int, optional) – Size of each individual plot, default is (5, 5)
. Used when only one plot is being generated or when saving individual plots.
grid_figsize (tuple of int, optional) – Size of the overall grid of plots when multiple plots are generated in a grid. Ignored when only one plot is being generated or when saving individual plots. If not specified, it is calculated based on figsize
, n_rows
, and n_cols
.
hist_color (str, optional) – Color of the histogram bars, default is '#0000FF'
.
kde_color (str, optional) – Color of the KDE plot, default is '#FF0000'
.
mean_color (str, optional) – Color of the mean line if plot_mean
is True, default is '#000000'
.
median_color (str, optional) – Color of the median line if plot_median
is True, default is '#000000'
.
hist_edgecolor (str, optional) – Color of the histogram bar edges, default is '#000000'
.
hue (str, optional) – Column name to group data by, adding different colors for each group.
fill (bool, optional) – Whether to fill the histogram bars with color, default is True
.
fill_alpha (float, optional) – Alpha transparency for the fill color of the histogram bars, where 0
is fully transparent and 1
is fully opaque. Default is 1
.
n_rows (int, optional) – Number of rows in the subplot grid. If not provided, it will be calculated automatically.
n_cols (int, optional) – Number of columns in the subplot grid. If not provided, it will be calculated automatically.
w_pad (float, optional) – Width padding between subplots, default is 1.0
.
h_pad (float, optional) – Height padding between subplots, default is 1.0
.
image_path_png (str, optional) – Directory path to save the PNG image of the overall distribution plots.
image_path_svg (str, optional) – Directory path to save the SVG image of the overall distribution plots.
image_filename (str, optional) – Filename to use when saving the overall distribution plots.
bbox_inches (str, optional) – Bounding box to use when saving the figure. For example, 'tight'
.
single_var_image_filename (str, optional) – Filename to use when saving the separate distribution plots. The variable name will be appended to this filename. This parameter uses figsize
for determining the plot size, ignoring grid_figsize
.
y_axis_label (str, optional) – The label to display on the y-axis
, default is 'Density'
.
plot_type (str, optional) – The type of plot to generate, options are 'hist'
, 'kde'
, or 'both'
. Default is 'both'
.
log_scale_vars (str or list of str, optional) – Variable name(s) to apply log scaling. Can be a single string or a list of strings.
bins (int or sequence, optional) – Specification of histogram bins, default is 'auto'
.
binwidth (float, optional) – Width of each bin, overrides bins but can be used with binrange.
label_fontsize (int, optional) – Font size for axis labels, including xlabel, ylabel, and tick marks, default is 10
.
tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10
.
text_wrap (int, optional) – Maximum width of the title text before wrapping, default is 50
.
disable_sci_notation (bool, optional) – Toggle to disable scientific notation on axes, default is False
.
stat (str, optional) – Aggregate statistic to compute in each bin (e.g., 'count'
, 'frequency'
, 'probability'
, 'percent'
, 'density'
), default is 'density'
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min
, max
).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min
, max
).
plot_mean (bool, optional) – Whether to plot the mean as a vertical line, default is False
.
plot_median (bool, optional) – Whether to plot the median as a vertical line, default is False
.
std_dev_levels (list of int, optional) – Levels of standard deviation to plot around the mean.
std_color (str or list of str, optional) – Color(s) for the standard deviation lines, default is '#808080'
.
label_names (dict, optional) – Custom labels for the variables of interest. Keys should be column names, and values should be the corresponding labels to display.
show_legend (bool, optional) – Whether to show the legend on the plots, default is True
.
vars_of_interest (list of str, optional) – List of column names for which to generate distribution plots. If ‘all’, plots will be generated for all numeric columns.
figsize (tuple of int, optional) – Size of each individual plot, default is (5, 5)
. Used when only one plot is being generated or when saving individual plots.
grid_figsize (tuple of int, optional) – Size of the overall grid of plots when multiple plots are generated in a grid. Ignored when only one plot is being generated or when saving individual plots. If not specified, it is calculated based on figsize
, n_rows
, and n_cols
.
hist_color (str, optional) – Color of the histogram bars, default is '#0000FF'
.
kde_color (str, optional) – Color of the KDE plot, default is '#FF0000'
.
mean_color (str, optional) – Color of the mean line if plot_mean
is True, default is '#000000'
.
median_color (str, optional) – Color of the median line if plot_median
is True, default is '#000000'
.
hist_edgecolor (str, optional) – Color of the histogram bar edges, default is '#000000'
.
hue (str, optional) – Column name to group data by, adding different colors for each group.
fill (bool, optional) – Whether to fill the histogram bars with color, default is True
.
fill_alpha (float, optional) – Alpha transparency for the fill color of the histogram bars, where 0
is fully transparent and 1
is fully opaque. Default is 1
.
n_rows (int, optional) – Number of rows in the subplot grid. If not provided, it will be calculated automatically.
n_cols (int, optional) – Number of columns in the subplot grid. If not provided, it will be calculated automatically.
w_pad (float, optional) – Width padding between subplots, default is 1.0
.
h_pad (float, optional) – Height padding between subplots, default is 1.0
.
image_path_png (str, optional) – Directory path to save the PNG image of the overall distribution plots.
image_path_svg (str, optional) – Directory path to save the SVG image of the overall distribution plots.
image_filename (str, optional) – Filename to use when saving the overall distribution plots.
bbox_inches (str, optional) – Bounding box to use when saving the figure. For example, 'tight'
.
single_var_image_filename (str, optional) – Filename to use when saving the separate distribution plots. The variable name will be appended to this filename. This parameter uses figsize
for determining the plot size, ignoring grid_figsize
.
y_axis_label (str, optional) – The label to display on the y-axis
, default is 'Density'
.
plot_type (str, optional) – The type of plot to generate, options are 'hist'
, 'kde'
, or 'both'
. Default is 'both'
.
log_scale_vars (str or list of str, optional) – Variable name(s) to apply log scaling. Can be a single string or a list of strings.
bins (int or sequence, optional) – Specification of histogram bins, default is 'auto'
.
binwidth (float, optional) – Width of each bin, overrides bins but can be used with binrange.
label_fontsize (int, optional) – Font size for axis labels, including xlabel, ylabel, and tick marks, default is 10
.
tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10
.
text_wrap (int, optional) – Maximum width of the title text before wrapping, default is 50
.
disable_sci_notation (bool, optional) – Toggle to disable scientific notation on axes, default is False
.
stat (str, optional) – Aggregate statistic to compute in each bin (e.g., 'count'
, 'frequency'
, 'probability'
, 'percent'
, 'density'
), default is 'density'
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min
, max
).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min
, max
).
plot_mean (bool, optional) – Whether to plot the mean as a vertical line, default is False
.
plot_median (bool, optional) – Whether to plot the median as a vertical line, default is False
.
std_dev_levels (list of int, optional) – Levels of standard deviation to plot around the mean.
std_color (str or list of str, optional) – Color(s) for the standard deviation lines, default is '#808080'
.
label_names (dict, optional) – Custom labels for the variables of interest. Keys should be column names, and values should be the corresponding labels to display.
show_legend (bool, optional) – Whether to show the legend on the plots, default is True
.
kwargs (additional keyword arguments) – Additional keyword arguments passed to the Seaborn plotting function.
ValueError –
If plot_type
is not one of 'hist'
, 'kde'
, or 'both'
.
If stat
is not one of 'count'
, 'density'
, 'frequency'
, 'probability'
, 'proportion'
, 'percent'
.
If log_scale_vars
contains variables that are not present in the DataFrame.
UserWarning –
If both bins
and binwidth
are specified, which may affect performance.
df (pandas.DataFrame) – The DataFrame containing the data to plot.
col (str) – The name of the column in the DataFrame to be analyzed.
func_col (list) – List of ground truth columns to be analyzed.
legend_labels_list (list) – List of legend labels for each ground truth column.
title (list) – List of titles for the plots.
kind (str, optional) – The kind of plot to generate ('bar'
or 'barh'
for horizontal bars), default is 'bar'
.
width (float, optional) – The width of the bars in the bar plot, default is 0.9
.
rot (int, optional) – The rotation angle of the x-axis
labels, default is 0
.
custom_order (list, optional) – Specifies a custom order for the categories in the col
.
image_path_png (str, optional) – Directory path where generated PNG plot images will be saved.
image_path_svg (str, optional) – Directory path where generated SVG plot images will be saved.
save_formats (list, optional) – List of file formats to save the plot images in.
color (list, optional) – List of colors to use for the plots. If not provided, a default color scheme is used.
output (str, optional) – Specify the output type: "plots_only"
, "crosstabs_only"
, or "both"
. Default is "both"
.
return_dict (bool, optional) – Specify whether to return the crosstabs dictionary, default is False
.
x (int, optional) – The width of the figure.
y (int, optional) – The height of the figure.
p (int, optional) – The padding between the subplots.
file_prefix (str, optional) – Prefix for the filename when output includes plots.
logscale (bool, optional) – Apply log scale to the y-axis
, default is False
.
plot_type (str, optional) – Specify the type of plot to generate: "both"
, "regular"
, "normalized"
. Default is "both"
.
show_legend (bool, optional) – Specify whether to show the legend, default is True
.
label_fontsize (int, optional) – Font size for axis labels, default is 12
.
tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10
.
text_wrap (int, optional) – The maximum width of the title text before wrapping, default is 50
.
remove_stacks (bool, optional) – If True
, removes stacks and creates a regular bar plot using only the col
parameter. Only works when plot_type
is set to 'regular'
. Default is False
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min, max).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min, max).
col (str) – The name of the column in the DataFrame to be analyzed.
func_col (list) – List of ground truth columns to be analyzed.
legend_labels_list (list) – List of legend labels for each ground truth column.
title (list) – List of titles for the plots.
kind (str, optional) – The kind of plot to generate ('bar'
or 'barh'
for horizontal bars), default is 'bar'
.
width (float, optional) – The width of the bars in the bar plot, default is 0.9
.
rot (int, optional) – The rotation angle of the x-axis
labels, default is 0
.
custom_order (list, optional) – Specifies a custom order for the categories in the col
.
image_path_png (str, optional) – Directory path where generated PNG plot images will be saved.
image_path_svg (str, optional) – Directory path where generated SVG plot images will be saved.
save_formats (list, optional) – List of file formats to save the plot images in.
color (list, optional) – List of colors to use for the plots. If not provided, a default color scheme is used.
output (str, optional) – Specify the output type: "plots_only"
, "crosstabs_only"
, or "both"
. Default is "both"
.
return_dict (bool, optional) – Specify whether to return the crosstabs dictionary, default is False
.
x (int, optional) – The width of the figure.
y (int, optional) – The height of the figure.
p (int, optional) – The padding between the subplots.
file_prefix (str, optional) – Prefix for the filename when output includes plots.
logscale (bool, optional) – Apply log scale to the y-axis
, default is False
.
plot_type (str, optional) – Specify the type of plot to generate: "both"
, "regular"
, "normalized"
. Default is "both"
.
show_legend (bool, optional) – Specify whether to show the legend, default is True
.
label_fontsize (int, optional) – Font size for axis labels, default is 12
.
tick_fontsize (int, optional) – Font size for tick labels on the axes, default is 10
.
text_wrap (int, optional) – The maximum width of the title text before wrapping, default is 50
.
remove_stacks (bool, optional) – If True
, removes stacks and creates a regular bar plot using only the col
parameter. Only works when plot_type
is set to 'regular'
. Default is False
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min, max).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min, max).
ValueError –
If output
is not one of "both"
, "plots_only"
, or "crosstabs_only"
.
If plot_type
is not one of "both"
, "regular"
, "normalized"
.
If remove_stacks
is set to True and plot_type
is not "regular"
.
If the lengths of title
, func_col
, and legend_labels_list
are not equal.
KeyError – If any columns specified in col
or func_col
are missing in the DataFrame.
KeyError – If any columns specified in col
or func_col
are missing in the DataFrame.
df (pandas.DataFrame) – The DataFrame containing the data to plot.
metrics_list (list of str) – List of metric names (columns in df) to plot.
metrics_comp (list of str) – List of comparison categories (columns in df).
n_rows (int, optional) – Number of rows in the subplot grid. Calculated automatically if not provided.
n_cols (int, optional) – Number of columns in the subplot grid. Calculated automatically if not provided.
image_path_png (str, optional) – Optional directory path to save .png
images.
image_path_svg (str, optional) – Optional directory path to save .svg
images.
save_plots (str, optional) – String, "all"
, "individual"
, or "grid"
to control saving plots.
show_legend (bool, optional) – Boolean, True if showing the legend in the plots. Default is True
.
plot_type (str, optional) – Specify the type of plot, either "boxplot"
or "violinplot"
. Default is "boxplot"
.
xlabel_rot (int, optional) – Rotation angle for x-axis
labels. Default is 0
.
show_plot (str, optional) – Specify the plot display mode: "individual"
, "grid"
, or "both"
. Default is "both"
.
rotate_plot (bool, optional) – Boolean, True if rotating (pivoting) the plots. Default is False
.
individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4)
.
grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots.
label_fontsize (int, optional) – Font size for axis labels. Default is 12
.
tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10
.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min
, max
).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min
, max
).
label_names (dict, optional) – Dictionary mapping original column names to custom labels. Default is None
.
metrics_list (list of str) – List of metric names (columns in df) to plot.
metrics_comp (list of str) – List of comparison categories (columns in df).
n_rows (int, optional) – Number of rows in the subplot grid. Calculated automatically if not provided.
n_cols (int, optional) – Number of columns in the subplot grid. Calculated automatically if not provided.
image_path_png (str, optional) – Optional directory path to save .png
images.
image_path_svg (str, optional) – Optional directory path to save .svg
images.
save_plots (str, optional) – String, "all"
, "individual"
, or "grid"
to control saving plots.
show_legend (bool, optional) – Boolean, True if showing the legend in the plots. Default is True
.
plot_type (str, optional) – Specify the type of plot, either "boxplot"
or "violinplot"
. Default is "boxplot"
.
xlabel_rot (int, optional) – Rotation angle for x-axis
labels. Default is 0
.
show_plot (str, optional) – Specify the plot display mode: "individual"
, "grid"
, or "both"
. Default is "both"
.
rotate_plot (bool, optional) – Boolean, True if rotating (pivoting) the plots. Default is False
.
individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4)
.
grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots.
label_fontsize (int, optional) – Font size for axis labels. Default is 12
.
tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10
.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min
, max
).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min
, max
).
label_names (dict, optional) – Dictionary mapping original column names to custom labels. Default is None
.
kwargs (additional keyword arguments) – Additional keyword arguments passed to the Seaborn plotting function.
ValueError –
If show_plot
is not one of "individual"
, "grid"
, or "both"
.
If save_plots
is not one of None
, "all"
, "individual"
, or "grid"
.
If save_plots
is set without specifying image_path_png
or image_path_svg
.
df (pandas.DataFrame) – The DataFrame containing the data.
x_vars (list of str, optional) – List of variable names to plot on the x-axis
.
y_vars (list of str, optional) – List of variable names to plot on the y-axis
.
n_rows (int, optional) – Number of rows in the subplot grid. Calculated based on the number of plots and n_cols
if not specified.
n_cols (int, optional) – Number of columns in the subplot grid. Calculated based on the number of plots and max_cols
if not specified.
max_cols (int, optional) – Maximum number of columns in the subplot grid. Default is 4
.
image_path_png (str, optional) – Directory path to save PNG images of the scatter plots.
image_path_svg (str, optional) – Directory path to save SVG images of the scatter plots.
save_plots (str, optional) – Controls which plots to save: "all"
, "individual"
, or "grid"
. If None, plots will not be saved.
show_legend (bool, optional) – Whether to display the legend on the plots. Default is True
.
xlabel_rot (int, optional) – Rotation angle for x-axis
labels. Default is 0
.
show_plot (str, optional) – Controls plot display: "individual"
, "grid"
, or "both"
. Default is "both"
.
rotate_plot (bool, optional) – Whether to rotate (pivot) the plots. Default is False
.
individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4)
.
grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots. Calculated based on the number of rows and columns if not specified.
label_fontsize (int, optional) – Font size for axis labels. Default is 12.
tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50
.
add_best_fit_line (bool, optional) – Whether to add a best fit line to the scatter plots. Default is False
.
scatter_color (str, optional) – Color code for the scattered points. Default is "C0"
.
best_fit_linecolor (str, optional) – Color code for the best fit line. Default is "red"
.
best_fit_linestyle (str, optional) – Linestyle for the best fit line. Default is "-"
.
hue (str, optional) – Column name for the grouping variable that will produce points with different colors.
hue_palette (dict, list, or str, optional) – Specifies colors for each hue level. Can be a dictionary mapping hue levels to colors, a list of colors, or the name of a seaborn color palette. This parameter requires the hue
parameter to be set.
size (str, optional) – Column name for the grouping variable that will produce points with different sizes.
sizes (dict, optional) – Dictionary mapping sizes (smallest and largest) to min and max values.
marker (str, optional) – Marker style used for the scatter points. Default is "o"
.
show_correlation (bool, optional) – Whether to display the Pearson correlation coefficient in the plot title. Default is True
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min
, max
).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min
, max
).
all_vars (list of str, optional) – If provided, automatically generates scatter plots for all combinations of variables in this list, overriding x_vars and y_vars.
label_names (dict, optional) – A dictionary to rename columns for display in the plot titles and labels.
kwargs (dict, optional) – Additional keyword arguments to pass to sns.scatterplot
.
x_vars (list of str, optional) – List of variable names to plot on the x-axis
.
y_vars (list of str, optional) – List of variable names to plot on the y-axis
.
n_rows (int, optional) – Number of rows in the subplot grid. Calculated based on the number of plots and n_cols
if not specified.
n_cols (int, optional) – Number of columns in the subplot grid. Calculated based on the number of plots and max_cols
if not specified.
max_cols (int, optional) – Maximum number of columns in the subplot grid. Default is 4
.
image_path_png (str, optional) – Directory path to save PNG images of the scatter plots.
image_path_svg (str, optional) – Directory path to save SVG images of the scatter plots.
save_plots (str, optional) – Controls which plots to save: "all"
, "individual"
, or "grid"
. If None, plots will not be saved.
show_legend (bool, optional) – Whether to display the legend on the plots. Default is True
.
xlabel_rot (int, optional) – Rotation angle for x-axis
labels. Default is 0
.
show_plot (str, optional) – Controls plot display: "individual"
, "grid"
, or "both"
. Default is "both"
.
rotate_plot (bool, optional) – Whether to rotate (pivot) the plots. Default is False
.
individual_figsize (tuple or list, optional) – Width and height of the figure for individual plots. Default is (6, 4)
.
grid_figsize (tuple or list, optional) – Width and height of the figure for grid plots. Calculated based on the number of rows and columns if not specified.
label_fontsize (int, optional) – Font size for axis labels. Default is 12.
tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50
.
add_best_fit_line (bool, optional) – Whether to add a best fit line to the scatter plots. Default is False
.
scatter_color (str, optional) – Color code for the scattered points. Default is "C0"
.
best_fit_linecolor (str, optional) – Color code for the best fit line. Default is "red"
.
best_fit_linestyle (str, optional) – Linestyle for the best fit line. Default is "-"
.
hue (str, optional) – Column name for the grouping variable that will produce points with different colors.
hue_palette (dict, list, or str, optional) – Specifies colors for each hue level. Can be a dictionary mapping hue levels to colors, a list of colors, or the name of a seaborn color palette. This parameter requires the hue
parameter to be set.
size (str, optional) – Column name for the grouping variable that will produce points with different sizes.
sizes (dict, optional) – Dictionary mapping sizes (smallest and largest) to min and max values.
marker (str, optional) – Marker style used for the scatter points. Default is "o"
.
show_correlation (bool, optional) – Whether to display the Pearson correlation coefficient in the plot title. Default is True
.
xlim (tuple or list, optional) – Limits for the x-axis
as a tuple or list of (min
, max
).
ylim (tuple or list, optional) – Limits for the y-axis
as a tuple or list of (min
, max
).
all_vars (list of str, optional) – If provided, automatically generates scatter plots for all combinations of variables in this list, overriding x_vars and y_vars.
label_names (dict, optional) – A dictionary to rename columns for display in the plot titles and labels.
kwargs (dict, optional) – Additional keyword arguments to pass to sns.scatterplot
.
ValueError –
If all_vars
is provided and either x_vars
or y_vars
is also provided.
If neither all_vars
nor both x_vars
and y_vars
are provided.
If hue_palette
is specified without hue
.
df (pandas.DataFrame) – The DataFrame containing the data.
cols (list of str, optional) – List of column names to include in the correlation matrix. If None, all columns are included.
annot (bool, optional) – Whether to annotate the heatmap with correlation coefficients. Default is True
.
cmap (str, optional) – The colormap to use for the heatmap. Default is "coolwarm"
.
save_plots (bool, optional) – Controls whether to save the plots. Default is False
.
image_path_png (str, optional) – Directory path to save PNG images of the heatmap.
image_path_svg (str, optional) – Directory path to save SVG images of the heatmap.
figsize (tuple, optional) – Width and height of the figure for the heatmap. Default is (10, 10)
.
title (str, optional) – Title of the heatmap. Default is "Cervical Cancer Data: Correlation Matrix"
.
label_fontsize (int, optional) – Font size for tick labels and colorbar label. Default is 12
.
tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10
.
xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 45
.
ylabel_rot (int, optional) – Rotation angle for y-axis labels. Default is 0
.
xlabel_alignment (str, optional) – Horizontal alignment for x-axis labels. Default is "right"
.
ylabel_alignment (str, optional) – Vertical alignment for y-axis labels. Default is "center_baseline"
.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50
.
vmin (float, optional) – Minimum value for the heatmap color scale. Default is -1
.
vmax (float, optional) – Maximum value for the heatmap color scale. Default is 1
.
cbar_label (str, optional) – Label for the colorbar. Default is "Correlation Index"
.
triangular (bool, optional) – Whether to show only the upper triangle of the correlation matrix. Default is True
.
kwargs (dict, optional) – Additional keyword arguments to pass to seaborn.heatmap()
.
cols (list of str, optional) – List of column names to include in the correlation matrix. If None, all columns are included.
annot (bool, optional) – Whether to annotate the heatmap with correlation coefficients. Default is True
.
cmap (str, optional) – The colormap to use for the heatmap. Default is "coolwarm"
.
save_plots (bool, optional) – Controls whether to save the plots. Default is False
.
image_path_png (str, optional) – Directory path to save PNG images of the heatmap.
image_path_svg (str, optional) – Directory path to save SVG images of the heatmap.
figsize (tuple, optional) – Width and height of the figure for the heatmap. Default is (10, 10)
.
title (str, optional) – Title of the heatmap. Default is "Cervical Cancer Data: Correlation Matrix"
.
label_fontsize (int, optional) – Font size for tick labels and colorbar label. Default is 12
.
tick_fontsize (int, optional) – Font size for axis tick labels. Default is 10
.
xlabel_rot (int, optional) – Rotation angle for x-axis labels. Default is 45
.
ylabel_rot (int, optional) – Rotation angle for y-axis labels. Default is 0
.
xlabel_alignment (str, optional) – Horizontal alignment for x-axis labels. Default is "right"
.
ylabel_alignment (str, optional) – Vertical alignment for y-axis labels. Default is "center_baseline"
.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Default is 50
.
vmin (float, optional) – Minimum value for the heatmap color scale. Default is -1
.
vmax (float, optional) – Maximum value for the heatmap color scale. Default is 1
.
cbar_label (str, optional) – Label for the colorbar. Default is "Correlation Index"
.
triangular (bool, optional) – Whether to show only the upper triangle of the correlation matrix. Default is True
.
kwargs (dict, optional) – Additional keyword arguments to pass to seaborn.heatmap()
.
ValueError –
If annot
is not a boolean.
If cols
is not a list.
If save_plots
is not a boolean.
model (estimator object) – The trained machine learning model used to generate partial dependence plots.
X_train (pandas.DataFrame or numpy.ndarray) – The training data used to compute partial dependence. Should correspond to the features used to train the model.
feature_names (list of str) – A list of feature names corresponding to the columns in X_train
.
features (list of int or tuple of int) – A list of feature indices or tuples of feature indices for which to generate partial dependence plots.
title (str, optional) – The title for the entire plot. Default is "PDP of house value on CA non-location features"
.
grid_resolution (int, optional) – The number of grid points to use for plotting the partial dependence. Higher values provide smoother curves but may increase computation time. Default is 50
.
plot_type (str, optional) – The type of plot to generate. Choose "grid"
for a grid layout, "individual"
for separate plots, or "both"
to generate both layouts. Default is "grid"
.
grid_figsize (tuple, optional) – Tuple specifying the width and height of the figure for the grid layout. Default is (12, 8)
.
individual_figsize (tuple, optional) – Tuple specifying the width and height of the figure for individual plots. Default is (6, 4)
.
label_fontsize (int, optional) – Font size for the axis labels and titles. Default is 12
.
tick_fontsize (int, optional) – Font size for the axis tick labels. Default is 10
.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Useful for managing long titles. Default is 50
.
image_path_png (str, optional) – The directory path where PNG images of the plots will be saved, if saving is enabled.
image_path_svg (str, optional) – The directory path where SVG images of the plots will be saved, if saving is enabled.
save_plots (str, optional) – Controls whether to save the plots. Options include "all"
, "individual"
, "grid"
, or None
(default). If saving is enabled, ensure image_path_png
or image_path_svg
are provided.
file_prefix (str, optional) – Prefix for the filenames of the saved grid plots. Default is "partial_dependence"
.
feature_names (list of str) – A list of feature names corresponding to the columns in X_train
.
features (list of int or tuple of int) – A list of feature indices or tuples of feature indices for which to generate partial dependence plots.
title (str, optional) – The title for the entire plot. Default is "PDP of house value on CA non-location features"
.
grid_resolution (int, optional) – The number of grid points to use for plotting the partial dependence. Higher values provide smoother curves but may increase computation time. Default is 50
.
plot_type (str, optional) – The type of plot to generate. Choose "grid"
for a grid layout, "individual"
for separate plots, or "both"
to generate both layouts. Default is "grid"
.
grid_figsize (tuple, optional) – Tuple specifying the width and height of the figure for the grid layout. Default is (12, 8)
.
individual_figsize (tuple, optional) – Tuple specifying the width and height of the figure for individual plots. Default is (6, 4)
.
label_fontsize (int, optional) – Font size for the axis labels and titles. Default is 12
.
tick_fontsize (int, optional) – Font size for the axis tick labels. Default is 10
.
text_wrap (int, optional) – The maximum width of the title text before wrapping. Useful for managing long titles. Default is 50
.
image_path_png (str, optional) – The directory path where PNG images of the plots will be saved, if saving is enabled.
image_path_svg (str, optional) – The directory path where SVG images of the plots will be saved, if saving is enabled.
save_plots (str, optional) – Controls whether to save the plots. Options include "all"
, "individual"
, "grid"
, or None
(default). If saving is enabled, ensure image_path_png
or image_path_svg
are provided.
file_prefix (str, optional) – Prefix for the filenames of the saved grid plots. Default is "partial_dependence"
.
ValueError –
If plot_type
is not one of "grid"
, "individual"
, or "both"
.
If save_plots
is enabled but neither image_path_png
nor image_path_svg
is provided.
model (estimator object) – The trained machine learning model used to generate partial dependence plots.
dataframe (pandas.DataFrame or numpy.ndarray) – The dataset on which the model was trained or a representative sample. If a DataFrame is provided, feature_names_list
should correspond to the column names. If a NumPy array is provided, feature_names_list
should correspond to the indices of the columns.
feature_names_list (list of str) – A list of two feature names or indices corresponding to the features for which partial dependence plots are generated.
x_label (str, optional) – Label for the x-axis in the plots. Default is None
.
y_label (str, optional) – Label for the y-axis in the plots. Default is None
.
z_label (str, optional) – Label for the z-axis in the plots. Default is None
.
title (str) – The title for the plots.
html_file_path (str, optional) – Path to save the interactive Plotly HTML file. Required if plot_type
is "interactive"
or "both"
. Default is None
.
html_file_name (str, optional) – Name of the HTML file to save the interactive Plotly plot. Required if plot_type
is "interactive"
or "both"
. Default is None
.
image_filename (str, optional) – Base filename for saving static Matplotlib plots as PNG and/or SVG. Default is None
.
plot_type (str, optional) – The type of plots to generate. Options are: +
feature_names_list (list of str) – A list of two feature names or indices corresponding to the features for which partial dependence plots are generated.
x_label (str, optional) – Label for the x-axis in the plots. Default is None
.
y_label (str, optional) – Label for the y-axis in the plots. Default is None
.
z_label (str, optional) – Label for the z-axis in the plots. Default is None
.
title (str) – The title for the plots.
html_file_path (str, optional) – Path to save the interactive Plotly HTML file. Required if plot_type
is "interactive"
or "both"
. Default is None
.
html_file_name (str, optional) – Name of the HTML file to save the interactive Plotly plot. Required if plot_type
is "interactive"
or "both"
. Default is None
.
image_filename (str, optional) – Base filename for saving static Matplotlib plots as PNG and/or SVG. Default is None
.
plot_type (str, optional) – The type of plots to generate. Options are:
- "static"
: Generate only static Matplotlib plots.
- "interactive"
: Generate only interactive Plotly plots.
- "both"
: Generate both static and interactive plots. Default is "both"
.
matplotlib_colormap (matplotlib.colors.Colormap, optional) – Custom colormap for the Matplotlib plot. If not provided, a default colormap is used.
plotly_colormap (str, optional) – Colormap for the Plotly plot. Default is "Viridis"
.
zoom_out_factor (float, optional) – Factor to adjust the zoom level of the Plotly plot. Default is None
.
wireframe_color (str, optional) – Color for the wireframe in the Matplotlib plot. If None
, no wireframe is plotted. Default is None
.
view_angle (tuple, optional) – Elevation and azimuthal angles for the Matplotlib plot view. Default is (22, 70)
.
figsize (tuple, optional) – Figure size for the Matplotlib plot. Default is (7, 4.5)
.
text_wrap (int, optional) – Maximum width of the title text before wrapping. Useful for managing long titles. Default is 50
.
horizontal (float, optional) – Horizontal camera position for the Plotly plot. Default is -1.25
.
depth (float, optional) – Depth camera position for the Plotly plot. Default is 1.25
.
vertical (float, optional) – Vertical camera position for the Plotly plot. Default is 1.25
.
cbar_x (float, optional) – Position of the color bar along the x-axis in the Plotly plot. Default is 1.05
.
cbar_thickness (int, optional) – Thickness of the color bar in the Plotly plot. Default is 25
.
title_x (float, optional) – Horizontal position of the title in the Plotly plot. Default is 0.5
.
title_y (float, optional) – Vertical position of the title in the Plotly plot. Default is 0.95
.
top_margin (int, optional) – Top margin for the Plotly plot layout. Default is 100
.
image_path_png (str, optional) – Directory path to save the PNG file of the Matplotlib plot. Default is None.
image_path_svg (str, optional) – Directory path to save the SVG file of the Matplotlib plot. Default is None.
show_cbar (bool, optional) – Whether to display the color bar in the Matplotlib plot. Default is True
.
grid_resolution (int, optional) – The resolution of the grid for computing partial dependence. Default is 20
.
left_margin (int, optional) – Left margin for the Plotly plot layout. Default is 20
.
right_margin (int, optional) – Right margin for the Plotly plot layout. Default is 65
.
label_fontsize (int, optional) – Font size for axis labels in the Matplotlib plot. Default is 8
.
tick_fontsize (int, optional) – Font size for tick labels in the Matplotlib plot. Default is 6
.
enable_zoom (bool, optional) – Whether to enable zooming in the Plotly plot. Default is True
.
show_modebar (bool, optional) – Whether to display the mode bar in the Plotly plot. Default is True
.
plotly_colormap (str, optional) – Colormap for the Plotly plot. Default is "Viridis"
.
zoom_out_factor (float, optional) – Factor to adjust the zoom level of the Plotly plot. Default is None
.
wireframe_color (str, optional) – Color for the wireframe in the Matplotlib plot. If None
, no wireframe is plotted. Default is None
.
view_angle (tuple, optional) – Elevation and azimuthal angles for the Matplotlib plot view. Default is (22, 70)
.
figsize (tuple, optional) – Figure size for the Matplotlib plot. Default is (7, 4.5)
.
text_wrap (int, optional) – Maximum width of the title text before wrapping. Useful for managing long titles. Default is 50
.
horizontal (float, optional) – Horizontal camera position for the Plotly plot. Default is -1.25
.
depth (float, optional) – Depth camera position for the Plotly plot. Default is 1.25
.
vertical (float, optional) – Vertical camera position for the Plotly plot. Default is 1.25
.
cbar_x (float, optional) – Position of the color bar along the x-axis in the Plotly plot. Default is 1.05
.
cbar_thickness (int, optional) – Thickness of the color bar in the Plotly plot. Default is 25
.
title_x (float, optional) – Horizontal position of the title in the Plotly plot. Default is 0.5
.
title_y (float, optional) – Vertical position of the title in the Plotly plot. Default is 0.95
.
top_margin (int, optional) – Top margin for the Plotly plot layout. Default is 100
.
image_path_png (str, optional) – Directory path to save the PNG file of the Matplotlib plot. Default is None.
image_path_svg (str, optional) – Directory path to save the SVG file of the Matplotlib plot. Default is None.
show_cbar (bool, optional) – Whether to display the color bar in the Matplotlib plot. Default is True
.
grid_resolution (int, optional) – The resolution of the grid for computing partial dependence. Default is 20
.
left_margin (int, optional) – Left margin for the Plotly plot layout. Default is 20
.
right_margin (int, optional) – Right margin for the Plotly plot layout. Default is 65
.
label_fontsize (int, optional) – Font size for axis labels in the Matplotlib plot. Default is 8
.
tick_fontsize (int, optional) – Font size for tick labels in the Matplotlib plot. Default is 6
.
enable_zoom (bool, optional) – Whether to enable zooming in the Plotly plot. Default is True
.
show_modebar (bool, optional) – Whether to display the mode bar in the Plotly plot. Default is True
.
ValueError –
If plot_type is not one of "static"
, "interactive"
, or "both"
.
If plot_type is "interactive"
or "both"
and html_file_path
or html_file_name
are not provided.