Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Clustering Method to PriceTaker #1579

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Price takers are entities that must accept market prices since they lack the mar
to directly influence the market price. Likewise, it is assumed that a price taker's resource or energy
system is small enough such that it does not significantly impact the market. When coupled with multi-period modeling,
the price-taker model is able to synthesize grid-centric modeling with steady-state process-centric modeling, as
depicted in figure below.
depicted in the figure below.

.. |pricetaker| image:: images/pricetaker.png
:width: 1200
Expand All @@ -32,15 +32,13 @@ The following equations represent the multi-period price taker model, where :mat
h(d,u_{s,t},δ_{s,t},u_{s,t+1},δ_{s,t+1}) = 0; ∀_{s} ∈ S, t ∈ T


The price taker multi-period modeling workflow involves the integration of multiple software platforms into the IDAES optimization model
and can be broken down into two distinct functions, as shown in the figure below. In part 1, simulated or historical
ISO (Independent System Operator) data is used to generate locational marginal price (LMP)
The price taker multi-period modeling workflow is shown in part 1 of the figure below. The price taker model uses
simulated or historical ISO (Independent System Operator) data to generate locational marginal price (LMP)
signals, and production cost models (PCMs) are used to compute and optimize the time-varying dispatch schedules for each
resource based on their respective bid curves. Advanced data analytics (RAVEN) reinterpret the LMP signals and PCM
resource based on their respective bid curves. Advanced data analytics reinterpret the LMP signals and PCM
as stochastic realizations of the LMPs in the form of representative days (or simply the full-year price signals).
In part 2, PRESCIENT uses a variety of input parameters (design capacity, minimum power output, ramp rate, minimum up/down time, marginal cost, no load cost, and startup profile)
to generate data for the market surrogates. Meanwhile, IDAES uses the double loop simulation to integrate detailed
process models (b, ii) into the daily (a, c) and hourly (i, iii) grid operations workflow.
Part 2 describes how a variety of input parameters (design capacity, minimum power output, ramp rate, minimum up/down time, marginal cost, no load cost, and startup profile)
are used generate data for market surrogates, which, in conjunction with the price taker model, can be used to optimize hybrid energy systems.

.. |hybrid_energy_system| image:: images/hybrid_energy_system.png
:width: 1200
Expand Down
135 changes: 101 additions & 34 deletions idaes/apps/grid_integration/pricetaker/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.interpolate import splrep, splev
from pyomo.common.dependencies import attempt_import
import idaes.logger as idaeslog

Expand Down Expand Up @@ -73,7 +74,11 @@ def generate_daily_data(raw_data: list, horizon_length: int):


def cluster_lmp_data(
raw_data: list, horizon_length: int, n_clusters: int, seed: int = 42
raw_data: list,
horizon_length: int,
n_clusters: int,
seed: int = 42,
eps: int = 1e-4,
):
"""
Clusters the given price signal into n_clusters using the k-means clustering
Expand All @@ -92,6 +97,9 @@ def cluster_lmp_data(
seed: int,
Seed value for initializing random number generator within Kmeans

eps: int,
Centroid values below this threshold are set to 0 to limit noise in the data

Returns:
lmp_data_clusters: dict
A dictionary of representative day LMP data, indices are indexed
Expand All @@ -116,7 +124,7 @@ def cluster_lmp_data(
labels = kmeans.labels_

# Set any centroid values that are < 1e-4 to 0 to avoid noise
centroids = centroids * (abs(centroids) >= 1e-4)
centroids = centroids * (abs(centroids) >= eps)

# Create dicts for lmp data and the weight of each cluster
# By default, the data is of type numpy.int or numpy.float.
Expand All @@ -136,6 +144,7 @@ def get_optimal_num_clusters(
kmin: int = 2,
kmax: int = 30,
method: str = "silhouette",
sensitivity: int = 1,
generate_elbow_plot: bool = False,
seed: int = 42,
):
Expand All @@ -158,6 +167,9 @@ def get_optimal_num_clusters(
Method for obtaining the optimal number of clusters.
Supported methods are elbow and silhouette

sensitivity : int,
difficulty of detecting knees, where 0 detects knees the easiest

generate_elbow_plot : bool,
If True, generates an elbow plot for inertia as a function of
number of clusters
Expand All @@ -181,28 +193,94 @@ def get_optimal_num_clusters(

k_values = list(range(kmin, kmax + 1))
inertia_values = []
mean_silhouette = []

for k in k_values:
kmeans = KMeans(n_clusters=k, random_state=seed).fit(samples)
inertia_values.append(kmeans.inertia_)

if method == "silhouette":
# Calculate the average silhouette score, if the chosen method
# is silhouette
mean_silhouette.append(silhouette_score(samples, kmeans.labels_))

# Identify the optimal number of clusters
if method == "elbow":
raise NotImplementedError(
"elbow method is not supported currently for finding the optimal "
"number of clusters."
)
if method == "silhouette":
mean_silhouette = []
mean_silhouette.append(silhouette_score(samples, kmeans.labels_))

elif method == "silhouette":
# Identify the optimal number of clusters
max_index = mean_silhouette.index(max(mean_silhouette))
n_clusters = k_values[max_index]

# The implementation below is based on
# Ville Satopaa, Jeannie Albrecht, David Irwin, Barath Raghavan
# Finding a “Kneedle” in a Haystack:
# Detecting Knee Points in System Behavior
# https://raghavan.usc.edu/papers/kneedle-simplex11.pdf

elif method == "elbow":
# Invert inertia values such that plot is concave down and increasing
inverted_inertia_values = [-y for y in inertia_values]
# TODO: See if make_splrep works and generates similar results
# Use a smoothing spline that retains the data's original shape
spline = splrep(k_values, inverted_inertia_values, k=3)
k_smooth = np.linspace(kmin, kmax + 1, 100)
inertia_smooth = splev(k_smooth, spline)

k_norm = _normalize_values(k_smooth)
inertia_norm = _normalize_values(inertia_smooth)

# Compute the set of differences (x, y) -> (x, y-x)
inertia_diff = []
for i in range(len(inertia_norm)):
set_of_differences = inertia_norm[i] - k_norm[i]
inertia_diff.append(set_of_differences)

# Identify local maxima
local_maxima = [
(k_norm[i], inertia_diff[i])
for i in range(1, len(inertia_diff) - 1)
if inertia_diff[i] > inertia_diff[i - 1]
and inertia_diff[i] > inertia_diff[i + 1]
]

# Calculate optimal # of clusters based on the # of local maxima
threshold_values = []
threshold_triggered = False
n_clusters = 0

if len(local_maxima) == 0:
n_clusters = 0
_logger.warning(
"The optimal number of cluster cannot be determined for this dataset."
)
elif len(local_maxima) == 1:
n_clusters_norm = local_maxima[0][0]
ind = k_norm.index(n_clusters_norm)
n_clusters = k_values[ind]
else:
n = len(local_maxima)
summation = 0
for i in range(0, n - 1):
summation += local_maxima[i + 1][0] - local_maxima[i][0]
# For each local maxima, compute the threshold and determine the index of the current and next local maxima
for i in range(0, n - 1):
threshold = local_maxima[i][1] - (sensitivity * summation / (n - 1))
threshold_values.append(threshold)
l_max_index = inertia_diff.index(local_maxima[i][1])
next_l_max_index = inertia_diff.index(local_maxima[i + 1][1])
for j in range(l_max_index + 1, next_l_max_index):
if inertia_diff[j] < threshold:
threshold_triggered = True
normalized_optimal_n_clusters = local_maxima[i][0]
index = k_norm.index(normalized_optimal_n_clusters)
n_clusters = x[index]
if threshold_triggered:
break
# If optimal # of clusters cannot be identified, use the first local maxima
if not threshold_triggered:
normalized_optimal_n_clusters = local_maxima[0][0]
index = k_norm.index(normalized_optimal_n_clusters)
n_clusters = k_values[index]
_logger.warning(
"The number of optimal clusters could not be accurately identified. "
"Consider lowering the sensitivity."
)

else:
raise ValueError(
f"Unrecognized method {method} for optimal number of clusters."
Expand All @@ -229,24 +307,13 @@ def get_optimal_num_clusters(
return n_clusters


def locate_elbow(x: list, y: list):
"""
Identifies the elbow/knee for the input/output data
def _normalize_values(values):
normalized_values = []

Args:
x : list
List of independent variables
y : list
List of dependent variables

Returns:
opt_x : float
Optimal x at which curvature changes significantly
"""
# The implementation is based on
# Ville Satopaa, Jeannie Albrecht, David Irwin, Barath Raghavan
# Finding a “Kneedle” in a Haystack:
# Detecting Knee Points in System Behavior
# https://raghavan.usc.edu/papers/kneedle-simplex11.pdf
for i in values:
max_value = max(values)
min_value = min(values)
normalized_value = (i - min_value) / (max_value - min_value)
normalized_values.append(normalized_value)

return len(np.array(x)) + len(np.array(y))
return normalized_values
Original file line number Diff line number Diff line change
Expand Up @@ -173,15 +173,15 @@ def my_operation_model(m, design_blk):
ConfigValue(
default=True,
domain=Bool,
doc="Should op_mode, startup, shutdown vars be defined?",
doc="Boolean flag to determine if `op_mode`, `startup`, and `shutdown` vars should be defined",
),
)
CONFIG.declare(
"declare_lmp_param",
ConfigValue(
default=True,
domain=Bool,
doc="Should LMP data automatically be appended to the model?",
doc="Boolean flag to determine if LMP data should automatically be appended to the model",
),
)

Expand Down
46 changes: 23 additions & 23 deletions idaes/apps/grid_integration/pricetaker/price_taker_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
"horizon_length",
ConfigValue(
domain=PositiveInt,
doc="Length of each representative day/period",
doc="Length of each time horizon",
),
)
CONFIG.declare(
Expand Down Expand Up @@ -99,14 +99,14 @@

# List of arguments needed for adding startup/shutdown constraints
CONFIG.declare(
"up_time",
"minimum_up_time",
ConfigValue(
domain=PositiveInt,
doc="Minimum uptime [in hours]",
),
)
CONFIG.declare(
"down_time",
"minimum_down_time",
ConfigValue(
domain=PositiveInt,
doc="Minimum downtime [in hours]",
Expand Down Expand Up @@ -136,10 +136,10 @@
),
)
CONFIG.declare(
"annualization_factor",
"capital_recovery_factor",
ConfigValue(
domain=is_in_range(0, 1),
doc="Capital cost annualization factor [fraction]",
doc="Capital cost annualization factor [fraction of investment cost/year]",
),
)
CONFIG.declare(
Expand Down Expand Up @@ -620,8 +620,8 @@ def add_startup_shutdown(
self,
op_block_name: str,
des_block_name: Optional[str] = None,
up_time: int = 1,
down_time: int = 1,
minimum_up_time: int = 1,
minimum_down_time: int = 1,
):
"""
Adds minimum uptime/downtime constraints for a given unit/process
Expand All @@ -635,19 +635,19 @@ def add_startup_shutdown(
op_block_name. This argument is specified if the design is
being optimized simultaneously, e.g., "ngcc_design"

up_time: int, default=1,
minimum_up_time: int, default=1,
Total uptime (must be >= 1), e.g., 4 time periods
Uptime must include the minimum uptime and the time required
for shutdown.

down_time: int, default=1,
minimum_down_time: int, default=1,
Total downtime (must be >= 1), e.g., 4 time periods
Downtime must include the minimum downtime and the time
required for startup
"""
# Check up_time and down_time for validity
self.config.up_time = up_time
self.config.down_time = down_time
# Check minimum_up_time and minimum_down_time for validity
self.config.minimum_up_time = minimum_up_time
self.config.minimum_down_time = minimum_down_time

op_blocks = self._get_operation_blocks(
blk_name=op_block_name,
Expand Down Expand Up @@ -681,15 +681,15 @@ def add_startup_shutdown(
blk=start_shut_blk[d],
op_blocks=op_blocks[d],
install_unit=install_unit,
up_time=up_time,
down_time=down_time,
minimum_up_time=minimum_up_time,
minimum_down_time=minimum_down_time,
set_time=self.set_time,
)

# Save the uptime and downtime data for reference
self._op_blk_uptime_downtime[op_block_name] = {
"up_time": up_time,
"down_time": down_time,
"minimum_up_time": minimum_up_time,
"minimum_down_time": minimum_down_time,
}

# Logger info for where constraint is located on the model
Expand Down Expand Up @@ -803,7 +803,7 @@ def add_overall_cashflows(
lifetime: int = 30,
discount_rate: float = 0.08,
corporate_tax_rate: float = 0.2,
annualization_factor: Optional[float] = None,
capital_recovery_factor: Optional[float] = None,
cash_inflow_scale_factor: Optional[float] = 1.0,
):
"""
Expand All @@ -822,7 +822,7 @@ def add_overall_cashflows(
Fractional value of corporate tax used in NPV calculations.
Must be between 0 and 1.

annualization_factor: float, default=None,
capital_recovery_factor: float, default=None,
Annualization factor

cash_inflow_scale_factor: float, default=1.0,
Expand All @@ -841,7 +841,7 @@ def add_overall_cashflows(
self.config.lifetime = lifetime
self.config.discount_rate = discount_rate
self.config.corporate_tax = corporate_tax_rate
self.config.annualization_factor = annualization_factor
self.config.capital_recovery_factor = capital_recovery_factor
self.config.cash_inflow_scale_factor = cash_inflow_scale_factor

if not self._has_hourly_cashflows:
Expand Down Expand Up @@ -902,17 +902,17 @@ def add_overall_cashflows(
expr=cf.net_profit == cf.net_cash_inflow - cf.fom - cf.corporate_tax
)

if annualization_factor is None:
if capital_recovery_factor is None:
# If the annualization factor is not specified
annualization_factor = discount_rate / (
capital_recovery_factor = discount_rate / (
1 - (1 + discount_rate) ** (-lifetime)
)

cf.lifetime_npv = Expression(
expr=(1 / annualization_factor) * cf.net_profit - cf.capex
expr=(1 / capital_recovery_factor) * cf.net_profit - cf.capex
)
cf.npv = Expression(
expr=cf.net_profit - annualization_factor * cf.capex,
expr=cf.net_profit - capital_recovery_factor * cf.capex,
)

self._has_overall_cashflows = True
Expand Down
Loading
Loading