Skip to content

Commit

Permalink
Update R, packages, etc. and bug fixing
Browse files Browse the repository at this point in the history
  • Loading branch information
ahaeusser committed Dec 21, 2024
1 parent cc6632b commit 8ce18e6
Show file tree
Hide file tree
Showing 12 changed files with 2,856 additions and 2,952 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Imports:
crayon
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
RoxygenNote: 7.3.2
Suggests:
knitr,
rmarkdown,
Expand Down
4 changes: 2 additions & 2 deletions R/plot_bar.R
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,8 @@ plot_bar <- function(data,
# Adjust annotations
p <- p + labs(title = title)
p <- p + labs(subtitle = subtitle)
p <- p + labs(x = if_else(is_empty(xlab), as_name(enquo(x)), xlab))
p <- p + labs(y = if_else(is_empty(ylab), as_name(enquo(y)), ylab))
p <- p + labs(x = xlab)
p <- p + labs(y = ylab)
p <- p + labs(caption = caption)

# Adjust ggplot2 theme
Expand Down
4 changes: 2 additions & 2 deletions R/plot_density.R
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ plot_density <- function(data,
# Adjust annotations
p <- p + labs(title = title)
p <- p + labs(subtitle = subtitle)
p <- p + labs(x = if_else(is_empty(xlab), as_name(enquo(x)), xlab))
p <- p + labs(y = if_else(is_empty(ylab), "Density", ylab))
p <- p + labs(x = xlab)
p <- p + labs(y = ylab)
p <- p + labs(caption = caption)

# Adjust ggplot2 theme
Expand Down
4 changes: 2 additions & 2 deletions R/plot_histogram.R
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,8 @@ plot_histogram <- function(data,
# Adjust annotations
p <- p + labs(title = title)
p <- p + labs(subtitle = subtitle)
p <- p + labs(x = if_else(is_empty(xlab), as_name(enquo(x)), xlab))
p <- p + labs(y = if_else(is_empty(ylab), "Count", ylab))
p <- p + labs(x = xlab)
p <- p + labs(y = ylab)
p <- p + labs(caption = caption)

# Adjust ggplot2 theme
Expand Down
4 changes: 2 additions & 2 deletions R/plot_line.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ plot_line <- function(data,
# Adjust annotations
p <- p + labs(title = title)
p <- p + labs(subtitle = subtitle)
p <- p + labs(x = if_else(is_empty(xlab), as_name(enquo(x)), xlab))
p <- p + labs(y = if_else(is_empty(ylab), as_name(enquo(y)), ylab))
p <- p + labs(x = xlab)
p <- p + labs(y = ylab)
p <- p + labs(caption = caption)

# Adjust ggplot2 theme
Expand Down
4 changes: 2 additions & 2 deletions R/plot_point.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ plot_point <- function(data,
# Adjust annotations
p <- p + labs(title = title)
p <- p + labs(subtitle = subtitle)
p <- p + labs(x = if_else(is_empty(xlab), as_name(enquo(x)), xlab))
p <- p + labs(y = if_else(is_empty(ylab), as_name(enquo(y)), ylab))
p <- p + labs(x = xlab)
p <- p + labs(y = ylab)
p <- p + labs(caption = caption)

# Adjust ggplot2 theme
Expand Down
4 changes: 2 additions & 2 deletions R/plot_qq.R
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,8 @@ plot_qq <- function(data,
# Adjust annotations
p <- p + labs(title = title)
p <- p + labs(subtitle = subtitle)
p <- p + labs(x = if_else(is_empty(xlab), "Theoretical quantile", xlab))
p <- p + labs(y = if_else(is_empty(ylab), "Sample quantile", ylab))
p <- p + labs(x = xlab)
p <- p + labs(y = ylab)
p <- p + labs(caption = caption)

# Adjust ggplot2 theme
Expand Down
123 changes: 65 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ prices in \[EUR/MWh\] from the ENTSO-E Transparency Platform. The data
set contains hourly time series data from 2019-01-01 to 2020-12-31 for 8
European bidding zones (BZN):

- DE: Germany (including Luxembourg)
- DK: Denmark
- ES: Spain
- FI: Finland
- FR: France
- NL: Netherlands
- NO1: Norway 1 (Oslo)
- SE1: Sweden 1 (Lulea)
- DE: Germany (including Luxembourg)
- DK: Denmark
- ES: Spain
- FI: Finland
- FR: France
- NL: Netherlands
- NO1: Norway 1 (Oslo)
- SE1: Sweden 1 (Lulea)

In this vignette, we will use only four time series to demonstrate the
functionality of the package (the data set is filtered to the bidding
Expand All @@ -66,6 +66,7 @@ The function `summarise_stats()` calculates descriptive statistics for
each time series.

``` r

series_id = "bidding_zone"
value_id = "value"
index_id = "time"
Expand All @@ -81,7 +82,7 @@ main_frame <- elec_price %>%
filter(bidding_zone %in% c("DE", "FR", "NO1", "SE1"))

main_frame
#> # A tibble: 70,176 x 5
#> # A tibble: 70,176 × 5
#> time item unit bidding_zone value
#> <dttm> <chr> <chr> <chr> <dbl>
#> 1 2019-01-01 00:00:00 Day-ahead Price [EUR/MWh] DE 10.1
Expand All @@ -94,7 +95,7 @@ main_frame
#> 8 2019-01-01 07:00:00 Day-ahead Price [EUR/MWh] DE -4.93
#> 9 2019-01-01 08:00:00 Day-ahead Price [EUR/MWh] DE -6.33
#> 10 2019-01-01 09:00:00 Day-ahead Price [EUR/MWh] DE -4.93
#> # ... with 70,166 more rows
#> # 70,166 more rows

main_frame %>%
plot_line(
Expand All @@ -113,31 +114,32 @@ main_frame %>%
<img src="man/figures/README-clean_data-1.svg" width="100%" />

``` r

summarise_data(
.data = main_frame,
context = context
)
#> # A tibble: 4 x 8
#> # A tibble: 4 × 8
#> bidding_zone start end n_obs n_missing
#> <chr> <dttm> <dttm> <int> <int>
#> 1 DE 2019-01-01 00:00:00 2020-12-31 23:00:00 17544 0
#> 2 FR 2019-01-01 00:00:00 2020-12-31 23:00:00 17544 0
#> 3 NO1 2019-01-01 00:00:00 2020-12-31 23:00:00 17544 0
#> 4 SE1 2019-01-01 00:00:00 2020-12-31 23:00:00 17544 0
#> # ... with 3 more variables: pct_missing <dbl>, n_zeros <int>, pct_zeros <dbl>
#> # 3 more variables: pct_missing <dbl>, n_zeros <int>, pct_zeros <dbl>

summarise_stats(
.data = main_frame,
context = context
)
#> # A tibble: 4 x 11
#> # A tibble: 4 × 11
#> bidding_zone mean median mode sd p0 p25 p75 p100 skewness
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 DE 34.1 35.1 35.9 16.9 -90.0 26.3 43.8 200. -0.776
#> 2 FR 35.8 36 37.2 15.5 -75.8 26.6 44.9 200. 0.323
#> 3 NO1 24.3 26.9 2.66 17.1 -1.73 6.95 38.8 109. 0.189
#> 4 SE1 26.1 27.9 37.9 15.9 -1.73 11.4 38.5 189. 0.378
#> # ... with 1 more variable: kurtosis <dbl>
#> # 1 more variable: kurtosis <dbl>
```

## Split data into training and testing
Expand All @@ -154,6 +156,7 @@ options for splitting the data are available via `type` (see function
reference for more details).

``` r

# Setup for time series cross validation
type = "first"
value = 2400 # size for training window
Expand All @@ -180,7 +183,7 @@ split_frame <- split_frame %>%
filter(split %in% c(1:50))

split_frame
#> # A tibble: 200 x 4
#> # A tibble: 200 × 4
#> bidding_zone split train test
#> <chr> <int> <list> <list>
#> 1 DE 1 <int [2,400]> <int [24]>
Expand All @@ -193,7 +196,7 @@ split_frame
#> 8 DE 8 <int [2,400]> <int [24]>
#> 9 DE 9 <int [2,400]> <int [24]>
#> 10 DE 10 <int [2,400]> <int [24]>
#> # ... with 190 more rows
#> # 190 more rows
```

## Training and forecasting
Expand All @@ -206,21 +209,22 @@ have to convert the data set `main_frame` from a `tibble` to a
`tsibble`. Due to the sample size and computation time, only very simple
benchmark methods are used:

- `SNAIVE`: Seasonal naive model with weekly seasonality (from package
`fable`)
- `STL-NAIVE`: STL-decomposition model and naive forecast. The series
is decomposed via STL and the seasonal adjusted series is predicted
via the naive approach. Afterwards, seasonal component is added to
the forecasts (from packages `fable` and `feasts`)
- `SNAIVE2`: Variation of the seasonal naive approach. Mondays,
Saturdays and Sundays are treated with a weekly lag. Tuesdays,
Wednesdays, Thursdays and Fridays are treated with a daily lag.
- `SMEDIAN`: Seasonal median model
- `SNAIVE`: Seasonal naive model with weekly seasonality (from package
`fable`)
- `STL-NAIVE`: STL-decomposition model and naive forecast. The series is
decomposed via STL and the seasonal adjusted series is predicted via
the naive approach. Afterwards, seasonal component is added to the
forecasts (from packages `fable` and `feasts`)
- `SNAIVE2`: Variation of the seasonal naive approach. Mondays,
Saturdays and Sundays are treated with a weekly lag. Tuesdays,
Wednesdays, Thursdays and Fridays are treated with a daily lag.
- `SMEDIAN`: Seasonal median model

The functions `SMEDIAN()` and `SNAIVE2()` are extensions to the `fable`
package

``` r

# Slice training data from main_frame according to split_frame
train_frame <- slice_train(
main_frame = main_frame,
Expand All @@ -229,7 +233,7 @@ train_frame <- slice_train(
)

train_frame
#> # A tibble: 480,000 x 6
#> # A tibble: 480,000 × 6
#> time item unit bidding_zone value split
#> <dttm> <chr> <chr> <chr> <dbl> <int>
#> 1 2019-01-01 00:00:00 Day-ahead Price [EUR/MWh] DE 10.1 1
Expand All @@ -242,7 +246,7 @@ train_frame
#> 8 2019-01-01 07:00:00 Day-ahead Price [EUR/MWh] DE -4.93 1
#> 9 2019-01-01 08:00:00 Day-ahead Price [EUR/MWh] DE -6.33 1
#> 10 2019-01-01 09:00:00 Day-ahead Price [EUR/MWh] DE -4.93 1
#> # ... with 479,990 more rows
#> # 479,990 more rows

# Convert tibble to tsibble
train_frame <- train_frame %>%
Expand All @@ -266,7 +270,7 @@ train_frame
#> 8 2019-01-01 07:00:00 Day-ahead Price [EUR/MWh] DE -4.93 1
#> 9 2019-01-01 08:00:00 Day-ahead Price [EUR/MWh] DE -6.33 1
#> 10 2019-01-01 09:00:00 Day-ahead Price [EUR/MWh] DE -4.93 1
#> # ... with 479,990 more rows
#> # 479,990 more rows

# Model training via fabletools::model()
model_frame <- train_frame %>%
Expand All @@ -292,7 +296,7 @@ model_frame
#> 8 DE 8 <SNAIVE> <STL decomposition model> <SNAIVE2> <SMEDIAN>
#> 9 DE 9 <SNAIVE> <STL decomposition model> <SNAIVE2> <SMEDIAN>
#> 10 DE 10 <SNAIVE> <STL decomposition model> <SNAIVE2> <SMEDIAN>
#> # ... with 190 more rows
#> # 190 more rows

# Forecasting via fabletools::forecast()
fable_frame <- model_frame %>%
Expand All @@ -301,19 +305,20 @@ fable_frame <- model_frame %>%
fable_frame
#> # A fable: 19,200 x 6 [1h] <UTC>
#> # Key: bidding_zone, split, .model [800]
#> bidding_zone split .model time value .mean
#> <chr> <int> <chr> <dttm> <dist> <dbl>
#> 1 DE 1 SNAIVE 2019-04-11 00:00:00 N(33, 367) 33
#> 2 DE 1 SNAIVE 2019-04-11 01:00:00 N(33, 367) 32.6
#> 3 DE 1 SNAIVE 2019-04-11 02:00:00 N(34, 367) 34.1
#> 4 DE 1 SNAIVE 2019-04-11 03:00:00 N(37, 367) 36.9
#> 5 DE 1 SNAIVE 2019-04-11 04:00:00 N(45, 367) 44.7
#> 6 DE 1 SNAIVE 2019-04-11 05:00:00 N(54, 367) 53.6
#> 7 DE 1 SNAIVE 2019-04-11 06:00:00 N(60, 367) 59.9
#> 8 DE 1 SNAIVE 2019-04-11 07:00:00 N(47, 367) 46.9
#> 9 DE 1 SNAIVE 2019-04-11 08:00:00 N(48, 367) 48
#> 10 DE 1 SNAIVE 2019-04-11 09:00:00 N(47, 367) 47
#> # ... with 19,190 more rows
#> bidding_zone split .model time
#> <chr> <int> <chr> <dttm>
#> 1 DE 1 SNAIVE 2019-04-11 00:00:00
#> 2 DE 1 SNAIVE 2019-04-11 01:00:00
#> 3 DE 1 SNAIVE 2019-04-11 02:00:00
#> 4 DE 1 SNAIVE 2019-04-11 03:00:00
#> 5 DE 1 SNAIVE 2019-04-11 04:00:00
#> 6 DE 1 SNAIVE 2019-04-11 05:00:00
#> 7 DE 1 SNAIVE 2019-04-11 06:00:00
#> 8 DE 1 SNAIVE 2019-04-11 07:00:00
#> 9 DE 1 SNAIVE 2019-04-11 08:00:00
#> 10 DE 1 SNAIVE 2019-04-11 09:00:00
#> # ℹ 19,190 more rows
#> # ℹ 2 more variables: value <dist>, .mean <dbl>

# Convert fable_frame (fable) to future_frame (tibble)
future_frame <- make_future(
Expand All @@ -322,7 +327,7 @@ future_frame <- make_future(
)

future_frame
#> # A tibble: 19,200 x 6
#> # A tibble: 19,200 × 6
#> time bidding_zone model split horizon point
#> <dttm> <chr> <chr> <int> <int> <dbl>
#> 1 2019-04-11 00:00:00 DE SNAIVE 1 1 33
Expand All @@ -335,7 +340,7 @@ future_frame
#> 8 2019-04-11 07:00:00 DE SNAIVE 1 8 46.9
#> 9 2019-04-11 08:00:00 DE SNAIVE 1 9 48
#> 10 2019-04-11 09:00:00 DE SNAIVE 1 10 47
#> # ... with 19,190 more rows
#> # 19,190 more rows
```

## Evaluation of forecast accuracy
Expand All @@ -344,19 +349,20 @@ To evaluate the forecast accuracy, the function `make_accuracy()` is
used. You can define whether to evaluate the accuracy by `horizon` or by
`split`. Several accuracy metrics are available:

- `ME`: mean error
- `MAE`: mean absolute error
- `MSE`: mean squared error
- `RMSE`: root mean squared error
- `MAPE`: mean absolute percentage error
- `sMAPE`: scaled mean absolute percentage error
- `MPE`: mean percentage error
- `rMAE`: relative mean absolute error (relative to some user-defined
benchmark method)
- `ME`: mean error
- `MAE`: mean absolute error
- `MSE`: mean squared error
- `RMSE`: root mean squared error
- `MAPE`: mean absolute percentage error
- `sMAPE`: scaled mean absolute percentage error
- `MPE`: mean percentage error
- `rMAE`: relative mean absolute error (relative to some user-defined
benchmark method)

### Forecast accuracy by forecast horizon

``` r

# Estimate accuracy metrics by forecast horizon
accuracy_horizon <- make_accuracy(
future_frame = future_frame,
Expand All @@ -366,7 +372,7 @@ accuracy_horizon <- make_accuracy(
)

accuracy_horizon
#> # A tibble: 2,688 x 6
#> # A tibble: 2,688 × 6
#> bidding_zone model dimension n metric value
#> <chr> <chr> <chr> <int> <chr> <dbl>
#> 1 DE SMEDIAN horizon 1 MAE 4.84
Expand All @@ -379,7 +385,7 @@ accuracy_horizon
#> 8 DE SMEDIAN horizon 8 MAE 6.73
#> 9 DE SMEDIAN horizon 9 MAE 7.38
#> 10 DE SMEDIAN horizon 10 MAE 9.06
#> # ... with 2,678 more rows
#> # 2,678 more rows

# Visualize results
accuracy_horizon %>%
Expand All @@ -402,6 +408,7 @@ accuracy_horizon %>%
### Forecast accuracy by split

``` r

# Estimate accuracy metrics by forecast horizon
accuracy_split <- make_accuracy(
future_frame = future_frame,
Expand All @@ -411,7 +418,7 @@ accuracy_split <- make_accuracy(
)

accuracy_split
#> # A tibble: 5,600 x 6
#> # A tibble: 5,600 × 6
#> bidding_zone model dimension n metric value
#> <chr> <chr> <chr> <int> <chr> <dbl>
#> 1 DE SMEDIAN split 1 MAE 2.80
Expand All @@ -424,7 +431,7 @@ accuracy_split
#> 8 DE SMEDIAN split 8 MAE 4.87
#> 9 DE SMEDIAN split 9 MAE 11.5
#> 10 DE SMEDIAN split 10 MAE 5.02
#> # ... with 5,590 more rows
#> # 5,590 more rows

# Visualize results
accuracy_split %>%
Expand Down
Loading

0 comments on commit 8ce18e6

Please sign in to comment.