From d375ead0b4041ed195ae7abeb72ad81fcd5cadb9 Mon Sep 17 00:00:00 2001 From: Jack Davison Date: Sat, 27 Apr 2024 21:42:35 +0100 Subject: [PATCH 1/3] feat: add `panels` arg to [timeVariation()] --- R/timeVariation.R | 277 ++++++++++++++++++++++++------------------- man/timeVariation.Rd | 198 ++++++++++++++++--------------- 2 files changed, 254 insertions(+), 221 deletions(-) diff --git a/R/timeVariation.R b/R/timeVariation.R index 1d59168d..28ad4732 100644 --- a/R/timeVariation.R +++ b/R/timeVariation.R @@ -10,57 +10,56 @@ #' in the way vehicles vary by vehicles type e.g. less heavy vehicles at #' weekends. #' -#' The \code{timeVariation} function makes it easy to see how concentrations -#' (and many other variable types) vary by hour of the day and day of the week. +#' The [timeVariation()] function makes it easy to see how concentrations (and +#' many other variable types) vary by hour of the day and day of the week. #' #' The plots also show the 95\% confidence intervals in the mean. The 95\% #' confidence intervals in the mean are calculated through bootstrap #' simulations, which will provide more robust estimates of the confidence #' intervals (particularly when there are relatively few data). #' -#' The function can handle multiple pollutants and uses the flexible \code{type} -#' option to provide separate panels for each 'type' --- see \code{cutData} for -#' more details. \code{timeVariation} can also accept a \code{group} option -#' which is useful if data are stacked. This will work in a similar way to -#' having multiple pollutants in separate columns. +#' The function can handle multiple pollutants and uses the flexible `type` +#' option to provide separate panels for each 'type' --- see [cutData()] for +#' more details. [timeVariation()] can also accept a `group` option which is +#' useful if data are stacked. This will work in a similar way to having +#' multiple pollutants in separate columns. #' -#' The user can supply their own \code{ylim} e.g. \code{ylim = c(0, 200)} that -#' will be used for all plots. \code{ylim} can also be a list of length four to -#' control the y-limits on each individual plot e.g. \code{ylim = -#' list(c(-100,500), c(200, 300), c(-400,400), c(50,70))}. These pairs -#' correspond to the hour, weekday, month and day-hour plots respectively. +#' The user can supply their own `ylim` e.g. `ylim = c(0, 200)` that will be +#' used for all plots. `ylim` can also be a list of length four to control the +#' y-limits on each individual plot e.g. `ylim = list(c(-100,500), c(200, 300), +#' c(-400,400), c(50,70))`. These pairs correspond to the hour, weekday, month +#' and day-hour plots respectively. #' -#' The option \code{difference} will calculate the difference in means of two +#' The option `difference` will calculate the difference in means of two #' pollutants together with bootstrap estimates of the 95\% confidence intervals #' in the difference in the mean. This works in two ways: either two pollutants -#' are supplied in separate columns e.g. \code{pollutant = c("no2", "o3")}, or -#' there are two unique values of \code{group}. The difference is calculated as -#' the second pollutant minus the first and is labelled as such. Considering -#' differences in this way can provide many useful insights and is particularly -#' useful for model evaluation when information is needed about where a model -#' differs from observations by many different time scales. The manual contains -#' various examples of using \code{difference = TRUE}. +#' are supplied in separate columns e.g. `pollutant = c("no2", "o3")`, or there +#' are two unique values of `group`. The difference is calculated as the second +#' pollutant minus the first and is labelled as such. Considering differences in +#' this way can provide many useful insights and is particularly useful for +#' model evaluation when information is needed about where a model differs from +#' observations by many different time scales. The manual contains various +#' examples of using `difference = TRUE`. #' -#' Note also that the \code{timeVariation} function works well on a subset of -#' data and in conjunction with other plots. For example, a -#' \code{\link{polarPlot}} may highlight an interesting feature for a particular -#' wind speed/direction range. By filtering for those conditions -#' \code{timeVariation} can help determine whether the temporal variation of -#' that feature differs from other features --- and help with source -#' identification. +#' Note also that the [timeVariation()] function works well on a subset of data +#' and in conjunction with other plots. For example, a [polarPlot()] may +#' highlight an interesting feature for a particular wind speed/direction range. +#' By filtering for those conditions [timeVariation()] can help determine +#' whether the temporal variation of that feature differs from other features +#' --- and help with source identification. #' -#' In addition, \code{timeVariation} will work well with other variables if +#' In addition, [timeVariation()] will work well with other variables if #' available. Examples include meteorological and traffic flow data. #' #' Depending on the choice of statistic, a subheading is added. Users can -#' control the text in the subheading through the use of \code{sub} e.g. -#' \code{sub = ""} will remove any subheading. +#' control the text in the subheading through the use of `sub` e.g. `sub = ""` +#' will remove any subheading. #' #' @param mydata A data frame of hourly (or higher temporal resolution data). -#' Must include a \code{date} field and at least one variable to plot. +#' Must include a `date` field and at least one variable to plot. #' @param pollutant Name of variable to plot. Two or more pollutants can be -#' plotted, in which case a form like \code{pollutant = c("nox", "co")} should -#' be used. +#' plotted, in which case a form like `pollutant = c("nox", "co")` should be +#' used. #' @param local.tz Should the results be calculated in local time that includes #' a treatment of daylight savings time (DST)? The default is not to consider #' DST issues, provided the data were imported without a DST offset. Emissions @@ -72,111 +71,112 @@ #' is to express time as local time. This correction tends to produce #' better-defined diurnal profiles of concentration (or other variables) and #' allows a better comparison to be made with emissions/activity data. If set -#' to \code{FALSE} then GMT is used. Examples of usage include \code{local.tz -#' = "Europe/London"}, \code{local.tz = "America/New_York"}. See -#' \code{cutData} and \code{import} for more details. -#' @param normalise Should variables be normalised? The default is \code{FALSE}. -#' If \code{TRUE} then the variable(s) are divided by their mean values. This -#' helps to compare the shape of the diurnal trends for variables on very -#' different scales. +#' to `FALSE` then GMT is used. Examples of usage include `local.tz = +#' "Europe/London"`, `local.tz = "America/New_York"`. See `cutData` and +#' `import` for more details. +#' @param normalise Should variables be normalised? The default is `FALSE`. If +#' `TRUE` then the variable(s) are divided by their mean values. This helps to +#' compare the shape of the diurnal trends for variables on very different +#' scales. #' @param xlab x-axis label; one for each sub-plot. #' @param name.pol Names to be given to the pollutant(s). This is useful if you #' want to give a fuller description of the variables, maybe also including #' subscripts etc. -#' @param type \code{type} determines how the data are split i.e. conditioned, -#' and then plotted. The default is will produce a single plot using the -#' entire data. Type can be one of the built-in types as detailed in -#' \code{cutData} e.g. \dQuote{season}, \dQuote{year}, \dQuote{weekday} and so -#' on. For example, \code{type = "season"} will produce four plots --- one for -#' each season. +#' @param type `type` determines how the data are split i.e. conditioned, and +#' then plotted. The default is will produce a single plot using the entire +#' data. Type can be one of the built-in types as detailed in `cutData` e.g. +#' \dQuote{season}, \dQuote{year}, \dQuote{weekday} and so on. For example, +#' `type = "season"` will produce four plots --- one for each season. #' -#' It is also possible to choose \code{type} as another variable in the data -#' frame. If that variable is numeric, then the data will be split into four +#' It is also possible to choose `type` as another variable in the data frame. +#' If that variable is numeric, then the data will be split into four #' quantiles (if possible) and labelled accordingly. If type is an existing #' character or factor variable, then those categories/levels will be used #' directly. This offers great flexibility for understanding the variation of #' different variables and how they depend on one another. #' -#' Only one \code{type} is allowed in\code{timeVariation}. +#' Only one `type` is allowed in [timeVariation()]. #' @param group This sets the grouping variable to be used. For example, if a -#' data frame had a column \code{site} setting \code{group = "site"} will plot -#' all sites together in each panel. See examples below. -#' @param difference If two pollutants are chosen then setting \code{difference -#' = TRUE} will also plot the difference in means between the two variables as -#' \code{pollutant[2] - pollutant[1]}. Bootstrap 95\% confidence intervals of -#' the difference in means are also calculated. A horizontal dashed line is -#' shown at y = 0. The difference can also be calculated if there is a column -#' that identifies two groups e.g. having used \code{splitByDate}. In this -#' case it is possible to call \code{timeVariation} with the option -#' \code{group = "split.by"} and \code{difference = TRUE}. +#' data frame had a column `site` setting `group = "site"` will plot all sites +#' together in each panel. See examples below. +#' @param difference If two pollutants are chosen then setting `difference = +#' TRUE` will also plot the difference in means between the two variables as +#' `pollutant[2] - pollutant[1]`. Bootstrap 95\% confidence intervals of the +#' difference in means are also calculated. A horizontal dashed line is shown +#' at y = 0. The difference can also be calculated if there is a column that +#' identifies two groups e.g. having used `splitByDate`. In this case it is +#' possible to call [timeVariation()] with the option `group = "split.by"` and +#' `difference = TRUE`. #' @param statistic Can be \dQuote{mean} (default) or \dQuote{median}. If the #' statistic is \sQuote{mean} then the mean line and the 95\% confidence #' interval in the mean are plotted by default. If the statistic is #' \sQuote{median} then the median line is plotted together with the 5/95 and #' 25/75th quantiles are plotted. Users can control the confidence intervals -#' with \code{conf.int}. -#' @param conf.int The confidence intervals to be plotted. If \code{statistic = -#' "mean"} then the confidence intervals in the mean are plotted. If -#' \code{statistic = "median"} then the \code{conf.int} and \code{1 - -#' conf.int} \emph{quantiles} are plotted. \code{conf.int} can be of length 2, -#' which is most useful for showing quantiles. For example \code{conf.int = -#' c(0.75, 0.99)} will yield a plot showing the median, 25/75 and 5/95th -#' quantiles. +#' with `conf.int`. +#' @param conf.int The confidence intervals to be plotted. If `statistic = +#' "mean"` then the confidence intervals in the mean are plotted. If +#' `statistic = "median"` then the `conf.int` and `1 - conf.int` *quantiles* +#' are plotted. `conf.int` can be of length 2, which is most useful for +#' showing quantiles. For example `conf.int = c(0.75, 0.99)` will yield a plot +#' showing the median, 25/75 and 5/95th quantiles. #' @param B Number of bootstrap replicates to use. Can be useful to reduce this #' value when there are a large number of observations available to increase #' the speed of the calculations without affecting the 95\% confidence #' interval calculations by much. -#' @param ci Should confidence intervals be shown? The default is \code{TRUE}. -#' Setting this to \code{FALSE} can be useful if multiple pollutants are -#' chosen where over-lapping confidence intervals can over complicate plots. +#' @param ci Should confidence intervals be shown? The default is `TRUE`. +#' Setting this to `FALSE` can be useful if multiple pollutants are chosen +#' where over-lapping confidence intervals can over complicate plots. #' @param cols Colours to be used for plotting. Options include #' \dQuote{default}, \dQuote{increment}, \dQuote{heat}, \dQuote{jet} and -#' \code{RColorBrewer} colours --- see the \code{openair} \code{openColours} -#' function for more details. For user defined the user can supply a list of -#' colour names recognised by R (type \code{colours()} to see the full list). -#' An example would be \code{cols = c("yellow", "green", "blue")} +#' `RColorBrewer` colours --- see the [openColours()] function for more +#' details. For user defined the user can supply a list of colour names +#' recognised by R (type `colours()` to see the full list). An example would +#' be `cols = c("yellow", "green", "blue")` #' @param ref.y A list with details of the horizontal lines to be added -#' representing reference line(s). For example, \code{ref.y = list(h = 50, lty -#' = 5)} will add a dashed horizontal line at 50. Several lines can be plotted -#' e.g. \code{ref.y = list(h = c(50, 100), lty = c(1, 5), col = c("green", -#' "blue"))}. See \code{panel.abline} in the \code{lattice} package for more -#' details on adding/controlling lines. -#' @param key By default \code{timeVariation} produces four plots on one page. +#' representing reference line(s). For example, `ref.y = list(h = 50, lty = +#' 5)` will add a dashed horizontal line at 50. Several lines can be plotted +#' e.g. `ref.y = list(h = c(50, 100), lty = c(1, 5), col = c("green", +#' "blue"))`. See `panel.abline` in the `lattice` package for more details on +#' adding/controlling lines. +#' @param key By default [timeVariation()] produces four plots on one page. #' While it is useful to see these plots together, it is sometimes necessary -#' just to use one for a report. If \code{key} is \code{TRUE}, a key is added -#' to all plots allowing the extraction of a single plot \emph{with} key. See -#' below for an example. +#' just to use one for a report. If `key` is `TRUE`, a key is added to all +#' plots allowing the extraction of a single plot *with* key. See below for an +#' example. #' @param key.columns Number of columns to be used in the key. With many #' pollutants a single column can make to key too wide. The user can thus -#' choose to use several columns by setting \code{columns} to be less than the +#' choose to use several columns by setting `columns` to be less than the #' number of pollutants. #' @param start.day What day of the week should the plots start on? The user can #' change the start day by supplying an integer between 0 and 6. Sunday = 0, #' Monday = 1, \ldots For example to start the weekday plots on a Saturday, -#' choose \code{start.day = 6}. +#' choose `start.day = 6`. #' @param panel.gap The gap between panels in the hour-day plot. -#' @param auto.text Either \code{TRUE} (default) or \code{FALSE}. If \code{TRUE} -#' titles and axis labels will automatically try and format pollutant names -#' and units properly e.g. by subscripting the \sQuote{2} in NO2. +#' @param auto.text Either `TRUE` (default) or `FALSE`. If `TRUE` titles and +#' axis labels will automatically try and format pollutant names and units +#' properly e.g. by subscripting the \sQuote{2} in NO2. #' @param alpha The alpha transparency used for plotting confidence intervals. 0 #' is fully transparent and 1 is opaque. The default is 0.4 -#' @param month.last Should the order of the plots be changed so the plot -#' showing monthly means be the last plot for a logical hierarchy of averaging -#' periods? -#' @param plot Should a plot be produced? \code{FALSE} can be useful when -#' analysing data to extract plot components and plotting them in other ways. -#' @param ... Other graphical parameters passed onto \code{lattice:xyplot} and -#' \code{cutData}. For example, in the case of \code{cutData} the option -#' \code{hemisphere = "southern"}. +#' @param panels The plots to use in the lower row of the [timeVariation()] +#' plot, defaulting to `c("hour", "month", "day")`. Users may wish to instead +#' provide `c("hour", "day", "month")` for a more logical order. Only +#' specified panels are drawn. For example, if only one month's data is +#' provided, users could provide `c("hour", "day")` to only draw the "hour" +#' and "weekday" lower panels and exclude the "month" panel. +#' @param month.last Not used. Please use the `panels` argument. +#' @param plot Should a plot be produced? `FALSE` can be useful when analysing +#' data to extract plot components and plotting them in other ways. +#' @param ... Other graphical parameters passed onto [lattice:xyplot()] and +#' [cutData()]. For example, in the case of [cutData()] the option `hemisphere +#' = "southern"`. #' #' @import lattice #' @export #' @return an [openair][openair-package] object. The four components of -#' timeVariation are: \code{day.hour}, \code{hour}, \code{day} and -#' \code{month}. Associated data.frames can be extracted directly using the -#' \code{subset} option, e.g. as in \code{plot(object, subset = "day.hour")}, -#' \code{summary(output, subset = "hour")}, etc., for \code{output <- -#' timeVariation(mydata, "nox")} +#' [timeVariation()] are: `day.hour`, `hour`, `day` and `month`. Associated +#' data.frames can be extracted directly using the `subset` option, e.g. as in +#' `plot(object, subset = "day.hour")`, `summary(output, subset = "hour")`, +#' etc., for `output <- timeVariation(mydata, "nox")` #' @author David Carslaw #' @family time series and trend functions #' @examples @@ -191,6 +191,13 @@ #' pollutant = "pm10", ylab = "pm10 (ug/m3)") #' } #' +#' # for a single month of data +#' \dontrun{ +#' timeVariation(selectByDate(mydata, month = 1, year = 2000), +#' pollutant = "pm10", +#' panels = c("hour", "day")) # exclude superfluous 'month' panel +#' } +#' #' # multiple pollutants with concentrations normalised #' \dontrun{timeVariation(mydata, pollutant = c("nox", "co"), normalise = TRUE)} #' @@ -257,17 +264,36 @@ #' col = "firebrick") #' } #' -timeVariation <- function(mydata, pollutant = "nox", local.tz = NULL, - normalise = FALSE, xlab = c( - "hour", "hour", "month", - "weekday" - ), name.pol = pollutant, - type = "default", group = NULL, difference = FALSE, - statistic = "mean", conf.int = 0.95, B = 100, ci = TRUE, cols = "hue", - ref.y = NULL, key = NULL, key.columns = 1, start.day = 1, +timeVariation <- function(mydata, + pollutant = "nox", + local.tz = NULL, + normalise = FALSE, + xlab = c("hour", "hour", "month", "weekday"), + name.pol = pollutant, + type = "default", + group = NULL, + difference = FALSE, + statistic = "mean", + conf.int = 0.95, + B = 100, + ci = TRUE, + cols = "hue", + ref.y = NULL, + key = NULL, + key.columns = 1, + start.day = 1, panel.gap = 0.2, - auto.text = TRUE, alpha = 0.4, month.last = FALSE, plot = TRUE, + auto.text = TRUE, + alpha = 0.4, + panels = c("hour", "month", "day"), + plot = TRUE, + month.last, ...) { + if (!missing(month.last)) { + cli::cli_warn( + "{.arg month.last} has been deprecated. Please use the {.arg panels} argument to reorder the lower panels of {.fun timeVariation}." + ) + } ## get rid of R check annoyances variable <- NULL @@ -495,7 +521,7 @@ timeVariation <- function(mydata, pollutant = "nox", local.tz = NULL, mydata <- mutate(mydata, wkday = wday(date, label = TRUE, abbr = FALSE), wkday = ordered(wkday, levels = day.ord), - hour= hour(date), + hour = hour(date), mnth = month(date) ) @@ -965,6 +991,10 @@ timeVariation <- function(mydata, pollutant = "nox", local.tz = NULL, } main.plot <- function(...) { + # check panels are correct + panels <- unique(panels) + rlang::arg_match(panels, c("hour", "month", "day"), multiple = TRUE) + if (type == "default") { print(update( day.hour, @@ -986,18 +1016,19 @@ timeVariation <- function(mydata, pollutant = "nox", local.tz = NULL, ), position = c(0, 0.5, 1, y.upp), more = TRUE) } - # Build the plot panels in different orders - if (!month.last) { - # The original plot orders - print(hour, position = c(0, y.dwn, 0.33, 0.53), more = TRUE) - print(month, position = c(0.33, y.dwn, 0.66, 0.53), more = TRUE) - print(day, position = c(0.66, y.dwn, 1, 0.53)) - } else { - # Move around the plot order so they follow a logical hierarchy of averaging - # periods hour-day-month - print(hour, position = c(0, y.dwn, 0.33, 0.53), more = TRUE) - print(day, position = c(0.33, y.dwn, 0.66, 0.53), more = TRUE) - print(month, position = c(0.66, y.dwn, 1, 0.53)) + # create list of possible panels + theplots <- list("hour" = hour, "month" = month, "day" = day) + # filter by user-specified panels + theplots <- theplots[panels] + # get horizontal bounds for the number of panels + bounds <- seq(0, 1, length.out = length(panels) + 1) + # iteratively plot lower panels + for (i in seq_along(theplots)) { + print( + theplots[[i]], + position = c(bounds[i], y.dwn, bounds[i + 1], 0.53), + more = i != max(seq_along(theplots)) + ) } ## use grid to add an overall title diff --git a/man/timeVariation.Rd b/man/timeVariation.Rd index b372e4f9..06fc9368 100644 --- a/man/timeVariation.Rd +++ b/man/timeVariation.Rd @@ -26,8 +26,9 @@ timeVariation( panel.gap = 0.2, auto.text = TRUE, alpha = 0.4, - month.last = FALSE, + panels = c("hour", "month", "day"), plot = TRUE, + month.last, ... ) } @@ -36,8 +37,8 @@ timeVariation( Must include a \code{date} field and at least one variable to plot.} \item{pollutant}{Name of variable to plot. Two or more pollutants can be -plotted, in which case a form like \code{pollutant = c("nox", "co")} should -be used.} +plotted, in which case a form like \code{pollutant = c("nox", "co")} should be +used.} \item{local.tz}{Should the results be calculated in local time that includes a treatment of daylight savings time (DST)? The default is not to consider @@ -50,14 +51,13 @@ of \dQuote{smearing-out} the concentrations. Sometimes, a useful approach is to express time as local time. This correction tends to produce better-defined diurnal profiles of concentration (or other variables) and allows a better comparison to be made with emissions/activity data. If set -to \code{FALSE} then GMT is used. Examples of usage include \code{local.tz - = "Europe/London"}, \code{local.tz = "America/New_York"}. See -\code{cutData} and \code{import} for more details.} +to \code{FALSE} then GMT is used. Examples of usage include \code{local.tz = "Europe/London"}, \code{local.tz = "America/New_York"}. See \code{cutData} and +\code{import} for more details.} -\item{normalise}{Should variables be normalised? The default is \code{FALSE}. -If \code{TRUE} then the variable(s) are divided by their mean values. This -helps to compare the shape of the diurnal trends for variables on very -different scales.} +\item{normalise}{Should variables be normalised? The default is \code{FALSE}. If +\code{TRUE} then the variable(s) are divided by their mean values. This helps to +compare the shape of the diurnal trends for variables on very different +scales.} \item{xlab}{x-axis label; one for each sub-plot.} @@ -65,34 +65,32 @@ different scales.} want to give a fuller description of the variables, maybe also including subscripts etc.} -\item{type}{\code{type} determines how the data are split i.e. conditioned, -and then plotted. The default is will produce a single plot using the -entire data. Type can be one of the built-in types as detailed in -\code{cutData} e.g. \dQuote{season}, \dQuote{year}, \dQuote{weekday} and so -on. For example, \code{type = "season"} will produce four plots --- one for -each season. +\item{type}{\code{type} determines how the data are split i.e. conditioned, and +then plotted. The default is will produce a single plot using the entire +data. Type can be one of the built-in types as detailed in \code{cutData} e.g. +\dQuote{season}, \dQuote{year}, \dQuote{weekday} and so on. For example, +\code{type = "season"} will produce four plots --- one for each season. -It is also possible to choose \code{type} as another variable in the data -frame. If that variable is numeric, then the data will be split into four +It is also possible to choose \code{type} as another variable in the data frame. +If that variable is numeric, then the data will be split into four quantiles (if possible) and labelled accordingly. If type is an existing character or factor variable, then those categories/levels will be used directly. This offers great flexibility for understanding the variation of different variables and how they depend on one another. -Only one \code{type} is allowed in\code{timeVariation}.} +Only one \code{type} is allowed in \code{\link[=timeVariation]{timeVariation()}}.} \item{group}{This sets the grouping variable to be used. For example, if a -data frame had a column \code{site} setting \code{group = "site"} will plot -all sites together in each panel. See examples below.} - -\item{difference}{If two pollutants are chosen then setting \code{difference - = TRUE} will also plot the difference in means between the two variables as -\code{pollutant[2] - pollutant[1]}. Bootstrap 95\\% confidence intervals of -the difference in means are also calculated. A horizontal dashed line is -shown at y = 0. The difference can also be calculated if there is a column -that identifies two groups e.g. having used \code{splitByDate}. In this -case it is possible to call \code{timeVariation} with the option -\code{group = "split.by"} and \code{difference = TRUE}.} +data frame had a column \code{site} setting \code{group = "site"} will plot all sites +together in each panel. See examples below.} + +\item{difference}{If two pollutants are chosen then setting \code{difference = TRUE} will also plot the difference in means between the two variables as +\code{pollutant[2] - pollutant[1]}. Bootstrap 95\\% confidence intervals of the +difference in means are also calculated. A horizontal dashed line is shown +at y = 0. The difference can also be calculated if there is a column that +identifies two groups e.g. having used \code{splitByDate}. In this case it is +possible to call \code{\link[=timeVariation]{timeVariation()}} with the option \code{group = "split.by"} and +\code{difference = TRUE}.} \item{statistic}{Can be \dQuote{mean} (default) or \dQuote{median}. If the statistic is \sQuote{mean} then the mean line and the 95\\% confidence @@ -101,13 +99,11 @@ interval in the mean are plotted by default. If the statistic is 25/75th quantiles are plotted. Users can control the confidence intervals with \code{conf.int}.} -\item{conf.int}{The confidence intervals to be plotted. If \code{statistic = - "mean"} then the confidence intervals in the mean are plotted. If -\code{statistic = "median"} then the \code{conf.int} and \code{1 - - conf.int} \emph{quantiles} are plotted. \code{conf.int} can be of length 2, -which is most useful for showing quantiles. For example \code{conf.int = - c(0.75, 0.99)} will yield a plot showing the median, 25/75 and 5/95th -quantiles.} +\item{conf.int}{The confidence intervals to be plotted. If \code{statistic = "mean"} then the confidence intervals in the mean are plotted. If +\code{statistic = "median"} then the \code{conf.int} and \code{1 - conf.int} \emph{quantiles} +are plotted. \code{conf.int} can be of length 2, which is most useful for +showing quantiles. For example \code{conf.int = c(0.75, 0.99)} will yield a plot +showing the median, 25/75 and 5/95th quantiles.} \item{B}{Number of bootstrap replicates to use. Can be useful to reduce this value when there are a large number of observations available to increase @@ -115,28 +111,26 @@ the speed of the calculations without affecting the 95\\% confidence interval calculations by much.} \item{ci}{Should confidence intervals be shown? The default is \code{TRUE}. -Setting this to \code{FALSE} can be useful if multiple pollutants are -chosen where over-lapping confidence intervals can over complicate plots.} +Setting this to \code{FALSE} can be useful if multiple pollutants are chosen +where over-lapping confidence intervals can over complicate plots.} \item{cols}{Colours to be used for plotting. Options include \dQuote{default}, \dQuote{increment}, \dQuote{heat}, \dQuote{jet} and -\code{RColorBrewer} colours --- see the \code{openair} \code{openColours} -function for more details. For user defined the user can supply a list of -colour names recognised by R (type \code{colours()} to see the full list). -An example would be \code{cols = c("yellow", "green", "blue")}} +\code{RColorBrewer} colours --- see the \code{\link[=openColours]{openColours()}} function for more +details. For user defined the user can supply a list of colour names +recognised by R (type \code{colours()} to see the full list). An example would +be \code{cols = c("yellow", "green", "blue")}} \item{ref.y}{A list with details of the horizontal lines to be added -representing reference line(s). For example, \code{ref.y = list(h = 50, lty - = 5)} will add a dashed horizontal line at 50. Several lines can be plotted -e.g. \code{ref.y = list(h = c(50, 100), lty = c(1, 5), col = c("green", - "blue"))}. See \code{panel.abline} in the \code{lattice} package for more -details on adding/controlling lines.} +representing reference line(s). For example, \code{ref.y = list(h = 50, lty = 5)} will add a dashed horizontal line at 50. Several lines can be plotted +e.g. \code{ref.y = list(h = c(50, 100), lty = c(1, 5), col = c("green", "blue"))}. See \code{panel.abline} in the \code{lattice} package for more details on +adding/controlling lines.} -\item{key}{By default \code{timeVariation} produces four plots on one page. +\item{key}{By default \code{\link[=timeVariation]{timeVariation()}} produces four plots on one page. While it is useful to see these plots together, it is sometimes necessary -just to use one for a report. If \code{key} is \code{TRUE}, a key is added -to all plots allowing the extraction of a single plot \emph{with} key. See -below for an example.} +just to use one for a report. If \code{key} is \code{TRUE}, a key is added to all +plots allowing the extraction of a single plot \emph{with} key. See below for an +example.} \item{key.columns}{Number of columns to be used in the key. With many pollutants a single column can make to key too wide. The user can thus @@ -150,31 +144,34 @@ choose \code{start.day = 6}.} \item{panel.gap}{The gap between panels in the hour-day plot.} -\item{auto.text}{Either \code{TRUE} (default) or \code{FALSE}. If \code{TRUE} -titles and axis labels will automatically try and format pollutant names -and units properly e.g. by subscripting the \sQuote{2} in NO2.} +\item{auto.text}{Either \code{TRUE} (default) or \code{FALSE}. If \code{TRUE} titles and +axis labels will automatically try and format pollutant names and units +properly e.g. by subscripting the \sQuote{2} in NO2.} \item{alpha}{The alpha transparency used for plotting confidence intervals. 0 is fully transparent and 1 is opaque. The default is 0.4} -\item{month.last}{Should the order of the plots be changed so the plot -showing monthly means be the last plot for a logical hierarchy of averaging -periods?} +\item{panels}{The plots to use in the lower row of the \code{\link[=timeVariation]{timeVariation()}} +plot, defaulting to \code{c("hour", "month", "day")}. Users may wish to instead +provide \code{c("hour", "day", "month")} for a more logical order. Only +specified panels are drawn. For example, if only one month's data is +provided, users could provide \code{c("hour", "day")} to only draw the "hour" +and "weekday" lower panels and exclude the "month" panel.} + +\item{plot}{Should a plot be produced? \code{FALSE} can be useful when analysing +data to extract plot components and plotting them in other ways.} -\item{plot}{Should a plot be produced? \code{FALSE} can be useful when -analysing data to extract plot components and plotting them in other ways.} +\item{month.last}{Not used. Please use the \code{panels} argument.} -\item{...}{Other graphical parameters passed onto \code{lattice:xyplot} and -\code{cutData}. For example, in the case of \code{cutData} the option -\code{hemisphere = "southern"}.} +\item{...}{Other graphical parameters passed onto \code{\link[=lattice:xyplot]{lattice:xyplot()}} and +\code{\link[=cutData]{cutData()}}. For example, in the case of \code{\link[=cutData]{cutData()}} the option \code{hemisphere = "southern"}.} } \value{ an \link[=openair-package]{openair} object. The four components of -timeVariation are: \code{day.hour}, \code{hour}, \code{day} and -\code{month}. Associated data.frames can be extracted directly using the -\code{subset} option, e.g. as in \code{plot(object, subset = "day.hour")}, -\code{summary(output, subset = "hour")}, etc., for \code{output <- - timeVariation(mydata, "nox")} +\code{\link[=timeVariation]{timeVariation()}} are: \code{day.hour}, \code{hour}, \code{day} and \code{month}. Associated +data.frames can be extracted directly using the \code{subset} option, e.g. as in +\code{plot(object, subset = "day.hour")}, \code{summary(output, subset = "hour")}, +etc., for \code{output <- timeVariation(mydata, "nox")} } \description{ Plots the diurnal, day of the week and monthly variation for different @@ -188,8 +185,8 @@ and meteorology. For traffic sources, there are often important differences in the way vehicles vary by vehicles type e.g. less heavy vehicles at weekends. -The \code{timeVariation} function makes it easy to see how concentrations -(and many other variable types) vary by hour of the day and day of the week. +The \code{\link[=timeVariation]{timeVariation()}} function makes it easy to see how concentrations (and +many other variable types) vary by hour of the day and day of the week. The plots also show the 95\\% confidence intervals in the mean. The 95\\% confidence intervals in the mean are calculated through bootstrap @@ -197,42 +194,40 @@ simulations, which will provide more robust estimates of the confidence intervals (particularly when there are relatively few data). The function can handle multiple pollutants and uses the flexible \code{type} -option to provide separate panels for each 'type' --- see \code{cutData} for -more details. \code{timeVariation} can also accept a \code{group} option -which is useful if data are stacked. This will work in a similar way to -having multiple pollutants in separate columns. +option to provide separate panels for each 'type' --- see \code{\link[=cutData]{cutData()}} for +more details. \code{\link[=timeVariation]{timeVariation()}} can also accept a \code{group} option which is +useful if data are stacked. This will work in a similar way to having +multiple pollutants in separate columns. -The user can supply their own \code{ylim} e.g. \code{ylim = c(0, 200)} that -will be used for all plots. \code{ylim} can also be a list of length four to -control the y-limits on each individual plot e.g. \code{ylim = -list(c(-100,500), c(200, 300), c(-400,400), c(50,70))}. These pairs -correspond to the hour, weekday, month and day-hour plots respectively. +The user can supply their own \code{ylim} e.g. \code{ylim = c(0, 200)} that will be +used for all plots. \code{ylim} can also be a list of length four to control the +y-limits on each individual plot e.g. \code{ylim = list(c(-100,500), c(200, 300), c(-400,400), c(50,70))}. These pairs correspond to the hour, weekday, month +and day-hour plots respectively. The option \code{difference} will calculate the difference in means of two pollutants together with bootstrap estimates of the 95\\% confidence intervals in the difference in the mean. This works in two ways: either two pollutants -are supplied in separate columns e.g. \code{pollutant = c("no2", "o3")}, or -there are two unique values of \code{group}. The difference is calculated as -the second pollutant minus the first and is labelled as such. Considering -differences in this way can provide many useful insights and is particularly -useful for model evaluation when information is needed about where a model -differs from observations by many different time scales. The manual contains -various examples of using \code{difference = TRUE}. - -Note also that the \code{timeVariation} function works well on a subset of -data and in conjunction with other plots. For example, a -\code{\link{polarPlot}} may highlight an interesting feature for a particular -wind speed/direction range. By filtering for those conditions -\code{timeVariation} can help determine whether the temporal variation of -that feature differs from other features --- and help with source -identification. - -In addition, \code{timeVariation} will work well with other variables if +are supplied in separate columns e.g. \code{pollutant = c("no2", "o3")}, or there +are two unique values of \code{group}. The difference is calculated as the second +pollutant minus the first and is labelled as such. Considering differences in +this way can provide many useful insights and is particularly useful for +model evaluation when information is needed about where a model differs from +observations by many different time scales. The manual contains various +examples of using \code{difference = TRUE}. + +Note also that the \code{\link[=timeVariation]{timeVariation()}} function works well on a subset of data +and in conjunction with other plots. For example, a \code{\link[=polarPlot]{polarPlot()}} may +highlight an interesting feature for a particular wind speed/direction range. +By filtering for those conditions \code{\link[=timeVariation]{timeVariation()}} can help determine +whether the temporal variation of that feature differs from other features +--- and help with source identification. + +In addition, \code{\link[=timeVariation]{timeVariation()}} will work well with other variables if available. Examples include meteorological and traffic flow data. Depending on the choice of statistic, a subheading is added. Users can -control the text in the subheading through the use of \code{sub} e.g. -\code{sub = ""} will remove any subheading. +control the text in the subheading through the use of \code{sub} e.g. \code{sub = ""} +will remove any subheading. } \examples{ @@ -246,6 +241,13 @@ timeVariation(subset(mydata, ws > 3 & wd > 100 & wd < 270), pollutant = "pm10", ylab = "pm10 (ug/m3)") } +# for a single month of data +\dontrun{ +timeVariation(selectByDate(mydata, month = 1, year = 2000), +pollutant = "pm10", +panels = c("hour", "day")) # exclude superfluous 'month' panel +} + # multiple pollutants with concentrations normalised \dontrun{timeVariation(mydata, pollutant = c("nox", "co"), normalise = TRUE)} From b612d7f640bef7546d32ebd1edb8d348c36f4c8f Mon Sep 17 00:00:00 2001 From: Jack Davison Date: Sat, 27 Apr 2024 21:44:53 +0100 Subject: [PATCH 2/3] docs: update `NEWS.md` --- NEWS.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index c52572a5..5a6c5831 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,7 +2,9 @@ ## New Features -- add option to `corPlot` to carry through "use" option in `cor`. +- `timeVariation()` has gained the `panels` argument. This allows users to both specify the order of the bottom row of panels (each of "hour", "day", and "month") but also exclude panels. This is likely most useful when only a single month of data has been provided; setting `panels = c("hour", "day")` will exclude the superfluous 'month' panel entirely. + +- add option to `corPlot()` to carry through "use" option in `cor`. ## Bug fixes From a7a3f36434b02b514e99247cc2218d827b0f822d Mon Sep 17 00:00:00 2001 From: Jack Davison Date: Sat, 27 Apr 2024 23:16:39 +0100 Subject: [PATCH 3/3] docs: fix issues w/ function linking --- R/timeVariation.R | 2 +- man/timeVariation.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R/timeVariation.R b/R/timeVariation.R index 28ad4732..40d8142f 100644 --- a/R/timeVariation.R +++ b/R/timeVariation.R @@ -166,7 +166,7 @@ #' @param month.last Not used. Please use the `panels` argument. #' @param plot Should a plot be produced? `FALSE` can be useful when analysing #' data to extract plot components and plotting them in other ways. -#' @param ... Other graphical parameters passed onto [lattice:xyplot()] and +#' @param ... Other graphical parameters passed onto [lattice::xyplot()] and #' [cutData()]. For example, in the case of [cutData()] the option `hemisphere #' = "southern"`. #' diff --git a/man/timeVariation.Rd b/man/timeVariation.Rd index 06fc9368..1a637150 100644 --- a/man/timeVariation.Rd +++ b/man/timeVariation.Rd @@ -163,7 +163,7 @@ data to extract plot components and plotting them in other ways.} \item{month.last}{Not used. Please use the \code{panels} argument.} -\item{...}{Other graphical parameters passed onto \code{\link[=lattice:xyplot]{lattice:xyplot()}} and +\item{...}{Other graphical parameters passed onto \code{\link[lattice:xyplot]{lattice::xyplot()}} and \code{\link[=cutData]{cutData()}}. For example, in the case of \code{\link[=cutData]{cutData()}} the option \code{hemisphere = "southern"}.} } \value{