MainText.Rmd

--- 
title: "A new approach to interspecific synchrony in population ecology using tail association"
author: "Shyamolina Ghosh, Lawrence W. Sheppard, Philip C. Reid, Daniel Reuman"
fontsize: 12 pt
geometry: "left=1 in,right=1 in,top=1 in,bottom=1 in"

output: 
  pdf_document:
    number_sections: no
    keep_tex: yes
    fig_caption: yes
    includes:
      in_header: head_MainText.sty

mainfont: Times New Roman         
tables: True
link-citations: True
urlcolor : blue
indent : True

csl: TheAmericanNaturalist.csl
bibliography: REF_CIS.bib
---

```{r setup_MainText, echo=F}
library(rmarkdown)
knitr::opts_chunk$set(echo = TRUE, fig.pos = "H")
options(scipen = 1, digits = 5) #This option round all numbers appeared in the inline r code upto 5th digit
seed<-101
```

\noindent \emph{Affiliations:}

\noindent Ghosh: Department of Ecology and Evolutionary Biology and Kansas Biological Survey, University of Kansas, Lawrence, KS, 66045, USA

\noindent Sheppard: Department of Ecology and Evolutionary Biology and Kansas Biological Survey, University of Kansas, Lawrence, KS, 66045, USA

\noindent Reid: Continuous Plankton Recorder Survey, The Marine Biological Association, The Laboratory, Citadel Hill, Plymouth PL1 2PB, UK; School of Biological & Marine Sciences, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK

\noindent Reuman: Department of Ecology and Evolutionary Biology and Kansas Biological Survey, University of Kansas, Lawrence, KS, 66045, USA

\noindent \emph{Correspondence:} Daniel Reuman, 2101 Constant Ave, Lawrence, KS, 66047, reuman@ku.edu, 626 560 7084. 


\noindent \emph{Short title/ Running head:} Interspecific synchrony and tail association


\newpage

# Abstract 

\noindent Standard methods for studying the association between two ecologically important variables
provide only a small slice of the information content of the association, but
<!--DAN CHANGED: methods--> statistical approaches<!--END CHANGES--> are available<!--DAN CHANGED:, based on *copulas*,--><!--END CHANGES--> that provide comprehensive information.
In particular,<!--DAN CHANGED:copula--> available <!--END CHANGES-->approaches can reveal
*tail associations*, i.e., accentuated or reduced associations 
between the more extreme values of variables. We here study the nature 
and causes of tail associations between phenological or population-density 
variables of co-located species, and their ecological importance. We employ a 
simple method of measuring tail associations which we call the *partial Spearman correlation*.
Using multidecadal, multi-species spatiotemporal datasets on aphid first flights and marine 
phytoplankton population densities, we assess the potential for tail association
to illuminate two major topics of study in community ecology: the stability or instability of
aggregate community measures such as total community biomass and its relationship
with the synchronous or compensatory dynamics of the community's constituent species;
and the potential for fluctuations and trends in species phenology to
result in trophic mismatches. We find that positively associated fluctuations
in the population densities of co-located species commonly show asymmetric tail 
associations, i.e., it is common for two species' densities
to be more correlated when large than when small, or vice versa. Ordinary measures
of association such as correlation do not take this asymmetry into account. Likewise, 
positively associated fluctuations in the phenology of co-located species 
also commonly show asymmetric tail associations. We provide evidence that tail associations
between two or more species' population density or phenology time series can be inherited
from mutual tail associations of these quantities with an environmental driver. We argue 
that our understanding of community dynamics and stability, and of
phenologies of interacting species, can be meaningfully improved in future work by taking into
account tail associations. 

\vspace{.5cm}

\noindent \textbf{\textit{Keywords:}} aphids, copula, inter-species synchrony, 
match-mismatch hypothesis, plankton, tail association

\newpage

# Introduction\label{Introduction}

All ecologists study relationships between biological and environmental 
variables and among biological variables. But standard methods for studying the association between two variables
provide only a small slice of the information content of the association.
For instance, the two pairs of variables in Fig. \ref{pedagogfig}A, B have identical
Pearson correlation coefficients, and also have identical Spearman correlation coefficients, 
but nonetheless display
very different patterns of association [@Ghosh_copula; @ghosh2020tail].<!--***Shya CHANGED: added new citation of our Ecosphere paper.--> Correlations
are not the only way to study associations,
but they are very commonly used, and other standard methods in ecology
provide a similarly limited amount of information that neglects patterns of association
[@nelsen2006_copula; @Genest2007; @joe2014_dependence; @MaiScherer2017; @Anderson2018] that seem likely 
to be ecologically important [@Ghosh_copula; @ghosh2020tail].<!--***Shya CHANGED: added new citation of our Ecosphere paper.-->

<!--DAN CHANGED:
The variables of Fig. \ref{pedagogfig}A
(respectively, Fig. \ref{pedagogfig}B) are more strongly related in the left
(respectively, right) portions of their distributions,
thereby displaying asymmetric *tail association*. For two 
positively associated variables, *left-tail* (respectively, *right-tail*) 
*association* is stronger association
between values in the left or lower portions (respectively, the right or upper
portions) of the two distributions, as in Fig. \ref{pedagogfig}A 
(respectively, Fig. \ref{pedagogfig}B). Tail association is 
a potentially important pattern of association
that is not captured by standard correlation coefficients. -->
The variables of Fig. \ref{pedagogfig}A
(respectively, Fig. \ref{pedagogfig}B) are more strongly related in the left
(respectively, right) portions of their distributions,
thereby displaying asymmetric associations of the distribution tails, henceforth called
asymmetric *tail association*. For two 
positively associated variables, stronger association
between values in the left or lower portions of the distributions of the variables is henceforth
referred to as *left-tail association* (Fig. \ref{pedagogfig}A), whereas stronger association
between values in the right or upper portions of the distributions of the variables is henceforth
referred to as *right-tail association* (Fig. \ref{pedagogfig}B). The word "distribution"
is sometimes omitted from the terminology, but implied. Tail association is 
a potentially important pattern of association
that is not captured by standard correlation coefficients. 
<!--END MODIFIED TEXT-->

Statistical approaches exist, however, that provide a complete description
of the relationship between variables;  these approaches are based on the idea of the *copula*. 
Tail associations are an important aspect of a copula approach to dependence,
and tail association will be a focus of this paper.
We here give a conceptual flavor of copulas before subsequently focusing on 
tail association. <!--DAN CHANGED:-->We introduce copulas instead of proceeding directly to 
tail associations, for three reasons: to properly credit the copula ideas at the root of 
our tail association tools, and the researchers who developed them;
to indicate the origin of our tail association tools, so that future researchers
seeking to generalize our approach will have a place to start; and to introduce ideas
(normalized rank plots - see below) that are necessary to define our measures of
tail association.<!--END MODIFIED TEXT--> Copulas can be used to separate the 
information content of a bivariate dataset, $(x_t,y_t)$ for $t=1,\ldots,T$, 
into two non-overlapping parts: the information in the marginal
distributions (which is not about the association between the variables)
and the rest of the information (which is solely about the association).
Following @Ghosh_copula and @Genest2007, the isolated information about the 
association between $x_t$ and $y_t$ is revealed by the plot of $u_t$
against $v_t$, where $u_t$ is the rank of $x_t$ in the set
$\{ x_1,x_2,\ldots,x_T \}$, divided by $T+1$; and $v_t$ is the rank 
of $y_t$ in the set
$\{ y_1,y_2,\ldots,y_T \}$, also divided by $T+1$. Here the rank of the smallest
element of a set is understood to be $1$. We refer to the $u_t$ and $v_t$ as *normalized ranks*
of the $x_t$ and $y_t$. We refer to the plot of $v_t$ against $u_t$ as 
the *normalized rank plot* for $y_t$ and $x_t$. 
For instance, the normalized rank plots for
Fig. \ref{pedagogfig}A,B are in Fig. \ref{pedagogfig}C,D, and show the asymmetric 
associations in the tails.
The normalized rank plot reflects the copula structure of 
$(x_t, y_t)$ [@Ghosh_copula; @Genest2007]. 
Ranking makes the marginal distributions uniform, isolating only the information
on association between the variables.<!--DAN CHANGED:--> @Genest2007 states that 
inferences about dependence structures should always be based on ranks.<!--END MODIFICATION--> 
It is likewise the purpose of copula approaches
to separate association information from information on marginals.

We emphasize that we have not here provided a formal definition of copulas, instead only 
introducing the fundamental copula idea of separating dependence information from
information on marginals.
Brief [@Genest2007; @Anderson2018; @Ghosh_copula] 
and comprehensive [@nelsen2006_copula; @joe2014_dependence; @MaiScherer2017] introductions 
to copulas are available elsewhere. Copulas can also be used to study multivariate data. 
Copula approaches are applied
widely and to great effect in fields such as finance and neuroscience [@Li2000; @Kim2008;
@Serinaldi2008; @onken2009; @li2013; @Emura2016; @She2018; @Goswami2018], 
but only rarely, so far, in ecology
[@Valpine2014; @Anderson2018; @Popovic2019; @Ghosh_copula; @ghosh2020tail].<!--***Shya CHANGED: new citation of our ecosphere paper added.-->
The potential of copulas for
improving ecological understanding was 
argued by @Ghosh_copula, and those authors also introduced tail association as an important aspect
of copula structure, and elaborated the relationship
between tail association and copulas. 

The study of @Ghosh_copula was a wide-ranging study of the importance, 
causes and consequences of copula structures in associations between
ecological variables. One of the main foci of that paper was 
associations between fluctuations through time of 
population density or phenological measurements of the same
species in different locations. This study instead focuses on population
density and phenological measurements of different species in the same location.
@Ghosh_copula studied, for instance, associations between first-flight time
series, for a given species of aphid, measured at different locations in 
the United Kingdom (UK); and 
associations between plankton density time series, for a given plankton taxon, 
measured at different locations in seas around the UK. We instead study 
associations between first-flight or population density time series
measured in the same location for different (sympatric) species. Thus,
in contrast with the study of @Ghosh_copula, this study 
is more part of community ecology than of spatial ecology.
Our reasons for this shift are as follows.

First, *synchronous* (positively correlated) 
and *compensatory* (negatively correlated) 
population density dynamics
of different species occupying the same area are longstanding 
topics of concern in community ecology, with important ramifications
for the stability or instability of aggregate community or ecosystem
properties [@raimondo2004interspecific; @kent2007synchrony;
@loreau2008species; @gonzalez2009causes; @jochimsen2013compensatory];
there are reasons to believe tail associations in this context will play an important
but unstudied role in understanding these topics.
A major past insight into community dynamics [@gonzalez2009causes]
was that an aggregate 
property of a community, such as its total biomass, can be relatively
stable through time although its constituent parts (population 
biomasses of individual species)
are highly variable, if the parts show compensatory dynamics [@hallett2014]. 
Likewise,
synchrony amplifies community biomass variability because
the concordant variations of species biomass time series reinforce each
other in the total [@Ma2017]. If synchronous fluctuations show right-tail association,
then species are highly abundant simultaneously, which may produce 
years of extremely high community biomass. Alternatively, if synchronous
fluctuations show left-tail association, species are very scarce simultaneously,
potentially producing years of extremely low community biomass. 
Thus the tail association of synchrony, not just the presence and strength
of synchrony, may independently influence temporal variability of 
aggregate community properties. This is revisited in the \nameref{Discussion}.

Second, studies of the phenology of species interacting in one area
have also played a central role in community ecology, with 
important ramifications for whether and to what extent interactions 
will be modified by climate change [@Durant2007; @Yang2010];
there are reasons to believe tail 
associations between variables in this context may play an
important role, as well.
As climate changes and phenologies shift, there is the potential for 
phenologies of interacting species to shift differently, disrupting
the interaction [@Thackery2010]. This idea is referred to as the match-mismatch
hypothesis. Even if, for instance, year-to-year fluctuations in the emergence times
of two interacting species are 
highly correlated, if this correlation is principally in the 
right (respectively, left) tails of the distributions of possible emergence times, 
so that early (respectively, late) emergences of the species are actually uncorrelated, then 
mismatched years are likely to occur, impacting the species. 
Such mismatches will occur, in this conceptual example, 
when emergence is early (respectively, late). 
Essentially, even with substantial correlation between emergence dates of species, if this correlation
is principally in one of the tails, then uncorrelated emergences, and therefore mismatches, 
can occur under some conditions. 
One potential mechanism by which early emergences, for example,
may be uncorrelated between species while later emergences remain correlated
is if both species follow the same environmental cue for their 
emergence, but physiological limitations of only one of the species prevent 
emergence before a certain date. Advancing emergence dates of myriad species
make this scenario more plausible.

We here begin exploring whether tail associations may be important for 
studies of synchrony and compensatory dynamics, and for studies
of phenology and the match-mismatch hypothesis. We use a 
56-year dataset of population densities of 4 species of dinoflagellates
from the *Certaium* genus, from 15 locations in the seas around the 
UK; and a 35-year dataset of annual first-flight dates 
for 20 species of aphid from 10 locations within the UK.
The terms left- and right-tail 
association, defined above, do not apply to negatively associated 
variables, because the 
negative association means values in the left tail
of one variable are associated with those in the right tail of the other; 
slightly modified methods are required to study tail association and its
asymmetry in negatively associated variables. 
But our aphid and plankton
population and phenology variables were 
almost exclusively positively associated with each other (see \nameref{Results}).
Therefore, we introduce methods and present results in this study
chiefly for the case of positively associated variables, 
returning to the topics of negatively 
associated variables and compensatory population dynamics 
in the \nameref{Discussion}.

In addition to examining whether tail association in our data is
asymmetric, we also test for possible causes of such patterns.
One possible mechanism, similar to some of the mechanisms 
explored by @Ghosh_copula, 
is explained for the *Ceratium* example as
follows. Earlier work showed that average sea surface temperature 
is an important
correlate of phytoplankton abundance in our data 
[e.g., @defriez2016climate; @sheppard2017; @sheppard2019]:
cold water is associated with more phytoplankton, likely because upwelling and 
mixing of the surface and deeper ocean layers bring both 
nutrients and cold water to the photic zone. 
However, if it is the case for a given location 
that very cold water is associated with no more
*Ceratium*, on average, than is moderately cold water, then 
that corresponds to a positive relationship and a left-tail association 
between the "coldness" of the surface water (measured, for instance,
by how many degrees colder the water is than average)
and *Ceratium* abundance. If such tail association is strong and consistent
across *Ceratium* species, it should
produce positive relationships with left-tail association between 
the abundance time series of the species. Likewise, 
in locations for which the winter coldness-*Ceratium* abundance association
shows less left-tail association, one should see less
left-tail association between different *Ceratium* species.
So tail association between two species may be inherited from joint tail
association of both species on a common environmental driver.
Phytoplankton are also strongly influenced by the abundant generalist copepod 
consumer *Calanus finmarchicus*, so our actual investigation of the mechanism 
proposed here will take into account this influence as well as the 
association with sea surface temperature. For aphid first flight, we examine
the same potential mechanism, but the relevant driver in that case is 
winter temperature.

Thus this paper focuses on whether and why population density 
or phenological time series of co-located species may show asymmetric 
patterns in their tail-associations, with a focus on positively associated variables
because positive associations are what occurred in the 
available data. We ask the following specific questions.
(Q1) Do synchronous/positively correlated population density or phenological time series
of co-located species commonly show asymmetric tail associations? (Q2) If so, 
what are the causes of these patterns? We examine potential ecological consequences of 
asymmetric tail associations in the \nameref{Discussion}. We regard our investigation
as a first step toward a better understanding of the potential importance of 
asymmetric tail associations for such central ecological topics as synchrony and 
compensatory dynamics in communities and their influence on community stability;
and the match-mismatch hypothesis in phenology. The \nameref{Discussion} also has 
additional thoughts on next steps toward this goal. 
Our results and the conceptual considerations
introduced above are good evidence, in our view, of the potential for tail
association to make a crucial difference in how ecologists understand these 
important topics.


```{r read_res,echo=F}
res_ff<-readRDS("./Results/aphid_results/ff_npa_stat_results/cor_npa_diff_ff_ln_all.RDS")
res_cer<-readRDS("./Results/plankton_results/npa_stat_results/cor_npa_diff_plankton_ln_all.RDS")
```

# Methods\label{M&M}

## Data\label{Data} 

Our population dataset comprised average annual 
abundance estimates for 15 locations (Fig. \ref{SM-fig_plankton_map})
in the North Sea and British seas for 4 species from the *Ceratium*
genus of dinoflagellates, and for the generalist consumer copepod species
*C. finmarchicus*, for the 56 years 1958 to 2013.
These data were a subset of a larger dataset covering 22 taxa 
and 26 locations, analyzed by @sheppard2017, @sheppard2019, and @Ghosh_copula.
The locations are $2^\circ$ by $2^\circ$ grid cells.
The data were originally obtained from the 
Continuous Plankton Recorder (CPR) dataset, 
now operated and maintained by 
the Marine Biological Association of the United Kingdom. 
Data preprocessing steps were the same as used by @Ghosh_copula. 
*Ceratium* species were extracted in part because they
have a role in harmful algal blooms (red tides) [@Baek2009]; and also because four 
species were available from the genus (Table \ref{tab_plankton_aphid_info}), 
and we chose closely related 
species because they may 
be influenced in similar ways
by environmental variables. The 15 locations we used were 
selected from the 26 locations of the larger dataset 
(Fig. \ref{SM-fig_plankton_map}) as follows. 
First, to reduce the effects of sampling variation on statistical results,
we chose the subset of locations for which more than 35 years
of data were available for all species.
Second, for a given location, we excluded *Ceratium* species 
that were undetected for more than 10$\%$ of sampled years at that location. 
Finally, we considered only those locations for which at least two 
*Ceratium* species remained. We also had data on average growing season 
sea surface temperature for each grid cell and year
[@sheppard2017; @sheppard2019]. Earlier analyses [e.g., @sheppard2019]
demonstrated that sea surface temperature and *C. finmarchicus* 
abundance are important covariates of phytoplankton dynamics 
in UK seas, 
though associations between temperature and phytoplankton are
probably due to relationships both these variables have with nutrient 
abundance in surface ocean layers. Sea surface 
temperature data preprocessing was the same as used by @sheppard2017.


Our phenology dataset comprised annual first flight dates
for 20 aphid species (Table \ref{tab_plankton_aphid_info})
from 10 locations across the UK (Fig. \ref{SM-fig_aphid_map}),
spanning the 35 years 1976 to 2010. These data were a subset of a larger
dataset covering 11 locations, analyzed previously by @sheppard2016
and @Ghosh_copula. The data were originally obtained from the 
Rothamsted Insect Survey suction-trap dataset 
[@harrington2014; @bell2015]. Data preprocessing 
was the same as that of @sheppard2016. 
Locations were screened, leading to the removal of one of the original
11 sampling locations, by requiring at least 30 years of data be 
available for all species, again to reduce sampling variation of
statistics. We also had time series of winter average 
temperature for each location and year. The winter temperature for 
year $t$ was the average of December of year $t-1$ to March of year $t$.
Earlier analyses have demonstrated the importance of winter temperature
for aphid first flight date [e.g., @sheppard2016].

<!--\pagebreak
\begin{centering}
\includegraphics[width=10cm]{./Results/pedagog_fig.pdf}
\captionsetup{parbox=none}
\captionof{figure}[short caption]{Pedagogical figure for introducing tail association and partial Spearman correlation.
(A, B) Two pairs of variables that have identical Pearson (P) correlation, and also identical Spearman (S) 
correlation,
but that differ markedly in the nature of the association. Panel A shows stronger left- than right-tail association
and panel B shows the reverse. (C, D) Normalized rank plots 
(see \nameref{Introduction}) for panels A and B, respectively. 
(E, F) Graphics supporting the definitions of partial Spearman correlation and our statistic measuring 
asymmetry of tail association (see \nameref{M&M}). This figure is similar in some respects to Figs 1 and 7 of Ghosh \emph{et al} (2020).}\label{pedagogfig}
\end{centering}-->

<!--Table with species info -->
```{r tab_plankton_aphid_info, echo=F, results='asis',message=F}
library(tinytex)
library(tibble)
library(kableExtra)
library(dplyr)

#aphid_info_org <- tibble(c0=c(1:20),
#                     c1=c("Apple grass aphid","Bird cherry oat aphid","Black bean aphid","Blackberry cereal aphid",
#                       "Blackcurrant sowthistle aphid",
#                      "Corn leaf aphid","Currant lettuce aphid","Damson hop aphid","Grain aphid","Green spruce aphid",
#                      "Leaf-curling plum aphid","Mealy cabbage aphid","Mealy plum aphid","Pea aphid","Peach potato aphid",
#                      "Potato aphid","Rose grain aphid","Shallot aphid","Sycamore aphid","Willow carrot aphid"),
#                    c2=c("Rhopalosiphum insertum", "Rhopalosiphum padi", "Aphis fabae", "Sitobion fragariae", "Hyperomyzus lactucae",
#                      "Rhopalosiphum maidis", "Nasonovia ribisnigri","Phorodon humuli","Sitobion avenae","Elatobium abietinum",
#                       "Brachycaudus helichrysi","Brevicoryne brassicae","Hyalopterus pruni","Acyrthosiphon pisum","Myzus persicae",
#                       "Macrosiphum euphorbiae","Metopolophium dirhodum","Myzus ascalonicus","Drepanosiphum platanoidis","Cavariella #aegopodii")
#)

plankton_aphid_info <- tibble(c0=c(1:4,1:20),
                     c2=c("Ceratium fusus","Ceratium furca","Ceratium tripos","Ceratium macroceros",
                          "Rhopalosiphum insertum", "Rhopalosiphum padi", "Aphis fabae", "Sitobion fragariae",
                          "Hyperomyzus lactucae", "Rhopalosiphum maidis", "Nasonovia ribisnigri","Phorodon humuli",
                          "Sitobion avenae","Elatobium abietinum", "Brachycaudus helichrysi","Brevicoryne brassicae",
                          "Hyalopterus pruni","Acyrthosiphon pisum","Myzus persicae", "Macrosiphum euphorbiae",
                          "Metopolophium dirhodum","Myzus ascalonicus","Drepanosiphum platanoidis","Cavariella aegopodii"                                           ))
knitr::kable(plankton_aphid_info, "latex", booktabs = T, linesep = "\\addlinespace",align="c",
      caption = "Names of 4 plankton and 20 aphid species for which data were used. \\label{tab_plankton_aphid_info}",
      col.names = NULL)%>%
      group_rows("Plankton", 1,4) %>%
      group_rows("Aphids", 5,24) %>%
      column_spec(column = 2,italic=T)%>%add_header_above(header=c("Species ID" = 1, "Latin binomial" = 1))
```

## Statistical methods\label{Methods}

Given bivariate data $(x_t,y_t)$ for a set of years, $t$, of size $T$, 
and after computing
normalized ranks $(u_t,v_t)$ as described in the \nameref{Introduction}, tail
association and asymmetry of tail association were measured using the 
*partial Spearman correlation* of @Ghosh_copula, which we here reintroduce. 
The standard Spearman correlation itself measures association between the variables 
$x_t$ and $y_t$ (or between $u_t$ and $v_t$ - recall the Spearman correlation
is based on ranks, so is the same for both sets of variables);
but Spearman correlation measures only the overall 
association of the samples and cannot tell us how association varies across the distributions
of the variables.
Given two bounds $1\leq l_b < u_b \leq 1$, we define the boundary lines 
$u+v=2l_b$ and $u+v=2u_b$ (Fig. \ref{pedagogfig}E), 
which intersect the unit square on which 
normalized ranks are plotted.<!--DAN MODIFIED:--> The partial Spearman correlation associated 
with the bounds $l_b$ and $u_b$ will be the portion of the Spearman correlation attributable
to the points that fall between these boundary lines.<!--END CHANGES-->
The partial Spearman correlation for the
band between these boundaries and within the unit square is
\begin{equation}\label{eq.Cor}
   \cor_{l_b,u_b}(u,v) = \frac{\sum 
   (u_t-\mean(u)) (v_t-\mean(v))}{(T-1)\sqrt{\var(u)\var(v)}}.
\end{equation}
\noindent Here, sample means and sample variances are computed using all $T$ data points, 
but the sum, $\Sigma$, is over only the indices $t$ for which $u_t+v_t > 2l_b$ and 
$u_t+v_t < 2u_b$. The partial Spearman correlation is not defined if there
are no points in the band. For positively associated 
$(u_t,v_t)$, the partial Spearman correlations 
$\cor_{0,b}$ and $\cor_{1-b,1}$ for $b\leq 0.5$ (Fig. \ref{pedagogfig}F)
measure association in the left and right tails, respectively, and can be compared
via a difference, $\cor_{0,b}-\cor_{1-b,1}$, to measure asymmetry of tail 
association. Positive values (respectively, negative) of this difference mean stronger left-tail
(respectively, right-tail) association.
The sum of $\cor_{0,0.5}$ and 
$\cor_{0.5,1}$ (or the sum of
$\cor_{l_{b_k},u_{b_k}}$ for any other choice of bands 
$(l_{b_k},u_{b_k})$ that partition $(0,1)$) equals the standard 
Spearman correlation, as long as no points happen to lie exactly 
on the bounds. 
<!--DAN CHANGED:-->
Notation is summarized in Table \ref{SM-tab_notation}.
<!--END OF NEW TEXT-->

For each sampling location, $n$, we computed a matrix, $C^n$, which we call the 
*community tail association matrix*, which quantifies
asymmetry of tail association between pairs of aphid species or pairs of *Ceratium* 
species at $n$. Denote by
$s_i^n(t)$ the aphid first flight date or the *Ceratium* population density 
for sampling location $n$, for the $i^{\text{th}}$ species that was present in the 
cleaned data for location $n$, and for year $t$.
We then defined the matrix $C^n$ 
by defining $C^n(i,j)$ for two aphid or *Ceratium* 
species $i,j$, as follows. <!--DAN MODIFIED THE FOLLOWING TEXT-->First, $C^n(i,j)$ was not defined, or was defined to
equal the missing-data space holder "NA", if one of three conditions held true: A) $i=j$; or if B) the hypothesis
that $s_i^n(t)$ and $s_j^n(t)$ were independent could not be rejected 
($5\%$ level, using a test described by @Genest2007, implemented in the 
function `BiCopIndTest` in the `VineCopula` package in R);
or if C) independence was rejected but the Spearman correlation
of $s_i^n(t)$ and $s_j^n(t)$ was negative.<!--END CHANGES--> Otherwise we defined 
$C^n(i,j)=\cor_{0,b}(s_i^n(t),s_j^n(t))-\cor_{1-b,1}(s_i^n(t),s_j^n(t))$,
where the partial Spearman correlations in this expression were computed
over the times, $t$, for which data were available for location $n$.
The entry $C^n(i,j)$ was set to NA if independence of 
$s_i^n(t)$ and $s_j^n(t)$ 
could not be rejected because attempting to quantify
tail association (or anything else about association) for independent
variables is pointless. $C^n(i,j)$ was set to NA for negatively associated
$s_i^n(t)$ and $s_j^n(t)$ because negative association occurred for only 
one pair of species in one location 
in our data (plankton sampling location 18, species *C. furca* and 
*C. macroceros*, see \nameref{Results}). 
Tail association
for negatively associated variables should be studied, and this
topic is revisited in the \nameref{Discussion},
but negative associations were too rare in our data to study them. 
The community tail association matrix $C^n$ is symmetric. The
value $b=1/3$ was used for plankton locations, whereas $b=1/2$
was used for aphid locations because aphid time series were shorter, and
larger $b$ reduces sampling variation for our statistics [@Ghosh_copula].
See Appendix \ref{SM-boundb} for more information on the choice of $b$.

We also computed a matrix $D^n$, which we call the *community-driver tail association matrix*,
which quantifies tail association between 
aphid or plankton time series and their covariates. 
Denote by $d_k^n(t)$ the value of the $k^{\text{th}}$ covariate 
that operated at sampling location $n$ in year $t$ (winter temperature for an
aphid sampling location, sea surface temperature or *C. finmarchicus* density
for a *Ceratium* location). We then defined $D^n$ by defining 
$D^n(i,k)$ for an aphid or *Ceratium* species $i$ and a covariate $k$,
as follows. <!--DAN MODIFIED THE FOLLOWING-->First, $D^n(i,k)$ was not defined, 
or was set to NA, if 
the hypothesis that $s_i^n(t)$ and $d_k^n(t)$ were independent could not 
be rejected ($5\%$ level, `BiCopIndTest`). Otherwise, we either: A) 
set $D^n(i,k) = \cor_{0,b}(s_i^n(t),d_k^n(t))-\cor_{1-b,1}(s_i^n(t),d_k^n(t))$
if  $s_i^n(t)$ and $d_k^n(t)$ were positively associated (positive Spearman correlation);
or B) set $D^n(i,k) = \cor_{0,b}(s_i^n(t),-d_k^n(t))-\cor_{1-b,1}(s_i^n(t),-d_k^n(t))$
if  $s_i^n(t)$ and $d_k^n(t)$ were negatively associated (negative Spearman 
correlation).<!--END CHANGES-->
For aphid first-flight time series, for which $k$ was always $1$ and 
$d_k^n(t)$ was winter temperature in location $n$, associations between
$s_i^n(t)$ and $d_k^n(t)$ were always negative when they were significant
(see \nameref{Results}). The same was true for *Ceratium* density time series and
sea surface temperature. 
Thus our practice of using $-d_k^n(t)$ was
equivalent, in the case of temperature variables, to using a "coldness"
index such as the number of degrees colder than an average 
or typical reference temperature, in place of temperature. Aphid and *Ceratium* data 
were always positively associated with the coldness index when they were 
significantly associated with it. Although *C. finmarchicus* abundance was 
positively associated with *Ceratium* time series in some sampling locations
and negatively associated in others, it always showed the same sign of association
with all *Ceratium* species within a location. 
Using $-d_k^n(t)$ in place
of $d_k^n(t)$ when negative associations with aphid or *Ceratium* data occurred 
allowed us to study asymmetry of tail association using methods developed with 
positively associated variables in mind. 
<!--***DAN: Lawrence got confused by the below text. I figured it was not essential here so removed it, storing
here for the time being, just in case:
Note that using a similar procedure to 
study pairs of negatively associated biological variables (e.g., two aphid or plankton
time series) would be inappropriate because in that case there is no 
canonical choice of which variable to take the negative of.--> 
We again used
$b=1/3$ for plankton data and covariates, and $b=1/2$ for aphid data and winter
temperature. For display, we horizontally concatenated the matrices
$C^n$ and $D^n$ and displayed matrix values using color. 

We used the community tail association matrix $C^n$ for each sampling location $n$ to answer
Q1 from the \nameref{Introduction}, as follows. First, we counted the number, 
$N_L^n$, of entries of $C^n$ which were not NA and which were greater than 
$0$. These were the "left-tail dominant" species pairs, i.e., pairs of species 
for which association was stronger in the left rather than in the right tails of the species distributions.
We also counted the number, $N_R^n$, of right-tail dominant pairs, for which 
the corresponding entries of $C^n$ were negative. If $N_L^n$ was substantially
greater than (respectively, substantially less than) $N_R^n$ for a location $n$,
it suggested that left-tail association (respectively, right-tail association) 
between species in that location was dominant, answering Q1 in the affirmative. 
We also calculated $A_{C,L}^n$, the 
sum of all positive, non-NA entries of $C^n$; $A_{C,R}^n$, the sum
of all negative, non-NA entries of $C^n$; and $A_C^n=A_{C,L}^n+A_{C,R}^n$, a
general measure of asymmetry of tail association in location $n$. 
We refer to $A_C^n$ as the *total community tail association*.
We additionally calculated
the normalized quantities $F_{C,L}^n=A_{C,L}^n/(A_{C,L}^n+|A_{C,R}^n|)$
and $F_{C,R}^n=A_{C,R}^n/(A_{C,L}^n+|A_{C,R}^n|)$. 
Because $0\leq F_{C,L}^n \leq 1$, 
$0 \leq |F_{C,R}^n| \leq 1$, and $F_{C,L}^n+|F_{C,R}^n|=1$, the relative sizes of
$F_{C,L}^n$ and $|F_{C,R}^n|$ indicate the relative dominance of left- and right-tail
association between species at location $n$. Together, all these statistics provide an answer to Q1.

We used the community tail association matrix, $C^n$, and the community-driver 
tail association matrix, $D^n$, to answer Q2 from the \nameref{Introduction}
for the *Ceratium* and aphid data, as follows. First, we calculated 
$A_D^n$, the sum of all non-NA entries of $D^n$. This was analogous to 
$A_C^n$, but calculated using the matrix $D^n$ instead of the matrix
$C^n$. We refer to $A_D^n$ as the *total community-driver tail association*.
We then examined whether the values $A_C^n$ and $A_D^n$ were 
correlated across locations, $n$. 
This tests the causal hypothesis in the \nameref{Introduction} because it 
tests whether *Ceratium* or aphid time series having 
stronger right-tail (respectively, left-tail) association 
with environmental covariates in a given location also had 
stronger right-tail (respectively, left-tail) association 
with each other at that location. 
Recall that an environmental covariate was reversed (its negative was used)
when it was negatively associated with a *Ceratium* or aphid species, and that
no covariate was ever significantly positively associated with some 
*Ceratium* or aphid species and significantly negatively associated with another
such species in the same location (see \nameref{Results}).

We also answered Q2 for the aphid data as follows. Within a location, $n$, for each species,
$i$, we computed the mean $\alpha^n_C(i)$ of all non-NA entries $C^n(i,j)$, 
for $j$ ranging across all species for which we had data. This quantity measures an
average tail association of species $i$ with other species in the
same location, with positive values for greater left-tail association and 
negative ones for greater right-tail association. We refer to 
$\alpha^n_C(i)$ as the *species-community tail association* for species $i$.
We then defined
$\alpha^n_D(i)$ as the sum of all non-NA entries $D^n(i,k)$, for $k$ 
ranging across all covariates for which we had data. We refer to this
as the *species-driver tail association* for species $i$. For 
aphids we only had one covariate, winter temperature, so $\alpha^n_D(i)=D^n(i,k)$ 
for $k=1$ corresponding to winter temperature. We provide the more general
definition of $\alpha^n_D(i)$ that applies when more
covariates were available so the definition can also be considered 
(briefly, see below) for *Ceratium* data. We then examined, for each location, $n$, whether 
$\alpha^n_C(i)$ and $\alpha^n_D(i)$ were correlated across species, $i$.
This tests the causal hypothesis in the \nameref{Introduction} because it 
tests whether aphid species which were more right-tail (respectively, left-tail) 
associated with environmental covariates (winter temperature) also had 
time series that were more right-tail (respectively, left-tail) associated with 
the time series of other species in the location. Recall that
winter temperature was always negatively associated with aphid first flight
when it was significantly associated (see \nameref{Results}), 
and negative temperature (a coldness index) was used in computing $D^n(i,k)$.
Testing whether $\alpha^n_C(i)$ and $\alpha^n_D(i)$ were correlated across
species, $i$, within a location, $n$, was not practical for *Ceratium*,
because we only had data for at most four *Ceratium* species per sampling
location, an insufficient number to provide much statistical power
in testing for a correlation. 

# Results\label{Results}

<!--Results for Q1-->

<!--For Ceratium-->
Associations between *Ceratium* species were always positive when they were significant,
except for one pair of species in one location (plankton sampling location 18, species *C. furca* and 
*C. macroceros*).
Asymmetric tail association was very common between *Ceratium* population density
time series from the same location, answering Q1 in the affirmative for *Ceratium*;
for some locations, left-tail association between *Ceratium*
species was dominant, and for other locations right-tail association was dominant.
To show this, we show that for some locations, the community tail association 
matrix, $C^n$, was comprised largely of positive values, 
indicating a preponderance of left-tail association between *Ceratium* time 
series for the location (Fig. \ref{fig_CorlmCoru_plankton_map_loc12_26}A).
For such locations, *Ceratium* population densities are more likely to be correlated
across species at low population densities than at high densities.
For other locations, $C^n$ had mostly negative values, indicating a 
preponderance of right-tail association (Fig. \ref{fig_CorlmCoru_plankton_map_loc12_26}B).
For such locations, *Ceratium* population densities are more likely to be correlated
across species at high population densities than at low densities.
To demonstrate the same result in another way, we show that the 
statistics $F_{C,L}$ and $F_{C,R}$, plotted across all sampling locations
(Fig. \ref{fig_CorlmCoru_plankton_map_loc12_26}C), indicated that most *Ceratium*
sampling locations were dominated by either left- or right-tail association, with
approximately equal numbers of each, with only a few locations having
more symmetric tail association, on average across pairs of *Ceratium* species.

<!--For aphids-->
Associations between aphid time series were always positive 
when they were significant.
Asymmetric tail association was also very common between aphid first flight
time series from the same location, answering Q1 in the affirmative for aphids;
left-tail association was more common for some sampling locations
and right-tail association dominated for others, but for most sites right-tail
association dominated. 
To show this, we show that for some locations, the community
tail association matrix, $C^n$, was 
comprised of a slight majority of positive values, 
indicating more left- than right-tail association between aphid time 
series for the location (Fig. \ref{fig_CorlmCoru_ff_map_loc2_5}A);
whereas for other locations, $C^n$ had mostly negative values, indicating a 
preponderance of right-tail association (Fig. \ref{fig_CorlmCoru_ff_map_loc2_5}B).
To demonstrate the same result in another way, we show that the statistics 
$F_{C,L}$ and $F_{C,R}$, plotted across all sampling locations
(Fig. \ref{fig_CorlmCoru_ff_map_loc2_5}C), indicated that most aphid sampling 
locations had a preponderance of right-tail association, with only a few locations
having more left-tail association, and those only slightly more. Thus, for most locations,
aphid first flights are more correlated across species when first flights are 
later than average.

<!--\pagebreak
\begin{centering}
\textbf{ \hspace{1 cm} (A) \hspace{7 cm} (B)} \\
\includegraphics[width=7.5 cm]{./Results/plankton_results/npa_stat_results/loc12/loc12_Corl-Coru.pdf}
\includegraphics[width=7.5 cm]{./Results/plankton_results/npa_stat_results/loc26/loc26_Corl-Coru.pdf}\\
\textbf{ (C)} \\
\includegraphics[width=6.5 cm]{./Results/plankton_results/npa_stat_results/Corstat_LmU_values_on_map_sp_only.pdf} 
\captionsetup{parbox=none}
\captionof{figure}[short caption]{Either right- or left-tail association between population density time series of 
\emph{Ceratium} species could dominate, depending on the sampling location. (A, B) The community 
tail association matrix, $C^n$, and the community-driver tail association matrix, $D^n$ (\nameref{Methods}), 
horizontally concatenated, for example locations $n=12$ (A) and $n=26$ (B). 
See Table \ref{tab_plankton_aphid_info} for species names. 
All the non-NA values in $C^n$ were positive (red) for location $12$ (A), indicating left-tail
association dominated in that location; but values were largely negative (blue) for location
$26$ (B), indicating right-tail association dominated there. Matrix entries which were NA 
because time series were independent are displayed in yellow. The counts $N^n_L$ and $N_n^R$
(see \nameref{Methods}) also reflect the distinct tail association characteristics of the two
locations. \emph{C. fin.} = \emph{C. finmarchicus}; Temp. = temperature.
Green dots in $D^n$
represent variables which were originally negatively associated, so the negative
of the environmental covariate was used for calculating tail association.
See Fig. \ref{SM-fig_CorlmCoru_plankton_all_loc} 
for analogous figures for the other sampling locations. 
(C) The summary statistics $F_{C,L}$ and $F_{C,R}$ (see \nameref{Methods}) 
for each site show that association between \emph{Ceratium} species was either 
substantially dominated by the left or right tails of \emph{Ceratium} distributions, with the
exceptions of a few locations for which tail association was closer to symmetric. Site codes
are colored red or blue depending on which of $F_{C,L}$ or $F_{C,R}$ had higher magnitude. Values are
not plotted for site 3 because the hypothesis could not be rejected for that site
that dynamics of distinct \emph{Ceratium} species were independent. 
\label{fig_CorlmCoru_plankton_map_loc12_26}}
\end{centering}-->

<!--
\begin{center}
  \rule{0.3\textwidth}{\textheight}
  \captionof{figure}{text on the next page}
\end{center}-->

<!--Results for Q2-->

<!--Para for Ceratium, using A_C^n and A_D^n across n.-->
For the *Ceratium* data, the total community tail association, $A_C^n$, and the
total community-driver tail association, $A_D^n$, were significantly correlated
across locations, $n$, validating our
hypothesis from the \nameref{Introduction} for a cause of tail association between co-located
species, and helping to answer Q2. In other words, tail association between co-located 
species time series was apparently inherited from common tail association of the species 
on environmental drivers. 
Across our `r length(res_cer$CorlmCoru_all_ln_list)` locations, 
$A_C^n$ and $A_D^n$ were significantly positively correlated (Pearson correlation,
two-tailed test, Fig. \ref{fig_plankton_scatter}A). Thus locations for which
*Ceratium* density time series showed greater left-tail (respectively, right-tail) 
association with environmental covariates (measured with $A_D^n$)
also exhibited greater left-tail (respectively, right-tail) association between
density time series for distinct species (measured with $A_C^n$).

<!--Para for aphids, using A_C^n and A_D^n across n.-->
For the aphid data, the total community tail association, $A_C^n$, and the total community-driver
tail association, $A_D^n$, were positively but non-significantly correlated
across our `r length(res_ff$CorlmCoru_all_ln_list)` sampling locations
(Fig. \ref{fig_plankton_scatter}B). Thus locations for which aphid first-flight time series
showed greater left-tail (respectively, right-tail) association with winter temperature
also showed a non-significant tendency toward greater left-tail (respectively, right-tail) association between
the time series of distinct species. 
<!--***DAN CHANGED: The correlation may have been non-significant for the aphid data
-simply because there were slightly fewer aphid sampling locations than there were 
-plankton locations.
-Nevertheless, when combined with the plankton result, this aphid result tends to support 
-the hypothesis that tail association between co-located species time series can be inherited from
-common tail association on environmental drivers. See also the subsequent results for aphids.-->
The correlation was close to significant for the aphid data, and 
may have been non-significant simply because there were slightly fewer aphid sampling locations than there were 
plankton locations. See also the subsequent results for aphids, which were significant and which support
the same overall conclusions.
<!--END OF NEW TEXT-->

<!--\pagebreak
\begin{centering}
\textbf{ (A) \hspace{7 cm} (B)} \\
\includegraphics[width=7.5 cm]{./Results/aphid_results/ff_npa_stat_results/loc2/loc2_Corl-Coru.pdf}
\includegraphics[width=7.5 cm]{./Results/aphid_results/ff_npa_stat_results/loc5/loc5_Corl-Coru.pdf} \\
\textbf{ (C)} \\
\includegraphics[width=7.5 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_LmU_values_on_map_sp_only.pdf}
\captionsetup{parbox=none}
\captionof{figure}[short caption]{Either right-tail association between first-flight time series of 
aphid species could dominate, or left-tail association could be more common, 
depending on the sampling location. (A, B) The community tail association matrix, $C^n$, 
and the community-driver tail association matrix, $D^n$ (\nameref{Methods}), horizontally 
concatenated, for example locations $n=2$ (A) and $n=5$ (B).
See Table \ref{tab_plankton_aphid_info} for species names.
A slight majority of non-NA values in $C^n$ were positive (red) for location $2$ (A; see the
$N_L^n$ and $N_R^n$ counts displayed), indicating left-tail
association was slightly more common than right-tail association in that location. But values were largely 
negative (blue) for location
$5$ (B), indicating right-tail association dominated there. Matrix entries which were NA 
because time series were independent are displayed in yellow. Temp. = temperature.
Green dots in $D^n$
represent variables which were originally negatively associated, so the negative
of winter temperature was used for calculating tail association (\nameref{Methods}); this happened in all cases
for which temperature and first flight were significantly associated.
See Fig. \ref{SM-fig_CorlmCoru_ff_all_loc} 
for analogous figures for the other sampling locations. 
(C) The summary statistics $F_{C,L}$ and $F_{C,R}$ (see \nameref{Methods}) 
for each site show that association was either 
dominated by the right tails, or, for a few locations, 
showed slightly more left-tail association. Site codes
are colored red or blue depending on which of $F_{C,L}$ or $F_{C,R}$ had higher magnitude.}
\label{fig_CorlmCoru_ff_map_loc2_5}
\end{centering}-->

<!--Para for aphids using the second method.-->
Our second analysis using aphids, based on the species-community tail associations, $\alpha_C^n(i)$, 
and the species-driver tail associations, $\alpha_D^n(i)$ (\nameref{Methods}),
provided further evidence supporting our hypothesis for a cause of tail association between co-located
species (\nameref{Introduction}). For 8 of 10 sampling locations, $\alpha_C^n(i)$ and $\alpha_D^n(i)$
were significantly correlated across species, $i$ (Fig. \ref{fig_aphid_multipanel}).
In other words, for 8 of 10 locations, aphid species with greater left-tail (respectively,
right-tail) association with winter temperature also had greater left-tail (respectively,
right-tail) association with other aphid species.


# Discussion\label{Discussion}

<!--P: summary of answers to Qs-->
Our results show that synchronous population density or phenological time series
of co-located species can very commonly show asymmetric tail association. For some sampling locations
and species, tail association was predominantly in the left tails, and for others it was predominantly 
in the right tails of time series distributions, showing a new kind of ecologically
meaningful variation among ecosystems. The partial Spearman correlation presented
by @Ghosh_copula is a simple and effective way to measure tail association for ecological applications.
Our results also demonstrate a mechanism by which asymmetric tail association between species
can arise: it can
be inherited by joint tail association of the two species on the same environmental variables.
This mechanism seems likely to apply commonly when co-located species are influenced by the
same external factors. Our results convincingly show that standard correlation approaches 
omit phenomena that seem likely to be important for at least two major topics of interest in ecology:
synchronous/compensatory dynamics of species within a community and their influence on 
community stability; and shifting phenologies and the match-mismatch hypothesis. 

<!--P: Interpretations and consequences of patterns for Ceratium-->
The distinct tail association characteristics of *Ceratium* in different sampling areas 
around the UK may have consequences for the stability through time of total *Ceratium* abundance,
which may relate to harmful algal blooms because *Ceratium* species can have a role in such blooms [@Baek2009].
For locations in which left-tail association between *Ceratium* density time series
is dominant, *Ceratium* species are scarce simultaneously,
potentially producing years of very low total *Ceratium* biomass.
In contrast, for locations in which right-tail association is dominant, *Ceratium*
species are highly abundant simultaneously, which may produce 
years of very high *Ceratium* biomass, which may sometimes correspond to harmful algal blooms.
Our results show that the distinction between these two types of location relates to the
tail association of *Ceratium* species with their environmental covariates,
sea surface temperature and *C. finmarchicus* density. It may be useful to study
in future work why some locations principally have left-tail association with these
drivers and some principally have right-tail association.

<!--P: same for aphids-->
First-flight time series for populations of co-located aphid species were principally 
right-tail associated, i.e., more strongly correlated when first flights were later in the
season. Our results show this was probably because: cold winters delay aphid first flights,
but warm winters do not lead to first flights that are any earlier, on average, than those following moderate winters,
producing right-tail association between first flights and winter coldness across multiple species;
this common association leads to right-tail association between aphids. Thus winter temperature
fluctuations lead to temporally dispersed early but temporally coordinated late arrival times of aphid
species on summer hosts (many of which are crops, for the species we studied), a fact that may have pest-control 
significance. Winter temperature is known to 
influence the first-flight dates of virtually all the aphid species for which we had data
[@sheppard2016]. Overwintering aphids are sensitive to frost conditions, and so winters
probably reduce early spring populations on winter hosts plants. This then lengthens the time 
required for populations to reach sufficient densities to stimulate the production of winged morphs
for flight to summer host plants.


<!--P: 4) consequences of tail dependence in spatial synchrony as studied in BIVAN (skewness stuff), links to similar idea for community dynamics, and that is a reason what we have done is important-->
If $x_{s,l}(t)$ denotes the population density of species $s$ ($s=1,\ldots,S$) in location $l$ ($l=1,\ldots,L$)
at time $t$, we have here studied the nature and causes of tail association among the time 
series $x_{s,l}(t)$ for a fixed $l$ and for $s=1,\ldots,S$; whereas @Ghosh_copula studied
the nature, causes and consequences of tail association among the time series
$x_{s,l}(t)$ for fixed $s$ and $l=1,\ldots,L$, a distinct ecological context. One 
of the consequences studied by @Ghosh_copula
relates to and illuminates a potential consequence, mentioned above, of tail association for the ecological
context of this study. @Ghosh_copula showed that the skewness, though time, of the 
spatial-total time series $\sum_l x_{s,l}(t)$ is sensitive to the nature of tail association between
the $x_{s,l}(t)$ ($l=1,\ldots,L$), if these time series are positively associated with each other. Right-tail 
(respectively, left-tail) association tended to produce right (respectively, left) skew in the total.
Right skew corresponds to a spatial-total time series with exceptionally large values, 
i.e., to "spiky", unstable dynamics of the total population. Left skew 
corresponds to a spatial-total time series with low values, i.e., to dynamics of the total
population with a tendency to "crash". The total population can
be regarded as a landscape-level measure of the stability or variability of species 
$s$, and is important, for instance, if species $s$ is a pest or an exploited species.
For the same reasons, the skewness, through time, of the community-total time series
$\sum_s x_{s,l}(t)$ is sensitive to the tail association between the $x_{s,l}(t)$ ($s=1,\ldots,S$),
which we have here studied. Right-tail 
(respectively, left-tail) association again tends to produce right (respectively, left) skew in the total
time series.
In this community context, the total is an aggregate property of the community, and the variability of
this total has been used in an extensive literature [e.g., @hallett2014] to characterize community stability 
through time. This literature has explored the effects of synchronous versus compensatory
dynamics in the $x_{s,l}(t)$ ($s=1,\ldots,S$) on the stability of the total community time series,
$\sum_s x_{s,l}(t)$. But our results show that, even if all the species time series 
$x_{s,l}(t)$ ($s=1,\ldots,S$) are synchronous with each other, the tail association properties
of these time series can influence the stability of the community-total time series. 

Although our results are sufficient to show that tail associations are
likely to be important for studies of community dynamics and stability, many 
communities show not only synchronous dynamics between some species pairs 
$x_{s_i,l}(t)$ and $x_{s_j,l}(t)$, but also compensatory dynamics between other pairs. 
Our *Ceratium* time series were almost entirely synchronous, so we could not study 
the importance of tail association for compensatory dynamics.
Next research steps should include the study of tail association between compensatory species
within a local community. Furthermore, *Ceratium* are only part of the 
phytoplankton community in UK seas. It may be advantageous for future work to use 
data characterizing an entire competitive community. For instance, the data of @hallett2014 
constitute annual abundances of all species of plant in an area. In that dataset,
some species pairs show synchronous and some show compensatory dynamics.

<!--P: 7) para on how to deal with negatively associated variables in future work-->
Studying asymmetry of tail association for negatively correlated species density time 
series will require slightly modified methods. The only negative association between aphid or *Ceratium* time series
that occurred in our system was not analyzed. Negative associations between species time series and 
the environmental covariates we considered were handled
statistically by considering the positive association between the species time
series and a "reversed" covariate; this corresponds to a positive 
association with a reconceptualized covariate, e.g., a "coldness" index.
But that approach would make no sense for negatively 
associated time series of two aphid or *Ceratium* time series:
there is no canonical choice of which variable to reverse. Asymmetry of tail
association could still be considered, however, for negatively associated variables, $u,v$, 
in an unsigned approach, via the index $|\cor_{0,b}(u,1-v)-\cor_{1-b,1}(u,1-v)|$. Because $|\cor_{0,b}(u,1-v)-\cor_{1-b,1}(u,1-v)|=|\cor_{0,b}(1-u,v)-\cor_{1-b,1}(1-u,v)|$,
no choice need be made on which variable to "reverse."
A large value of this index indicates that tail association between $u$ and $v$ is asymmetric,
though it does not provide information on whether association is stronger between
the left tail of $u$ and the right tail of $v$ or between the right tail of $u$
and the left tail of $v$. 


<!--\begin{centering}
\textbf{ \hspace{1.2 cm} (A) \hspace{7 cm} (B)} \\
\includegraphics[width=8 cm]{./Results/plankton_results/npa_stat_results/Corstat_scatter_LmU_values.pdf} 
\includegraphics[width=8 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_scatter_LmU_values.pdf}
\captionsetup{parbox=none}
\captionof{figure}[short caption]{Tail association with environmental covariates was positively related to tail association
between species for aphid and plankton time series. Panels show total community tail association, $A_C^n$, 
plotted against total community-driver tail association, $A_D^n$ (\nameref{Methods}), across
locations, $n$, for \emph{Ceratium} density (A) and aphid first-flight (B) data. Pearson correlations
and associated $p$-values for each panel are in the headers. Points are labeled with location
numbers (see Figs \ref{SM-fig_plankton_map} and \ref{SM-fig_aphid_map}).\label{fig_plankton_scatter}}
\end{centering}-->


<!--what extra info do you get by monitoring tail dep. for plankton?-->
Measures of tail association may also reveal useful information about 
freshwater plankton ecosystems and harmful algal blooms, in addition to information about 
marine harmful algal blooms (discussed above). 
Because blooms are extreme phenomena involving multiple species, 
monitoring the associations of phytoplankton species with each other and 
their associations with 
temperature and nutrient data in the
extremes (this is tail association)
could help us to better understand harmful blooms. Considering tail association
may even produce improvements in statistics that have been 
developed to serve as early warning signals of impending major changes<!--DAN CHANGED:--> 
(so-called "tipping points")<!--END CHANGES--> in plankton
communities and the lakes they inhabit [@carpenter2011early; @butitta2017spatial], 
since some established early warning statistics
make use of skewness of population distributions [@guttal2008changing]. Tail association between
phytoplankton species is related to skewness of the total phytoplankton biomass
time series,<!--DAN CHANGED: as described above--> as described in an earlier Discussion
paragraph.<!--END CHANGES-->

<!--\begin{centering}
\hspace{1 cm}
\includegraphics[width=16 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_LmU_avg_sp_temp_multipanel_plot_legend.pdf} 
\includegraphics[width=16 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_LmU_avg_sp_temp_multipanel_plot.pdf} 
\captionsetup{parbox=none}
\captionof{figure}[short caption]{For 8 out of 10 sites, the Pearson correlation ($P$) between the species-community tail association, 
$\alpha_C^n(i)$, and the species-driver tail association, $\alpha_D^n(i)$, across
$i = 1, 2, \dots, 20$, was significantly positive (p $< 0.05$, one tailed test). This supports the
hypothesis that tail association between species may be inherited from joint tail
association of both species on a common environmental driver. See Table \ref{tab_plankton_aphid_info} 
for species IDs.\label{fig_aphid_multipanel}}
\end{centering}-->

<!--P: 6) although our aphid results were sufficient to demo that tail dependence can be an important factor in the phenology of co-located species, and therefore *may* be an important factor for understanding shifting phenologies and their consequences, fuller application to match-mismatch will require future work using species which interact.
  a) our aphid species have largely different hosts, so don't really interact
  b) think about further commentary
  c) When you present items 5 and 6 here, you will have to cause the reader to think back to your statement in the Intro that our results are just a first step toward the goals which were outlined there (i.e., copulas in community dynamics and phenology studies)-->
Although our aphid results were sufficient to demonstrate that tail association can be an important 
factor in the phenology of co-located species, it will be necessary in future work to apply
tail association ideas to different datasets to assess whether these ideas
can improve our understanding of the consequences of changing phenology for trophic phenological matching. 
The aphid species we studied have different host plants, so they do not directly interact. Shifts and
fluctuations in the phenology of one species probably do not directly influence 
other species in our dataset. Future research should apply tail association to 
time series of phenologies of interacting species, such as the data on tree budburst dates, 
caterpillar abundance, and breeding phenology of great tits (*Parsus major*) and 
blue tits (*P. caeruleus*) collected in Wytham Woods, Oxford, and other locations in Europe
[e.g., @Nilsson2006; @Savill2011; @Cole2017], or the extensive data collection 
from multiple trophic levels of @Thackery2010.

<!--DAN CHANGED-->
One final idea for potentially valuable future research has to do with 
combining our approach, based
on tail associations, with other recent approaches which emphasize other statistical 
aspects of the synchrony. For instance, research has now showed that synchrony and compensatory
dynamics in communities have "timescale structure", i.e., the dynamics of two
or more species can be synchronous on some timescales of analysis and compensatory on 
others [@Keitt2006; @Vasseur2014; @Zhao2020]. How timescale specificity and tail associations
interact is unknown, but potentially interesting. Multivariate copula 
approaches [@joe2014_dependence; @Czado2019] may be useful
in this and other future extensions of the work we have begun here.
<!--END MODIFICATIONS-->

<!--P: Link back to copulas, and conclude-->
Our results extend the results of @Ghosh_copula. Those authors argued that considering 
copulas and tail associations can provide insights across the field of ecology.
But @Ghosh_copula did not consider co-located species, a context important for
community ecology which we considered here. 

# Acknowledgments

We thank the many contributors to the large datasets we used; D. Stevens and P. Verrier for 
data extraction; and Joel E. Cohen, Lauren Hallett, and Jonathan Walter for helpful 
suggestions. We thank James Bell of the Rothamsted Insect Survey (RIS). The RIS, a UK Capability,
is funded by the Biotechnology and Biological Sciences Research Council under the Core
Capability Grant BBS/E/C/000J0200. SG, LWS and DCR were partly funded by US 
National Science Foundation grants 1714195 and 1442595 and 
the James S McDonnell Foundation. 

# Author contributions

SG, LWS and DCR designed the study and analyzed the data, SG and DCR wrote the manuscript, 
PCR provided the data and assisted with interpretation of results, and all authors edited the manuscript
and gave final approval for publication.

# Data availability statement
Plankton data are available from the Dryad Digital Repository https://doi.org/10.5061/dryad.rq3jc84 [@Sheppard2019data]. Full code for the analyses can be downloaded from Dryad Digital Repository https://datadryad.org/stash/share/eC20ojo_e9UTmXAoX1oq3kIfT3aY0iptcUqAk8MHrAA.  

# Conflict of interest statement
The authors declare no conflict of interest.


<!-- Figure with float use, keeps fig at the end of ms-->
\begin{figure}[!h] 
\begin{center}
\includegraphics[width=10cm]{./Results/pedagog_fig.pdf}
\caption{Pedagogical figure for introducing tail association and partial Spearman correlation.
(A, B) Two pairs of variables that have identical Pearson (P) correlation, and also identical Spearman (S) 
correlation,
but that differ markedly in the nature of the association. Panel A shows stronger left- than right-tail association
and panel B shows the reverse. (C, D) Normalized rank plots 
(see \nameref{Introduction}) for panels A and B, respectively. 
(E, F) Graphics supporting the definitions of partial Spearman correlation and our statistic measuring 
asymmetry of tail association (see \nameref{M&M}). This figure is similar in some respects to Figs 1 and 7 of Ghosh \emph{et al} (2020\textit{b}).}\label{pedagogfig}
\end{center}
\end{figure}
<!--***DAN: Shyamolina, the citation here is not an autolink, I think we cannot
make it be an autolink because this is a latex figure, but if you know how please 
make it happen. If you don't know how, we will have to live with it as is, but that
means if the citation ever changes, we will have to change it manually here. So 
please leave this comment so we can remember to do that when the time comes.-->

<!--plankton plots : Q1 : Figure with float use, keeps fig at the end of ms-->
\begin{figure}[!ht]
\begin{center}
\textbf{ \hspace{1 cm} (A) \hspace{7 cm} (B)} \\
\includegraphics[width=7.5 cm]{./Results/plankton_results/npa_stat_results/loc12/loc12_Corl-Coru.pdf}
\includegraphics[width=7.5 cm]{./Results/plankton_results/npa_stat_results/loc26/loc26_Corl-Coru.pdf}\\
\textbf{ (C)} \\
\includegraphics[width=6.5 cm]{./Results/plankton_results/npa_stat_results/Corstat_LmU_values_on_map_sp_only.pdf} 
\caption{Either right- or left-tail association between population density time series of 
\emph{Ceratium} species could dominate, depending on the sampling location. (A, B) The community 
tail association matrix, $C^n$, and the community-driver tail association matrix, $D^n$ (\nameref{Methods}), 
horizontally concatenated, for example locations $n=12$ (A) and $n=26$ (B). 
See Table \ref{tab_plankton_aphid_info} for species names. 
All the non-NA values in $C^n$ were positive (red) for location $12$ (A), indicating left-tail
association dominated in that location; but values were largely negative (blue) for location
$26$ (B), indicating right-tail association dominated there. Matrix entries which were NA 
because time series were independent are displayed in yellow. The counts $N^n_L$ and $N^n_R$
(see \nameref{Methods}) also reflect the distinct tail association characteristics of the two
locations. \emph{C. fin.} = \emph{C. finmarchicus}; Temp. = temperature.
Green dots in $D^n$
represent variables which were originally negatively associated, so the negative
of the environmental covariate was used for calculating tail association.
See Fig. \ref{SM-fig_CorlmCoru_plankton_all_loc} 
for analogous figures for the other sampling locations. 
(C) The summary statistics $F_{C,L}$ and $F_{C,R}$ (see \nameref{Methods}) 
for each site show that association between \emph{Ceratium} species was either 
substantially dominated by the left or right tails of \emph{Ceratium} distributions, with the
exceptions of a few locations for which tail association was closer to symmetric. Site codes
are colored red or blue depending on which of $F_{C,L}$ or $F_{C,R}$ had higher magnitude. Values are
not plotted for site 3 because the hypothesis could not be rejected for that site
that dynamics of distinct \emph{Ceratium} species were independent. 
\label{fig_CorlmCoru_plankton_map_loc12_26}}
\end{center}
\end{figure}


<!--aphid plots : Q1 : Figure with float use, keeps fig at the end of ms-->
\begin{figure}[!ht]
\begin{center}
\textbf{ (A) \hspace{7 cm} (B)} \\
\includegraphics[width=7.5 cm]{./Results/aphid_results/ff_npa_stat_results/loc2/loc2_Corl-Coru.pdf}
\includegraphics[width=7.5 cm]{./Results/aphid_results/ff_npa_stat_results/loc5/loc5_Corl-Coru.pdf} \\
\textbf{ (C)} \\
\includegraphics[width=7.5 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_LmU_values_on_map_sp_only.pdf}
\caption{Either right-tail association between first-flight time series of 
aphid species could dominate, or left-tail association could be more common, 
depending on the sampling location. (A, B) The community tail association matrix, $C^n$, 
and the community-driver tail association matrix, $D^n$ (\nameref{Methods}), horizontally 
concatenated, for example locations $n=2$ (A) and $n=5$ (B).
See Table \ref{tab_plankton_aphid_info} for species names.
A slight majority of non-NA values in $C^n$ were positive (red) for location $2$ (A; see the
$N_L^n$ and $N_R^n$ counts displayed), indicating left-tail
association was slightly more common than right-tail association in that location. But values were largely 
negative (blue) for location
$5$ (B), indicating right-tail association dominated there. Matrix entries which were NA 
because time series were independent are displayed in yellow. Temp. = temperature.
Green dots in $D^n$
represent variables which were originally negatively associated, so the negative
of winter temperature was used for calculating tail association (\nameref{Methods}); this happened in all cases
for which temperature and first flight were significantly associated.
See Fig. \ref{SM-fig_CorlmCoru_ff_all_loc} 
for analogous figures for the other sampling locations. 
(C) The summary statistics $F_{C,L}$ and $F_{C,R}$ (see \nameref{Methods}) 
for each site show that association was either 
dominated by the right tails, or, for a few locations, 
showed slightly more left-tail association. Site codes
are colored red or blue depending on which of $F_{C,L}$ or $F_{C,R}$ had higher magnitude.
\label{fig_CorlmCoru_ff_map_loc2_5}}
\end{center}
\end{figure}

<!--plankton and aphid scatter plots: Q2 : float, figs at the end of ms-->
\begin{figure}[!ht]
\begin{center}
\textbf{ \hspace{1.2 cm} (A) \hspace{7 cm} (B)} \\
\includegraphics[width=8 cm]{./Results/plankton_results/npa_stat_results/Corstat_scatter_LmU_values.pdf} 
\includegraphics[width=8 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_scatter_LmU_values.pdf}
\caption{Tail association with environmental covariates was positively related to tail association
between species for aphid and plankton time series. Panels show total community tail association, $A_C^n$, 
plotted against total community-driver tail association, $A_D^n$ (\nameref{Methods}), across
locations, $n$, for \emph{Ceratium} density (A) and aphid first-flight (B) data. Pearson correlations
and associated $p$-values for each panel are in the headers. Points are labeled with location
numbers (see Figs \ref{SM-fig_plankton_map} and \ref{SM-fig_aphid_map}).\label{fig_plankton_scatter}}
\end{center}
\end{figure}

<!--aphid multipanel plot : Q2 : float, figs at the end of ms-->
\begin{figure}[!ht]
\hspace{1 cm}
\includegraphics[width=16 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_LmU_avg_sp_temp_multipanel_plot_legend.pdf} 
\includegraphics[width=16 cm]{./Results/aphid_results/ff_npa_stat_results/Corstat_LmU_avg_sp_temp_multipanel_plot.pdf} 
\caption{For 8 out of 10 sites, the Pearson correlation ($P$) between the species-community tail association, 
$\alpha_C^n(i)$, and the species-driver tail association, $\alpha_D^n(i)$, across
$i = 1, 2, \dots, 20$, was significantly positive (p $< 0.05$, one tailed test). This supports the
hypothesis that tail association between species may be inherited from joint tail
association of both species on a common environmental driver. See Table \ref{tab_plankton_aphid_info} 
for species IDs.\label{fig_aphid_multipanel}}
\end{figure}


# Literature cited

\setlength{\parindent}{-0.2in}
\setlength{\leftskip}{0.2in}
\setlength{\parskip}{1pt}
\noindent