Skip to content

Commit

Permalink
first version of generative model and data
Browse files Browse the repository at this point in the history
  • Loading branch information
nimar committed Mar 3, 2015
1 parent a2e6446 commit 5cd64b9
Show file tree
Hide file tree
Showing 5 changed files with 733 additions and 33 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,10 @@
*.bbl
*.bib
*.pyc
*.aux
*.blg
*.out
*.fdb_latexmk
*.fls
*.log

12 changes: 6 additions & 6 deletions README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ test.solution -- the sample solution on the test data
Overview
========

The model is completely described in ```description.odt``` and this should be
The model is completely described in ```description.tex``` and this should be
translated in the Probabilistic Programming Language of your
choosing. The unlabeled data in ```test.blind``` (and optionally the labeled
data in ```training.data```) comprises the observations to the model. The
Expand All @@ -39,11 +39,11 @@ script ```test.data```. One can also compare the results versus the
baseline in ```test.solution```.

The files ```generate.py``` and ```solve.py``` have only been provided
for convenience they shouldn't normally be used. However, if you want to
check the performance of your model on more than just the provided data
you may generate more as needed. The sample solver is based loosely on
the published greedy algorithm, and may be used as a competitive
baseline.
for convenience. These files shouldn't normally be used. However, if you
want to check the performance of your model on more than just the
provided data you may generate more as needed. The sample solver is
based loosely on the published greedy algorithm, and may be used as a
competitive baseline.

Authors
=======
Expand Down
103 changes: 76 additions & 27 deletions description.tex
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ \subsection{Events}
e^i_m & \sim \text{Exponential}(\ \cdot\ ;\ \lambda_m, location=2)
\end{align*}

\subsection{True Arrivals}
\subsection{True Detections}

The seismic energy from an event travels radially outwards in distinct
phases, each of which may or may not be detected by a station depending
Expand All @@ -103,14 +103,14 @@ \subsection{True Arrivals}
the surface, refer to \citet{iaspei2011} for a full list. In this work we only
consider the first arriving phase, the {\em P} phase.

We define a true arrival $\Lambda^{ik}$ as the moment of first arrival
We define a true detection $\Lambda^{ik}$ as the moment of first arrival
of the energy from an event $i$ at a seismic station $k$. Various signal
processing algorithms are applied on the raw waveforms to detect an
arrival, and then station processing algorithms collect various
attributes of the
arrival such as time, azimuth, slowness, and amplitude referenced by
detection such as time, azimuth, slowness, and amplitude referenced by
$\Lambda^{ik}_t$, $\Lambda^{ik}_z$, $\Lambda^{ik}_s$, and
$\Lambda^{ik}_a$ respectively. Time is quite obviously the arrival time
$\Lambda^{ik}_a$ respectively. Time is quite obviously the detection time
of the energy, azimuth refers to the geographical direction of the
incoming seismic waves, and amplitude is the height of the initial
peak. Slowness is a more peculiar term, it refers to the inverse of the
Expand Down Expand Up @@ -139,17 +139,17 @@ \subsubsection{Detection Probability}
\[\text{logistic}
(\mu_{d0}^k + \mu_{d1}^k e^i_m + \mu_{d2}^k \Delta_{ik}) \ .\]

\subsubsection{Arrival Time}
\subsubsection{Detection Time}

The theoretical travel time of a seismic wave at a distance of $\delta$ is
given by the travel time function,
\[ I_T(\delta) = -.023 * \delta^2 + 10.7 * \delta\, + 5.\]

The arrival time is a Laplacian centered near the theoretical arrival time,
The detection time is a Laplacian centered near the theoretical detection time,
\[ \Lambda_t^{ik} \sim \text{Laplacian}(\ \cdot \ , e^i_t + I_T(\Delta_{ik}) +
\mu_t^k \ , \ \theta_t^k) . \]

\subsection{Arrival Azimuth}
\subsection{Detection Azimuth}

The azimuth of location $b=(lon_2, lat_2)$ as observed from location
$a=(lon_1, lat_1)$ is given by the function $G_z(a, b) \in [0,
Expand All @@ -176,13 +176,13 @@ \subsection{Arrival Azimuth}
\psi'(z_1, z_2) &= (z_2 - z_1) + 360\ \text{mod}\ 360.
\end{align*}

The difference of the arrival azimuth from the theoretical
The difference of the detection azimuth from the theoretical
station-to-event azimuth is distributed as a Laplacian,

\[\psi(G_z(s^k_l, e^i_l), \Lambda_z^{ik}) \sim \text{Laplacian}(\ \cdot
\ ,\ \mu_z^k, \ \theta_z^k \ ) . \]

\subsection{Arrival Slowness}
\subsection{Detection Slowness}
The slowness at distance $\delta$ given by $I_S(\delta)$ is simply the
derivative of the travel time. In other words, slowness measures the
the time that the seismic wave takes to travel between two points very
Expand All @@ -191,34 +191,34 @@ \subsection{Arrival Slowness}
\[ I_S(\delta) = -.046 * \delta + 10.7\, .\]
Note that $I_S$ is always positive since $\delta \in [0, 180]$.

The arrival slowness is a Laplacian centered near the theoretical
The detection slowness is a Laplacian centered near the theoretical
slowness,

\[ \Lambda_s^{ik} \sim \text{Laplacian}(\ \cdot \ , I_S(\Delta_{ik}) +
\mu_s^k \ , \ \theta_s^k) . \]

\subsection{Arrival Amplitude}
\subsection{Detection Amplitude}

The log of the arrival amplitude has a Gaussian distribution with a mean
The log of the detection amplitude has a Gaussian distribution with a mean
determined by the event magnitude and travel time.

\[\log(\Lambda_a^{ik}) \sim \text{Gaussian}(\ \cdot \ ,\ \mu^k_{a0}
+ \mu^k_{a1} e^i_m + \mu^k_{a2} I_T(\Delta_{ik})\ ,\ \sigma_a^k \ ) . \]


\subsection{False Arrivals}
\subsection{False Detections}

Each station $k$ has its own time-homogenous Poisson process generating
false arrivals with rate $\lambda^k_f$. In other words, if $\xi^k$ is
the set of false arrivals in a time interval $T$,
false detections with rate $\lambda^k_f$. In other words, if $\xi^k$ is
the set of false detections in a time interval $T$,
\begin{align*}
|\xi^k| & \sim \text{Poisson}(\ \cdot \ , \ \lambda^k_f T),
\intertext{and the arrival time $\xi^k_t$ is uniformly distributed,}
\intertext{and the detection time $\xi^k_t$ is uniformly distributed,}
\xi^k_t & \sim \text{Uniform}(\ \cdot \ , \ 0, \ T) .
\intertext{The azimuth and slowness are also uniformly distributed
between their possible values, as follows:}
\xi^k_z & \sim \text{Uniform}(\ \cdot \ , \ 0, \ 360), \\
\xi^k_s & \sim \text{Uniform}(\ \cdot \ , \ I_S(0), \ I_S(180)) .
\xi^k_s & \sim \text{Uniform}(\ \cdot \ , \ I_S(180), \ I_S(0)) .
\intertext{However, the log-amplitude is distributed as a Gaussian,}
\log{\xi^k_a} & \sim \text{Gaussian}(\ \cdot \ , \ \mu^k_f, \sigma^k_f) .
\end{align*}
Expand All @@ -230,20 +230,58 @@ \section{Hyperpriors and Constants}
R & = 6371 \, km \\
\lambda_e & \sim \text{Gamma}(\ \cdot \ , \ 6.0, \frac{1}{4 \pi R^2 T}) \\
\lambda_m & = \log(10) \\
\mu^k_{d0} & \sim \text{Gaussian}(\ \cdot \ , \ -16.6,\ 4.9) \\
\mu^k_{d1} & \sim \text{Gaussian}(\ \cdot \ , \ 3.9,\ 0.89) \\
\mu^k_{d2} & \sim \text{Gaussian}(\ \cdot \ , \ -0.052,\ 0.018) \\
\left[
\begin{array}{l}
\mu^k_{d0} \\
\mu^k_{d1} \\
\mu^k_{d2}
\end{array} \right] & = \text{MVarGaussian} \left( \ \cdot \ ; \
\left[
\begin{array}{l}
-10.4 \\
3.26 \\
-.0499
\end{array}
\right]
,
\left[
\begin{array}{lll}
13.43 & -2.36 & -.0122 \\
-2.36 & .452 & .000112 \\
-.0122 & .000112 & .000125 \\
\end{array}
\right]
\ \right) \\
\mu_t^k & = 0 \\
\theta_t^k & \sim \text{InvGamma}(\ \cdot \ , \ 96.9, \ 102.0) \\
\theta_t^k & \sim \text{InvGamma}(\ \cdot \ , \ 120, \ 118) \\
\mu_z^k & = 0 \\
\theta_z^k & \sim \text{InvGamma}(\ \cdot \ , \ 7.0, \ 80.0) \\
\theta_z^k & \sim \text{InvGamma}(\ \cdot \ , \ 5.2, \ 44) \\
\mu_s^k & = 0 \\
\theta_s^k & \sim \text{InvGamma}(\ \cdot \ , \ 7.0, \ 9.0) \\
\mu^k_{a0} & \sim \text{Gaussian}(\ \cdot \ , \ -7.3, \ 1.1) \\
\mu^k_{a1} & \sim \text{Gaussian}(\ \cdot \ , \ 2, \ 0.21) \\
\mu^k_{a2} & \sim \text{Gaussian}(\ \cdot \ , \ -0.002, \ 0.00055) \\
\theta_s^k & \sim \text{InvGamma}(\ \cdot \ , \ 6.7, \ 7.5) \\
\left[
\begin{array}{l}
\mu^k_{a0} \\
\mu^k_{a1} \\
\mu^k_{a2}
\end{array} \right] & = \text{MVarGaussian} \left( \ \cdot \ ; \
\left[
\begin{array}{l}
-7.3 \\
2.03 \\
-.00196
\end{array}
\right]
,
\left[
\begin{array}{lll}
1.23 & -.227 & -.000175 \\
-.227 & .0461 & .0000245 \\
-.000175 & .0000245 & .000000302 \\
\end{array}
\right]
\ \right) \\
(\sigma^k_a)^2 & \sim \text{InvGamma}(\ \cdot \ , \ 21.1, \ 12.6) \\
\lambda^k_f & \sim \text{Gamma}(\ \cdot \ , \ 0.782, \ 0.0018) \\
\lambda^k_f & \sim \text{Gamma}(\ \cdot \ , \ 2.1, \ 0.0013) \\
\mu^k_f & \sim \text{Gaussian}(\ \cdot \ , \ -0.6, \ 0.6) \\
(\sigma^k_f)^2 & \sim \text{InvGamma}(\ \cdot \ , \ 10.34, \ 9.49) \\
\end{align*}
Expand Down Expand Up @@ -379,6 +417,17 @@ \subsection{Inverse-Gamma}
\, x^{-\alpha-1} e^{-\frac{\theta}{x}} \, , \] defined over all $x \in
\mathbb{R}_{>0}$.

\subsection{Multi-variate Gaussian}

The Muti-variate Gaussian distribution with mean vector
$\mu \in \mathbb{R}^k$, and
covariance matrix $\Sigma \in \mathbb{R}^{k \times k}$ has probability density

\[\text{MVarGaussian}(x; \mu, \Sigma) = \frac{1}{\sqrt{(2 \pi)^k | \Sigma
|}}
e^{\left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right)}
\]
defined over all $x \in \mathbb{R}^k$.
\end{appendices}

\bibliographystyle{chicagoa}
Expand Down
Loading

0 comments on commit 5cd64b9

Please sign in to comment.