-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
145 lines (101 loc) · 6.37 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ciccr
<!-- badges: start -->
[![R-CMD-check](https://github.com/sokbae/ciccr/workflows/R-CMD-check/badge.svg)](https://github.com/sokbae/ciccr/actions)
[![](https://cranlogs.r-pkg.org/badges/ciccr)]( https://CRAN.R-project.org/package=ciccr)
[![codecov](https://codecov.io/gh/sokbae/ciccr/branch/master/graph/badge.svg?token=WZ348KLQGA)](https://app.codecov.io/gh/sokbae/ciccr)
<!-- badges: end -->
The goal of ciccr is to implement methods for carrying out causal inference in case-control and case-population studies ([Jun and Lee, 2023](https://arxiv.org/abs/2004.08318)).
## Installation
You can install the released version of ciccr from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("ciccr")
```
Alternatively, you can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools") # uncomment this line if devtools is not installed yet
devtools::install_github("sokbae/ciccr")
```
## Example
We first call the ciccr package.
```{r example}
library(ciccr)
```
To illustrate the usefulness of the package, we use the dataset ACS_CC that is included the package. This dataset is an extract from American Community Survey (ACS) 2018, restricted to white males residing in California with at least a bachelor's degree. The ACS is an ongoing annual survey by the US Census Bureau that provides key information about US population. We use the following variables:
```{r}
y = ACS_CC$topincome
t = ACS_CC$baplus
x = ACS_CC$age
```
- The binary outcome `y` is defined to be one if a respondent's annual total pre-tax wage and salary income is top-coded. In the sample extract, the top-coded income bracket has median income \$565,000 and the next highest income that is not top-coded is \$327,000.
- The binary treatment `t` is defined to be one if a respondent has a master's degree, a professional degree, or a doctoral degree.
- The covariate `x` is age in years and is restricted to be between 25 and 70.
The original ACS survey is not from case-control sampling but we construct a case-control sample by the following procedure:
1. The case sample is composed of 921 individuals whose income is top-coded.
2. The control sample of equal size is randomly drawn without replacement from the pool of individuals whose income is not top-coded.
We now construct cubic b-spline terms with three inner knots using the age variable.
```{r}
x = splines::bs(x, df = 6)
```
Using the retrospective sieve logistic regression model, we estimate the average of the log odds ratio conditional on the case sample by
```{r}
results_case = avg_RR_logit(y, t, x, 'case')
results_case$est
results_case$se
```
Here, option `'case'` refers to conditioning on the event that income is top-coded.
Similarly, we estimate the average of the log odds ratio conditional on the control sample by
```{r}
results_control = avg_RR_logit(y, t, x, 'control')
results_control$est
results_control$se
```
Here, option `'control'` refers to conditioning on the event that income is not top-coded.
We carry out causal inference by
```{r}
results = cicc_RR(y, t, x, 'cc', 0.95)
```
Here, 'cc' refers to case-control sampling and 0.95 refers to the level of the uniform confidence band (0.95 is the default choice).
```{r}
est = results$est
print(est)
se = results$se
print(se)
ci = results$ci
print(ci)
```
The S3 object `results` contains estimates `est`, standard errors `se`, and one-sided confidence bands `ci` at `p = 0` and `p = 1`. The point estimates and confidence interval estimates of the `cicc_RR` command are based on the scale of log relative risk. It is more conventional to look at the results in terms of relative risk. To do so, we plot the results in the following way:
```{r}
cicc_plot(results)
```
To interpret the results, we assume both marginal treatment response (MTR) and marginal treatment selection (MTS). In this setting, MTR means that everyone will not earn less by obtaining a degree higher than bachelor's degree; MTS indicates that those who selected into higher education have higher potential to earn top incomes. Based on the MTR and MTS assumptions, we can conclude that the treatment effect lies in between 1 and the upper end point of the one-sided confidence interval with high probability. Thus, the estimates in the graph above suggest that the effect of obtaining a degree higher than bachelor's degree is anywhere between 1 and the upper end points of the uniform confidence band. This roughly implies that the chance of earning top incomes may increase up to by a factor as large as the upper end points of the uniform confidence band, but allowing for possibility of no positive effect at all. The results are shown over the range of the unknown true case probability. See [Jun and Lee, 2020](https://arxiv.org/abs/2004.08318) for more detailed explanations regarding how to interpret the estimation results.
## Comparison with Logistic Regression
We can compare these results with estimates obtained from logistic regression.
```{r}
logit = stats::glm(y~t+x, family=stats::binomial("logit"))
est_logit = stats::coef(logit)
ci_logit = stats::confint(logit, level = 0.9)
# point estimate
exp(est_logit)
# confidence interval
exp(ci_logit)
```
Here, the relevant coefficient is 2.06 (`t`) and its two-sided 90% confidence interval is [1.75, 2.43]. If we assume strong ignorability, the treatment effect is about 2 and its two-sided confidence interval is between [1.75, 2.43]. However, it is unlikely that the higher-degrees-than-BA treatment satisfies the strong ignorability condition.
## What else the package can do
See the vignette for other examples that include inference on attributable risk and how to work with case-population samples.
# References
Jun, S.J. and Lee, S. (2023). Causal Inference under Outcome-Based Sampling with Monotonicity Assumptions.
https://arxiv.org/abs/2004.08318, accepted for publication in Journal of Business & Economic Statistics.
Manski, C.F. (1997). Monotone Treatment Response. Econometrica, 65(6), 1311-1334.
Manski, C.F. and Pepper, J.V. (2000). Monotone Instrumental Variables: With an Application to the Returns to Schooling. Econometrica, 68(4), 997-1010.