-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreport_group41.Rmd
1094 lines (843 loc) · 67.6 KB
/
report_group41.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Refugee Resettlements in the USA"
author: "Ji In Choi (jic2124), Jung Ah Shin (js5569), Olivia Wang (yw3324), Tiffany Zhu (tz2196)"
date: "December 12, 2019"
output:
html_document:
code_folding: hide
toc: true
toc_float: true
theme: journal
number_sections: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = T)
library(readxl)
library(tidyverse)
library(plotly)
library(knitr)
library(scales)
library(data.table)
library(reshape)
library(ggmosaic)
```
[GitHub](https://github.com/tlzhu19/edav-final-project-41) | [Website](http://rpubs.com/tlzhu/refugees_in_us)
# Introduction
## Motivation
The world’s forcibly displaced population hit its record high in 2017. Globally, at the end of 2017, the global refugee population increased by 2.9 million. By the end of the year, 68.5 million individuals were forcibly displaced worldwide as a result of persecution conflict, or generalized violence (https://www.unhcr.org/5b27be547.pdf). Despite the increase in demand for refugee admission and assistance, the United States specifically has taken a drastic turn away from supporting refugees. The number of refugees admitted to the United States has dropped from a recent high of 84,994 in FY 2016 to 22,874 in FY 2018 - the lowest in 40 years since 1977. The current ceiling for refugee admission has also dropped to 45,000, the lowest in the history of the current U.S. resettlement program. Coming at a time when global numbers of refugees have reached record highs, the ratio of refugees admitted to the United States to the number of refugees worldwide has never been lower. For the first time, the U.S. policy towards refugee admission is moving decisively against the trend of the total number of refugees worldwide (https://www.cgdev.org/blog/reflecting-world-refugee-day-trends-and-consequences-us-refugee-policy).
The recent years thus mark a significant shift in refugee resettlement in the U.S., as a result, this report will be examining the refugee admission trend in the U.S. over the past 10 years (2009-2018).
## Background
According to the UNHCR, refugees are defined as those who have been forced to leave their country due to violence, war, or persecution based on their race, religion, nationality, political opinion or particular social group.
Process of refugee resettlement (https://www.unrefugees.org/refugee-facts/usa/
):
1. The process of refugee resettlement to the U.S. is a lengthy and thorough process that takes approximately two years and involves numerous U.S. governmental agencies.
2. Refugees do not choose the country in which they would like to live. UNHCR, the UN Refugee Agency, identifies the most vulnerable refugees for resettlement and then makes recommendations to select countries.
3. Once a refugee is recommended to the U.S. for settlement, the U.S. government conducts a thorough vetting of each applicant. The process of which takes between 12 to 24 months and includes:
* Screening by 8 federal agencies including the State Department, Department of Homeland Security and the FBI
* Six security database checks and biometric security checks screened against U.S. federal databases
* Medical screening
* Three in-person interviews with Department of Homeland Security Officers
Under the Refugee Act of 1980, the president sets an annual ceiling for refugee admissions in consultation with Congress. The annual ceiling has varied over the years, from a high of 231,700 in FY 1980 to a prior low of 67,000 in FY 1986. Amid a large exodus of Syrinas from their war-torn country, President Obama raised the refugee ceiling for FY 2016 to 110,000. After taking office, Trump reduced the FY 2017 cap to 50,000, and for FY 2018 set one at a historic low of 45,000. Far fewer refugees, 22,874, were actually resettled in FY 2018.
## Questions
There are currently 25.9 million refugees in the world, indicating the dramatic growth in refugees over the past decade. This led us to question what the refugee resettlement trend has been for the past decade, and delve deeper than just the changes in the numbers of refugees. In order to better visualize the trend of refugee resettlement to the U.S., this report will be specifically focusing on the top 5 countries with the highest refugee resettlement population in the U.S. (Burma, Iraq, Somalia, Bhutan, Democratic Republic of the Congo), which accounted for 60.9% of the total refugee arrivals in the U.S. (https://data.newamericaneconomy.org/en/refugee-resettlement-us/).
We are interested in answering the following questions to gain a better understanding of the refugee resettlements in the United States:
1. What insights can we gain from temporal exploratory data analysis of refugee settlement patterns in the U.S. from 2009 to 2018?
* Has there been increases/decreases in refugee resettlements population in the 10 years?
* Is there a correlation between change in refugee resettlement population and major political events that have happened over the 10 years?
2. What insights can we gain from geographical visualization of refugee settlement patterns in the U.S. over the 10 years? Why might some states have larger refugee settlements than others?
3. What changes in demographic patterns (i.e., religion, gender, age, etc.) within the refugee population over the 10 years can we visualize?
* Given the number of religions included in the refugee population, we will be focusing on the top 4 religions in the world (Christianity, Islam, Hinduism, Buddhism). In Iraq, they separate the Muslim population into three categories: Muslim, Muslim Shiite, and Muslim Suni. For the purposes of this analysis, we will combine them as one (https://thecountriesof.com/top-5-largest-religions-in-the-world/).
# Data Sources
## Refugee Processing Center Data
We first collected data from the RPC (Refugee Processing Center), which is operated by the U.S. Department of State, Bureau of Population, Refugees, and Migration (PRM). It coordinates the processing and tracking of the movement of refugees from various countries around the world to the U.S. for resettlement under the U.S. Refugee Admissions Program.
It provides refugee arrival information:
1. by state and nationality
2. by demographic profile
3. by destination and nationality
4. by nationality and religion.
We focused on 1) and 2). Note that the data in 3) was included in 2).
### Total Refugee Resettlements by State
We initially wanted to download the number of refugee resettlements in each state in the U.S. per year. Since the RPC website does not allow faceting by year, we had an excel sheet for each year. The original total arrivals file for each year showed the table containing information on the State, Cases, and the number of individuals that arrived in each state. For this data, there were 50-56 records per year (from 2009 to 2018). Each record represented a state or U.S. territory, the extra six records were from American Samoa, District of Columbia, Guam, Puerto Rico, Unknown State, and Virgin Islands. For the purposes of our analysis, we wanted to focus on the States rather than other U.S. territories. Since we wanted a single dataframe for all of these data while excluding the non-state U.S. territories in the State column, we had to combine and clean the original data where more detailed steps are provided in Section 3: Data Transformation.
### Demographic Information of Resettled Refugees
For the same reasons stated above, we had an excel sheet detailing the refugee demographics information for the top-5 countries (Bhutan, Burma, DRC, Iraq, and Somalia) for each year. These countries had the most number of refugees resettling in the U.S. The individual original demographics file for each year contained five sheets: Age Group, Religion, Ethnicity, Education, and Native Language. The Age Group sheet contained the age groups, gender, cumulative total, and percentage of individuals belonging to that specific group. The Religion sheet contains records of the number and percentage of individuals belonging to a certain religion. Similarly, Ethnicity, Education, and Native Language sheets all contained information regarding the total number of individuals. Other than Age Group and Education, the other three sheets (Ethnicity, Native Language, and Religion) all varied among countries and years, thus we also had to further clean and combine the original dataset.
## Refugee Ceiling Data
We obtained 10 data points, representing the annual ceilings in the past decade, which were set by the U.S. president since the resettlement program was enacted with the Refugee Act of 1980. We sourced the data from the State Department Bureau of Population Refugees website.
# Data Transformation
## Cleaning the Data in R
We took the following steps to clean our raw data:
1. From the website, we were able to download '.xlsx' files. Raw files can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw).
* As we had separate files for the overall refugee resettlement data for each year from 2009 to 2018, we had to combine these files into one and add in the year from which the data was obtained from.
* We also had separate files for the demographics, each excel sheet corresponds with one country, in one year. Within each file, each sheet corresponds to one specific demographics information (e.g. Religion, Education, Native Language etc.).
2. Wrote two functions to clean Excel sheet for a given year so that they are in a format that is easily readable by R.
* `clean_arrival` to clean the Excel files for all refugee resettlements for each state.
* removed extra rows and moved columns such that there are no repeated columns and those columns that share a name are combined into one
* `clean_demographics` to clean the Excel files for demographic information for refugees from specific countries (namely Bhutan, Burma, DRC, Iraq, and Somalia).
* removed extra rows and moved columns such that there are no repeated columns and those columns that share a name are combined into one.
```{r}
# Used to clean the Arrivals data
clean_arrival <- function(file, sheet_name) {
dat <- read_excel(file, sheet = sheet_name)
dat1 <- subset(dat, select = c(...6, ...8,...9))
dat2 <-subset(dat, select = c(...15,...18,...19))
dat3 <-subset(dat, select = c(...23,...25,...27))
dat1<-dat1 %>% drop_na()
dat2<-dat2 %>% drop_na()
dat3<-dat3 %>% drop_na()
# set first row (State, Cases, Inds) as column name
colnames(dat1) <- as.character(unlist(dat1[1,]))
dat1 = dat1[-1, ]
colnames(dat2) <- as.character(unlist(dat2[1,]))
dat2 = dat2[-1, ]
colnames(dat3) <- as.character(unlist(dat3[1,]))
dat3 = dat3[-1, ]
# Grouping columns together
combined <- rbind(dat1,dat2)
combined <- rbind(combined,dat3)
# State,Cases,Inds
combined$Cases <-as.numeric(as.character(combined$Cases))
combined$Inds <-as.numeric(as.character(combined$Inds))
return (combined)
}
```
```{r}
# Used to clean the Demographics data
clean_demographics <- function(file, sheet_name) {
df <- read_excel(file, sheet = sheet_name)
if (sheet_name == 'Age Group') {
column_names = c('Department of State','...2', '...3', '...4')
}
else if (sheet_name == 'Religion') {
column_names = c('Department of State','...2', '...4', '...6')
}
else if (sheet_name == 'Ethnicity' | sheet_name == 'Education' | sheet_name == 'Native Language') {
column_names = c('Department of State','...2', '...3', '...5')
}
df2 <- subset(df, select = column_names)
df3 <- df2[complete.cases(df2), ]
df4 = df3[-nrow(df3), ]
# file_unpacked <- unlist(strsplit(file_replaced, split='.', fixed=TRUE))
file_unpacked <- unlist(strsplit(file, 'demographics/'))
file_unpacked2 <- unlist(strsplit(file_unpacked[2], '_Demographics'))
country_name = file_unpacked2[1]
colnames(df4) = c(sheet_name, 'Male', 'Female', 'Total')
df4$country <- rep(country_name, nrow(df4))
return(df4)
}
```
3. Wrote another function, `combine_files`, to combine each year's Excel file into one.
* For those files that contained overall refugee resettlement data, we generated a function that combined these files as they had common column names.
* For those files that contained demographics data, we generated a function that combined the sheets within each file that belonged to the same kind of demographics information. The way these sheets are named is consistant within each file, we simply referred to the sheet name when combining those within the same country and year.
* We also added in an extra column - `Year`, and parsed the information about which year the data is from from the file name (in the format: Arrivals_*year*).
```{r}
# Cleans and combines all the files in the directory given by path
combine_files <- function(path, data_cleaner, sheet_name, title) {
df = data.frame()
files <- list.files(path=path, pattern="*.xlsx", full.names=TRUE, recursive=FALSE)
for (file in files){
cleaned_data = data_cleaner(file, sheet_name)
year_extract <- unlist(strsplit(file, title))
cleaned_data$Year <- as.numeric(gsub(".xlsx", "", year_extract[2]))
df <- rbind(df, cleaned_data)}
return (df)
}
#Examples
# path = "~/Desktop/Projects/EDAV/edav-final-project-41/data/raw/demographics"
# title = 'Demographics_'
# function
# sheet_name = 'Ethnicity'
# sheet_name = 'Age Group'
# sheet_name = 'Religion'
# sheet_name = 'Education'
# sheet_name = 'Native Language'
# data_cleaner = clean_demographics
# path = "~/Desktop/Projects/EDAV/edav-final-project-41/data/raw/all_arrivals"
# title = 'Arrivals_'
# sheet_name = 'Detailed'
# data_cleaner = clean_arrival
```
4. Saved these as csv files and uploaded to GitHub to easily access.
* We now have only 1 file for the overall refugee resettlement population data that consisted of the total refugee resettlement population by state.
* In terms of demographics information, or refugee resettlement information by country of origin, we have 1 file for each demographics category (thus 5 in total).
```{r}
# Cleaned all_arrivals data converted to csv files
save_cleaned_all_arrivals <- function(path) {
read_path = paste(path, "/edav-final-project-41/data/raw/all_arrivals", sep='')
write_path = paste(path, "/edav-final-project-41/data/clean", sep='')
title = 'Arrivals_'
sheet_name = 'Detailed'
data_cleaner = clean_arrival
arrival_df <- combine_files(read_path, data_cleaner, sheet_name, title)
write.csv(arrival_df, file = paste(write_path, "all_arrivals.csv", sep=''), row.names = FALSE)
}
# Cleaned demographics data converted to csv files
save_cleaned_all_demographics <- function(path) {
read_path = paste(path, "/edav-final-project-41/data/raw/demographics", sep='')
write_path = paste(path, "/edav-final-project-41/data/clean", sep='')
title = 'Demographics_'
sheet_names = c('Ethnicity', 'Age Group', 'Religion', 'Education', 'Native Language')
file_names = c('ethnicity', 'age_group', 'religion', 'education', 'native_language')
data_cleaner = clean_demographics
for (i in 1:length(sheet_names)) {
demographic_df <- combine_files(read_path, data_cleaner, sheet_names[i], title)
file_name = paste(write_path, file_names[i], sep='')
write.csv(demographic_df, file = paste(file_name, '.csv', sep=''), row.names = FALSE)
}
}
```
## Cleaned Data Format
```{r}
# Read cleaned data from github
arrival_df = read.csv('https://raw.githubusercontent.com/tlzhu19/edav-final-project-41/master/data/clean/all_arrivals.csv')
age_df = read.csv('https://raw.githubusercontent.com/tlzhu19/edav-final-project-41/master/data/clean/age_group.csv')
ethnicity_df = read.csv('https://raw.githubusercontent.com/tlzhu19/edav-final-project-41/master/data/clean/ethnicity.csv')
religion_df = read.csv('https://raw.githubusercontent.com/tlzhu19/edav-final-project-41/master/data/clean/religion.csv')
education_df = read.csv('https://raw.githubusercontent.com/tlzhu19/edav-final-project-41/master/data/clean/education.csv')
native_language_df = read.csv('https://raw.githubusercontent.com/tlzhu19/edav-final-project-41/master/data/clean/native_language.csv')
```
After cleaning the data, we have six '.csv' files that can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/clean). The following tables are a preview of the first few rows of these csv files.
1. `all_arrivals.csv`: The total number of refugee resettlements to each of the 50 states in the U.S. from 2009-2018. All raw files for this file can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw/all_arrivals).
```{r}
kable(head(arrival_df))
```
2. `age_group.csv`: The composition of refugee population based on age groups from under 14 years old to age 65 and over. Provides additional information on which gender, country, and year these individuals in age groups belong. All raw files for this can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw/demographics).
```{r}
kable(head(age_df))
```
3. `education.csv`: The composition of refugee population based on the education levels of the refugees. Provides additional information on which gender, country, and year these individuals in age groups belong. All raw files for this can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw/demographics).
```{r}
kable(head(education_df))
```
4. `ethnicity.csv`: The composition of refugee population based on the ethnicity of the refugees. Provides additional information on which gender, country, and year these individuals in age groups belong. All raw files for this can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw/demographics).
```{r}
kable(head(ethnicity_df))
```
5. `native_language.csv`: The composition of refugee population based on the native languages used by the refugees. Provides additional information on which gender, country, and year these individuals in age groups belong. All raw files for this can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw/demographics).
```{r}
kable(head(native_language_df))
```
6. `religion.csv`: The composition of refugee population based on the religion of the refugees. Provides additional information on which gender, country, and year these individuals in age groups belong. All raw files for this can be found [here](https://github.com/tlzhu19/edav-final-project-41/tree/master/data/raw/demographics).
```{r}
kable(head(religion_df))
```
Based on some initial analysis, we decided to exclude Ethnicity and Native language as they did not provide sufficient information to help us answer our questions regarding refugee resettlement trends in the U.S.. More specifically, Ethnicity and Native Language ware excluded in our data visualization as they are directly related to country of origin and thus are considered as redundant information.
Furthermore, we excluded many minority religions, as it made it difficult to observe the overall trend due to only a small number of people belonging to each religion. Hence, we concluded that since the majority of the refugees belong to the top-4 religions (Christian, Muslim, Hindu, Buddhist), we decided to focus on these religions.
## Transforming the Data
In order to visualize and compare the refugee resettlement trends among countries or between years, we had to group the data by country, year, or both.
For the same reason, in addition to using the raw values given in the data (number of people belonging to that group), we transformed the data into proportion to see a clear comparison among the countries, religions, education level, and age.
# Missing Values
The datasets from RPC (Refugee Processing Center) did not contain any missing values. However, we also noticed that our data contained a single row called “Unknown State”. Since this could not be visualized in our maps, we decided that it would be better to remove the data. Additionally, when we converted the `State` column to factors, there were 56. The extra six "states" are:
* American Samoa
* District of Columbia
* Guam
* Puerto Rico
* Unknown State
* Virgin Islands
Notice that these are territories of the U.S. as well as the capital of the U.S. We removed these rows since our main goal is to just focus on the fifty states.
```{r}
# Colors
BUHTAN_COLOR = 'orange'
BURMA_COLOR = BLUE = 'rgb(26, 118, 255)'
BLUE_h = '#1A76FF'
DRC_COLOR = RED = 'rgb(255, 99, 71)'
RED_h = '#FF6347'
IRAQ_COLOR = GREEN = 'rgb(60, 179, 113)'
GREEN_h = '#3CB371'
SOMALIA_COLOR = PURPLE = 'rgb(218, 112, 214)'
PURPLE_h = '#DA70D6'
# for the countries
our_palette = c('orange', BLUE_h, RED_h, GREEN_h, PURPLE_h)
```
```{r, fig.width=4.5, fig.height=4.5, fig.align='center'}
# CODE FOR PLOTTING PERCENTAGE OF MISSING DATA (Education)
na_df<- select(filter(education_df, Education == "Bio Data not Complete" | Education == "Unknown"),c(Education,Total, country, Year))
# Total sum of data
total_df <- education_df %>% group_by(country)%>% summarize(Total = sum(Total))
# Total sum of missing data
na_df <- na_df %>% group_by(country) %>% summarize(Total = sum(Total))
# Change column name
names(na_df)[2] <- "na_Total"
# Put two df together
combine_df <- cbind(total_df,na_df[2])
# Proportion of missing data
combine_df <- transform(combine_df, proportion = na_Total/Total)
# Plot missing data
ggplot(combine_df, aes(x=country, y=proportion, fill=country))+
geom_bar(stat="identity")+
theme(legend.position="none", plot.title=element_text(hjust=0.5))+
labs(x='Country', y='Proportion') +
ggtitle("Missing Education Data by Country") +
scale_fill_manual(values=our_palette)
```
In Education data, there were two rows called “Bio Data not Complete” and “Unkown”. We decided to consider these two values as missing data since they do not provide information regarding the education level of the refugees that re-settled in the U.S.. By summing up the total amount of these two rows and dividing it by the total population of that specific country, we observed the proportion of missing data in each country.
In the Missing Education Data by Country, Somalia has the largest proportion of data missing among the top-5 countries, with approximately 0.39 of the data missing. Burma has the lowest proportion of data missing, with approximately 0.15 of the data missing.
```{r, fig.width=4.5, fig.height=4.5, fig.align='center'}
# Religion
#group by country and bar plot for Unknown ()
# CODE FOR PLOTTING PERCENTAGE OF MISSING DATA
na_df1<- select(filter(religion_df, Religion == "Unknown"),c(Religion,Total, country, Year))
# Total sum of data
total_df1 <- religion_df %>% group_by(country)%>% summarize(Total = sum(Total))
# Total sum of missing data
na_df1 <- na_df1 %>% group_by(country) %>% summarize(Total = sum(Total))
# Change column name
names(na_df1)[2] <- "na_Total"
# Put two df together
combine_df1 <- cbind(total_df1,na_df1[2])
# Proportion of missing data
combine_df1 <- transform(combine_df1, proportion = na_Total/Total)
# Plot missing data
ggplot(combine_df1, aes(x=country, y=proportion, fill=country))+
geom_bar(stat="identity")+
theme(legend.position="none", plot.title=element_text(hjust=0.5))+
labs(x='Country', y='Proportion') +
ggtitle("Missing Religion Data by Country") +
scale_fill_manual(values=our_palette)
```
Similarly, for the Religion data, we also considered the row “Unknown” to be missing data, and calculated the proportion of missing data within each religion. Unlike the data on Education Level, the proportion of missing data was much smaller for religion. This suggests that information on religion was much more accessible compared to information on Education Level. DRC has the most amount of Religion data missing, with approximately 1.19e-03 of the total data missing.
Lastly, Age data did not contain any missing data, so no further analysis was needed to analyze missing patterns for this dataset.
# Results
## Temporal Analysis
Given our datasets, it is apparent that the temporal patterns of our data are of great importance to our analysis. As a result, to provide a general view of the overall change in resettlement population from 2009 to 2018, we chose to use a time series graph.
In this graph, we have years as the independent variable, and population count as the dependent variable. The green line indicates the ceiling set by the U.S. government for the maximum number of refugees that can be admitted. The red line indicates the actual number of individuals admitted.
We hypothesize that the change in the refugee admission ceiling, and the actual refugee resettlement population are correlated with:
1. The government administration at the time
2. The U.S. economy and major events in general
```{r, fig.align='center'}
# Compare ceilings with actual admissions
total_arrival_df <- arrival_df %>%
group_by(Year) %>%
summarise(Actual = sum(Inds))
# data from https://www.migrationpolicy.org/article/refugees-and-asylees-united-states#Refugee_Admission_Ceiling
Ceiling = c(80000, 80000, 80000, 76000, 70000, 70000, 70000, 85000, 50000, 45000)
# Actual = c(74654, 73311, 56424, 58238, 69926, 69987, 69933, 84995, 53716, 22491)
# Year = 2009:2018
# total_arrival_df = data.frame(ceiling, actual, year)
total_arrival_df$Ceiling = Ceiling
total_arrival_df <- total_arrival_df %>%
gather(key, Individuals, Actual, Ceiling)
total_arrival_df$Events = c("
-Barack Obama is inaugurated.
-In Cairo, President Obama addresses the
world's Muslims, vowing to forge a 'new beginning'.
U.S. forces pull out of Iraq's main cities and towns.", '
-The long war in Iraq was officially declared over
-Unemployment rate rose to 9.8%
-The Arizona Immigration Law', '\nU.S. forces killed the al-Qaida leader, Osama Bin Laden', '
-Gay marriage made legal
-Facebook IPO
-U.S. general election where democratic Obama beat Republican
rival Mitt Romney to win a second term in the White House', '
-Detroit filed for bankruptcy
-Unable to reach agreement on federal spending levels,
a dysfunctional Congress stumbled into the first government
shutdown since the mid-1990s', '
-U.S. air attacks against ISIS forces in Iraq and Syria began in August
-U.S. - Cuba thaw after Alan Gross was freed after 5 years', '
-President Obama’s plan to allow 10,000 Syrian refugees into
the U.S was met with stiff resistance from some House Republicans', '\n-Donald Trump elected as the next U.S. president
-Flint, Michigan water crisis', '
-Donald Trump is inaugurated as the 45th president of the U.S.
-American tensions with North Korea intensified
-Devastating hurricane season in southeast Texas and Florida', '
-In California, wildfires left 88 dead in the massive Camp Fire
-Migrant children were separated from their parents as part
of President Trump’s ‘zero tolerance’ border policy', "
-Barack Obama is inaugurated.
-In Cairo, President Obama addresses the
world's Muslims, vowing to forge a 'new beginning'.
U.S. forces pull out of Iraq's main cities and towns.", '
-The long war in Iraq was officially declared over
-Unemployment rate rose to 9.8%
-The Arizona Immigration Law', '\nU.S. forces killed the al-Qaida leader, Osama Bin Laden', '
-Gay marriage made legal
-Facebook IPO
-U.S. general election where democratic Obama beat Republican
rival Mitt Romney to win a second term in the White House', '
-Detroit filed for bankruptcy
-Unable to reach agreement on federal spending levels,
a dysfunctional Congress stumbled into the first government
shutdown since the mid-1990s', '
-U.S. air attacks against ISIS forces in Iraq and Syria began in August
-U.S. - Cuba thaw after Alan Gross was freed after 5 years', '
-President Obama’s plan to allow 10,000 Syrian refugees into
the U.S was met with stiff resistance from some House Republicans', '\n-Donald Trump elected as the next U.S. president
-Flint, Michigan water crisis', '
-Donald Trump is inaugurated as the 45th president of the U.S.
-American tensions with North Korea intensified
-Devastating hurricane season in southeast Texas and Florida', '
-In California, wildfires left 88 dead in the massive Camp Fire
-Migrant children were separated from their parents as part
of President Trump’s ‘zero tolerance’ border policy')
p <-ggplot(total_arrival_df, aes(x=Year, y=Individuals, colour=key, label=Events)) +
geom_line() +
geom_point(size = .75) +
geom_vline(xintercept = 2016, color='gray') +
annotate("text", x=2015.3, y =50000, label="Trump\nelected", color='gray', hjust=0) +
geom_vline(xintercept = 2012, color='gray') +
annotate("text", x=2011.2, y =40000, label="Obama\nreelected", color='gray', hjust=0) +
ylab("Number of Individuals") +
ggtitle('Ceilings Vs Actual Refugee Resettlements') +
scale_y_continuous(labels = comma) +
theme(plot.title=element_text(hjust=0.5), legend.title = element_blank())
ggplotly(p, tooltip = c('y', 'label')) %>%
layout(annotations = list(x=-0.1, y = -0.10, text = "Hover over points to read about major events", showarrow = F, xref = 'paper', yref = 'paper', xanchor = 'left', yanchor = 'auto', xshift = 0, yshift = 0, font = list(size = 11, color = "grey")))
```
Looking at the general trend of change in refugee admission ceiling, the two timepoints that stand out the most are 2015 and 2016.
* In 2015, the refugee admission ceiling increased drastically from 70,000 to a record high of 85,000 in 2016 under President Obama’s administration.
* In 2016, however, the refugee admission ceiling decreased drastically from 85,000 to a record low of 45,000 in 2018 under President Trump’s administration and has continued to decrease.
Looking at the general trend of change in the actual refugee resettlement population, we can see that the population decreased from 79,943 in 2009 to a low of 51,458 in 2011, when President Obama was re-elected.
However, the resettlement population increased from 51,458 in 2011 to a record high of 96,874 in 2016, and decreased drastically from 96,874 in 2016 to a record low of 22,847 in 2018 under President Trump’s administration.
In order to take a closer look at the refugee resettlement pattern of each individual country, we decided to visualize each of our five countries individually(https://data.newamericaneconomy.org/en/refugee-resettlement-us/).
We hypothesize this trend to be both correlated to the U.S. refugee admission policy, the U.S. economy overall, and major political and social events in each of the 5 countries.
```{r, fig.align='center'}
total_demo <- ethnicity_df
total_demo$Total <- as.numeric(total_demo$Total)
change_in_pop <- total_demo %>%
group_by(Year, country) %>%
summarize(Individuals=sum(Total))
change_in_pop <- rename(change_in_pop, c('country'='Country'))
change_in_pop$Events <- c('\n-A 6.1 magnitude earthquake occurred in eastern
Bhutan, leaving at least 10 dead
-Huanglongbing virus wipes out much of Bhutan’s
orange crop, an important export for Bhutan', "\n- Government signs ceasefire with rebels
of Karen ethnic group.
- Myanmar abolishes pre-publication
media censorship.", '\nEastern Congo offensive - a joint Congo-Rwanda military
offensive against the Hutu FDLR rebel group descended from
those groups that carried out the 1994 Rwanda genocide.', '\n-U.S. hands over responsibility for
security in the Green Zone to Iraqi forces
- lowest monthly death toll since the
U.S.-led invasion of March 2003
- several car bombings ', '\n-Multiple hijacking by Somali pirates
-Oxfam International describes the
humanitarian crisis in Somalia as "very dire".', '\nAlthough Bhutan saw a decline in annual tourism
revenues and a loss of over $67 million due to
earthquake and cyclone damages', "\n- Government changes country's flag,
national anthem and official name.", '\nThe UN renewed sanctions on the DRC which
included guidelines for importers, processors and
consumers buying Congolese minerals, which are the
source of funding for many rebel groups.', '\n-Parliamentary election: secular, non-sectarian
Iraqi National Movement received the most votes
-President Obama announced that U.S. combat
operations in Iraq will end Aug 31', 'Ongoing Somali Civil War', '\nKing Jigme Khesar Namgyel Wangchuck marries
21-year-old student Jetsun Pema', '\n- Thein Sein is sworn in as president
of a new, nominally civilian government.
- U.S. Secretary of State Hillary Clinton
visits, meets Aung San Suu Kyi and President Thein Sein', '\nPresidential and parliamentary elections. Mr Kabila
gains another term. The vote is criticised abroad and
the opposition disputes the result.', 'War officially ended', '\n-East Africa Drought
-Somalia Famine', '\nPolitical leaders enforced a nearly six-month
ban on all public religious activities ahead
of the upcoming elections', '\n- Government signs ceasefire with rebels
of Karen ethnic group.
- Myanmar abolishes pre-publication
media censorship.', '\nThe Child Protection Law of 2009 sets the minimum age
for full-time work at 18, unless a parent consents', 'Ongoing explosions, suicide bombings', 'Somali presidential election', '\nParliamentary elections: opposition People’s Democratic
Party wins 32 seats in the lower house, against the
incumbent Druk Phuensum Tshogpa party’s 15 seats.', '\n- Rioting between Muslims and Buddhists in
Meiktila leaves at least 10 people dead.
- Four private daily newspapers appear for the
first time in almost 50 years as the state monopoly ends.', '\n- Representatives of 11 African countries sign an
accord pledging to help end the conflict in DR Congo.
- 3,000-member UN Intervention Brigade deployed to
fight and disarm rebels in the east.', '\nIraqi protests (Iraqi Insurgency/Iraq Crisis :
violent conflict with the central government &
sectarian violence among religious groups)', 'NA', '\nThe World Bank Group’s Senior Vice President and
Chief Economist, Kaushik Basu, visited Bhutan to learn
from the country’s unique development experience as the
country makes rapid progress in reducing poverty.', '\n- U.S. extends some sanctions for another year,
saying that despite the recent reforms, rights abuses
and army influence on politics and the economy persist.', '\nThe Democratic Republic of Congo confirms two deaths
from Ebola in the north of the country, the first
cases reported from the current outbreak in West Africa', '\nBeginning of Iraqi Civil War & ISIS militants
control the Iraqi cities & Northern Iraq offensive', '\nThe Somali government-led Operation Indian
Ocean was launched to clean up the remaining
insurgent-held pockets in the countryside', '\nJohn Kerry becomes the first-ever U.S. secretary of state
to hold a cabinet-level meeting with a Bhutanese official
when he meets Bhutanese Prime Minister Tshering Tobgay in India', '\n- A draft ceasefire agreement is signed between
the government and 16 rebel groups.
\n- Hundreds of Muslim Rohingyas migrants leave
by sea in flimsy boats.
\n- Opposition National League for Democracy,
led by Aung San Suu Kyi, wins enough seats in
parliamentary elections to form a government.', '\nDozens killed in protests against proposed electoral
law changes which the opposition said were designed
to allow President Kabila to remain in power.', '\nISIL -> destruction of Mosul Museum, bombings in
Baghdad, destruction of the city of Hatra, Nimrud,
and Bash Tapia Castle', '\nAMISOM and Somalia National Army regained many
villages and major towns of Baardhere and Dinsoor', '\nKing Jigme Khesar Namgyel Wangchuck and his wife, Queen Jetsun
Pema announce the birth of Crown Prince Jigme Namgyal Wangchuck', "\n- Htin Kyaw sworn in as president, ushering in a
new era as Aung San Suu Kyi's democracy movement takes
power after 50 years of military domination.", "\nA political deal signed between President Kabila's
ruling coalition and the opposition to delay the
presidential election until 2018 sees Prime Minister
Augustin Matata Ponyo and his cabinet resign, paving
the way for a new cabinet to include opposition figures.", '\nIslamic State attacked and briefly seized
an Iraqi army base in Al Tarah', '\nDrought', '\nBhutan protests to China over its building of a
road in disputed territory', '\n- The United Nations human rights council decides
to set up an investigation into alleged human rights abuses
by the army against the Rohingya Muslim minority.', '\nDR Congo is experiencing a "mega-crisis", with conflict
having forced 1.7 million people to flee their homes
during the year, aid agencies say.', '\n-Iraqi government official threat to Kurdish to
close a border in Northern Iraq follow vote for
independence referendum
-Ongoing suicide car bombings ', '\nMogadishu bombings', 'NA', '\n- President Htin Kyaw resigns on health grounds
and is replaced by Win Myint, a fellow Suu Kyi loyalist.', '\n- Main opposition Union for Democracy and Social
Progress chooses Felix Tshisekedi as its candidate
for the December presidential election.
- Ebola outbreak in the east.', 'Baghdad bombings (two suicide bombings)', '\n-Mogadishu attack
-Cyclone Sagar makes landfall ')
countries_plot <- ggplot(change_in_pop, aes(Year, Individuals, color=Country, shape=Country, label=Events))+
geom_line() +
geom_point(size = .9) +
xlab("Year") +
ylab("Number of Individuals") +
labs(title = "Change in Refugee Resettlements to the U.S. \nof Top 5 Refugee Origins from 2009-2018 ") +
theme(legend.title = element_blank(), plot.title=element_text(hjust=0.5)) +
scale_color_manual(values=our_palette)
ggplotly(countries_plot, tooltip = c('shape', 'y', 'label')) %>%
layout(annotations = list(x = -0.1, y = -0.1, text = "Hover over points to read about major events", showarrow = F, xref = 'paper', yref = 'paper', xanchor = 'left', yanchor = 'auto', xshift = 0, yshift = 0, font = list(size = 11, color = "grey")))
```
Overall, the amount of refugees that resettled to the U.S. decreased drastically from 2009 to 2018.
**Bhutan:**
Situated between the superpowers of India and China, the isolated Buddhist kingdom of Bhutan has generated one of the highest numbers of refugees in the world in proportion to its population.
From 1991, over one sixth of Bhutan’s people sought asylum in Nepal, India and other countries around the world. Over 105,000 Bhutanese have spent 15-20 years living in UNHCR-run refugee camps in Nepal. Since 2008, a resettlement process has seen the majority of those living in the camps re-settled in the USA, Canada, Australia, New Zealand and Europe (http://bhutaneserefugees.com/).
Overall, there is a decrease in the number of Bhutan refugees that resettled to the U.S. over the 10 years, from 15,000 in 2009 to 707 in 2018.
**Burma:**
The Rohingya people of Burma have faced decades of systematic discrimination, statelessness and targeted violence. Such persecution have forced Rohingya people out of the country for many years, with significant spikes following violent attacks in 1978, 1991-1992, and again in 2016. It was August 2017 that triggered by far the largest and fastest refugee outflux in Burma (https://www.unocha.org/rohingya-refugee-crisis).
There is an overall decrease in the number of Burma refugees that resettled to the U.S. However, we can see that it first decreased from 2009 to 2013, and then increased from 2013 to 2015, when hundreds of Muslim Rohingyas migrants leave by sea in flimsy boats. However, this population drastically fell after 2015 to a record low of 3,771 in 2018 despite the rise in overall Rohingya refugees in 2017. We can thus infer this as a reflection of the tightening of U.S. refugee admission policy.
**Democratic Republic of the Congo (D.R.C):**
Despite the DRC’s civil war being brought to an end in 2013, the nation has continued to see sporadic waves of fighting - especially in the Eastern parts of the country. Since 2016, a new wave of violence also affected the DTC’s previously peaceful Kasai region, bringing thousands of civilians to struggle for survival (https://www.unhcr.org/en-us/dr-congo-emergency.html).
In addition to widespread violence from armed groups, many displaced people are facing major health risks, including a recent outbreak of Ebola. The eastern provinces of Ituri and North Kivu, which are most affected by the outbreak are also the areas most affected by displacement and violence (https://www.unrefugees.org/news/democratic-republic-of-the-congo-refugee-crisis-explained/#What%20conflicts%20are%20occurring%20within%20the%20DRC?).
The refugee resettlement trend from the DRC to the U.S. is the reverse of the overall refugee resettlement trend among the 5 countries and experienced an overall increase. It increased drastically from 1,134 in 2009 to 19,829 in 2016, but dropped drastically to 5,252 in 2017. However, there is still an overall increase of 8,171 over the 10 years.
**Iraq:**
As a victim to decades of conflict and widespread violence, Iraq has more than 3.3 million Iriqis displaced across the country since 2014. Although armed violence has declined in some parts of the country, armed groups and small scale military operations continued to carry out unpredictable attacks throughout the country, resulting in new displacements (https://www.unrefugees.org/emergencies/iraq/).
Iraq’s overall refugee resettlement trend in the U.S. is probably the one most related to U.S. policy. As we can see from the graph, the refugee resettlement population from Iraq first dropped in 2011, when the war between the U.S. combat operations in Iraq officially ended, to a record low of 6,339. However, the refugee resettlement population peaked again in 2014 at 20,337 when the Iraqi Civil War began and ISIS militants controlled Iraqi cities. This number, however, fell from 20,337 in 2014 to a low of 91 in 2017.
**Somalia:**
The impact of nearly two-and-a-half decades of armed conflict in Somalia, compounded by drought and other natural hazards, drove over 870,000 Somalis to flee their homes (https://www.unhcr.org/en-us/somalia.html).
We can see from our graph that the population of Somalian refugees in the U.S. first increased from 4,620 in 2009 to 10,786 in 2016. Unsurprisingly, however, this population dropped to 139 in 2017.
## Geographical Analysis
In order to visualize the overall refugee resettlement trend within the United States, we chose to use a choropleth map.
We hypothesize that:
1. There would be an overall decrease in refugee resettlement population in the U.S. in all states
2. Coastal states would have more refugees than other states.
```{r, fig.width=9, fig.align='center'}
df_map <- arrival_df
# remove extra 6 'States'
df_map <- df_map[!df_map$State %in% c('American Samoa', 'District of Columbia', 'Guam', 'Puerto Rico', 'Unknown State', 'Virgin Islands'), ]
df_map$State <- as.factor(df_map$State)
df_map$code <- state.abb[match(df_map$State, state.name)]
#add Wyoming and Montana because they are misisng
df_add <- data.frame(State = c('Wyoming','Wyoming','Montana'), Cases = c(0, 0, 0), Inds = c(0, 0, 0), Year = c(2009, 2010, 2010), code = c('WY', 'WY', 'MT'))
df_map <- rbind(df_map, df_add)
df_map$State <- tolower(df_map$State)
states <- map_data('state')
map_basic <- ggplot(df_map, aes(map_id = State)) +
geom_map(aes(fill = Inds), map = states) +
expand_limits(x = states$long, y = states$lat) +
facet_wrap(~Year) +
scale_fill_gradient(low="#D6EAF8", high="#154360") +
ggtitle('Refugee Resettlement in the U.S. by State') +
labs(fill = "Number of\nIndividuals") +
theme(plot.title=element_text(hjust=0.5))
map_basic
```
We can see that there is a slight decrease in the overall refugee resettlement population in 2011 and increased slightly from 2012 to 2016. However, this population decreased drastically in all states after 2016.
Confirming our hypothesis, we can also see that in all 10 years, the number of refugees in coastal states like California was significantly higher than other states. Together, states like Texas, California, Washington and New York resettled roughly a quarter of all refugees.
In order to further examine the refugee resettlement pattern within the U.S. states, we will be providing an interactive visualization in a later part of this report.
## Demographic Analysis
### Religion
The U.S. has admitted far more Christian refugees than Muslim refugees in recent years. Christians accounted for nearly 80% of refugees who came to the U.S. by the end of 2018. This pattern marks a sharp reversal from several years ago. In fiscal 2016, the number of Muslim refugees admitted reached 38,900, a historic high that outpaced Christian refugee admissions, accounting for 46% of the total number of refugees that resettled to the U.S. that year (https://www.pewresearch.org/fact-tank/2019/10/07/key-facts-about-refugees-to-the-u-s/).
We hypothesize that under President Trump’s administration, the population of Muslim refugees that resettled to the U.S. will experience the most decrease in comparison to other religions.
We decided to visualize this trend using a bar chart, faceted by the year. We focused on the proportions of each religion for each year instead of the actual number of individuals to better compare each relgion across the years. As one may notice, for a given year, the proportions may not add up to 1 because as mentioned previously, we will be looking at the top 4 religions (see the end of section 3.2 Cleaned Data Format).
```{r, fig.height=7, fig.align='center'}
muslim_names = c('Moslem', 'Moslem Suni', 'Moslem Shiite')
religion_df3 <- religion_df %>%
group_by(Year, Religion) %>%
summarise(Inds = sum(Male) + sum(Female)) %>%
mutate(Proportion = Inds / sum(Inds))
# rename to Moslem
religion_df3[religion_df3$Religion %in% muslim_names, 'Religion'] <- 'Moslem'
innerFunc <- function(sub_df){
row <- head(sub_df, 1)
row$Inds <- sum(head(sub_df$Inds))
return(row)
}
religions = c('Christian', 'Moslem', 'Buddhist', 'Hindu')
subset_religion_df3 <- religion_df3[religion_df3$Religion %in% religions, ]
# combines Moslem for each year
create_religion_df <- function(current_df) {
df <- do.call(rbind,
by(data=current_df[current_df$Year==2009, ],
INDICES=current_df[current_df$Year==2009, ]$Religion, FUN=innerFunc))
for (year in c(2010:2018)) {
df2 <- do.call(rbind,
by(data=current_df[current_df$Year==year, ],
INDICES=current_df[current_df$Year==year, ]$Religion, FUN=innerFunc))
df <- rbind(df, df2)
}
return(df)
}
subset_religion_df4 <- create_religion_df(subset_religion_df3)
subset_religion_df4$Religion <- factor(subset_religion_df4$Religion, levels = c('Christian','Moslem','Hindu', 'Buddhist'))
p2 <- ggplot(subset_religion_df4, aes(x= reorder(Religion, -Proportion), y=Proportion, fill=Religion)) +
geom_bar(stat="identity") +
facet_grid(subset_religion_df4$Year) +
labs(x = 'Religion', y = 'Proportion') +
scale_x_discrete(labels = c('Christian','Muslim','Hindu', 'Buddhist')) +
ggtitle('Religion Over Time') +
theme(legend.position = 'none', plot.title=element_text(hjust=0.5))
p2
```
We can see that whereas the proportion of Christian refugees stayed pretty consistant over the 10 years, the proportion of Muslim refugees first increased from 2009 to 2015, and then decreased from 2016 to 2018. Similarly, the proportion of the Hindu refugee population also decreased from 2013 on, whereas the proportion of Buddhist refugees stayed rather consistent throughout the 10 years.
In order to further examine the resettlement pattern of individuals within each religion, we will be providing an interactive visualization in a later part of this report.
### Education Level
Refugee children are found to be five times more likely to be out of school than their non-refugee peers. According to the UNHCR, 76% of refugee adolescents were not in secondary school, only 61% of refugee children attend primary school, and only 3% of refugees enroll in a college or university (https://www.unrefugees.org/refugee-facts/statistics/).
We decided to visualize the education level attained by refugees resettled in the U.S. from 2009 to 2018 with a bar chart faceted by country. We focused on the proportions of each education level for each country instead of the actual number of individuals to better compare each education level across the countries.
```{r, fig.align='center'}
edu_df<- select(filter(education_df, Education != "Bio Data not Complete" & Education != "Unknown"),c(Education,Total, country, Year))
edu_country <- edu_df %>%
group_by(country) %>%
mutate(proportion = Total/sum(Total))
edu_country$Education <- factor(edu_country$Education, levels = c('Kindergarten', 'Primary','Secondary', 'Intermediate', 'Pre-University', 'University/College', 'Graduate School', 'Technical School', 'Professional', 'NONE'))
ggplot(edu_country, aes(x= Education, y=proportion, fill=Education))+
geom_bar(stat="identity") +
facet_grid(edu_country$country) +
labs(x='Education Level', y='Proportion') +
scale_x_discrete(labels = c('Kindergarten', 'Primary','Secondary', 'Intermediate', 'Pre-University', 'University/College', 'Graduate School', 'Technical School', 'Professional', 'No Education')) +
theme(legend.position="none", plot.title=element_text(hjust=0.5), axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Education Data by Country")
```
We can see that for all of the 5 countries, most of the refugee population have only attained an education level of primary school or lower. The DRC and Somalia have the highest proportion of refugee population with a primary school degree, followed by Bhutan, Burma, and then Iraq. About 30% of the refugee population in Bhutan, Burma, the DRC have a secondary school degree, whereas less than 20% of the refugee population in Iraq and Somalia have a secondary school degree. We can see that less than 10% of the refugee population in each of the 5 countries have obtained a degree higher than professional school.
### Age Group
In order to visualize the distribution of age groups within the refugee resettlement population from each of the 5 countries, we used a bar chart faceted by country. We focused on the proportions of each age group for each country instead of the actual number of individuals to better compare each age group across the countries.
```{r, fig.align='center'}
age_df2 <- age_df %>%
group_by(country) %>%
mutate(Proportion = Total/sum(Total))
age_df2$Age.Group <- factor(age_df2$Age.Group, levels = c("Under 14", "Age 14 to 20", "Age 21 to 30", "Age 31 to 40", "Age 41 to 50", "Age 51 to 64", "Age 65 and Over"))
p3 <- ggplot(age_df2, aes(x= Age.Group, y=Proportion, fill=Age.Group)) +
geom_bar(stat="identity") +
facet_grid(age_df2$country) +
labs(x = 'Age Group', y = 'Proportion') +
theme(legend.position="none", plot.title=element_text(hjust=0.5)) +
ggtitle('Age Group by Country')
p3
```
We can see that DRC and Somalia have the highest proportions of refugees who are under 14.
### Multidimensional Analysis
In order to obtain an overview of the data and observe whether there are any correlations among the demographic variables, we decided to examine the relationships between a few variables using mosaic plots. The specific variables used for multi-dimensional analysis are provided below.
**Age Group and Gender**
We examine the correlation between age groups and gender in the refugee resettlement population.
Our original assumption was that the two variables would be independent of each other as events that led to refugeee resettlement should have the same effect on different age groups and genders.
```{r, fig.align='center'}
age_df_b = select(filter(age_df), c(Age.Group, Male,Female,country,Year))
melted_data <- melt(age_df_b, id=c("Age.Group","country", "Year"))
melted_data$Age.Group <- factor(melted_data$Age.Group, levels=c("Under 14","Age 14 to 20", "Age 21 to 30", "Age 31 to 40", "Age 41 to 50", "Age 51 to 64", "Age 65 and Over"))
names(melted_data)[4] <- "Gender"
names(melted_data)[5] <- "freq"
melted_data2 <- melted_data %>%
group_by(Age.Group, Gender) %>%
summarize(Total = sum(freq))
ggplot(melted_data2)+
geom_mosaic(
aes(x=product(Gender, Age.Group),
weight=Total,
fill=Gender
),
divider=c("hspine","vspine")
) +
labs(x="Gender", y="Age Group", title="Refugee Resettlements in the U.S. by Age Group and Gender") +
theme(legend.position="none")
```
We can observe that the mosaic plot shows us little to no correlation between the two variables. Throughout all age groups, we can see equal distribution among males and females. Although we can see that there are slightly more males than females in the older age groups (age 41 and older), the difference is quite trivial.
**Education Level and Gender**
We also examine the correlation between the education and gender in the refugee resettlement population.
Our original assumption was that education would be dependent on gender due to social and cultural conventions that often result in males being prioritized over females to attend school.
```{r, fig.align='center'}
education_df2<- select(filter(education_df, Education != "Bio Data not Complete" & Education != "Unknown"),c(Education, Male, Female, country, Year))
melted_data2 <- melt(education_df2, id=c("Education","country", "Year"))
melted_data2$Education <- factor(melted_data2$Education, levels=c('Kindergarten', 'Primary','Secondary', 'Intermediate', 'Pre-University', 'University/College', 'Graduate School', 'Technical School', 'Professional', 'NONE'))
names(melted_data2)[4] <- "Gender"
names(melted_data2)[5] <- "freq"
melted_data3 <- melted_data2 %>%
group_by(Education, Gender) %>%
summarize(Total = sum(freq))
levels(melted_data3$Education)[levels(melted_data3$Education)=="NONE"] <- "No Education"
ggplot(melted_data3)+
geom_mosaic(
aes(x=product(Gender, Education),
weight=Total,
fill=Gender
),
divider=c("hspine","vspine")
) +
labs(x="Gender", y="Education Level", title="Refugee Resettlements in the U.S. by Education Level and Gender") +
theme(legend.position="none")
```
As we can see, the mosaic plot shows us little to no correlation between the two variables. Contrary to our hypothesis, the distribution of gender within each education level is almost consistent across all education levels. Although there are slightly more females who had obtained an education level of higher than professional school (professional, technical and graduate school) than males, there is no significant difference between the education levels of females and males.
Given the nature of our topic and data, we believe that multidimensionality is not the most informative factor in our analysis.
# Interactive Component
## Geographics
```{r, fig.align='center'}
# remove extra 6 'States'
arrival_df2 <- arrival_df[!arrival_df$State %in% c('American Samoa', 'District of Columbia', 'Guam', 'Puerto Rico', 'Unknown State', 'Virgin Islands'), ]
arrival_df2$State <- as.factor(arrival_df2$State)
arrival_df2$code <- state.abb[match(arrival_df2$State, state.name)]
gg <- ggplot(arrival_df2, aes(code, Inds, frame = Year)) +
geom_bar(stat="identity", position='identity') +
labs(x = "State", y="Individuals") +
ggtitle('Refugee Resettlements By State from 2009-2018') +
theme(axis.text.x = element_text(angle = 90, hjust = 1), plot.title=element_text(hjust=0.5))
ggplotly(gg)
```
Through this interactive graph, we can gain a better understanding of the refugee resettlement trends within the U.S. states (2009 - 2018).
In terms of the overall trend, we can observe a significant decrease in overall refugee resettlements in the U.S. from 2009 to 2018. While there was a slight increase from 2015 to 2016, the numbers dropped drastically from 2016 onwards. We can also note that the two states with the largest refugee resettlements are California and Texas, and this has remained constant during the entire period. Interestingly, 2009 was the only year where California accepted more than 9000 refugees. Other states have never exceeded this number.
## Demographics: Religion
To dive further into religions of refugees, we visualized the breakdown of Hindu and Muslim refugees over time in the bar charts below. In the drop down menu, the options to select are:
* All Individuals: count of individuals in each religion by country
* All Proportions: proportions of each country within a religion
In addition to these two options, there are options to see the breakdown of each category by country. For example, “Hindus from Bhutan” means number of Hindu refugees from Bhutan over time. Moreover, “Hindus from Bhutan in Proportions” means proportion of refugees from Bhutan within Hindu refugees over time.
The first bar chart shows that the number of Hindu refugees decreased steadily in the past 10 years, mostly driven by Bhutan. We can observe that refugees from Bhutan account for more than 99% of Hindu refugees, which decreased from nearly 12,000 to 200. This reduction explains the overall decrease in Hindu refugees over the past decade.
```{r}
# Line Chart with Drop-down menu for Hindu and Muslim by Country
muslim_names = c('Moslem', 'Moslem Suni', 'Moslem Shiite')
religion_df5 = religion_df
religion_df5[religion_df5$Religion %in% muslim_names, 'Religion'] <- 'Moslem'
religions_2 = c('Moslem', 'Hindu')
subset_religion_df5 <- religion_df5[religion_df5$Religion %in% religions_2, ]
subset_religion_df5 <- subset_religion_df5 %>% group_by(Year, Religion, country,) %>% summarise(Total = sum(Male)+sum(Female))
#Absolute Number Calculation
hindu_df <- filter(subset_religion_df5, Religion == 'Hindu')
hindu_df_messy <- spread(hindu_df, country, Total)
muslim_df <- filter(subset_religion_df5, Religion == 'Moslem')
muslim_df_messy <- spread(muslim_df, country, Total)
drop <- c('Bhutan')
muslim_df_messy <- muslim_df_messy[, !names(muslim_df_messy) %in% drop]
# Percent Calculation
religion_df6 = religion_df
religion_df6[religion_df6$Religion %in% muslim_names, 'Religion'] <- 'Moslem'
religion_df6 <- religion_df6 %>%
group_by(Year, Religion, country) %>%
summarise(Inds = sum(Male) + sum(Female)) %>%
mutate(Proportion = Inds / sum(Inds))
hindu_percent_df <- filter(religion_df6, Religion == 'Hindu') %>%
select(Year, Religion, country, Proportion) %>%
spread(country, Proportion)
muslim_percent_df <- filter(religion_df6, Religion == 'Moslem') %>%
select(Year, country, Religion, Proportion) %>%
spread(country, Proportion)
muslim_percent_df <- muslim_percent_df[, !names(muslim_percent_df) %in% c('Bhutan')]
setnames(hindu_percent_df, old=c("Bhutan","Burma"), new=c("Bhutan_Percentage", "Burma_Percentage"))
setnames(muslim_percent_df, old=c("Burma", "DRC", "Iraq", "Somalia"), new=c("Burma_Percentage", "DRC_Percentage", "Iraq_Percentage", "Somalia_Percentage"))
hindu_df_messy_final <- merge(hindu_df_messy, hindu_percent_df, by = c("Year", "Religion"))
muslim_df_messy_final <- merge(muslim_df_messy, muslim_percent_df, by = c("Year", "Religion"))
```
```{r, fig.align='center'}
p_hindu_dropdown <- plot_ly(hindu_df_messy_final, x = ~Year) %>%
add_bars(y = ~Bhutan, name = 'Bhutan', marker= list(color=BUHTAN_COLOR)) %>%
add_bars(y = ~Burma, name = 'Burma', marker= list(color=BURMA_COLOR)) %>%
add_bars(y = ~Bhutan_Percentage, name = 'Bhutan Proportion', marker= list(color=BUHTAN_COLOR), visible=F) %>%
add_bars(y = ~Burma_Percentage, name = 'Burma Proportion', marker= list(color=BURMA_COLOR), visible=F) %>%
layout(
title = 'Hindus by Country Over Time',
xaxis = list(domain = c(2009, 2018)),
yaxis = list(title = 'Hindu'),
updatemenus = list(
list(
x = 1.5,
y = 0.8,
buttons = list(
list(method = 'restyle',
args = list('visible', list(TRUE, TRUE, F, F)),
label = 'All Individuals'),
list(method = 'restyle',
args = list('visible', list(TRUE, FALSE, FALSE, FALSE)),
label = '\t\t\tHindus from Bhutan'),
list(method = 'restyle',
args = list('visible', list(FALSE, TRUE, FALSE, FALSE)),
label = '\t\t\tHindus from Burma'),
list(method = 'restyle',
args = list('visible', list(F, F, T, T)),
label = 'All Proportions'),
list(method = 'restyle',
args = list('visible', list(FALSE, FALSE, TRUE, FALSE)),
label = '\t\t\tHindus from \n\t\t\tBhutan in Proportions'),
list(method = 'restyle',
args = list('visible', list(FALSE, FALSE, FALSE, TRUE)),
label = '\t\t\tHindus from \n\t\t\tBurma in Proportions'))