Skip to content

Latest commit

 

History

History
131 lines (98 loc) · 5.25 KB

importingdata.md

File metadata and controls

131 lines (98 loc) · 5.25 KB

Importing Data into R

Israel Arevalo 2022-09-23

Getting Started

This guide was created to assist a new user of R in importing a dataset. R is a powerful tool that allows the user to import a variety of data types (i.e., .csv, .sav, .dat, and more). To start, we want to download a package called haven by running the following code.

Note
While R supports reading filetypes such as .csv, not all data comes in this format. Therefore, we will be installing the haven package to allow us to import various different file types

install.packages('haven')

Now that we’ve downloaded the haven package, let’s load it into our environment. We will do that by running the library function. Notice that now that we have installed our package, we don’t need to wrap our package’s name with ” ” or ’ ’. However, doing so will not yield an error.

library(haven)

The Haven package will allow you to import a variety of data types into your R environment. In order to read these datasets, we will be using the read_DATATYPE function within the Haven package. You can find the full documentation for the Haven library here. The Haven package allows the user to read .dta, .sas, .spss, and .xpt files.

If you are importing a natively supported datatype such as .csv, click here to skip the following sections and navigate to the (Importing CSV files into R) section of this tutorial.

Before Moving On

Several assumptions are made prior to this process.

  1. You have installed R on your computer.
  2. You have installed an Integrated Development Environment (IDE) like RStudio on your computer.
  3. You have a dataset that you can import into RStudio.

Importing Data into R Environment

Now that we have installed the haven package, let’s start importing our data. Importing your data is the same, regardless of whether you are using a MacOS, Windows, or Linux; however, you may come across issues when running your code on different operating systems. This is due to the fact that the paths (even though they may be in the same folder) may vary slightly due to the nature of the operating system. This can be a problem if you house your dataset in a different folder than the one you created your Rmarkdown/Rscript document. To avoid this issue, a common practice is to house your dataset and your R document within the same folder. However, this may not always be possible. A workaround this issue is to create a read_DATATYPE command for each environment you will be working on and comment out # the one you are not using at the time.

Now, let’s see how importing a file looks like. In the example below, we are importing a dataset that is located within the same folder as our R file and calling it df.

df <- read_spss(mylovelydata.sav)

Now let’s look at how our command would look like if our data is located in a different path than our R file. Notice how the path now needs to be wrapped using a ".

df <- read_spss("E:\Dropbox\Research\Data\mylovelydata.sav")

What about other data types?

Here’s a list of the read functions that come within the haven package to help you import your dataset into R.

# Importing SPSS Data Files
df <- read_spss(mylovelydata.sav)


# Importing Stata DTA Files
df <- read_dat(mylovelydata.dta)


# Importing SAS Data
df <- read_sas(mylovelydata.sas7bdat)

Importing CSV and Excel files into R

Importing a .csv file into R is very similar to importing a file using the haven package; however, instead of using the read_DATATYPE function, we will be using read.csv function from native R. Let’s view this command by importing a dataset called mylovelydata.csv into a dataframe called df.

# Importing CSV File
df <- read.csv("E:\Dropbox\Research\Data\mylovelydata.csv")

Lastly, we will review importing Excel files into R. To do this, we will need to download a package called xlsx that will allow us to read .xlsx files. Below is an example of how we would install and load the package, read the data, and call it ‘df’.

install.packages('xlsx')
library(xlsx)

df <- read.xlsx("E:\Dropbox\Research\Data\mylovelydata.xlsx")

Conclusion

While short, I hope this guide provided you with the basics on importing your data into the R environment. For further details, alternative methods, and documentation on importing data into R, please visit RStudio’s guide on this topic.