Javascript required
Skip to content Skip to sidebar Skip to footer

How to Upload an Excelfile Into R

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (Y'all tin study consequence about the content on this page here)


Want to share your content on R-bloggers? click here if you take a weblog, or here if you don't.

  • Introduction
  • Transform an Excel file to a CSV file
  • R working directory
    • Get working directory
    • Set working directory
      • User-friendly method
      • Via the console
      • Via the text editor
  • Import your dataset
    • User-friendly fashion
    • Via the text editor
  • Import SPSS (.sav) files

Introduction

Every bit we have seen in this article on how to install R and RStudio, R is useful for many kind of computational tasks and statistical analyses. All the same, it would not be so powerful and useful without the possibility to import datasets into R. As you will about likely utilize R with your own data, being able to import information technology into R is crucial for whatsoever user.

In this article I nowadays two dissimilar ways to import an Excel file; (i) via the text editor and (two) in a more "user-friendly" way. I also discuss about the main advantages and inconvenients of both methods. Note that:

  • How to import a dataset often depends on the format of the file (Excel, CSV, text, SPSS, Stata, etc.). I focus here but on Excel files as it is the well-nigh mutual type of file for a dataset
  • There are several other ways to import an Excel file (probably even some I am non aware of), but I present the 2 about uncomplicated yet robust ways to import such files
  • No matter what type of file and how y'all import it, there is one gold standard regarding how datasets are structured: columns correspond to variables, rows correspond to observations (in the broad sense of the term) and each value must have its own cell:

Structure of a dataset. Source: R for Data Science by Hadley Wickham & Garrett Grolemund

Construction of a dataset. Source: R for Data Science by Hadley Wickham & Garrett Grolemund

Transform an Excel file to a CSV file

Before dealing with the importation, the first matter is to change the format of your Excel file to a CSV format. CSV format is the standard when working with datasets and programming languages equally information technology is a more robust format compared to Excel. If your file is already in the CSV format (with the extension .csv), yous can skip this section. If the file is not in the CSV format (for example the extension is .xlsx) you can easily transform it to CSV by following these steps:

  1. Open your Excel file
  2. Click on File > Salvage as
  3. Choose the format .csv
  4. Click on Save

Check that your file finishes with the extension .csv. If that is the case, your file is now prepare to be imported. Merely first, let me introduce an important concept when importing datasets into RStudio, the working directory.

R working directory

Although programming languages may be very powerful, it oftentimes needs our help and importing a dataset is not an exception. Indeed, earlier importing your information, you must tell RStudio where your file is located (so let RStudio know in which folder to look for your dataset). But before this, let me introduce the working directory. The working directory is the location (in your figurer) of where RStudio is currently working (in fact RStudio is non working across your entire calculator; it is working inside i binder of your computer). Apropos this working directory, there are two functions that we will need:

  1. getwd() (wd stands for working directory)
  2. setwd()

Get working directory

In virtually cases, when you open RStudio, the working directory (so where it is currently working) is different than where your dataset is located. To know what is the working directory RStudio is currently using, run getwd(). On MacOS, this function will most likely render a location such as "/Users/yourname/", while on Windows it will well-nigh likely render "c:/Documents/". Do not worry if your working directory is dissimilar, the most important is to gear up the working directory correctly (so where your file is located) and non where it is now.

Set working directory

Every bit mentioned earlier, your dataset is most likely located in a different location than your working directory. Without any action from you, RStudio volition never be able to import your file as it is non looking in the right binder (you will come across the following error in the console: cannot open up file 'data.csv': No such file or directory). Now, in order to specify the correct location of your file (that is, to tell RStudio in which folder it should await for your dataset), y'all have three options:

  1. the convenient method
  2. via the panel
  3. via the text editor (come across below why it is my preferred selection)

User-friendly method

To ready the correct binder, so to set the working directory equal to the folder where your file is located, follow these steps:

  1. In the lower right pane of RStudio, click on the tab "Files"
  2. Click on "Dwelling" adjacent to the firm icon
  3. Become to the folder where your dataset is located
  4. Click on "More than"
  5. Click on "Set Every bit Working Directory

Set working directory in RStudio (user-friendly method)

Fix working directory in RStudio (user-friendly method)

Alternativaly, you lot can likewise fix the working directory by clicking on Session > Set Working Directory > Choose Directory…

Set working directory in RStudio (user-friendly method)

Set working directory in RStudio (user-friendly method)

As you lot tin run across in the console, any of the two methods will really execute the code setwd() with the path to the folder you lot specified. And so past clicking on the buttons you really asked RStudio to write a line of lawmaking for y'all. This method has the advantage that you do non need to remember the code and that you volition not make a mistake in the proper name of the path to your folder. The disadvantage is that if you lot leave RStudio and open up it again later, you volition have to specify the working directory again as RStudio did not salve your deportment via the buttons.

Via the console

You can specifiy the working directory by running setwd(path/to/binder) direct in the console, with path/to/binder beingness the path to the folder containing your dataset. However, you will demand to run the command over again when reoping RStudio.

Via the text editor

This method is actually a combination of the two in a higher place:

  1. Set up the working directory by post-obit the verbal same steps than for the convenient method (via the buttons)
  2. Copy the code executed in the console and paste it in the text editor (i.e., your script)

I recommend this method for several reasons. First, you do not need to remember the setwd() office. Second, you lot will not make typos in the path of your folder (path which tin sometimes be quite long if you accept folders inside folders). Third, when saving your script (which I assume you do otherwise you would lose all your work), you also save the actions you just made via the buttons. So when y'all reopen your script in the hereafter, no affair what is the electric current directory, by executing your script (which now include the line of code for setting the working directory), you will at the same time specify the working directory you selected for this project.

Import your dataset

Now that you lot have tranformed your Excel file into a CSV file and you have specified the binder containing your data by setting the working directory, yous are now ready to actually import your dataset. Remind that there are a two methods to import a file:

  1. in a convenient way
  2. via the text editor (see besides below why it is my preferred option)

No thing which method yous choose, it is a good practice to outset open up your file in TextEdit (on Mac) or Notepad (on Windows) in order to see the raw data. If you lot open the file in Excel you lot will see the data already formatted and thus miss some important information needed for the importation. Below an example of raw information:

Example of raw data

Instance of raw data

At that place are a few things we need to look for in lodge to properly import our dataset:

  • Are the variables names present?
  • How are the values seperated? Comma, semicolon, whitespace, tab?
  • Is the decimal a indicate or a comma?
  • How are specified missing values? Empty cells, NA, nothing, O, other?

User-friendly way

As shown below, only click on the file > Import Dataset…

Import dataset in RStudio

Import dataset in RStudio

A window which looks like this will open:

Import window in RStudio

Import window in RStudio

From this window, you can take a preview of your information, and more importantly, check whether your data seems to have been imported correctly. If your information have been correctly imported, you tin can click on "Import". If this is non the case, y'all can change the import options at the bottom of the window (beneath the data preview) respective to the information yous gathered when looking at the raw data. Beneath, the import options you will about likely use:

  • Name: set the name of your data set (default is the name of the file). Avoid special characters and long names (as you volition have to type the name of your dataset several times). I personnaly rename my datasets with a generic name such every bit "dat", others use "df" (for dataframe), "data", or even "my_data". You could apply more explicit names such every bit "tennis_data" if you are using information on tennis matches for example. However, the chief drawback with using specific names for datasets is that if, for instance, y'all desire to reuse the code you created while analysing tennis data on other datasets, y'all will need to edit your code by replacing all occurences of "tennis_data" by the name of your new dataset
  • Skip: specify the number of top rows you want to skip (default is 0). Most of the time, 0 is fine. However, if your file contains some blank rows at the pinnacle (or information y'all want to condone), set the number of rows to skip
  • First Row as Names: specify whether the variables names are present or not (default is that variables names are nowadays)
  • Delimiter: the character which separate the values. From our raw data to a higher place, y'all can see that the delimiter is a comma (","). Modify it to semicolon if your values are separated by ";"
  • NA: how missing values are specified (default is empty cells). From our raw data to a higher place, you tin can come across that missing values are just empty cells, so leave NA to default or change it to "empty". Alter this pick if missing values in your raw data are coded equally "NA" or "0" (tip: practise not lawmaking yourself missing values as "0", otherwise you will not be able to distinguish the true zero values and the missing values)

After changing the import options corresponding to your data, click on "Import". You should now see your dataset in a new window and from there you can start analyzing your data.

This convenient method has the advantage that you do not need to call up the lawmaking (see the adjacent section for the entire code). However, the primary drawback is that your import options will non exist saved for a future usage so you volition demand to import your dataset manually each time y'all open up RStudio.

Via the text editor

Similarily to setting the working directory, I also recommend using the text editor instead of the user-friendly method for the simple reason that y'all can salvage your import options when using the text editor (and not when using the convenient method). Saving your import options in your script (thanks to a line of code) allows you to speedily import your dataset the verbal aforementioned way without having to repeat all the necessary steps everytime you import your dataset. The control to import a CSV file is read.csv() (or read.csv2() which is equivalent just with other default import options). Here is an example with the same file than in the convenient method:

dat <- read.csv(   file = "data.csv",   header = TRUE,   sep = ",",   dec = "." )
  • dat <-: proper noun of the dataset in RStudio. This means that later on importation, I volition demand to refer to the dataset by calling dat
  • file =: proper name of the file in the working directory. Do not forget "" around the proper name, the extension .csv at the end and the fact that RStudio is case sensitive ("Data.csv" will give an error) and space sensitive inside "" ("data .csv" will as well throw an error). In our case the file is named "data.csv" then file = "data.csv"
  • header =: are variables names present? The default is Truthful, alter it to FALSE if it is not the instance in your dataset (TRUE and FALSE are always in capital letters, true will not work!)
  • sep =: separator. Equivalent to delimiter in the user-friendly method. Do not forget the "". In our dataset the separator of the values is a comma so sep = ","
  • dec =: decimal. Do not forget the "". In our dataset, the decimal for the numeric values is a point, so december = "."
  • I do non write that missing values are coded as empty cells in my dataset considering it is the default
  • Final but not to the lowest degree, do not forget that the arguments are separated by a comma

Other arguments be, run ?read.csv to run into all of them.

After the importation you can check whether your data have been correctly imported by running View(dat) where dat is the name you lot chose for your data. A window, like than for the user-friendly method, will brandish your data. Alternatively you can also run caput(dat) to see the first 6 rows and check that it corresponds to your Excel file. If something is non correct, edit the import options and bank check once again. If your dataset has been correctly imported, you lot tin now start analyzing your data. Run across other articles on R if you want to learn how.

The advantage of importing your dataset directly via the code in the text editor is that your import options will exist saved for a future usage, preventing you lot from importing it manually every time yous open your script. Y'all will, however, need to remember the function read.csv() (not the arguments since you can always check them in the assistance documentation).

Import SPSS (.sav) files

Merely Excel files are covered in details here. However, SPSS files (.sav) can as well be read in R past using the following command:

library(foreign) dat <- read.spss(   file = "filename.sav",   employ.value.labels = TRUE,   to.data.frame = TRUE )

The read.spss() office outputs a data table which retrieves all the characteristics of the .sav file, including the names given for the different levels of the categorical variables and the characteristics of the variables. If you lot need more than information about this control, see the help documentation (library(strange) and so ?read.spss).

Thanks for reading. I hope this article helped you to import an Excel file in RStudio. If your dataset is correctly imported, larn how to manipule information technology. Every bit always, if you find a mistake/issues or if you have any questions practise not hesitate to let me know in the comment department beneath, raise an issue on GitHub or contact me. Get updates every time a new article is published past subscribing to this web log.

ingleares1988.blogspot.com

Source: https://www.r-bloggers.com/2019/12/how-to-import-an-excel-file-in-rstudio/