Data Visualization

Getting started in R: First steps

R Commands

R is a command-driven program based on the programming languages C and S, which makes it very flexible. You can either use the interactive command interface or write scripts. R is based on the concept of functions. Functions are a fundamental building block of R--in the simplest sense, functions take an input, do something to it, and then give an output.

Functions are always followed by parenthesis (), and your arguments go inside the parenthesis. The arguments are the specific instructions telling R what to do, including the input. For example, running hist(mydata) will create a histogram of the object "mydata". The function hist() tells R that you want to make a histogram, and the argumentmydata inside the parenthesis tells R what to make a histogram of. You can add further instructions to a command, such as specifying a subset of the data, changing the appearance of the histogram, and much more.

Useful R Commands

  • writeRDS/readRDS: saving very large datasets as R objects instead of CVS objects is extremely useful if you are using R to edit a dataset that would take a long time to import and export each time.
  • if (FALSE) { some code }
  • Sys.time(): to print the runtime of some code: 
                   start_time <- Sys.time()
                   Some code
                   end_time <- Sys.time()
                   end_time() - start_time
  • Sys.info()["nodename"]
  • rgb(red, green, blue, alpha): color function, where alpha sets opacity 

Click here for an overview of functions in R.

Click here for a common R function cheat-sheet.

 

Set Your Working Directory

R reads and saves data from the working directory (i.e., folder). The default working directory is usually the Documents folder, and you'll usually want to change it to where you keep your project data.

To find out the current working directory, type:  getwd()

To set the working directory, type:  setwd("pathname") where pathname is the folder on your computer or you can use the menus or navigate the directories in RStudio

On a Mac, this would look like setwd("/Users/username/Documents/myproject")
On a PC, this would look like setwd("C:/Users/username/Documents/myproject")
                                       or setwd("C:\\Users\\username\\Documents\\myproject")
Note: On PCs, you should use forward slashes or double backslashes for the path rather than backslashes (the Windows default). This is because backslashes are used in other ways in R.

Once you set the working directory, you can refer to a file in the working directory using just its name.
Rather than having to type mydata <- importcsv(“C:/Users/username/Documents/myproject/nlsy.csv”)
you can just type                mydata <- importcsv(“nlsy.csv”)

R Workspace

This is the working environment inside of R, which includes any objects (matrices, vectors, lists, dataframes, arrays) that you have defined during your session. In RStudio, this is the Environemnt tab.

When you exit R, it will ask you if you want to save your workspace image--we highly recommend that you choose "no", or R will begin to run more slowly over time as the workspace fills up. Instead, use R scripts to regenerate objects when you begin a new session. If your work session created objects that are large or time-consuming to recreate, you can save the workspace as its own file or save the objects using the save function.

For more details about the R workspace and working directory: http://www.statmethods.net/interface/workspace.html

Data structures in R

The term 'data structures' refers to how R stores and retrieves data. R has a more flexible set of data structures, unlike traditional statistical software. These data structures include vectors, matrices, data frames, and lists.

For a very clear explanation of data structures in R, check out this guide (go to section 9.4.0.2 Basic data structures).

R scripts

In addition to writing commands interactively, you can also write scripts in R. This is an excellent way to make you work repeatable.

R scripts have the *.R extension. In R or RStudio, you can open script files and run either a portion of the script or the entire script. The results will be shown in the output window. To run R code from a script, highlight the commands you would like to run and press Ctrl+R (PC) or +Enter (Mac) on the keyboard. You can run an entire script using the source function, though in general it's usually better to highlight and run portions of a script.

Many of the examples in this guide include sample R code in a script file. You can download these scripts, open them in R, read the comments (indicated by a pound (#) sign), and try the commands for yourself. We suggest running one command at a time in order to understand what each line does.

Packages

R comes with a standard (base) package of functions, which contains functions that allow R to perform various tasks. These tasks include essential tasks such as importing/exporting data, getting summary statistics, creating tables, and running common statistical tests (regression, ANOVA, etc). These packages are part of the R source code, and are automatically included in any R installation.

Additional packages can also be loaded, adding further functionality to R. Since R is an open-source language, users regularly write and share packages. Find a list of contributed packages at the CRAN repository.

  • To see which packages you have installed: library()
  • To see which packages you have currently loaded:search()
  • To install a package:install.packages(packagename)
  • To load a package: library(packagename)

Click here for more information about installing and using contributed packages.

 

 

 

This guide is adapted with permission from Wellesley University Libraries

Last Updated: Sep 6, 2023 12:14 PM