Skip to Main Content

R and RStudio in Digital Scholarship

This guide will provide an introduction to using R and RStudio in research and instruction and what resources are available in the Freedman Center,.

The Tidyverse

The tidyverse is a collection of R packages designed for data science. These packages share a common philosophy, grammar, and data structures, making it easier to learn and use them together. The tidyverse is especially known for making data manipulation, visualization, and analysis more intuitive and consistent.

The tidyverse was created to make data science in R easier, more consistent, and more intuitive, especially for people working with real-world data. Base R has many powerful functions, but they often have inconsistent syntax and naming conventions, which can be confusing for beginners. The tidyverse addresses this by emphasizing clear, readable code that follows a logical, step-by-step workflow, often using the pipe operator (%>% or |>). It also promotes the concept of “tidy data,” where each variable is a column and each observation is a row—making data analysis more predictable and efficient. The tidyverse consists of modular packages that each do one thing well but work seamlessly together, enabling a smooth transition from data import to visualization and reporting. Created by Hadley Wickham and others at RStudio in 2016, the tidyverse was also designed to support teaching and learning, offering a consistent and beginner-friendly approach to modern data science in R.

Core Features of the Tidyverse:

  • Tidy data principles: Each variable is a column, each observation is a row, and each type of observational unit is a table.

  • Piping (%>%): Used to write code in a clear, readable, step-by-step style.

  • Consistent syntax and function naming across packages.

Key Packages in the Tidyverse:

These packages are all loaded when you run library(tidyverse).

  • ggplot2 – for data visualization

  • dplyr – for data manipulation (e.g., filter, select, mutate, summarize)

  • tidyr – for tidying and reshaping data (e.g., pivoting, separating columns)

  • readr – for reading rectangular data (like CSV files)

  • tibble – a modern version of data frames

  • purrr – for functional programming and iteration

  • stringr – for string (text) processing

  • forcats – for working with categorical variables (factors)