Find us on GitHub

A Data Carpentry Workshop

Cornell University

June 14-15, 2017

9:00 am - 5:00 pm

Instructors: Erika Mudrak, Lynn Johnson, David Kent, Stephen Parry

Helpers: Emily Davenport, Francoise Vermeylen, Michael Ko

General Information

Our workshop uses Data Carpentry lessons and is aimed at academic researchers in all fields and at all career stages. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

We will cover Data organization in spreadsheets and OpenRefine, Data analysis and visualization in R, R for Reproducible Scientific Analysis and Introduction to Python. Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

Who: The course is aimed at faculty, research staff, postdocs, graduate students, advanced undergraduates, and other researchers in any field. Priority will be given to people from Cornell Departments that support CSCU. See this page for a list of such departments.

Where: Albert R. Mann Library Room B30A, 237 Mann Drive, Cornell University. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating sytem (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Prerequisites: We especially encourage registration for those who may be less familiar with the above topics. To allow for coverage of more advanced R topics, we require that participants be familiar enough with R and RStudio to:

  • Be familiar with the R Studio interface (console, scripts, tabs for workspace, history, files, plots, packages, help)
  • Make a project in a new directory
  • load a data.frame via read.csv() or read.table()
  • how to access columns of a data.frame with $
  • know about assignment operators (<- or =) and comments #
  • explore a data.frame via head() str() summary() nrow() ncol() names() rownames() table() levels() mean() length() max() min()
  • modify data types with as.factor() relevel() as.numeric()
  • create vectors with c() or seq() and index them with [,] bracket notation
  • subset data with [,] bracket notion and logical vectors (==, !=, <, >, %in%) for conditions
  • simple plotting with plot() barplot() hist()
  • understand the concept of functions and their arguments
  • get help via ? or searching the help tab

If you have never used R or want a refresher, please prepare for the Data Carpentry Workshop by attending CSCU's free workshops:
Learn the above in Introductory Statistic Using R on June 12th
Practice the above in Intermediate Statistics Using R on June 13th.

Registration: We charge a $40 fee to help defray costs. Please register here.

Contact: Please mail for more information.

Preliminary Schedule


Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey

Day 1

Morning Data organization in spreadsheets and OpenRefine
Afternoon Data analysis and visualization in R

Day 2

Morning R for Reproducible Scientific Analysis
Afternoon Introduction to Python

We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Data for Open Refine Lesson \



Data organization in spreadsheets

Data cleaning with open Refine

R day 1

  • Manipulating and rearranging data in R with dplyr and tidyr
  • Visualizing data in R with ggplot
  • Script

R day 2



To participate in a Data Carpentry workshop, you will need working copies of the described software. Please make sure to install everything (or at least to download the installers) before the start of your workshop. Participants should bring and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop.

Please follow these Setup Instructions.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.