Data Science Workshops & Tutorials

As part of my data science work at Northwestern University, I am teaching a series of programming workshops. Below you can find the materials for these workshops as freestanding tutorials. R code is in R notebooks and Python code is in Jupyter notebooks. Let me know if you run into any bugs!

  • Web Scraping in R

    How to download and process arbitrary data found on the internet. I cover structured data, like HTML tables, and unstructured data, like text and images.

  • Building a Shiny App

    Show interactive data visualizations of your data and models. Geared for researchers who want to share their novel datasets with the scientific world.

  • Text Analysis in R

    Working with text data in R has never been easier, thanks to modern tidyverse syntax. I overview the basics of how to get started in Natural Language Processing using R.

  • Natural Language Tool Kit

    NLTK is a ubiquitous package for doing Natural Language Processing in Python.

  • Journal-Ready Tables from R

    How to export publication-ready data tables and visualizations from R.

  • Parallelized Loops in R

    Some computational processes require loops, but that doesn’t mean your code has to run slowly.

Selected Collaborations

  • Racial Disparity in Arrests Map (2020)

    I was a co-creator of the interactive data visualization, built in R on the Shiny framework. The data comes from FBI records of nationwide arrests, reported by 13,917 police agencies, including 2,908 county and 11,009 municipal police, from 1999 through 2015. [Redbird, Beth, and Kat Albrecht. 2019. "Measuring Racial Disparities in Local and County Police Arrests." Working Paper, Institute for Policy Research, Northwestern University.]

  • Pythonic RIFTEHR (2022)

    I was a co-creator of an improved version of the algorithm that is 100% Python. RIFTEHR = Relationship Inference From The Electronic Health Records. This is an automated algorithm for identifying relatedness between patients in an institution's existing health records, basically allowing researchers to build inferred family trees. This is an important step in understanding health across generations using big data approaches. (in prep)