Data Science — Daniel Robert Turner

Data Science Workshops & Tutorials

As part of my data science work at Northwestern University, I am teaching a series of programming workshops. Below you can find the materials for these workshops as freestanding tutorials. R code is in R notebooks and Python code is in Jupyter notebooks. Let me know if you run into any bugs!

Web Scraping in R

How to download and process arbitrary data found on the internet. I cover structured data, like HTML tables, and unstructured data, like text and images.

Course materials
Building a Shiny App

Show interactive data visualizations of your data and models. Geared for researchers who want to share their novel datasets with the scientific world.

Course materials
Text Analysis in R

Working with text data in R has never been easier, thanks to modern tidyverse syntax. I overview the basics of how to get started in Natural Language Processing using R.

Course materials
Natural Language Tool Kit

NLTK is a ubiquitous package for doing Natural Language Processing in Python.

Course materials
Journal-Ready Tables from R

How to export publication-ready data tables and visualizations from R.

(coming late 2022)
Parallelized Loops in R

Some computational processes require loops, but that doesn’t mean your code has to run slowly.

(coming late 2022)

Selected Collaborations

Racial Disparity in Arrests Map (2020)

I was a co-creator of the interactive data visualization, built in R on the Shiny framework. The data comes from FBI records of nationwide arrests, reported by 13,917 police agencies, including 2,908 county and 11,009 municipal police, from 1999 through 2015. [Redbird, Beth, and Kat Albrecht. 2019. "Measuring Racial Disparities in Local and County Police Arrests." Working Paper, Institute for Policy Research, Northwestern University.]

Explore the map
Pythonic RIFTEHR (2022)

I was a co-creator of an improved version of the algorithm that is 100% Python. RIFTEHR = Relationship Inference From The Electronic Health Records. This is an automated algorithm for identifying relatedness between patients in an institution's existing health records, basically allowing researchers to build inferred family trees. This is an important step in understanding health across generations using big data approaches. (in prep)

Github repo

Data Science Workshops & Tutorials

Web Scraping in R

Building a Shiny App

Text Analysis in R

Natural Language Tool Kit

Journal-Ready Tables from R

Parallelized Loops in R

Racial Disparity in Arrests Map (2020)

Pythonic RIFTEHR (2022)

dturner@u.northwestern.edu

Elsewhere –