Skip to main content
IT Service Status
IT Service Status

2022-2023 Data Science, Statistics and Visualization Services Annual Report

During the 2022-2023 academic year, the Data Science, Statistics, and Visualization (DSSV) team within Northwestern IT Research Computing and Data Services supported the data science and visualization needs of Northwestern researchers in multiple ways:

  • Individual consultations help researchers troubleshoot issues, explore new analysis approaches, get assistance with data manipulation, interpret results, and create visualizations.
  • Workshops provide opportunities to learn new skills and explore tools and analysis methods via high-quality training at no cost to the researchers.
  • Bring Your Own Data (BYOD) working groups support researchers undertaking data-intensive work by connecting them to DSSV staff and other researchers across the university for accountability and advice.
  • Collaborative projects bring together DSSV staff’s specialized skills with researchers from many different domains to undertake impactful research.

Workshops

The DSSV team continued to offer virtual workshops throughout the academic year where researchers could participate synchronously or work through workshop materials and a recorded video later at their own pace. We were excited to return to an in-person format for our summer bootcamps where researchers spent multiple days learning introductory and intermediate R and Python skills on both the Evanston and Chicago campuses. The schedule included an expanded series of visualization topics across multiple languages and programs, covering Shiny, Plotly, Bokeh, Glue, matplotlib, ParaView, JavaScript, HTML and CSS basics, and an introduction to data visualization.

Workshops included customized sessions for Research Experiences for Undergraduates (REU) programs in CIERA and Quantitative Biology, and for research analysts from the Global Poverty Research Lab. We also worked with researchers in the Department of Preventive Medicine to facilitate discussion of ways to improve data visualizations.
104
Hours of Data Science and Visualization Workshops
1080
Workshop Attendees
770
Distinct Researchers Registered from over 150 Departments and Centers
3600
Hours of Researchers Attending Workshops Synchronously

Consultations

We support researchers both through one-on-one research consultations and through small, Bring Your Own Data (BYOD) working groups that meet weekly over a quarter. We saw an increase in

  • 245 One-on-one consultations with researchers from 11 of Northwestern’s schools and the Northwestern Prison Education Program, a 14% increase over 2021-2022
  • 42 BYOD working group participants across eight groups

DSSV consultants supported researchers with consultations on a wide range of topics, including:

  • Creating visualizations of RNAseq data
  • Improving the performance of a predictive model
  • Extracting text from PDFs for analysis
  • Analyzing videos of mice movements
  • Collecting data via an API
  • Developing mixed models for regression analysis of an experiment
  • Extracting keywords and phrases from text
  • Addressing missing values in a survey dataset
  • Organizing data analysis workflows and code files
  • Troubleshooting software installations
  • Writing regular expressions for data extraction
  • Speeding up inefficient code
  • Exploring strategies for analyzing rank data
  • Creating an analysis workflow that includes both R and Python
  • Using Gaussian mixture models
  • Fine-tuning an LLM for classification
  • Creating data tables for publication
  • Counting cells in microscope images
  • Merging multiple datasets
  • Collecting data via web scraping
  • Clustering points in a high-dimensional space
  • Parameterizing an analysis script
  • Incorporating survey weights into data analysis
  • Evaluating options for incorporating ChatGPT into a data collection process
  • Adding interactivity to a Qualtrics survey
  • Learning how to use git and GitHub
  • Conducting analysis of social networks
42
BYOD Working Group Participants
245
One-on-One Consultations with Researchers

 

Bar graph data science and visualization consultations

Research Collaboration Highlights

The team continued several collaborations from previous years and started new ones. A few projects are highlighted below. You can read more about the DSSV team’s projects in the Project Portfolio.

RainFlow

Collaborating with the Amaral Lab to develop flow cytometry analysis software that supports a reproducible analysis workflow and uses a new clustering algorithm to improve data analysis across multiple samples.

Social Justice News Nexus

A Data Science Student Consultant created matching algorithms to combine data on mental health calls to 911 and police arrests across multiple cities with Kari Lydersen from the Medill School of Journalism, Media, Integrated Marketing Communications.

Astrophysics Illustrations and Graphics

Created multiple graphics to accompany publications from CIERA researchers Claude-André Faucher-Giguère, Vicky Kalogera, Jillian Rastinejab, and Wen-fai Fong on galaxy evolution, the merging of black holes, and a kilonova.

FOIA Dashboard

A Data Science Student Consultant supported a research assistant for Jacqueline Stevens from Political Science in creating an interactive application to explore publicly available FOIA data.

Inclusive STEM Teaching Text Analysis

Explored natural language processing techniques to identify themes in over 200,000 short-answer survey responses about participants’ experiences in a professional development program for inclusive STEM teaching practices in a project with Bennett Goldberg.

Patient Symptom Tracking Visualizations

Generating a visualization of patients’ urological symptoms, both at the current clinic visit and over time, to aid in treatment decisions and evaluation of outcomes with James Griffith and his collaborators in the Feinberg School of Medicine.

External Impact

Team Members

Team members and their positions during the 2022-2023 academic year.

Staff

  • Christina Maimone: Associate Director, Research Data Services
  • Colby Witherup Wood: Lead Data Scientist
  • Aaron Geller: Senior Data Visualization Specialist
  • Ritika Giri: Data Scientist

Data Science Student Consultants

  • Haley Carter, Plant Biology and Conservation
  • Rahul Devathu, Neuroscience and Data Science
  • Daniel Encinas Zevallos, Political Science
  • Arne Holverscheid, Political Science
  • Benjamin Liu, Math/Statistics/MMSS
  • Ren Lopez, Materials Science and Engineering
  • Julianne Murphy, Health Sciences Integrated Program
  • Julie Anh Nguyen, Applied Math
  • Jose Sotelo, Cognitive Psychology
  • Carrie Stallings, Sociology
  • Dan Turner, Linguistics
  • Patrick Zacher, Cognitive Psychology