Skip to main content
IT Service Status
IT Service Status

2024-2025 Data Science, Statistics, and Visualization Services Annual Report

During the 2024–2025 academic year, the Data Science, Statistics, and Visualization (DSSV) team within Northwestern IT Research Computing and Data Services strengthened its role as a trusted partner for researchers across all Northwestern schools and colleges. Through workshops, consultations, and collaborative projects, DSSV empowered the Northwestern community to tackle complex data challenges and explore emerging technologies.

This year, we taught 237 hours of workshops —up 10% from last year—featuring our summer Code Academy series, advanced R and Python tracks, and new sessions on AI models and workflows. Beyond training, DSSV provided personalized consultations to 236 Northwestern researchers from every Northwestern school and college and hosted 47 researchers in Bring Your Own Data Working Groups, which saw a 20% increase in attendance this year and fostered unique cross-disciplinary collaborations. DSSV partnered on 17 faculty-led projects, with about a quarter incorporating AI components. Overall, collaboration between Northwestern researchers and DSSV continues to deepen; more than half of participants in our service areas seek us out for additional support after their first engagement.

Workshops

Workshops provide opportunities to learn new skills and explore tools and analysis methods via high-quality training at no or low cost to the researchers.

The DSSV team offered workshops on Zoom, the Evanston campus, and the Chicago campus throughout the year. In addition to our popular multi-day, in-person Code Academy summer series in R and Python fundamentals this year, we introduced new advanced topics in our Next Steps in R and Next Steps in Python virtual series. Responding to a growing need among researchers, this year we also offered new workshops covering AI models and workflows.

237
Hours of Workshops Offered (+10% from FY24)
976
Individuals Registered for One or More Workshops
56%
of Workshop Participants Registered for Multiple Workshops
100%
of Workshop Attendees Said They Learned Something New
4 out of 5
Workshops with Highest Attendance were AI-related

Role of Workshop Registrants

Role of workshop registrants

Consultations

This year, we offered Bring Your Own Data (BYOD) Working Groups in Bioinformatics and Genomics, Social Sciences, and Modeling. BYOD groups bring together a small group of researchers working on related data-intensive projects weekly over a quarter. These groups supported 47 researchers, a 20% increase over last year, and fostered unique cross-disciplinary collaborations.

In FY25 we supported researchers in every Northwestern school and college through our one-on-one consultation service that helps researchers troubleshoot issues, explore new analysis approaches, get assistance with data manipulation, interpret results, and create visualizations. Consultations cover a wide range of topics including statistics, data science, AI, data visualization, and software engineering. There has been an increasing number of AI-related consultations starting in FY24 and continuing into FY25. AI-related consult requests increased by over 200% from FY23 to FY25 and now comprise nearly 10% of all DSSV consults, echoing the rapidly growing demand for AI-related research support.

DSSV consultants supported researchers with consultations on a wide range of topics, including:

  • Using machine learning methods to analyze animal behavior and neuron activity
  • Extracting information from articles using Named Entity Recognition
  • Finetuning LLMs from HuggingFace for information extraction tasks
  • Visualizing qualitative study relationship outcomes and mediators
  • Improving the clarity of data visualizations for publication
  • Replicating research results to validate workflows
  • Reducing bottlenecks in R and Python code
  • Clustering large datasets using k-means and DTW
  • Creating bulk RNA heatmaps and annotating cell counts on slides
  • Organizing longitudinal participant data with variable measures
  • Parallelizing PyTorch code for training large models
  • Extracting nucleotide sequences from VCF to BED files
  • Optimizing Python code for dimensionality reduction on CPU and GPU
  • Conducting repeated measures analysis for indistinguishable dyads in R
  • Separating highly correlated independent variables in regression analysis
  • Applying Bayesian hierarchical modeling in R using Stan
  • Performing sentiment analyses using images and text
  • Selecting appropriate statistical tests for analyzing experimental data
  • Quantifying uncertainty in summary statistics
  • Handling highly correlated variables in statistical analyses
  • Correcting for missing data in large datasets
301
One-on-one Researcher Consultations
236
Distinct Researchers Supported through Consultations
100
Distinct Researchers Supported through Statistical Consulting
35%
of Consultation Requests were from Returning Consultation Clients
47
BYOD Working Group Participants (+20% from FY24)

Role of Researchers Requesting Consultations

Role of researchers requesting consultations

Collaborative Project Support

Collaborative projects bring together DSSV staff’s specialized skills with faculty researchers from many different domains to undertake impactful research. The team continued several collaborations from previous years and started new ones. A few projects are highlighted below. You can read more about the DSSV team’s projects in the Project Portfolio.

In addition to projects in data collection, analysis, and visualization, we assisted with grant proposal preparation and exploratory investigations to determine proof of concept, allowing us to assist faculty beginning at the earliest stages of a project. Several researchers also requested help incorporating AI methods into their projects.

17
Faculty-led Projects Active during 2024-2025
27%
of Projects had a Substantial AI Component

Project Highlights

Updating the PSID Clan Formation Codebase

Our team updated and extended Doron Shiffer-Sebba's (Sociology) codebase that analyzes family structures using data from the Panel Study of Income Dynamics (PSID), the longest running longitudinal household study in the world. The new code is more modular, reproducible, scalable, efficient, and user-friendly.

Quantifying Contempt in Court Cases

We partnered with the SCALES research team (David Schwartz and Zachary Clopton, Law) to build data processing pipelines and extract data from a unique set of full-text federal court documents. The project will enable the research team to better understand federal court operations and practices.

Astrophysics Visualizations

DSSV staff created multiple graphics for CIERA researchers, including a diagram for a journal article on Mars’ climate in collaboration with Hooman Mohseni (McCormick), a schematic diagram of a telescope and related instruments for a grant proposal submitted by Jason Wang (Weinberg), and an update to the Masses in the Stellar Graveyard interactive online visualization to include a new LIGO data release.

FOIA Dashboard

Building on a previous collaboration with Jacqueline Stevens’s (Political Science) research group, our team further developed an online interactive dashboard to explore FOIA requests. We optimized the data extraction workflow, expanded the amount of government data available to the user tenfold, refactored and simplified the codebase, improved the dashboard’s usability, and developed detailed documentation to support long-term maintenance by Professor Stevens’s group members.

Global Legal Education Trends

In collaboration with Carole Silver (Law), we cleaned and analyzed data on visa approvals between 2010 and 2020 as part of a study on international students pursuing US law degrees. This ongoing project has already led to a journal article: Carole Silver and Ritika Giri, June 2025, "The Importance of Commonality and Difference in Global Legal Education Communities," European Journal of Legal Education, Vol. 6, No. 1, 2025, 59–81.

Automated Collection of Counselor Educator Job Postings

With Gideon Litherland (Counseling), our team developed an automated pipeline for the collection, cloud-based storage, and initial analysis of data from multiple online job posting resources. To date, the system has collected over 1,500 postings and allows researchers to independently analyze the data and update their database with minimal technical effort, removing the barriers that previously made this process cumbersome.

Documentation of AI Workflows to Enable Cutting-Edge Research

DSSV staff created tutorials on how to use Large Language Models (LLMs) on Quest (Northwestern's high-performance computing cluster) and how to access LLMs and other AI tools on Microsoft Azure. This documentation enables a broader range of Northwestern researchers to incorporate AI models into their research.

Student Consult Program

Each year, talented Northwestern graduate and undergraduate students apply to our highly competitive Student Data Science Consultant Program. Students come into the program with experience in data science, statistics, or visualization and develop their professional and advanced technical skills through hands-on work with other Northwestern researchers.

51%
of Research Consultations Were Conducted by Student Consultants

2024-2025 Data Science, Statistics, and Visualization Student Consultants

  • Trudy Adjei, Biomedical Engineering
  • Saki Amagai, Health and Biomedical Informatics
  • Tuba Dolar, Mechanical Engineering
  • Jonathan Doriscar, Social Psychology
  • Daniel Encinas Zevallos, Political Science
  • Arne Holverscheid, Political Science
  • Yangyang Li, Driskill Graduate Program in Life Sciences
  • Nick Markov, Bioinformatics and Computational Biology
  • Julie Anh Nguyen, Applied Math
  • Yaelle Pierre, Data Science (Undergraduate)
  • Maalavika Pillai, Interdisciplinary Biological Sciences
  • Edwin Pokisa, Data Science (Undergraduate)
  • Anthony Pulvino, Interdisciplinary Biological Sciences
  • Elizabeth Teng, Astronomy
  • Samvardhan Vishnoi, Physics
  • Eileen Zheng Wu, Psychology

2024-2025 Data Science, Statistics, and Visualization Staff Members

Team members and their positions during the 2024-2025 academic year.

  • Christina Maimone, Associate Director, Research Data Services
  • Colby Witherup Wood, Manager, Data Science Services
  • David Nichols, Lead Statistician
  • Aaron Geller, Lead Data Scientist
  • efrén cruz cortés, Data Scientist
  • Ritika Giri, Data Scientist
  • John Lee, Data Scientist
  • Emilio Lehoucq, Data Scientist
  • Jillian Whitton, Statistician

Impacts from Data Science, Statistics, and Visualization Staff beyond Research Computing and Data Sciences

DSSV staff contributed to the following communities during the 2024-2025 academic year.

At Northwestern:

  • Northwestern Information Technology (IT) Services & Support Engagement Committee
  • Northwestern IT Together Advisory Board
  • Northwestern IT Managers and Team Leads (MTL) group Steering Committee
  • Northwestern IT Generative AI Community of Practice Steering Committee
  • Northwestern IT Employee Excellence Award Committee
  • Computation and Data Exchange (CoDEx) Symposium Session Chairs

Beyond Northwestern:

  • Big Ten Women in Technology
  • Big Ten Academic Alliance Love Data Week
  • American Statistical Association LGBTQ+ Advocacy Committee
  • Campus Research Computing Consortium (CaRCC) Research Computing and Data (RCD) Professionalization Working Group
  • US Research Software Engineer Association
  • Journal of Open Source Software
  • Quantitative Staff Network

DSSV staff also presented at, or contributed as a volunteer or reviewer to, the following conferences:

  • US Research Software Engineer Association Conference (USRSE’25)
  • International Conference on Computational Social Science (IC2S2)
  • Women in Statistics and Data Science
  • Practice & Experience in Advanced Research Computing (PEARC)
  • 2024 International Day of Women in Statistics
  • Joint Statistical Meetings of the American Statistical Association