Skip to main content

Data Science, Statistics and Visualization Projects

Project Collaboration

The Data Science, Statistics, and Visualization team supports various projects across disciplines. Read about some of our past projects below. To learn more about working with data scientists, statisticians, or visualization specialists to explore new approaches or create research products, please visit our Collaborative Projects page for more information.

Flow Cytometry Analysis Software: RainFlow

RCDS Consultant: Ritika Giri
Collaborators: Luís Amaral, Helio Tejedor

Typical flow cytometry experiments can simultaneously measure the expression of 25-40 proteins in millions of single cells. However, the lack of reproducible analysis techniques hinders deriving actionable insights from this data. RainFlow is an interactive application that automates flow cytometry data analysis for increased reproducibility. It leverages a commercially available reagent - rainbow beads - to establish ground truth signals. Broad spectrum fluorescence beads are mixed with sample cells to create an internal control for each sample. Subsequently, bead signal properties, such as drift, stretch, and dispersion, can be used to identify poor-quality samples and align signals from samples collected across several days. This improves the performance of automated cell-type analysis algorithms as observed by merging homogenous cell populations, which were artificially split due to technical artifacts, while simultaneously revealing true biological variation within normal and cancer cells. Algorithms and models are being incorporated into a Python package and Apple App Store application.

test

Firefly

RCDS Consultant: Aaron Geller
Collaborators: Alex Gurvich, Claude-André Faucher-Giguère

Firefly is an interactive particle visualization application that enables exploration of very large multi-dimensional datasets directly in a web browser or with Python. Beyond simply plotting the locations of the particles, Firefly allows users to “fly” through a 3D dataset, e.g., of particle positions, and visualize any other attributes of the particles through color, size, and/or vectors. This allows users to identify regions of interest for further analysis later, something that Firefly makes easy through integration with Python. Read more about Firefly: http://firefly.rcs.northwestern.edu/

P-RIFTEHR: Python code for inferring relationships from EHR data

RCDS Consultants: Colby Witherup Wood, Dan Turner
Researchers and Collaborators: Amy Krefman, Farhad Ghamsari, Alice Lu, Martin Borsje, Lucia Petito, Fernanda Polubriaginof, Daniel Schneider, Faraz Ahmad, Norrina Allen

Electronic health record (EHR) data are a valuable resource for population health research but lack critical information such as relationships between individuals. Emergency contacts in EHRs can be used to link family members, creating a population that is more representative of a community than traditional family cohorts. We revised a published algorithm: relationship inference from the electronic health record (RIFTEHR). Our version, Pythonic RIFTEHR (P-RIFTEHR), identifies a patient's emergency contacts, matches them to existing patients (when available) using network graphs, checks for conflicts, and infers new relationships. Research Computing and Data Services was able to speed up the code by a factor of 60, making it feasible to run on millions of records in the Northwestern Medicine Electronic Data Warehouse.

blur image background of corridor in hospital or clinic image

Text Analysis for Medical Education

RCDS Consultant: Christina Maimone
Researchers: Celia O’Brien, Brigid Dolan, Marianne Green, Sandra Sanguino, Patricia Garcia

Provided analytical support to the Feinberg School of Medicine Augusta Weber Office of Medical Education on a Stemmler Fund supported project to incorporate natural language processing methods in the student assessment process. The project developed a predictive model to identify students who may need additional support in developing their skills to meet competency standards and visualizations of students’ assessment portfolios to support faculty reviewers. The analysis included checks for bias across demographic groups to ensure equitable predictions for all students.

Assessment of STEM Teaching MOOC

RCDS Consultants: Christina Maimone, Matt Ford
Researchers: Bennett Goldberg

Led data analysis efforts to evaluate the effectiveness of a set of related MOOCs aimed at developing inclusive teaching practices for STEM faculty. Extracted user activity data from multiple online course platforms and combined data into unified data sets with standard metrics. Standardized survey response data across multiple survey platforms and survey versions. Developed summary measures of course engagement and learner experiences. Created data visualizations for publication and exploratory analysis. Produced deidentified public datasets and analysis code for publication.

Racial Disparities in Police Arrests Map

RCDS Consultants: Dan Turner, Austin Alleman
Researcher: Beth Redbird

Created an interactive map and supporting data visualizations using R Shiny and Leaflet to allow researchers and the public explore new measures of differential arrest rates among racial groups using data from the Federal Bureau of Investigation's Uniform Crime Reporting (UCR) Program. The application allows users to select a geographical region of interest and view trends in arrest measures over time for chosen groups.

The Global Rules of Art: Data Visualizations

RCDS Consultant: Christina Maimone
Researcher: Larissa Buchholz

Developed several dozen data visualizations and tables for the book The Global Rules of Art to demonstrate trends in the international art market and career trajectories of artists featured through case studies. The visualizations helped make unique data sets accessible to an audience unaccustomed to quantitative analysis.

Diverse cultures, international communication concept. Human silhouette with speech bubbles.

Inclusive STEM Teaching Text Analysis

RCDS Consultant: Aaron Geller
Researcher: Bennett Goldberg

Explored natural language processing techniques to identify themes in over 200,000 short-answer survey responses about participants’ experiences in a professional development program for inclusive STEM teaching practices. Compared results of NLP models to themes identified through qualitative analysis by the research team.

Masses in the Stellar Graveyard

RCDS Consultants: Frank Elavsky, Aaron Geller
Researchers: Vicky Kalogera, CIERA, LIGO Collaboration

Developed an interactive visualization of gravitational wave data from the LIGO Collaboration demonstrating the merging of neutron stars and black holes into larger black holes. The visualization provides detailed data on the observations, including links to source data. Related visualizations appear in press releases and textbooks.

CHiMaD System Design Toolbox

RCDS Consultant: Aaron Geller
Researchers: Begum Gulsoy, Clay Houser, Jonathan Emery, CHiMaD

The CHiMaD System Design Toolbox is a website developed to help users create, discuss, and modify material system charts. It builds on CHiMaD Materials Design Training workshops and applies the Materials Design philosophy and framework taught therein. Research Computing and Data Services developed the entire site and individual data visualizations used as part of the material design process.

“LinkEx” Linkage Analysis Tool

RCDS Consultant: Aaron Geller
Researchers: Claudia Haase, Tabea Meier

Created an interactive data exploration tool for the Life-Span Development Lab to visualize emotional and physiological measurement correlations across pairs of people interacting during experiments. The tool provides a code-free interface for researchers to upload and analyze data for correlation analysis between pairs of participants, generate interactive figures, customize and download static figures, and analyze and download the results for a larger dataset with many pairs of participants in one batch.

Hands, stethoscope and doctor with child for heart healthcare, breathing or check lungs for wellness in hospital

Pediatric Organ Dysfunction

RCDS Consultant: Aaron Geller
Researcher: L. Nelson Sanchez-Pinto

Designed interactive data visualizations to support ongoing research and upcoming publications on pediatric organ dysfunction using R Shiny. The application allows users to choose multiple visualization types to facilitate alternative data views.

Healthcare Utilization by MS Patients

RCDS Consultants: Christina Maimone, Jose Sotelo
Researcher: Dominique Kinnett-Hopkins

Exploring patterns of use of the healthcare system in Chicago by patients with multiple sclerosis using electronic health record data from CAPriCORN and the Northwestern EDW. The project examines differences in rates of utilization of emergency rooms based on the socioeconomic status of patients’ neighborhoods, their race, their insurance status, and their history of neurological care.

Hospital emergency room sign
Baby girl reading book with mom stock photo

Infant and Child Development Center Participant Recruitment and Data Integration

RCDS Consultants: Colby Witherup Wood, Christina Maimone
Researchers: Infant and Child Development Center

Research Computing and Data Services has worked with the ICDC on several projects to improve data workflows, move experiments online, and integrate multiple data platforms. The work included designing a new participant tracking database and migrating the data, customizing Qualtrics surveys to meet experimental criteria, and writing scripts to match participants between databases.

Paris Review Author Interviews

RDCS Consultant: Colby Witherup Wood
Researcher: Sarah Fay

Investigated gender differences in 60 years of Paris Review interviews with authors. Text analysis revealed differences in word choices made by both the interviewer and the author between male and female authors.

the words WORDS with old typewriter keys macro

EarlyPrint

RCDS Consultant: Philip Burns
Researcher: Martin Mueller

Supported a Mellon Foundation funded project to transcribe, annotate, and make searchable printed English-language documents from 1473 through the early 1700s. Provided support for multiple aspects of the project, including the natural language processing workflow and indexing documents for search.

Carceral Policy Reform Data Collection Tool

RCDS Consultants: Christina Maimone, Matthew Rich
Researcher: Heather Schoenfeld

Developed a custom tool and database to facilitate the collection of documents related to carceral reform, identify relevant actors, and document linkages between research observations. The web application helped ensure consistency among multiple qualitative researchers, allowed researchers to navigate between records in multiple ways, and supported uploading documents to shared cloud-based storage.

navigating legal and regulatory landscapes, concept of Data Analytics and Risk Analysis, created with Generative AI technology
Paper conversion graph

Paper Conservation Analysis

RCDS Consultant: Haley Carter
Researcher: Karissa Muratore

Designed statistical analysis approach for assessment of paper conservation materials and techniques. Wrote code to perform the analysis and created visualizations of the results. The collaboration with Research Computing and Data Services was essential for completing the analysis of a large experiment for the field and making the results accessible to the library conservation audience.

Publication and Press Release Images

RCDS Consultant: Aaron Geller
Researchers: CIERA

Developed figures for refereed journal publications and press releases in collaboration with Northwestern researchers. Figures range from strict representations of real data to artistic interpretations of scientific concepts.

Example figures for refereed journal articles:

Example press release images:

Artist’s impression of GRB 211211A. The kilonova and gamma-ray burst is on the right
Capitol dome building exterior, Washington DC, USA. Concept of networking and establishing new people connections

Congressional Newsletters

RCDS Consultant: Christina Maimone
Researcher: Laurel Harbridge-Yong

Extracted email messages sent by members of Congress from a Gmail inbox, extracted text and metadata, and built a structured database to facilitate analysis of the data. Developed classification rules to identify which messages were relevant to the research topic to reduce the number of documents that needed to be manually reviewed by the research team. Created a custom data review and qualitative coding tool to create a more efficient and reliable workflow for researchers to manually review and extract data from emails of interest. Reduced the time necessary to complete the research significantly, making the project feasible.