Skip to main content

Speakers at Northwestern Computational Research Day

The Northwestern Computational Research Day provides opportunities for University faculty, researchers, graduate students, and postdocs to discuss successful practices and challenges in research computing.

Keynote Presentations

Shane Larson, Associate Director of CIERA (Center for Interdisciplinary Exploration and Research in Astrophysics) and WCAS Research Associate Professor of Physics and Astronomy

At Home in a Storm of Stars: Observing, Simulating, and Pondering the Milky Way Galaxy—Louis Room 10:15 a.m. - 11:15 a.m.

Victoria Stodden, Associate Professor of Information Sciences, University of Illinois at Urbana

Reproducibility in Computational Research: Code, Data, Statistics, and Implementation—Louis Room 1:30 p.m. - 2:30 p.m.

Morning Sessions

11:30 a.m. - 12:15 p.m.

Learning Probabilistic Models for Graph Partitioning and Community
Detection–Northwestern Room A
Aravindan Vijayaraghavan, Assistant Professor, Electrical Engineering and Computer Science; Robert R. McCormick School of Engineering and Applied Science

Abstract

The Stochastic Block Model or the Planted Partition Model is the most widely used probabilistic model for community detection and clustering graphs in various fields, including machine learning,statistics, and social sciences. Many existing algorithms (e.g. spectral algorithms) successfully learn the communities or clusters when the data is drawn exactly according to the model. However, many of these guarantees do not hold in the presence of modeling errors, or when there is overlap between the different communities. In this talk, I will address the following question: can we design robust efficient algorithms for learning probabilistic models for community detection that work in the presence of adversarial modeling errors? I will describe different computationally efficient algorithms that probably recover communities or clusters (up to small recovery error).These algorithmic results will work for probabilistic models that are more general than the stochastic block model, or when there are different kinds of modeling errors or noise.

Learn more about Aravindan Vijayaraghavan.

Mechanistic Modeling of the (Bio)Conversion of (Bio)Macromolecules–Arch Room
Linda Broadbelt, Assistant Professor, Electrical Engineering and Computer Science; Robert R. McCormick School of Engineering and Applied Science

Abstract

Fast pyrolysis, a potential strategy for the production of transportation fuels from biomass, involves a complex network of competing reactions, which result in the formation of bio-oil, non-condensable gaseous species, and solid char. Bio-oil is a mixture of anhydro sugars, furan derivatives, and oxygenated aromatic and low molecular weight (LMW) compounds. Previously, the successful modeling of fast pyrolysis reactors for biomass conversion was hampered by lumped kinetic models, which fail to predict the bio-oil composition. Hence, a fundamental understanding of the chemistry and kinetics of biomass pyrolysis is important to evaluate the effects of process parameters like temperature, residence time and pressure on the composition of bio-oil. In this talk, a mechanistic model that was recently developed to characterize the primary products of fast pyrolysis of cellulose is described. The kinetic model of pyrolysis of pure cellulose was then extended to describe cellulose decomposition in the presence of sodium salts. To quantify the effect of sodium, a density functional theory study of glucose dehydration, an important class of decomposition reactions of a cellulose-derived intermediate, was carried out. The theoretical results reveal alterations in the reaction rate coefficients when sodium is present and a change in the relative rates of different reactions. These kinetic parameters were used in the kinetic model to describe Na-mediated pathways, capturing trends in the experimental product distributions as the salt loading was increased based on classic catalytic cycles. In contrast to pyrolysis, conversion of macromolecules such as cellulose in Nature takes place at ambient temperature, aided by enzymes. Mechanistic details of the action of these enzymes will also be discussed and contrasted to high-temperature pyrolysis pathways.

We have also developed a computational discovery platform for identifying and analyzing novel biochemical pathways to target chemicals. Automated network generation that defines and implements the chemistry of what we have coined “generalized enzyme functions” based on knowledge compiled in existing biochemical databases is employed. The output is a set of compounds and the pathways connecting them, both known and novel. To identify the most promising of the thousands of different pathways generated, we link the automated network generation algorithms with pathway evaluation tools. The simplest screening metrics to rank pathways are pathway length and number of known reactions. More sophisticated screening tools include thermodynamic feasibility and potential of known enzymes for carrying out novel reactions. Our method for automated generation of pathways creates novel compounds and pathways that have not been reported in biochemical or chemical databases. Thus, our method goes beyond a survey of existing compounds and reactions and provides an alternative to the conventional approaches practiced to develop novel biochemical processes that harness the power of enzymes as catalysts.

Learn more about Linda Broadbelt

Analysis Animation: A New Paradigm for Exploring Population Omics Data–Lake Room
Denise Scholtens, Professor of Preventive Medicine (Biostatistics) and Neurological Surgery; Chief, Division of Biostatistics

Abstract

Integration of genetics and metabolomics data demands careful accounting of complex dependencies, particularly when modeling familial omics data, for example, to study fetal programming of related maternal-offspring phenotypes. Efforts to find ‘genetically determined metabotypes’ using classic genome-wide association study (GWAS) approaches have proven useful for characterizing complex disease, but conclusions are often limited to a disjointed series variant-metabolite associations. Our research group is adapting Bayesian network models to integrate metabotypes with maternal-fetal genetic dependencies and metabolic profile correlations. Using data from the multiethnic Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study, we demonstrate that strategic specification of ordered dependencies, pre-filtering of candidate metabotypes, spinglass clustering of metabolites and conditional linear Gaussian methods clarify fetal programming of newborn adiposity related to maternal glycemia. Exploration of network growth over a range of penalty parameters, coupled with interactive plotting and external validation using publically available results, facilitate interpretation of network edges. These methods are broadly applicable to integration of diverse omics data for related individuals.

Learn more about Denise Scholtens.

Afternoon I Sessions

2:30 - 3:15 p.m.

Balloon-borne Observations of the Birth of Stars and Planets in Magnetized Galactic Clouds–Northwestern Room A
Giles Novak, Professor, Physics and Astronomy; Judd A. and Marjorie Weinberg College of Arts and Sciences

Abstract

The Galactic magnetic fields that permeate the coldest interstellar clouds where new stars and planets are born are believed to control many aspects of the stellar birth process. These fields are notoriously difficult to observe, but advanced detectors for submillimeter astronomy combined with the availability of long duration balloons flown over Antarctica are now making such observations possible on a large scale. Separating (a) the tiny astrophysical signals that carry information on cosmic magnetic fields from (b) the large thermal signals produced by the optics and residual atmosphere is a job for cluster computing. I will review this problem and other challenges, discuss recent results from our NASA-funded program of Antartic ballooning, and summarize future plans. Learn more about Giles Novak.

Leveraging computational processes and neuroimaging data to understand the developing human brain–Northwestern Room B
Elizabeth Norton, Assistant Professor, Communication Sciences and Disorders; School of Communication

Abstract

The human brain is impressively complex, and only recently have we started to understand the ways in which differences in brain structure and function can lead to meaningful behavior differences in individuals. An even greater challenge for our field to tackle is to understand how the brain changes over development. Neuroimaging studies using functional and structural MRI, EEG, and other technologies now let us peer in to the minds of adults and children with disorders such as dyslexia or autism, and even into preschoolers and infants who might develop these disorders. This presentation will discuss the computational tools that we use in our lab as cognitive neuroscientists to better understand these brain processes, such as software that allows us to track fibers and segment meaningful brain areas from MRI data, and to analyze and integrate EEG data from two people during interaction. We will discuss how brain measures and computational approaches might be used in a practical way to understand and predict common developmental disorders.

Learn more about Elizabeth Norton.

Using data to understand the social and systemic drivers of HIV–Lake Room
Michelle Birkett, Assistant Professor, Medical Social Sciences; Feinberg School of Medicine

Abstract

Understanding the drivers of health disparities within populations is extremely complex – particularly within stigmatized populations, such as racial or sexual and gender minorities. Health disparities have been suggested to occur because of intersecting individual, relational, and environmental processes caused by stigma, but little is known about the exact pathways. This talk will present how utilizing large and diverse datasets, as well as a systems-perspective, allows researchers to understand the social and contextual drivers of these disparities.

Learn more about Michelle Birkett.

Afternoon II Sessions

3:30 - 4:15 p.m.

Good, Fast, Cheap: Applying the Iron Triangle to Big Data Governance–Lake Room
Justin Starren, Professor, Preventive Medicine, Health and Biomedical Informatics Division; Feinberg School of Medicine

Abstract

The presentation builds upon the experience of the Northwestern Medicine Enterprise Data Warehouse, with a mature, decade-old, Enterprise Data Warehouses (EDW), containing over 6.7 million patients and data from 142 discrete source systems. The presentation will discuss the various governance for large mixed-use data repositories and the relative advantages and risks of each. Our experience is that the “politics,” such as policies governing access and ownership, are much more difficult and time consuming than the technical “plumbing” of the data handling. We will discuss how these issues are magnified in multi-institution repositories, and some strategies for addressing these challenges.

The presentation will present the concept of the Iron Triangle. It will also discuss how the Iron Triangle can be expanded to address the unique challenges of big data-centric solutions—the Iron Box. The presentation will present a graphical version of the Iron Box. It will discuss how the tool can be applied to these tradeoffs and it can be used to facilitate organizational strategy discussion. It will include a hands-on exercise for participants to help them understand how to apply the tool. The presentation will conclude with case studies demonstrating how a shared-model EDW can facilitate advanced analytics and a Learning Health System.

Vision Science in Visualization–Northwestern Room B
Christie Nothelfer, PhD candidate in the Brain, Behavior & Cognition program in the Department of Psychology

Abstract

Data are ubiquitous. Data visualizations are powerful ways to explore data, and a persuasive way to communicate the patterns within. How do we design effective visualizations? One route is to incorporate research on human vision, to design visualizations that leverage the strengths and weaknesses of the human perceptual system in visualizations. While the visualization and vision science research communities have historically maintained minimal contact, they have much to teach each other, and collaborative research questions can advance both fields. For example, should we use multiple visual features, like colors and shapes, to distinguish groups of data in a scatterplot, even if it adds more visual complexity? How many sections of a line chart can we perceive and remember? How quickly can we extract relationships between data points, such as whether a bar graph contains more increases or decreases in value? I will present research at this exciting intersection of visualization and vision science, and demonstrate how both research communities can benefit from cross-disciplinary work.

Learn more about Christie Nothelfer.

Building Big Data to Measure Legal Bias–Arch Room
Kat Albrecht, PhD candidate, Department of Sociology; Judd A. and Marjorie Weinberg College of Arts and Sciences

Abstract

Crime data can be extremely difficult to use when you are trying to measure actual crime. One reason for this is that the court system arbitrates the crime categories that are used by researchers, who often proceed without clear understanding of the separation between the original criminal event and the court derived data. Compounding this measurement issue are human sources of bias within the justice system, notably defined by the academic literature as judges. While it true that judges wield a lot of discretion, over 95% of federal cases are actually decided via plea bargaining, giving prosecutors widespread influence over most plea deals. This project focuses on creating victim-offender sentencing dyads from Florida Homicide data (N=43,459) that include prosecutor/defense attorney information in order to probe this potential source of bias. This project relies heavily on computational methodsincluding:transforming PDF records into digital data, scraping massive amounts of newspaper data to develop condensed victim profiles, fuzzy matching analysis to construct a large cohesive dataset, and tests of API’s designed to predict race/ethnicity for images. In addition to substantive findings of racial bias among prosecutors, benchmarking and functionality of these computational processes are discussed.

Statistical Signals Underlying Repeated Attempts That Lead to Success –Arch Room
Yian Yin, PhD candidate, Department of Industrial Engineering and Management Sciences; Robert R. McCormick School of Engineering and Applied Science

Abstract

Success is often preceded by repeated attempts, but little is known about statistical signatures underlying repeated attempts that eventually lead to success. Two important questions regarding failures have remained open: First, are there quantitative patterns governing the dynamics underlying repeated attempts that failed? Second, since there are only two outcomes to which repeated attempts lead: success and non-success, are there early statistical signatures behind their dynamical patterns that distinguish these two cases? Here we explore simple models of repeated failures by assuming a finite number of past attempts to learn from through repeated attempts. We find that there exist different regimes with fundamentally different characteristics as we tune the memory length, ranging from a random phase where each attempt with similar time cost and converging performance, to a cumulative phase where the time cost decreases as a power law of past attempt numbers and performance improves continuously, consistent with the universal learning curve that has been investigated by huge literatures across multiple levels and disciplines. To validate our theory, here we leverage millions of records through three datasets, ranging from scientists applying for research grants, to entrepreneurs working on startup ventures, to terrorism organizations launching armed attacks. All the three systems support the predicted co-existence of different phases. Our findings uncover new kinds of statistical indicators within failures that can predict the onset of future success, which not only have practical implications for science, but also improve our understanding of predictive patterns underlying complex interconnected systems.

Last Updated: 2 April 2018

Get Help Back to top