- Main
From Bikers to Batters: Steering Statistics Through Real-World Problems
- Glazer, Amanda
- Advisor(s): Stark, Philip
Abstract
A problem-first approach to statistics develops statistical methods directly from real world questions and problems. This dissertation illustrates this approach through the development of statistics methods and tools in four disciplines: active transportation, higher education, election auditing and sports. Causal inference and nonparametric methods are emphasized as they avoid typically incorrect parametric assumptions.
The second chapter focuses on active transportation and problems with ensuring dataquality. Sufficiently accurate bicycle and pedestrian counts are useful for improving safety analyses, planning infrastructure, and prioritizing funding. The accuracy of instrumental counts is affected by the instrument’s sensing technology, details of siting and installation, calibration, random error, and malfunctions. Some of these errors cannot be detected without an independent, accurate count to compare to the instrumental count. But some failures can be detected (imperfectly) through their signal in the count data, which has led to a variety of algorithms to clean and interpolate instrumental count data. We present different methods for flagging questionable data and provide a detailed comparison of data cleaning approaches.
Higher education is the focus of the next chapter, and the central research questionis “do female presenters receive more questions or comments than male presenters during academic job talks?” We collect a large dataset of academic job talks from eight UC Berkeley departments from 2013-2019 in order to answer this question. We find that differences in the number, nature, and total duration of audience questions and comments are neither material nor statistically significant. For instance, the median difference (by gender) in the duration of questioning ranges from zero to less than two minutes in the five departments. Moreover, in some departments, candidates who were interrupted more often were more likely to be offered a position, challenging the premise that interruptions are necessarily prejudicial. These results are specific to the departments and years covered by the data, but they are broadly consistent with previous research, which found differences of comparable in magnitude. However, those studies concluded that the (small) differences were statistically significant. We present evidence that the nominal statistical significance is an artifact of using inappropriate hypothesis tests. We show that it is possible to calibrate those tests to obtain a proper P-value using randomization.
Motivated by the permutation test work in the previous chapter, the fourth chapterdevelops a method to construct fast exact/conservative Monte Carlo confidence intervals by inverting exact/conservative Monte Carlo tests about parameters. The method uses a single set of Monte Carlo samples, which both reduces the computational burden and ensures that the problem of finding where the P-value crosses α is well posed. For problems with realvalued parameters, if the P-value is quasiconcave in the parameter, a minor modification of the bisection algorithm quickly finds conservative confidence bounds to any desired degree of accuracy. Additional computational savings are possible for common test statistics in the one-sample and two-sample problem by exploiting the relationship between values of the test statistics for different values of the parameter. Examples across a wide range of disciplines are given to illustrate this new method.
The fifth, sixth, and seventh chapters focus on post-election audits. Post-election auditscan provide convincing evidence that election outcomes are correct—that the reported winner( s) really won—by manually inspecting ballots selected at random from a trustworthy paper trail of votes. Risk-limiting audits (RLAs) control the probability that, if the reported outcome is wrong, it is not corrected before the outcome becomes official. RLAs keep this probability below the specified “risk limit.” Chapter five compares RLAs to a proposed Bayesian alternative, Bayesian audits (BAs). BAs control a weighted average probability of correcting wrong outcomes over a hypothetical collection of elections; the weights come from the prior. RLAs and BAs make different assumptions, use different standards of evidence and offer different assurances. We illustrate these differences using simulations based on real contests. Historically, conducting RLAs of all contests in a jurisdiction has been infeasible, because efficiency is eroded when sampling cannot be targeted to ballot cards that contain the contest(s) under audit. States that conduct RLAs of contests on multi-card ballots or of small contests can dramatically reduce sample sizes by using information about which ballot cards contain which contests—by keeping track of card-style data (CSD). We present a method for using CSD to drastically decrease RLA sample sizes in chapter six. Chapter seven describes an open-source Python implementation of RLAs using CSD for the Hart InterCivic Verity voting system and the Dominion Democracy Suite voting system. The software is demonstrated using all 181 contests in the 2020 general election and all 214 contests in the 2022 general election in Orange County, CA, USA, the fifth-largest election jurisdiction in the U.S., with over 1.8 million active voters.
In the final chapter, we develop a novel method to quantify the impact of injuries on playerperformance in baseball. To quantify this impact we can look at the difference between performance the player would have achieved in the absence of injury and after a given injury. This quantity can be estimated by matching injured players to similar non-injured players. However, matching in observational studies faces complications when units enroll in treatment on a rolling basis (e.g., players are injured at different times). To address this issue, we introduce a new matched design, GroupMatch with instance replacement, allowing maximum flexibility in control selection. Second, we propose a block bootstrap approach for inference in matched designs with rolling enrollment and demonstrate that it accounts properly for complex correlations across matched sets in our new design and several other contexts. Third, we develop a falsification test to detect violations of the timepoint agnosticism assumption, which is needed to permit flexible matching across time.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-