Modern statistical practice has taken an Icarian flight in its embrace of model-based inference. Models are underwritten by sophisticated assumptions about the origins of data, involving hypothetical populations that conveniently follow parametric distributions. These assumptions are abstract and often demonstrably false, providing ample grounds for skepticism of model-based findings. Models are also obscure---requiring a high degree of mathematical sophistication to understand and interpret---which serves to preclude deliberation between researchers, prevent scrutiny by diverse stakeholders, and obfuscate underlying normative values and possible weaknesses. They support the gathering of professional statisticians and scientists into a priestly class, empowered to steer a technocratic state through the appearance of knowing. They do not support a healthy science---one that knows its limits, ascertains the truths it can, and earns public trust.
The design-based philosophy of statistics might do better. In design-based theory and practice, emphasis is placed on the physics of how the data were collected. Hypotheses are posited in terms of sharply defined, real-world quantities and all assumptions necessary to link the data to those hypotheses flow from the design. The assumptions are generally simple and justified so that the output is rigorous and transparent. Those qualities help ensure conclusions are usually true. They also support inter-subjective belief (i.e., trust) that a given conclusion is true. Moreover, the design-based view clearly circumscribes the kinds of problems that are amenable to rigorous statistics: those with a known or sharply hypothetical (as-if) design. If widely adopted, design-based statistical thinking may engender the circumspection and humility currently lacking under the influence of model-based data science.
This dissertation develops methods for design-based statistics. The chapters are particularly focused on design-based inference from surveys and experiments, motivated by applications in risk-limiting post-election audits (Chapters V, VI, and VII) and soil carbon sequestration (Chapters II, III, and IV). Risk-limiting audits are fundamentally survey problems. They map a sharp question of interest---who won this contest?---to a collection of null hypotheses about the means of lists of bounded numbers, which represent populations derived from cast ballots. Providing rigorous and transparent evidence that reported election results are accurate is critical to supporting trustworthy elections. Adherence to the design-based paradigm when developing and implementing risk-limiting audits ensures such evidence can be furnished.
The science of soil carbon sequestration involves a large array of statistical problems that can be classified as either surveys (e.g., carbon stock measurement) or experiments (e.g., management experiments) and handled by a design-based approach.
Often, studies involve a survey (sampling soil cores from plots) embedded in an experiment (randomly assigning plots to treatment), or vice versa. Soil carbon sequestration is a trending topic for its hypothesized potential to offset emissions and mitigate climate change. Failure to rigorously measure sequestration and provide transparent evidence of its efficacy could squander resources, cause shortfalls in emissions reductions, and shake public confidence in coordinated efforts to fight climate change. The regular practice of design-based statistics in soil science could support effective action and accurate carbon budgets.
The technical emphasis of this dissertation falls on valid inference in the presence of two major design elements: sequential sampling (Chapters IV, V, VI, VII) and stratification (Chapters III, VI, VII). Sequential sampling is a natural, necessary, or expedient feature of many real-world data collection procedures.While most traditional inference procedures compute a single inferential statistic (e.g. a $P$-value) on a batch of $n$ data points, a sequential procedure returns a valid statistic at any time during an iterative process of sampling (e.g., one-at-a-time as each data point comes, or periodically as rounds of data are collected). Sequential analysis thus allows data collection to expand as needed until there is sufficient evidence to draw a conclusion about a hypothesis of interest. We leverage various old and new ideas from probability, game theory, and statistics in developing and implementing efficient methods for sequential analysis. Chapter IV suggests some uses in soil carbon measurement, especially for adaptive experimental designs. In Chapter V we use the theory of Kelly optimality to develop efficient sequential tests for risk-limiting comparison audits.By minimizing the expected number of ballots needed to confirm the winner(s) of a contest, the tests reduce the cost of implementing risk-limiting audits. In Chapter VI we compare sequential tests constructed from betting test supermartingales to tests constructed from exponential test supermartingales. We find the former to be more efficient for risk-limiting comparison audits. Chapter VII builds on and generalizes Chapter VI, proposing new definitions of optimality and constructing sequential tests for population means when the population is also stratified.
Stratification is widely used in design-based statistics to accommodate logistical constraints, increase statistical efficiency, and lower the costs of sampling. Stratification entails partitioning a population into disjoint strata and drawing some number of samples from each stratum uniformly, with or without replacement, and independently across strata. Traditionally, a batch sample of fixed-size $n_k$ is drawn from stratum $k$ and inference on the population mean proceeds using Gaussian theory and finite-population asymptotics. Chapter III explores this strategy for measuring soil carbon stocks at a single time or verifying stock change over time. Chapter VI constructs finite-sample nonparametric tests for risk-limiting audits that are both stratified and sequential (see above). Chapter VII builds a general framework for sequential stratified testing and develops optimal and efficient tests for the mean of a stratified bounded population. The tests are valid (i) sequentially, (ii) in finite samples, (iii) without parametric assumptions, (iv) under any stratification, (v) and with all probabilities flowing from the design.