The research contained herein is comprised of three studies, each of which utilizes a unique set of population-wide microdata to address a few of the many challenges that arise when statistical models are used to study the complex dynamics that govern the function and evolution of urban systems. These challenges range from the technical to the political, but they are all highly relevant to the management of the modern metropolis in which “data- driven” decision-making is increasingly the norm. Taken together, these studies draw upon relevant literature from the fields of urban geography, behavioral economics, and computa- tional statistics, among others, to synthesize a body of work which questions what can and cannot be achieved through urban systems modeling while simultaneously making several contributions to its study and its practice.
The first study demonstrates how thoughtful model design can be exploited to make powerful inferences about an understudied urban economic policy whose costs are primarily borne by the most vulnerable inhabitants of our cities. In particular, it identifies the first causal estimate of the effect of rent control status on eviction filing rates in the scientific literature. A 10-year dataset of eviction notices (n=21,806) is combined with a complete history of property tax records (n=1,978,687) in a regression discontinuity design to estimate a local average treatment effect of 0.013 evictions per residential unit per year conditioned on positive rent control status. Compared to the baseline rate of eviction notices over this same period, this translates to a 240% increase in the likelihood of eviction for tenants living in controlled units. I argue that this finding is best understood not as an inherent inefficiency of rent control policy in general, but rather as the result of specific state-wide laws, passed in the years following the adoption of rent control in San Francisco, which granted rent controlled property owners an economic incentive to evict and the legal means to do so. The chapter concludes by making specific policy recommendations which city officials can enact today in order to protect its most vulnerable residents. In the chapter that follows I address one of the myriad challenges that arise once a model, having been thoughtfully designed and estimated, is used to make inferences about the future rather than the past. The issue of simulation error due to sampling of alternatives in discrete choice models is an important one, not only because it represents a longstanding criticism of the statistical validity of many state-of-the-art microsimulation models, but also because it points to a significant disconnect in the development of methods for estimation and for simulation. This disconnect reflects a larger rift that exists between theory and practice in urban systems modeling. The purpose of this chapter is to help bridge that gap by defining a novel measure of forecast error and using it to quantify the extent of the problem as it manifests in a disaggregate model of discretionary location choice, typical of those that are commonly found in use today. The definition of this metric is itself a valuable contribution to a discipline which is often maligned for its inability to assess the accuracy of its methods. I am also able to demonstrate this value by using the metric to identify two key findings. First, I show that the proportion of aggregate demand which is misallocated due to sampling of alternatives is actually reduced as the size of the universe of alternatives increases (i.e. becomes more disaggregate). Secondly, I find that in most scenarios, simple random sampling actually outperforms the importance sampling approach in terms of simulation error due to sampling of alternatives. Both results contradict the traditional wisdom about best practices in microscopic models of travel and land use demand.
The final study included here presents a method designed to make it easier for researchers and practitioners alike to acquire the kinds of data required to perform microscopic urban mod- eling. Unlike households and persons, no public repository of establishment-level microdata currently exists for businesses in the United States. Work in this domain has therefore been primarily limited to those with the resources to purchase expensive commercial datasets. As a result, the development of disaggregate models of business and firm dynamics has lagged behind their person- and household-based counterparts, hindering the development of fully integrated transportation and land use microsimulation systems as a whole. Drawing on recent advances in the application of Bayesian networks to population synthesis and data privacy preservation, I estimate a series of probabilistic models on a dataset of proprietary business establishment listings and use them to generate synthetic populations which match the joint distributions of key characteristics from the original data but contain none of the original records themselves. In a second analysis I show how aggregate Census data can be used as control totals for sampling from the fitted models to create synthetic populations that match the aggregate Census counts but have a much richer set of features than what is made publicly available by the Census. In theory, these data, along with the fitted models themselves, can be shared freely among collaborators without fear of copyright violation or disclosure of private data. These results demonstrate the great potential of Bayesian net- works, and probabilistic models in general, to democratize access to microdata and thereby facilitate greater scientific collaboration in the field of disaggregate urban systems modeling.
I conclude with a brief summary of these findings and discuss their relation to current issues in urban systems modeling. I identify several opportunities to improve and build on the work presented here, while also addressing the inherent limitations of model-based research to meet the most pressing needs of our urban communities at this moment in history.