Quantifying risks and the effects of risk factors requires controlling for exposure,
or the number of opportunities for the adverse outcome in question to occur. In the
context of traffic crashes, traffic volumes are frequently used as an exposure measure.
Efforts to study bicyclist crash risk have historically been hindered by the lack of
widespread exposure data. This study presents methods to estimate bicycle traffic
volumes across an entire urban network.
The first major chapter of the dissertation presents a data schema for classifying
bicycle demand datasets. There is an ever-growing abundance of transportation data,
with some of the fastest growth seen in realm of non-motorized demand. However,
all of the available datasets provide incomplete information about the system. For
example, some only represent a time series of observations at a single location in
space (automated counters), while others cover all space and time but only represent
a small subset of the population of people and trips (crowdsourced data). In order
to understand how these heterogeneous sources of information correspond to one
another, it was deemed necessary to first identify their differences. Six metadata
characteristics were defined, which are termed the population scope, trip aggregation,
temporal scope, temporal resolution, spatial scale, and demographics. Levels are
defined for each dimension, and examples of generic datasets are discussed in terms
of their metadata dimension.
The second major chapter of the dissertation presents a method of fusing multiple
link-level demand estimates to infer peak-hour bicycle traffic volumes. While
the method is agnostic to the specific sources being used, it is presented with a
case study of San Francisco, CA using data from regional travel demand models,
a smartphone crowdsourcing application, and bikeshare system ridership. The defined process entails first converting the datasets to a common format in terms of
their metadata dimensions, and then fitting these homogenized link-level estimates
to observed counts using a weighted regression technique modeled after Geographically
Weighted Regression. The fitting parameters associated with each dataset are
hypothesized to vary geospatially, and the means by which this variation occurs is
controlled by the specified weighting scheme. A distance decay weighting, where observations
further from a given location contribute less to the parameter estimates, is
found to produce the best results. Cross-validation is employed for model comparison
and the selection of features and hyperparameter values. It is shown that, on the
basis of cross-validated Root-Mean Square Deviation, that fusing data sources provides
greater predictive accuracy than can be achieved using any individual source,
and that utilizing localized regression is more predictive than using a single global
parameter for each data set.
The final chapter is about inferring the temporal distribution of traffic based on
continuous automated count data. Latent Dirichlet Allocation is applied as a signal
decomposition model to identify latent spatio-temporal patterns in the observed
count data, which appear to correspond to coherent activity patterns such as AM
commuting, PM commuting, and midday cycling. Each link’s temporal distribution
can thus be expressed in terms of the extent to which each latent pattern is observed
on it. The mixture of these patterns on unobserved links is interpolated using a
purely autoregressive model, in contrast to the historically ad hoc methods used to
determine the temporal characteristics of bicycle traffic on unobserved links.
The primary conclusion of this work is that the lack of exposure data should no
longer be considered an insurmountable problem for studying bicycle crashes. Using
advanced analytical methods, such as those presented here, in conjunction with
the abundance of new datasets provides a means of generating defensible retrospective
volume estimates for the entire network. This dissertation paves the way for
many future lines of inquiry, including both refinements upon the methods presented
here and application of the volume estimates developed here to problems requiring
exposure quantities, such as the evaluation of crash risk.