The pervasive usage of mobile phones provides an abundance of data associated with their users. Data generated by mobile devices includes measurements of user actions (e.g. call logs data, SMS logs data, app usage sessions data) as well as records regarding its local environment (e.g. data about other local blue tooth devices, GPS/location trajectories). These data provide information regarding the underlying behaviors of device users and their corresponding social contexts, thus giving us new opportunities for learning human spatio-temporal and social patterns.
In this dissertation, new methods are developed to mine data generated by mobile devices and further improve applications in the area of social learning and inference. Particularly, three objectives are pursued here: (1) using mobile phone data to distinguish dyads embedded in social relationships from those that are not; (2) understanding how human telecommunication activities are associated with urban ecology; and (3) obtaining insights regarding users' demographic features from mobile app usage data. A key characteristic of these approaches is that they leverage activity and environmental data at the device level (in some cases, aggregated across sets of users), often allowing the data in question to be collected and stored in an efficient and privacy preserving manner.
In pursuing the first objective, we introduce a new approach --\emph{activity correlation spectroscopy} -- to inferring relationships by exploiting the spectral and distributional structure of activity correlation within dyads. Unlike existing techniques, our approach can be employed with minimal, individual-level (i.e., non-relational), and non-identifying data that is easily collected using commodity hardware. We demonstrate our methodology via an application to detection of friendship and group co-membership using mobile device and survey data from the MIT Reality Mining study \citep{eagle2009}.
Vis a vis the second objective, we provide a novel approach to the use of spatio-temporally aggregated cell phone data to learn features of urban ecology (i.e., spatial distributions of distinct social and economic entities and their associated activities). Specifically, our technique involves four stages: (\RN{1}) decomposing the aggregated cell phone activity within local areal units using spectral methods; (\RN{2}) learning spectral characteristics associated with ecological features using a training set; (\RN{3}) predicting local ecology composition for out-of-sample areas; and (\RN{4}) predicting activity time series for out-of-sample areas. The core of our approach is the projection of spectral features in cell phone activity series into an ecology-associated basis, allowing both identification of communication patterns arising from particular types of local activities and/or institutions and leveraging of those patterns for classification and activity prediction. We apply our methodology to aggregated communication and Internet traffic data from the cities of Milan and Trento to show the effectiveness of our method.
Finally, in pursuing the third objective, we demonstrate an integrated system which can cheaply and easily collect application behavior and survey data from mobile phones; we introduce several novel features that assist the learning of individual level demographic features (e.g., gender and age group). Specifically, our approach for learning and inference for demographic features involves new techniques: (\RN{1}) decomposing the app usage from mobile phones using spectral methods; (\RN{2}) learning spectral characteristics associated with individuals using a training set; (\RN{3}) combining other temporal features with learned spectral characteristics to predict demographic features for out-of-sample individuals. The core of our methodology is the utilization of spectral features in cell phone app activity series, allowing both identification of behavior patterns arising from particular types of cell phone apps and leveraging of those patterns for demographic classification and prediction. We demonstrate the effectiveness of our approach with an application to real mobile app traffic data from the United States.