Thanks to the pervasiveness of smartphones and their applications there is an abundance of data generated from mobile devices. These data are either actively contributed by the users (e.g., Foursquare check-ins), or can be passively inferred from the mobile phone activity (e.g., user's location during a phone call). Since, smartphones are constantly with their users, mobile data reflect the underlying human activity and they provide rich information about spatio-temporal and social patterns. This can be used to enable a variety of new services.
This dissertation is focused on how to mine mobile phone data to improve a variety of Smart City applications. In particular, we focus on Call Description Records (CDRs) that are generated every time a user makes a phone call. Also, users' phone calls reveal social information (e.g., whom they call). The spatio-temporal information in mobile data reflects how users move in the city (i.e., users' trajectories) and in which areas they spend their time during the day. In this dissertation we use their rich and unique combination of insights into human dynamics in order to: (1) understand and improve transportation in a city via ridesharing, (2) characterize various areas of the city, as well as how these areas interact with each other (Urban Ecology), and (3) predict communication between different areas of the city in order to improve provisioning in cellular infrastructure.
First, we use CDRs to assess the potential of ridesharing. Our offline analysis based on large CDRs and other data sets from four different cities indicates that ridesharing has a great potential considering spatio-temporal and social constraints the users might have when sharing a ride. Moreover, we design and implement an online ridesharing system (ORS), with emphasis on scalability.
Second, we use CDRs to infer features of urban ecology (i.e., social and economic activities, and social interaction). We present a novel approach that consists of time series decomposition of aggregate cell phone activity per unit area using spectral methods, and clustering of areal units with similar activity patterns. We validate our methodology using external ground truth data that we collected from municipal and online sources.
Finally, we use CDRs to predict cell-to-cell mobile traffic. Traffic prediction is crucial for provisioning and virtualization of cellular architectures. We build a traffic predictor using state-of-the art machine learning techniques. Our predictor is based on key insights that we got after examining the data and it achieves accuracy of 85% (while the baseline achieves 80%). Also, by giving higher weight to false positives, which is important for network operators, we can achieve a recall of 94%.