While the number, sizes and complexity of databases have increased, the query interfaces that facilitate exploration of these databases have largely remained inadequate.
This dissertation takes a principled approach to user data exploration and proposes techniques that simplify access to large and complex structured and semi-structured databases. Various factors that affect usability such as user preference, intuitiveness of interfaces, and effort expended by users are taken into account, in addition to database structure. The interplay of these factors is studied and the proposed methods effectively utilize them to deliver maximum benefit to users. The most common tasks in a data exploration scenario are query formulation, results navigation and presentation. We propose and evaluate methods to improve usability for all these tasks. The techniques proposed are often complementary to each other, and exploit domain properties of the data. The effectiveness of the approaches is demonstrated with experiments on real-life datasets and comprehensive user-studies, wherever applicable.
The first part of the dissertation presents an auto-completion style query formulation interface, which enables users to augment keyword queries by adding structured conditions. The resulting queries are focused and more likely to return results that the user finds relevant. The next two parts of the dissertation focus on challenges in two commonly used scenarios of results navigation: Categorization and Faceted Navigation. To model and quantify the effort incurred by a user, navigation and cost models are proposed for both navigation scenarios. Techniques to estimate this effort, taking into account preferences, are proposed and algorithms developed to compute the minimal set of suggested options that, if followed by the user, minimize the expected effort required to navigate the results. The final part focuses on the results presentation. They present a method to construct result snippets, which complements existing methods that consider solely the importance of the selected attributes. This method considers the user effort required to read and comprehend the snippets.