In a number of domains in computer vision, machine learning and psychology, it is common to model and analyze data in terms of pairwise interactions between data elements, that is, distances between pairs of points. A large variety of supervised and unsupervised algorithms exist for learning the structure of such data. However, it is not always the case that interactions between entities can be described in terms of pairs; often, higher order relations of size three and higher offer a more natural formulation. Consider the k-lines clustering problem, where the objective is to group data points in a d- dimensional vector space into k clusters where elements in each cluster are well approximated by a line. As every pair of data points trivially define a line, there does not exist a useful measure of similarity between pairs of points for this problem. However, it is possible to define measures of similarity over triplets of points that indicate how close they are to being collinear. Another example is the task of constructing an embedding obtained from psychometric experiments involving paired comparisons. A common paired comparison experiment is where subjects are shown three images -- X, Y and Z -- and asked to report which of X or Y is more similar to Z. Here each observation is a triadic relation on the set of images. We refer to the problem of learning from such data -sets as higher order learning. In this dissertation we present solutions to two problems in higher order learning. The first is a new technique for clustering called Clique Averaging, in which we use a novel generative model for hypergraphs to construct a clustering algorithm that has better theoretical as well as empirical performance than existing algorithms. We show applications to illumination invariant clustering and k-subspace learning. The second is a new non-metric multi-dimensional scaling (MDS) algorithm for ordinal data, that is, data sets in which instead of the magnitude, only the relative or rank order of distances is known. We will show how this algorithm can be used to construct a perceptual space for reflectance