In today’s digital world, modern online services often make use of user data to create “personalized experiences” where the service infers what the user may want to see or do next based on what similar users have done in the past. In the education sphere, many students are turning to online and computer based learning environments as part of their ongoing education, thus generating vast logs of interesting student data, yet there does not exist much work in personalizing online learning experiences in a data driven fashion. This dissertation presents two methodologies that learn from large sets of student-created data to infer potentially efficient ways to utilize learning content on each respective digital learning platform.
The first methodology presented in this dissertation is somewhat tailored to the ASSISTments math-tutoring framework, but could be potentially applied to other data sets. This methodology seeks to uncover pedagogically advantageous orderings of items related to learning. The motivation for this methodology stems from the format of the ASSISTments platform: a random order of related questions is given to students until the students have “mastered” the knowledge component that underlies each question. It could be the case, however, that students might be able to master a knowledge component more efficiently if there exists an ordering of items that enables learning faster than an at-random order. To investigate this hypothesis, an item order extension is proposed to the Bayesian Knowledge Tracing (BKT) framework. The BKT framework is well suited to model student knowledge over a sequence of learning opportunities organized around a single knowledge component with the assumption that tutoring or learning occurs after each presented assessment opportunity. Using the publicly released ASSISTments 2012-2013 data set consisting of approximately 10 million student responses to math questions, the item order extension to the BKT framework is applied and compared to a baseline BKT result. Cross validation techniques are used to assess model fit, and a qualitative investigation is performed along with regression analyses to uncover in what circumstance a pedagogically advantageous item order might exist in the ASSISTments framework. An experiment is then conducted using the item order model to see if predicted learning outcomes are met in practice.
The second methodology presented enables learners in Massive Open Online Course (MOOC) contexts to be given data driven personalized recommendation related to what the learner might want to do next, based on what similar users have done in the past. In the world of MOOCs, billions of specific and granular actions taken by students have been recorded and logged. The types of actions that are logged vary in type: some actions are related to answering quiz questions, others are related to playing or pausing videos, while others are related to navigating to different course pages. This dissertation proposes a methodology whereby a model of student behavior is learned by training on how students have behaved in the past and proposes that insights gleaned from those behavior patterns can be utilized as part of a “personalized recommendation” framework that future MOOC learners can use, which will hopefully enhance the learning experience for those students. In this methodology, a general deep learning architecture utilizing recurrent neural networks is proposed, as such a methodology is well suited to model arbitrarily long sequences of data that may have complex relationships spanning multiple timepoints. Simpler N-gram (frequentist) baselines and course structure or “syllabus” baselines are used to assess RNN performance. A working demonstration of incorporating the personalized recommendation framework into a live MOOC using the edX framework is also conducted, showing that such a methodology can be reasonably implemented in practice. This behavior modeling methodology is applied to a wide variety of MOOC datasets in order to determine to what extent such a methodology works across a variety of different MOOCs. The size of the analyzed datasets spans tens of millions of navigational actions across tens of thousands of students.