What can a machine learning simulation tell us about human
performance in a complex, real-time task such as Tetris¬ó?
Although Tetris is often used as a research tool (Mayer,
2014), the strategies and methods used by Tetris players have
seldom been the explicit focus of study. In Study 1, we use
cross-entropy reinforcement learning (CERL) (Szita & Lorincz,
2006; Thiery & Scherrer, 2009) to explore (a) the utility
of high-level strategies (goals or objective functions) for
maximizing performance and (b) a variety of features and
feature-weights (methods) for optimizing a low-level, onezoid
optimization strategy. Two of these optimization strategies
quickly rise to performance plateaus, whereas two others
continued towards higher but more jagged (i.e., variable)
plateaus. In Study 2, we compare the zoid (i.e., Tetris piece)
placement decisions made by our best CERL models with
those made by the full spectrum of novice-to-expert human
Tetris players. Across 370,131 episodes collected from 67 human
players, the ability of two CERL strategies to classify human
zoid placements varied with player expertise from 43%
for our lowest scoring novice to around 65% for our three
highest scoring experts.