- Mathuriya, Amrita;
- Bard, Deborah;
- Mendygral, Peter;
- Meadows, Lawrence;
- Arnemann, James;
- Shao, Lei;
- He, Siyu;
- Kärnä, Tuomas;
- Moise, Diana;
- Pennycook, Simon J;
- Maschhoff, Kristyn;
- Sewall, Jason;
- Kumar, Nalini;
- Ho, Shirley;
- Ringenburg, Michael F;
- Prabhat;
- Lee, Victor
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel® Xeon Phi™ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩsubM/sub, σsub8/sub and nsubs/sub with unprecedented accuracy.