Recently, analog compute-in-memory (CIM) architectures based on emerging
analog non-volatile memory (NVM) technologies have been explored for deep
neural networks (DNN) to improve energy efficiency. Such architectures,
however, leverage charge conservation, an operation with infinite resolution,
and thus are susceptible to errors. The computations in DNN realized by analog
NVM thus have high uncertainty due to the device stochasticity. Several reports
have demonstrated the use of analog NVM for CIM in a limited scale. It is
unclear whether the uncertainties in computations will prohibit large-scale
DNNs. To explore this critical issue of scalability, this paper first presents
a simulation framework to evaluate the feasibility of large-scale DNNs based on
CIM architecture and analog NVM. Simulation results show that DNNs trained for
high-precision digital computing engines are not resilient against the
uncertainty of the analog NVM devices. To avoid such catastrophic failures,
this paper introduces the analog floating-point representation for the DNN, and
the Hessian-Aware Stochastic Gradient Descent (HA-SGD) training algorithm to
enhance the inference accuracy of trained DNNs. As a result of such
enhancements, DNNs such as Wide ResNets for the CIFAR-100 image recognition
problem are demonstrated to have significant performance improvements in
accuracy without adding cost to the inference hardware.