Many machine learning problems can be expressed as the optimization of some
cost functional over a parametric family of probability distributions. It is
often beneficial to solve such optimization problems using natural gradient
methods. These methods are invariant to the parametrization of the family, and
thus can yield more effective optimization. Unfortunately, computing the
natural gradient is challenging as it requires inverting a high dimensional
matrix at each iteration. We propose a general framework to approximate the
natural gradient for the Wasserstein metric, by leveraging a dual formulation
of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach
leads to an estimator for gradient direction that can trade-off accuracy and
computational cost, with theoretical guarantees. We verify its accuracy on
simple examples, and show the advantage of using such an estimator in
classification tasks on Cifar10 and Cifar100 empirically.