The question of how to incorporate curvature information in the stochastic gradient method of Robbins-Monro is challenging. Some attempts made in the literature involve a direct extension of quasi-Newton updating techniques for deterministic optimization. We argue that such an approach is not sound, and present a new formulation based on the minimization of gradient variances. In the second part of the talk we discuss how to make a quasi-Newton method robust in an asynchronous distributed computing environment. |