We study the problem of minimizing nonsmooth convex functions with max structure in huge dimensions. We identify a max-type partial separability property that implies the existence of a quadratic separable overapproximation for the smoothed function obtained by Nesterov's smoothing. We then use this overapproximation to define a parallel coordinate descent method and we provide a theoretical parallelization speedup factor that depends on the degree of partial separability of the function. This work applies as well to structured smooth functions and we present a parallel version of the machine learning algorithm Adaboost. We compare the parallel coordinate descent method with other approaches on public large scale datasets. |