This presentation shows how large-scale data sets challenge traditional machine learning in fundamental ways. • Traditional machine learning describes tradeoffs associated with the scarcity of data. Qualitatively different tradeoffs appear when we consider instead that computing time is the bottleneck. As a consequence, one needs to reconsider the relations between the machine learning problem, its optimization formulation, and the optimization algorithms. • Traditional machine learning optimize average losses. Increasing the training set size cannot improve such metrics indefinitely. However these diminishing returns vanish if we measure instead the diversity of conditions in which the trained system performs well. In other words, big data is not an opportunity to increase the average accuracy, but an opportunity to increase coverage. • Since the benefits of big data are related to the diversity of big data, we need conceptual tools to build learning systems that can address all the (changing) aspects of real big data problems. Multitask learning, transfer learning, and deep learning are first steps in this direction. |