Supervised Dictionary Learning Julien Mairal Francis Bach INplete, with a number of basis elements greater than the dimension of the data. Recent research has shown that sparsity helps to capture higher-order correlation in data. In [3, 4], sparse decompositions are used with predefined dictionaries for face and signal recognition. In [5], dictionaries are learned for a reconstruction task, and the correspond- ing sparse models are used as features in an SVM. In [6], a discriminative method is introduced for various classification tasks, learning one dictionary per class; the classification process itself is based on the corresponding reconstruction error, and does not exploit the actual decomposition co- efficients. In [7], a generative model for documents is learned at the same time as the parameters of a deep network structure. In [8], multi-task learning is performed by learning features and tasks are selected using a sparsity criterion. The framework we present in this paper extends these approaches by learning simultaneously a single shared dictionary as well as models for different signal classes in a mixed generative and discriminative formulation (see also [9], where a different discriminative term is added to the classical reconstructive one). Similar joint generative/discriminative frame- works have