AUTHORS: Lorenzo Volpi, Alejandro Moreo, Fabrizio Sebastiani
WORK PACKAGE: WP 8 – UbiQuity
URL: A Simple Method for Classifier Accuracy Prediction Under Prior Probability Shift
Keywords: Classifier accuracy prediction, Prior probability shift, Label shift, Quantification
Abstract
The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction – CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. In this paper we propose a CAP method specifically designed to address prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. By solving a system of n2 independent linear equations, with n the number of classes, our method estimates the n2 entries of the contingency table of the test data, and thus allows estimating any specific evaluation measure. Since a key step in this method involves predicting the class priors of the test data, we further observe a connection between our method and the field of “learning to quantify”. Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our method tends to outperform existing CAP methods.