In supervised learning, to ensure the models' validity, it is essential to identify dataset shifts, i.e., when the data distribution changes from the one the model encountered at the time of training. To detect such changes, a comparative analysis of the multidimensional data distributions of the training data and new, unseen datasets is required. In this work, we span the design space of visualizations for multidimensional comparative data analytics. Based on this design space, we present DataShiftExplorer, a technique tailored to identify and analyze the change in multidimensional data distributions. To validate our approach, we carefully manipulate a classification data generator to build two use cases that exhibit the two most common forms of data-shift, the covariate, and the prior-probability shift. Our use cases highlight how DataShiftExplorer facilitates the identification and analysis of data changes, supporting supervised learning.
*Texto informado pelo autor.
Bruno Schneider is a Ph.D. Candidate in the Data Analysis and Visualization Group (DBVIS), at the University of Konstanz, Germany. The head of the group is Prof. Dr. Daniel Keim, who has a distinguished career in the fields of Information Visualization and Visual Analytics. The research in his group focuses on identifying opportunities for connecting human domain expert knowledge with fully automated algorithms and processes in computer science, through interactive visual interfaces. Before moving to Germany, Bruno gained his Master degree in Rio de Janeiro, Brazil, from the School of Applied Math, FGV (2014).