The Functoriality of Data -- Understanding Geometric Data Sets Jointly
Speaker:Leonidas J. Guibas
The information contained across many data sets is often highly correlated. Such connections and correlations can arise because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for data sets of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc.
We argue that when extracting knowledge from the data in a given data set, we can do significantly better if we exploit the wider context provided by all the relationships between this data set and a "society" or "social network" of other related data sets. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the "wisdom of the collection" in performing operations on individual data sets or in map inference between them.
This "functorial" view of data puts the spotlight on consistent, shared relations and maps as the key to understanding structure in data. It is a little different from the current dominant paradigm of extracting supervised or unsupervised feature sets, defining distance or similarity metrics, and doing regression or classification -- though sparsity still plays an important role. The inspiration is more from ideas in homological algebra or algebraic topology, exploiting the algebraic structure of data relationships or maps in an effort to disentangle dependencies and assign importance to the vast web of all possible relationships among multiple data sets.
We illustrate these ideas largely using examples from the realm of 3D shapes and images -- but the notions are more generally to the analysis of graphs and other networks, acoustic data, biological data such as microarrays, homeworks in MOOCs, etc.
This is an overview of joint work with multiple collaborators, as discussed in the talk.