We consider a dataset of object associations (pairs), where each object belongs to one of two types (e.g. drugs and side effects). In some cases, additional information for the objects is also available (e.g. drug structures). Our aim is to learn object representations that can be used to predict missing associations and encode interpretable object attributes that can explain the predictions. In this thesis, we propose three different approaches and apply them to real-world datasets, primarily from pharmacology, which vary in the type of data available in each.
Self-Matrix Factorization (SMF), is a method that learns object representations using solely association data. Exploiting the fact that, in general, objects lie in multiple linear manifolds embedded in high-dimensional space, SMF is able to learn similarities between objects---specifically, those that share a manifold---directly from the observed associations. Thus, SMF simultaneously learns object similarities and representations, constraining them to reflect underlying structures in the data. We tested SMF extensively on associations datasets containing user item ratings and drug side effect frequencies. Our results show that SMF outperforms competing methods in recovering missing associations and is also better at learning representations that capture meaningful object attributes.
In our second learning scenario, no explicit associations between objects are available—only the perturbations that objects (drugs and viruses) induce in an environment (protein-protein interaction network). Our approach learns the object representations through simultaneous matrix decompositions of different matrices. We show these representations encode interpretable attributes of the objects involved (drugs, viruses, etc) that can be used to predict effective antiviral treatments and that these predictions are explainable in terms of the learned object attributes.
Finally, in our third learning scenario, object associations (drug, side effects) are available together with some low level object features (drug molecular graphs). Object representations are learned through a deep learning model, called Features to Signatures (\(\phi 2 \sigma\)), and we show that these representations can be used to predicts drug side effect frequencies from molecular graphs. Importantly, (\(\phi 2 \sigma\)) can be used for ab initio prediction, to predict side effects frequency for compounds with previously undetected side effects.
27 de março de 2025, às 11h30.
Link do zoom: https://fgv-br.zoom.us/j/92321152709?pwd=viA26eSk7JcTCVKbI4RdW5L8iazXSN.1