Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era

Dan Oneață

politehnica Bucharest

Desmond Elliott

University of Copenhagen

Stella Frank

Pioneer Center for Artificial Intelligence

paper · code · poster

Self-supervised vision models are surprisingly good at predicting the semantic attributes of concepts.

Overview

  • Revisit classic work in distributional semantics (e.g. Rubinstein et al., 2015; Collell and Moens, 2016; Lucy and Gauthier, 2017) but with a new twist: larger models and denser data.
  • Evaluate how well large-scale pretrained models capture the semantic attributes of concepts; for example, rose is red, smells sweet, is a flower.
  • Evaluate on data that go beyond perception, e.g. emotion, encyclopaedic, functional knowledge.

Methodology

Methodology overview

  • Train linear probes on frozen representations extracted from pretrained vision or language models.

Datasets

Datasets overview

Link the concepts from things (Hebart et al., 2019) to two sets of attributes:

Results: Model Ranking

Results overview

(Models at the top are better.)

Interactive demo

Below you can visualise and interact with the results: you can pick two models and a dataset, and the app will display: (i) a scatterplot of the performance of the two models on the selected dataset (top), and (ii) the per concept predictions for a selected attribute (bottom). To access the demo on a full page, visit this link. Since the demo is hosted on Streamlit Cloud, the app goes to sleep due to inactivity, so you might need to get it back up. Currently, to get the app back up you need to do it from the full page (and not from the embedded app on this page).