[LLM][Wittgenstein][Bistable] Can LLMs deal with bistable images?
May 1, 2025
This is a very interesting paper that tested how vision language models perceive bistable images, such as the “Duck Rabbit illusion” made famous by Wittgenstein (image below). The study finds that most vision language models fail to see the two aspects, focusing only on one aspect. This is not very surprising though.
I think there are two factors that facilitate the switching: (1) visual attention - which part of the image you’re looking at, and (2) dynamical process going on in you brain that can get perturbed to switch the orbit. (2) might lead (1) to shift.
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch
https://arxiv.org/pdf/2405.19423
← Back to all articles Quick Navigation: Next:[ j ] – Prev:[ k ] – List:[ l ]