Where’s Wally: The influence of visual salience on referring expression generation

Alasdair Clarke, Micha Elsner, Hannah Rohde

Research output: Contribution to journalArticle

73 Citations (Scopus)
4 Downloads (Pure)

Abstract

Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.
Original languageEnglish
Article number00329
JournalFrontiers in Psychology
Volume4
DOIs
Publication statusPublished - 18 Jun 2013

Keywords

  • referring expression generation
  • visual salience
  • visual clutter

Cite this

Where’s Wally : The influence of visual salience on referring expression generation. / Clarke, Alasdair; Elsner, Micha; Rohde, Hannah.

In: Frontiers in Psychology, Vol. 4, 00329, 18.06.2013.

Research output: Contribution to journalArticle

Clarke, Alasdair ; Elsner, Micha ; Rohde, Hannah. / Where’s Wally : The influence of visual salience on referring expression generation. In: Frontiers in Psychology. 2013 ; Vol. 4.
@article{ac0971886d454cf296aaf84b55754dd4,
title = "Where’s Wally: The influence of visual salience on referring expression generation",
abstract = "Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.",
keywords = "referring expression generation, visual salience, visual clutter",
author = "Alasdair Clarke and Micha Elsner and Hannah Rohde",
year = "2013",
month = "6",
day = "18",
doi = "10.3389/fpsyg.2013.00329",
language = "English",
volume = "4",
journal = "Frontiers in Psychology",
issn = "1664-1078",
publisher = "Frontiers Media S.A.",

}

TY - JOUR

T1 - Where’s Wally

T2 - The influence of visual salience on referring expression generation

AU - Clarke, Alasdair

AU - Elsner, Micha

AU - Rohde, Hannah

PY - 2013/6/18

Y1 - 2013/6/18

N2 - Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.

AB - Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.

KW - referring expression generation

KW - visual salience

KW - visual clutter

U2 - 10.3389/fpsyg.2013.00329

DO - 10.3389/fpsyg.2013.00329

M3 - Article

VL - 4

JO - Frontiers in Psychology

JF - Frontiers in Psychology

SN - 1664-1078

M1 - 00329

ER -