Where’s Wally: The influence of visual salience on referring expression generation

Alasdair Clarke; Micha Elsner; Hannah Rohde

doi:10.3389/fpsyg.2013.00329

Where’s Wally: The influence of visual salience on referring expression generation

Alasdair Clarke, Micha Elsner^*, Hannah Rohde

^*Corresponding author for this work

Psychology

Research output: Contribution to journal › Article › peer-review

99 Citations (Scopus)

8 Downloads (Pure)

Abstract

Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.

Original language	English
Article number	00329
Journal	Frontiers in Psychology
Volume	4
DOIs	https://doi.org/10.3389/fpsyg.2013.00329
Publication status	Published - 18 Jun 2013

Keywords

referring expression generation
visual salience
visual clutter

Access to Document

10.3389/fpsyg.2013.00329

Where's wally
Copyright © 2013 Clarke, Elsner and Rohde. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
Final published version, 1.97 MB

Cite this

@article{ac0971886d454cf296aaf84b55754dd4,

title = "Where{\textquoteright}s Wally: The influence of visual salience on referring expression generation",

abstract = "Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.",

keywords = "referring expression generation, visual salience, visual clutter",

author = "Alasdair Clarke and Micha Elsner and Hannah Rohde",

year = "2013",

month = jun,

day = "18",

doi = "10.3389/fpsyg.2013.00329",

language = "English",

volume = "4",

journal = "Frontiers in Psychology",

issn = "1664-1078",

publisher = "Frontiers Media S.A.",

}

TY - JOUR

T1 - Where’s Wally

T2 - The influence of visual salience on referring expression generation

AU - Clarke, Alasdair

AU - Elsner, Micha

AU - Rohde, Hannah

PY - 2013/6/18

Y1 - 2013/6/18

N2 - Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.

AB - Referring expression generation (REG) presents the converse problem to visual search: given a scene and a specified target, how does one generate a description which would allow somebody else to quickly and accurately locate the target?Previous work in psycholinguistics and natural language processing has failed to find an important and integrated role for vision in this task. That previous work, which relies largely on simple scenes, tends to treat vision as a pre-process for extracting feature categories that are relevant to disambiguation. However, the visual search literature suggests that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. This paper presents a study testing whether participants are sensitive to visual features that allow them to compose such “good” descriptions. Our results show that visual properties (salience, clutter, area, and distance) influence REG for targets embedded in images from the Where's Wally? books. Referring expressions for large targets are shorter than those for smaller targets, and expressions about targets in highly cluttered scenes use more words. We also find that participants are more likely to mention non-target landmarks that are large, salient, and in close proximity to the target. These findings identify a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.

KW - referring expression generation

KW - visual salience

KW - visual clutter

U2 - 10.3389/fpsyg.2013.00329

DO - 10.3389/fpsyg.2013.00329

M3 - Article

SN - 1664-1078

VL - 4

JO - Frontiers in Psychology

JF - Frontiers in Psychology

M1 - 00329

ER -

Where’s Wally: The influence of visual salience on referring expression generation

Abstract

Keywords

Access to Document

Fingerprint

Cite this