A Curated Corpus for Sentiment-Topic Analysis

Emmanuel Ebuka Ibeke; Chenghua Lin; Christopher David Coe; Adam Zachary Wyner; Dong Liu; Mohamad Hardyman Bin Barawi; Noor Fazilla Abd Yusof

A Curated Corpus for Sentiment-Topic Analysis

Emmanuel Ebuka Ibeke, Chenghua Lin, Christopher David Coe, Adam Zachary Wyner, Dong Liu, Mohamad Hardyman Bin Barawi, Noor Fazilla Abd Yusof

Lincedo Ltd

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

57 Downloads (Pure)

Abstract

There has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinion
expressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted to
the development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus of
customer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review and
sentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotation
process was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.
In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.
Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task of
polarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broad
range of opinion mining tasks.

Original language	English
Title of host publication	Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis”
Editors	J. Fernando Sánchez-Rada , Björn Schuller
Pages	32-39
Number of pages	8
Publication status	Published - 23 May 2016

Keywords

Opinion mining
Sentiment and Topic analysis
Annotation guidelines

Access to Document

http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-ESA_Proceedings.pdf

Cite this

Ibeke, E. E., Lin, C., Coe, C. D., Wyner, A. Z., Liu, D., Barawi, M. H. B., & Abd Yusof, N. F. (2016). A Curated Corpus for Sentiment-Topic Analysis. In J. F. Sánchez-Rada , & B. Schuller (Eds.), Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis” (pp. 32-39) http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-ESA_Proceedings.pdf

@inproceedings{5e67658a571543fab8c545364f564432,

title = "A Curated Corpus for Sentiment-Topic Analysis",

abstract = "There has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinionexpressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted tothe development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus ofcustomer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review andsentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotationprocess was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task ofpolarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broadrange of opinion mining tasks.",

keywords = "Opinion mining, Sentiment and Topic analysis, Annotation guidelines",

author = "Ibeke, {Emmanuel Ebuka} and Chenghua Lin and Coe, {Christopher David} and Wyner, {Adam Zachary} and Dong Liu and Barawi, {Mohamad Hardyman Bin} and {Abd Yusof}, {Noor Fazilla}",

year = "2016",

month = may,

day = "23",

language = "English",

pages = "32--39",

editor = "{S{\'a}nchez-Rada }, {J. Fernando } and Schuller, {Bj{\"o}rn }",

booktitle = "Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis”",

}

TY - GEN

T1 - A Curated Corpus for Sentiment-Topic Analysis

AU - Ibeke, Emmanuel Ebuka

AU - Lin, Chenghua

AU - Coe, Christopher David

AU - Wyner, Adam Zachary

AU - Liu, Dong

AU - Barawi, Mohamad Hardyman Bin

AU - Abd Yusof, Noor Fazilla

PY - 2016/5/23

Y1 - 2016/5/23

N2 - There has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinionexpressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted tothe development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus ofcustomer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review andsentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotationprocess was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task ofpolarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broadrange of opinion mining tasks.

AB - There has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinionexpressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted tothe development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus ofcustomer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review andsentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotationprocess was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task ofpolarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broadrange of opinion mining tasks.

KW - Opinion mining

KW - Sentiment and Topic analysis

KW - Annotation guidelines

M3 - Published conference contribution

SP - 32

EP - 39

BT - Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis”

A2 - Sánchez-Rada , J. Fernando

A2 - Schuller, Björn

ER -

A Curated Corpus for Sentiment-Topic Analysis

Abstract

Keywords

Access to Document

Fingerprint

Cite this