Abstract
There has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinion
expressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted to
the development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus of
customer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review and
sentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotation
process was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.
In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.
Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task of
polarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broad
range of opinion mining tasks.
expressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted to
the development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus of
customer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review and
sentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotation
process was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.
In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.
Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task of
polarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broad
range of opinion mining tasks.
Original language | English |
---|---|
Title of host publication | Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis” |
Editors | J. Fernando Sánchez-Rada , Björn Schuller |
Pages | 32-39 |
Number of pages | 8 |
Publication status | Published - 23 May 2016 |
Keywords
- Opinion mining
- Sentiment and Topic analysis
- Annotation guidelines