Sentence subjectivity detection with weakly-supervised learning

Chenghua Lin; Yulan He; Richard Everson

Sentence subjectivity detection with weakly-supervised learning

Chenghua Lin, Yulan He, Richard Everson

Computing Science

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

30 Citations (Scopus)

Abstract

This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identiﬁes whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classiﬁer training or linguistic pattern extraction for subjectivity classiﬁcation, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incorporate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a signiﬁcant performance gain, the prior lexical information from neutral words is less effective.

Original language	English
Title of host publication	Proceedings of the 5th International Joint Conference on Natural Language Processing
Subtitle of host publication	Chiang Mai, Thailand, November 8 – 13, 2011
Publisher	AFNLP
Pages	1153-1161
Number of pages	9
Publication status	Published - 2011

Access to Document

http://aclweb.org/anthology//I/I11/I11-1129.pdf

Cite this

@inproceedings{ba8d722ab89a4e69a7a5f5f03e355c7d,

title = "Sentence subjectivity detection with weakly-supervised learning",

abstract = "This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identiﬁes whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classiﬁer training or linguistic pattern extraction for subjectivity classiﬁcation, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incorporate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a signiﬁcant performance gain, the prior lexical information from neutral words is less effective.",

author = "Chenghua Lin and Yulan He and Richard Everson",

year = "2011",

language = "English",

pages = "1153--1161",

booktitle = "Proceedings of the 5th International Joint Conference on Natural Language Processing",

publisher = "AFNLP",

}

TY - GEN

T1 - Sentence subjectivity detection with weakly-supervised learning

AU - Lin, Chenghua

AU - He, Yulan

AU - Everson, Richard

PY - 2011

Y1 - 2011

N2 - This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identiﬁes whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classiﬁer training or linguistic pattern extraction for subjectivity classiﬁcation, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incorporate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a signiﬁcant performance gain, the prior lexical information from neutral words is less effective.

AB - This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identiﬁes whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classiﬁer training or linguistic pattern extraction for subjectivity classiﬁcation, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incorporate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a signiﬁcant performance gain, the prior lexical information from neutral words is less effective.

M3 - Published conference contribution

SP - 1153

EP - 1161

BT - Proceedings of the 5th International Joint Conference on Natural Language Processing

PB - AFNLP

ER -

Sentence subjectivity detection with weakly-supervised learning

Abstract

Access to Document

Fingerprint

Cite this