Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches

Azwa Abdul Aziz*, Andrew Starkey

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

38 Citations (Scopus)
9 Downloads (Pure)


Sentiment Analysis (SA) is focused on mining opinion (identification and classification) from unstructured text data such as product reviews or microblogs. It is widely used for brand reviews, political campaigns, marketing analysis or gaining feedback from customers. One of the prominent approaches for SA is using supervised machine learning (SML), an algorithm that uses datasets with defined class labels based on mathematical learning from a training dataset. While the results are promising especially with in-domain sentiments, there is no guarantee the model provides the same performance against real time data due to the diversity of new data. In addition, previous studies suggest the result of SML decrease when applied to cross-domain datasets because new features are appeared in different domains. So far, studies in SA emphasise the improvement of the sentiment result whereas there is little discussion focusing on how to detect the degradation of performance for the proposed model. Therefore, we provide a method known as Contextual Analysis (CA), a mechanism that constructs a relationship between words and sources that is constructed in a tree structure identified as Hierarchical Knowledge Tree (HKT). Then, Tree Similarity Index (TSI) and Tree Differences Index (TDI), a formula generate from tree structure are proposed to find similarity as well as changes between train and actual dataset. The regression analysis of datasets reveals that there is a highly significant positive relationship between TSI and SML accuracies. As a result, the prediction model created indicated estimation error within 2.75 to 3.94 and 2.30 for 3.51 for average absolute differences. Moreover, this method also can cluster sentiment words into positive and negative without having any linguistics resources used and at the same time capturing changes of sentiment words when a new dataset is applied.
Original languageEnglish
Pages (from-to)17722-17733
Number of pages12
JournalIEEE Access
Early online date10 Dec 2019
Publication statusPublished - 28 Jan 2020


  • Text analytics
  • sentiment analysis
  • contextual analysis
  • supervised machine learning


Dive into the research topics of 'Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches'. Together they form a unique fingerprint.

Cite this