Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semi-supervised learning algorithms such as SVM and it also performs better than local learning without incorporating class priors.
|Title of host publication||Natural Language Processing and Information Systems|
|Subtitle of host publication||14th International Conference on Applications of Natural Language to Information Systems, NLDB 2009, Saarbrücken, Germany, June 24-26, 2009. Revised Papers|
|Publisher||Springer Berlin / Heidelberg|
|Number of pages||10|
|Publication status||Published - 2010|
|Name||Lecture Notes in Computer Science|
|Publisher||Springer Berlin Heidelberg|