Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation

Jinshuo Liu, Yusen Chen, Juan Deng*, Donghong Ji, Jeff Pan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the important works of Information Content Security is evaluating the theme words of the text. Because of the variety of the Chinese expression, especially of the abbreviation, the supervision of the theme words becomes harder. The goal of this paper is to quickly and accurately discover the intercept abbreviations from the text crawled at the short time period. The paper firstly segments the target texts, and then utilizes the Supported Vector Machine (SVM) to recognize the abbreviations from the wrongly segmented texts as the candidates. Secondly, this paper presents the collaborative methods: Improve the Conditional Random Fields (CRF) to predict the corresponding word to each character of the abbreviation; To solve the problems of the 1:n relationship, collaboratively merge the ranking list from the predict steps with the matched results of the thesaurus of abbreviations. The experiments demonstrate that our method at the recognizing stage is 76.5% of the accuracy and 77.8% of the recall rate. At the recovery step, the accuracy is 62.1%, which is 20.8% higher than the method based on Hidden Markov Model (HMM).

Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 16th China National Conference, CCL 2017 and 5th International Symposium, NLP-NABD 2017, Proceedings
EditorsMaosong Sun, Baobao Chang, Xiaojie Wang, Deyi Xiong
PublisherSpringer-Verlag
Pages224-236
Number of pages13
ISBN (Electronic)9783319690056
ISBN (Print)9783319690049
DOIs
Publication statusPublished - 7 Oct 2017
Event16th China National Conference on Computational Linguistics, CCL 2017 and 5th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2017 - Nanjing, China
Duration: 13 Oct 201715 Oct 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10565 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th China National Conference on Computational Linguistics, CCL 2017 and 5th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2017
Country/TerritoryChina
CityNanjing
Period13/10/1715/10/17

Keywords

  • Chinese abbreviation
  • Collaborative recovery
  • Improved CRF

Fingerprint

Dive into the research topics of 'Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation'. Together they form a unique fingerprint.

Cite this