Crime profiling for the Arabic language using computational linguistic techniques

Meshrif Alruily*, Aladdin Ayesh, Hussein Zedan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Arabic is a widely spoken language but few mining tools have been developed to process Arabic text. This paper examines the crime domain in the Arabic language (unstructured text) using text mining techniques. The development and application of a Crime Profiling System (CPS) is presented. The system is able to extract meaningful information, in this case the type of crime, location and nationality, from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction that depends on local grammar, and secondly, dictionaries that can be automatically generated. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering of the crime reports, based on crime type. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data are cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; it was not used during system development. Precision, recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems. In order to evaluate the clustering performance, three parameters are used: data size, loading time and quantization error.

Original languageEnglish
Pages (from-to)315-341
Number of pages27
JournalInformation Processing and Management
Volume50
Issue number2
Early online date31 Oct 2013
DOIs
Publication statusPublished - Mar 2014
Externally publishedYes

Keywords

  • Arabic language
  • Clustering
  • Crime domain
  • Information extraction
  • Pattern recognition
  • Syntactic analysis

Fingerprint

Dive into the research topics of 'Crime profiling for the Arabic language using computational linguistic techniques'. Together they form a unique fingerprint.

Cite this