Developing a tagset for automated POS tagging in Arabic

Shihadeh Alqrainy*, Aladdin Ayesh

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Arabic language has much more syntactical and morphological information. Diacritics, which are marks placed over and below the letters of Arabic word, play a great role in adding linguistic attributes to Arabic word in part-of-speech tagging system. This paper describes a tagset that were built based on the inflectional morphology system which derived from traditional Arabic grammatical theory. The tagset developed represent an early stage of research related to automatic morphosyntactic annotation in Arabic language. This paper aims to present a general tagset for use in diacritics-based automated tagging system that is underdevelopment by the author.

Original languageEnglish
Pages (from-to)2787-2792
Number of pages6
JournalWSEAS Transactions on Computers
Volume5
Issue number11
Publication statusPublished - Nov 2006
Externally publishedYes

Keywords

  • Arabic language
  • Diacritics
  • Morphological
  • Part-of-speech (POS)
  • Syntactical
  • Tagset

Fingerprint

Dive into the research topics of 'Developing a tagset for automated POS tagging in Arabic'. Together they form a unique fingerprint.

Cite this