Developing a tagset for automated POS tagging in Arabic

Shihadeh Alqrainy*, Aladdin Ayesh

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    4 Citations (Scopus)

    Abstract

    Arabic language has much more syntactical and morphological information. Diacritics, which are marks placed over and below the letters of Arabic word, play a great role in adding linguistic attributes to Arabic word in part-of-speech tagging system. This paper describes a tagset that were built based on the inflectional morphology system which derived from traditional Arabic grammatical theory. The tagset developed represent an early stage of research related to automatic morphosyntactic annotation in Arabic language. This paper aims to present a general tagset for use in diacritics-based automated tagging system that is underdevelopment by the author.

    Original languageEnglish
    Pages (from-to)2787-2792
    Number of pages6
    JournalWSEAS Transactions on Computers
    Volume5
    Issue number11
    Publication statusPublished - Nov 2006

    Keywords

    • Arabic language
    • Diacritics
    • Morphological
    • Part-of-speech (POS)
    • Syntactical
    • Tagset

    Fingerprint

    Dive into the research topics of 'Developing a tagset for automated POS tagging in Arabic'. Together they form a unique fingerprint.

    Cite this