TY - GEN
T1 - Text mining based on Self-Organizing Map method for Arabic-English documents
AU - Al-Marghilani, Abdulsamad
AU - Zedan, Hussein
AU - Ayesh, Aladdin
N1 - Funding Information:
The research presented in this paper was done in collaboration with my advisors, Prof. Dipti M. Sharma and Dr. Manish Shrivastava. The Part-of-Speech annotation was done in collaboration with Dr. Pinkey Nainwani and Harsh Lalwani. I would like to acknowledge the people who have helped me various tasks like collecting the data, translations and understanding Sindhi better such as, Mehtab Ahmed Solangi, Mr. Bhagwan Babani and Mr. Chunnilaal Wadhwani. I would like to thank Nehal J. Wani, Arnav Sharma, Himanshu Sharma and Dr. Francis M. Tyers for their constant motivation and support. Lastly, I am thankful to the anonymous reviewers for their invaluable feedback and suggestions. This publication reflects the authors views only.
PY - 2008
Y1 - 2008
N2 - Computer information and retrieval is becoming increasingly sophisticated and is being exploited in more and more spheres of human activity. Many computer applications are developed as information distribution systems, of which the Internet is one of the best known and widely used. With enormous quantities of data in different languages available on the net, it is essential that more efficient methods of language data extraction are daveloped. Thus this paper is focused on text mining multilingual datasets. Arabic is a highly derivated and inflected language, requiring proper morphological analysis for effective text mining, and yet no standard approach to word stemming has emerged. This work is an attempt towards the development of a tool useful in the analysis of Arabic-English texts, and is achieved through the multilingual text mining (MTM) of a combined Arabic-English corpus. This project is based on Self- Organizing Map (SOM) and uses an Arabic-English text corpus as the test-bed. Issues related to Arabic-English text mining, stemming and clustering are discussed in this paper. To the author's knowledge, there is no significant literature available regarding SOM techniques applied to Arabic-English language text mining. In this work a framework and the outcome of its implementation is presented.
AB - Computer information and retrieval is becoming increasingly sophisticated and is being exploited in more and more spheres of human activity. Many computer applications are developed as information distribution systems, of which the Internet is one of the best known and widely used. With enormous quantities of data in different languages available on the net, it is essential that more efficient methods of language data extraction are daveloped. Thus this paper is focused on text mining multilingual datasets. Arabic is a highly derivated and inflected language, requiring proper morphological analysis for effective text mining, and yet no standard approach to word stemming has emerged. This work is an attempt towards the development of a tool useful in the analysis of Arabic-English texts, and is achieved through the multilingual text mining (MTM) of a combined Arabic-English corpus. This project is based on Self- Organizing Map (SOM) and uses an Arabic-English text corpus as the test-bed. Issues related to Arabic-English text mining, stemming and clustering are discussed in this paper. To the author's knowledge, there is no significant literature available regarding SOM techniques applied to Arabic-English language text mining. In this work a framework and the outcome of its implementation is presented.
UR - http://www.scopus.com/inward/record.url?scp=84871487786&partnerID=8YFLogxK
M3 - Published conference contribution
AN - SCOPUS:84871487786
SP - 174
EP - 181
BT - MAICS 2008 - Proceedings of the 19th Midwest Artificial Intelligence and Cognitive Science Conference
T2 - 19th Midwest Artificial Intelligence and Cognitive Science Conference, MAICS 2008
Y2 - 12 April 2008 through 13 April 2008
ER -