Stemming and root-based approaches to the retrieval of Arabic documents on the Web

Journal Title: Webology - Year 2006, Vol 3, Issue 1

Abstract

Using information retrieval systems to gain access to documents in languages other than English is becoming an increasingly significant problem. Rules, theories, algorithms, and retrieval methods designed and developed for English and other morphologically similar languages may or may not apply in the linguistic environments of other languages. The problem is particularly acute in languages that differ radically from English on account of morphological rules. This paper compares the effects stemming and root retrieval on information retrieval in Arabic through an exploratory study of the handling of Arabic words by an English-language search engine (ELSE). Search experiments, using 2000 Arabic documents and 40 Arabic search terms (nouns), were conducted in a Web search engine developed for English (AltaVista) and in an Arabic search engine (al-Idrisi) to compare the performances of stemming and root retrieval and to investigate the possibility of adapting AltaVista for use with Arabic text. The results of the experiments show that more effective retrieval can be accomplished through stemming, and that it is possible to adapt an ELSE for use with Arabic without the need to develop root-retrieval features.

Authors and Affiliations

Haidar Moukdad

Keywords

Related Articles

Google Patents: The global patent search engine

Google Patents (www.google.com/patents) includes over 8 million full-text patents. Google Patents works in the same way as the Google search engine. Google Patents is the global patent search engine that lets users searc...

Institutional Repositories: Content and Culture in an Open Access Environment

As repository technology matures, the cultural and organizational aspects of setting up and running an institutional repository have come to the forefront of the discussion surrounding their deployment. The book delibera...

Virtual Communities, Social Networks and Collaboration, Series: Annals of Information Systems

This book tries to cover all aspects of virtual communities among virtual participants through a collection of inclusive, informative and far-reaching of 13 chapters and 239 pages including preface, a comprehensive table...

Digital Health Information for the Consumer: Evidence and Policy Implications

Wide and easy availability of health information for the general public is something that governments consider beneficial to the public as it improves the public health, helps largescale preventative medicine and eventua...

Library management in electronic environment

The interest of library profession in management sciences is nearly half a century old. A library being a service institution all the functions and principles of management are applicable to libraries as well. S R Rangan...

Download PDF file
  • EP ID EP687497
  • DOI -
  • Views 210
  • Downloads 0

How To Cite

Haidar Moukdad (2006). Stemming and root-based approaches to the retrieval of Arabic documents on the Web. Webology, 3(1), -. https://www.europub.co.uk/articles/-A-687497