Urdu Text Classification using Majority Voting

Abstract

Text classification is a tool to assign the predefined categories to the text documents using supervised machine learning algorithms. It has various practical applications like spam detection, sentiment detection, and detection of a natural language. Based on the idea we applied five well-known classification techniques on Urdu language corpus and assigned a class to the documents using majority voting. The corpus contains 21769 news documents of seven categories (Business, Entertainment, Culture, Health, Sports, and Weird). The algorithms were not able to work directly on the data, so we applied the preprocessing techniques like tokenization, stop words removal and a rule-based stemmer. After preprocessing 93400 features are extracted from the data to apply machine learning algorithms. Furthermore, we achieved up to 94% precision and recall using majority voting.

Authors and Affiliations

Muhammad Usman, Zunaira Shafique, Saba Ayub, Kamran Malik

Keywords

Related Articles

Implementation of Novel Medical Image Compression Using Artificial Intelligence

The medical image processing process is one of the most important areas of research in medical applications in digitized medical information. A medical images have a large sizes. Since the coming of digital medical infor...

Towards GP Sentence Parsing of V+P+CP/NP Structure

Computational linguistics can provide an effective perspective to explain the partial ambiguity during machine translation. The structure of V+Pron+CP/NP has the ambiguous potential to bring Garden Path effect. If Tell+P...

Design of a Cloud Learning System Based on Multi-Agents Approach

Cloud Computing can provide many benefits for university. It is a new paradigm of IT, which provides all resources such as software (SaaS), platform (PaaS) and infrastructure (IaaS) as a service over the Internet. In clo...

An Assessment of Open Data Sets Completeness

The rapid growth of open data sources is driven by free-of-charge contents and ease of accessibility. While it is convenient for public data consumers to use data sets extracted from open data sources, the decision to us...

A Systematic Report on Issue and Challenges during Requirement Elicitation

We say that researchers made a lot of contribution in requirement engineering by introducing many helpful tools and efficient methods for Requirement Engineering (RE) but simultaneously this field demands more research t...

Download PDF file
  • EP ID EP133714
  • DOI 10.14569/IJACSA.2016.070836
  • Views 119
  • Downloads 0

How To Cite

Muhammad Usman, Zunaira Shafique, Saba Ayub, Kamran Malik (2016). Urdu Text Classification using Majority Voting. International Journal of Advanced Computer Science & Applications, 7(8), 265-273. https://www.europub.co.uk/articles/-A-133714