Effective Listings of Function Stop words for Twitter

Abstract

Many words in documents recur very frequently but are essentially meaningless as they are used to join words together in a sentence. It is commonly understood that stop words do not contribute to the context or content of textual documents. Due to their high frequency of occurrence, their presence in text mining presents an obstacle to the understanding of the content in the documents. To eliminate the bias effects, most text mining software or approaches make use of stop words list to identify and remove those words. However, the development of such top words list is difficult and inconsistent between textual sources. This problem is further aggravated by sources such as Twitter which are highly repetitive or similar in nature. In this paper, we will be examining the original work using term frequency, inverse document frequency and term adjacency for developing a stop words list for the Twitter data source. We propose a new technique using combinatorial values as an alternative measure to effectively list out stop words.

Authors and Affiliations

Murphy Choy

Keywords

Related Articles

ASSA: Adaptive E-Learning Smart Students Assessment Model

Adaptive e-learning can be improved through measured e-assessments that can provide accurate feedback to instructors. E-assessments can not only provide the basis for evaluation of the different pedagogical methods used...

High Accuracy Arabic Handwritten Characters Recognition Using Error Back Propagation Artificial Neural Networks

This manuscript considers a new architecture to handwritten characters recognition based on simulation of the behavior of one type of artificial neural network, called the Error Back Propagation Artificial Neural Network...

Weld Defect Categorization from Welding Current using Principle Component Analysis

Real time welding quality control still remains a challenging task due to the dynamic characteristic of welding. Welding current of gas metal arc welding possess valuable information that can be analyzed for weld quality...

Prediction Method for Large Diatom Appearance with Meteorological Data and MODIS Derived Turbidity and Chlorophyll-A in Ariake Bay Area in Japan

Prediction method for large diatom appearance in winter with meteorological data and MODIS derived turbidity and chlorophyll-a in Ariake Bay Area in Japan is proposed. Mechanism for large diatom appearance in winter is d...

An Efficient Link Prediction Technique in Social Networks based on Node Neighborhoods

The unparalleled accomplishment of social networking sites, such as Facebook, LinkedIn and Twitter has modernized and transformed the way people communicate to each other. Nowadays, a huge amount of information is being...

Download PDF file
  • EP ID EP124949
  • DOI -
  • Views 92
  • Downloads 0

How To Cite

Murphy Choy (2012). Effective Listings of Function Stop words for Twitter. International Journal of Advanced Computer Science & Applications, 3(6), 8-11. https://www.europub.co.uk/articles/-A-124949