Effective Listings of Function Stop words for Twitter
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2012, Vol 3, Issue 6
Abstract
Many words in documents recur very frequently but are essentially meaningless as they are used to join words together in a sentence. It is commonly understood that stop words do not contribute to the context or content of textual documents. Due to their high frequency of occurrence, their presence in text mining presents an obstacle to the understanding of the content in the documents. To eliminate the bias effects, most text mining software or approaches make use of stop words list to identify and remove those words. However, the development of such top words list is difficult and inconsistent between textual sources. This problem is further aggravated by sources such as Twitter which are highly repetitive or similar in nature. In this paper, we will be examining the original work using term frequency, inverse document frequency and term adjacency for developing a stop words list for the Twitter data source. We propose a new technique using combinatorial values as an alternative measure to effectively list out stop words.
Authors and Affiliations
Murphy Choy
ASSA: Adaptive E-Learning Smart Students Assessment Model
Adaptive e-learning can be improved through measured e-assessments that can provide accurate feedback to instructors. E-assessments can not only provide the basis for evaluation of the different pedagogical methods used...
High Accuracy Arabic Handwritten Characters Recognition Using Error Back Propagation Artificial Neural Networks
This manuscript considers a new architecture to handwritten characters recognition based on simulation of the behavior of one type of artificial neural network, called the Error Back Propagation Artificial Neural Network...
Weld Defect Categorization from Welding Current using Principle Component Analysis
Real time welding quality control still remains a challenging task due to the dynamic characteristic of welding. Welding current of gas metal arc welding possess valuable information that can be analyzed for weld quality...
Prediction Method for Large Diatom Appearance with Meteorological Data and MODIS Derived Turbidity and Chlorophyll-A in Ariake Bay Area in Japan
Prediction method for large diatom appearance in winter with meteorological data and MODIS derived turbidity and chlorophyll-a in Ariake Bay Area in Japan is proposed. Mechanism for large diatom appearance in winter is d...
An Efficient Link Prediction Technique in Social Networks based on Node Neighborhoods
The unparalleled accomplishment of social networking sites, such as Facebook, LinkedIn and Twitter has modernized and transformed the way people communicate to each other. Nowadays, a huge amount of information is being...