Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

Authors and Affiliations

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem

Keywords

Related Articles

Multi-Criteria Wind Turbine Selection using Weighted Sum Approach

Wind energy is becoming a potential source for renewable and clean energy. An important factor that contributes to efficient generation of wind power is the use of appropriate wind turbine. However, the task of selecting...

A Novel Network user Behaviors and Profile Testing based on Anomaly Detection Techniques

The proliferation of smart devices and computer networks has led to a huge rise in internet traffic and network attacks that necessitate efficient network traffic monitoring. There have been many attempts to address thes...

Applicability of Data Mining Technique Using Bayesians Network in Diagnosis of Genetic Diseases

This study aims to identify a methodology to aid in the identification of diagnosis for chromosomal abnormalities and genetic diseases, presenting as a tutorial model the Turner Syndrome. So, it has been used classificat...

Comparison between Traditional Approach and Object-Oriented Approach in Software Engineering Development

This paper discusses the comparison between Traditional approaches and Object-Oriented approach. Traditional approach has a lot of models that deal with different types of projects such as waterfall, spiral, iterative an...

Performance Impact of Relay Selection in WiMAX IEEE 802.16j Multi-hop Relay Networks

Worldwide Interoperability for Microwave Access network accepts the challenge of last mile wireless access for internet. IEEE 802.16 standard, commercially known as WiMAX provide wireless broadband experience to the end...

Download PDF file
  • EP ID EP136006
  • DOI 10.14569/IJACSA.2013.040820
  • Views 146
  • Downloads 0

How To Cite

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem (2013). Investigate the Performance of Document Clustering Approach Based on Association Rules Mining. International Journal of Advanced Computer Science & Applications, 4(8), 142-151. https://www.europub.co.uk/articles/-A-136006