Document Similarity Detection using K-Means and Cosine Distance
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 2
Abstract
A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.
Authors and Affiliations
Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi
Skew Detection and Correction of Mushaf Al-Quran Script using Hough Transform
Document skew detection and correction is mainly one of base preprocessing steps in the document analysis. Correction of the skewed scanned images is critical because it has a direct impact on image quality. In this pape...
A Novel Approach for Dimensionality Reduction and Classification of Hyperspectral Images based on Normalized Synergy
During the last decade, hyperspectral images have attracted increasing interest from researchers worldwide. They provide more detailed information about an observed area and allow an accurate target detection and precise...
Use of Blockchain in Healthcare: A Systematic Literature Review
Blockchain is an emerging field which works on the concept of a digitally distributed ledger and consensus algorithm removing all the threats of intermediaries. Its early applications were related to the finance sector b...
Knowledge Discovery based Framework for Enhancing the House of Quality
Mining techniques proved to have a successful impact in different fields for many targets; one of these targets is to gain customers’ satisfaction through enhancing the products’ quality according to the voice of these c...
Preprocessor Agent Approach to Knowledge Discovery Using Zero-R Algorithm
Data mining and multiagent approach has been used successfully in the development of large complex systems. Agents are used to perform some action or activity on behalf of a user of a computer system. The stu...