Document Similarity Detection using K-Means and Cosine Distance
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 2
Abstract
A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.
Authors and Affiliations
Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi
Enhancing Business Intelligence in a Smarter Computing Environment through Cost Analysis
The paper aims at improving Business Intelligence in a Smarter Computing Environment through Cost Analysis. Smarter Computing is a new approach to designing IT infrastructures to create new opportunities like...
Tutoring Functions in a Blended Learning System: Case of Specialized French Teaching
There is an emergence of blended learning today which combines diversified teaching methods, alternating distance learning and classroom learning. As a matter of fact, most Moroccan universities are presently aware of th...
A Semantics for Concurrent Logic Programming Languages Based on Multiple-Valued Logic
In order to obtain an understanding of parallel logic thought it is necessary to establish a fully abstract model of the denotational semantics of logic programming languages. In this paper, a fixed point semantics for t...
Nonlinear Condition Tolerancing Using Monte Carlo Simulation
To ensure accuracy and performance of the products, designers tend to hug the tolerances. While, manufacturers prefer to increase them in order to reduce costs and ensure competition. The analysis and synthesis of tolera...
Skew correction for Chinese character using Hough transform
Chinese Handwritten character recognition is an emerging field in Computer Vision and Pattern Recognition. Documents acquired through Scanner, Mobile or Camera devices are often prone to Skew and Correction of skew...