Document Similarity Detection using K-Means and Cosine Distance

Abstract

A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.

Authors and Affiliations

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi

Keywords

Related Articles

Enhancing Business Intelligence in a Smarter Computing Environment through Cost Analysis

  The paper aims at improving Business Intelligence in a Smarter Computing Environment through Cost Analysis. Smarter Computing is a new approach to designing IT infrastructures to create new opportunities like...

Tutoring Functions in a Blended Learning System: Case of Specialized French Teaching

There is an emergence of blended learning today which combines diversified teaching methods, alternating distance learning and classroom learning. As a matter of fact, most Moroccan universities are presently aware of th...

A Semantics for Concurrent Logic Programming Languages Based on Multiple-Valued Logic

In order to obtain an understanding of parallel logic thought it is necessary to establish a fully abstract model of the denotational semantics of logic programming languages. In this paper, a fixed point semantics for t...

Nonlinear Condition Tolerancing Using Monte Carlo Simulation

To ensure accuracy and performance of the products, designers tend to hug the tolerances. While, manufacturers prefer to increase them in order to reduce costs and ensure competition. The analysis and synthesis of tolera...

  Skew correction for Chinese character using Hough transform

 Chinese Handwritten character recognition is an emerging field in Computer Vision and Pattern Recognition. Documents acquired through Scanner, Mobile or Camera devices are often prone to Skew and Correction of skew...

Download PDF file
  • EP ID EP468311
  • DOI 10.14569/IJACSA.2019.0100222
  • Views 63
  • Downloads 0

How To Cite

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi (2019). Document Similarity Detection using K-Means and Cosine Distance. International Journal of Advanced Computer Science & Applications, 10(2), 165-170. https://www.europub.co.uk/articles/-A-468311