Record Matching Over Query Results Using Fuzzy Ontological Document Clustering

Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 2

Abstract

Record matching is an essential step in duplicate detection as it identifies records representing same real-world entity. Supervised record matching methods require users to provide training data and therefore cannot be applied for web databases where query results are generated on-the-fly. To overcome the problem, a new record matching method named Unsupervised Duplicate Elimination (UDE) isproposed for identifying and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier, support vector machine classifier are iteratively employed with UDE where the first classifier utilizes the weights set to match records from different data sources. With the matched records as positive dataset and non uplicate records as negative set, the second classifier identifies new duplicates. Then, a new methodology to automatically interpret and cluster knowledge documents using an ontology schema is presented. Moreover, a fuzzy logic control approach is used to match suitable document cluster(s) for given patents based on their derived ontological semantic webs. Thus, this paper takes advantage of similarity among records from web databases and solves the online duplicate detection problem.

Authors and Affiliations

V. Vijayaraja , R. Prasanna Kumar , M. A. Mukunthan , G. Bharathi Mohan

Keywords

Related Articles

Migrations amid Generations of Wireless networks

This article presents an overview of migrations among various generations with emphasis on trends in the areas of wireless networking. Migrations from one type of network to another are being experienced due to advances...

A Simple Message-Encryption Scheme based on Amino-acid Protein Sequence

Recently, biological techniques become more and more popular, as they are applied to many kinds of applications, authentication protocols, biochemistry, and cryptography. . Bioinformatics [2] plays a very important role...

AGE CLASSIFICATION BASED ON SIMPLE LBP TRANSITIONS

The research related to age estimation using face images has become increasingly important, due to the fact that it has a variety of potentially useful applications. An age estimation system is generally composed of agin...

Stochastic Method for De-shadowing and Objects Retrieval from High Resolution Images

Remote sensing is emerging as a strong tool to extract information about the earth resources from the satellite imagery. But this information gets affected by shadow in urban areas. High-resolution satellite imagery offe...

DESIGN AND SIMULATION OF MULTIBAND PLANAR INVERTED-F ANTENNA FOR MOBILE PHONE APPLICATIONS

The proposed multiband planar inverted-F antenna has a very simple structure. The antenna has only one layer so it is easy to fabricate. The size of radiating patch of proposed antenna is 20mm× 25mm, while the size of gr...

Download PDF file
  • EP ID EP97303
  • DOI -
  • Views 139
  • Downloads 0

How To Cite

V. Vijayaraja, R. Prasanna Kumar, M. A. Mukunthan, G. Bharathi Mohan (2011). Record Matching Over Query Results Using Fuzzy Ontological Document Clustering. International Journal on Computer Science and Engineering, 3(2), 926-932. https://www.europub.co.uk/articles/-A-97303