Efficiency of K-Means Clustering Algorithm in Mining Outliers from Large Data Sets
Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 9
Abstract
This paper presents the performance of k-means clustering algorithm, depending upon various mean values input methods. Clustering plays a vital role in data mining. Its main job is to group the similar data together based on the haracteristic they possess. The mean values are the centroids of the specified number of cluster groups. The centroids, though gets changed during the process of clustering, are alculated using several methods. Clustering algorithms can be applied for image analysis, pattern recognition, bio-informatics and in several other fields. The clustering algorithm consists to two stages with first stage forming the clusters-calculating centroid and the second stage determining the outliers. There are three methods for assigning the mean values in k-means clustering algorithm. The three mean value assignment methods are implemented, performance is analysed and comparison of every method is done. Outliers, the is advantage of the process are used in the analyzation to determine the performance with various mean inputs and methods.
Authors and Affiliations
Sridhar. A , Sowndarya. S
Inheritance Hierarchy Based Reuse & Reusability Metrics in OOSD
Reuse and reusability are two major aspects in object oriented software which can be measured from inheritance hierarchy. Reusability is the prerequisite of reuse but both may or may not be measured using same metric. Th...
Adaptive Background subtraction in Dynamic Environments Using Fuzzy Logic
Abstract— Extracting a background from an image is the enabling step for many high-level vision processing tasks, such as object tracking and activity analysis. Although there are a number of object extraction algorithms...
Two Factor Biometric Key for Secure Wireless Networks
The applications of wireless networks is steadily increasing through out the world. Wireless transactions are now appening in highly secure banking networks. To have more reliable networks, security of wireless networks...
Fractals Based Clustering for CBIR
Fractal based CBIR is based on the self similarity fundamentals of fractals. Mathematical and natural fractals are the shapes whose roughness and fragmentation neither tend to vanish, nor fluctuate, but remain essentiall...
Hybrid Particle Swarm Optimization for Regression Testing
Regression Testing ensures that any enhancement made to software will not affect specified functionality of software. The execution of all test cases can be long and complex to run; this makes it a costlier process. The...