Data variables. The goal of the analysis is

DataMining is an essential process where intelligent methods are applied to takeout needed information from the bulky amount of data.Clustering is one of the main techniques in datamining application, which is the collection of similar objects to one group anddifferent objects in to another group. It is contributing areas of researchinclude data mining, statistics, machine learning, spatial database technology,biology and marketing. It is also works an unsupervised learning task where oneseeks to identify a finite set of categories termed clusters to describe thedata. Clustering partitions is divide a large data sets into groups based theirsimilarity which is called data segmentation.

Major clustering techniques arepartitioning methods, hierarchical methods, density based method, grid basedmethod and model based method. Partitioning method, divide objects in to numberof partitions and each partition represents a cluster. Hierarchical method,works by grouping data objects in to a tree of clusters. Density method,discover arbitrary shape. Grid method quantizes the object space into a finitenumber of cells. Model based method, optimize the fit between the given dataand some mathematical model.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
4,80
Writers Experience
4,80
Delivery
4,90
Support
4,70
Price
Recommended Service
From $13.90 per page
4,6 / 5
4,70
Writers Experience
4,70
Delivery
4,60
Support
4,60
Price
From $20.00 per page
4,5 / 5
4,80
Writers Experience
4,50
Delivery
4,40
Support
4,10
Price
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

This survey paper based on analysis ofpartitioning method and hierarchical method is done.KEYWORDS: Data mining, Clustering,Partitioning method, Grid based, Model based, Density             Based and Hierarchical based.1.      INTRODUCTIONData mining is a type of sorting technique which isactually used to extract hidden patterns from large databases. Data miningrefers extracting knowledge and mining from large amount of data. Sometimesdata mining treated as knowledge discovery in database (KDD) which is amulti-step process, requires accessing and preparing data for a mining thedata, data mining algorithm, analyzing results and taking appropriate action.The data, which is accessed, can be stored in one or more operationaldatabases.

In data mining the data can be mined by passing various processes. In data mining the datais mined using two learning approaches i.e. supervised learning or unsupervisedlearning. Supervised learning often also called directed datamining the variables under investigation can be split into two groups:explanatory variables and one (or more) dependent variables. The goal of theanalysis is to specify a relationship between the dependent variable andexplanatory variables the as it is done in regression analysis.

To proceed withdirected data mining techniques the values of the dependent variable must beknown for a sufficiently large part of the data set. Unsupervised learning, all the variables are treatedin same way; there is no distinction between dependent and explanatoryvariables. However, in contrast to the name undirected data mining, still thereis some target to achieve. This target might be as data reduction as general ormore specific like clustering. Comparison between unsupervised andsupervised learning:The dividing line between unsupervised learning andsupervised learning is the same that distinguishes discriminate analysis fromcluster analysis. Supervised learning requires, target variable should be welldefined and that a sufficient number of its values are given. Unsupervisedteach typically either the target variable has only been recorded for too smalla number of cases or the target variable is unknown.

Clustering is the most interesting topics in datamining which aims to find inherent structures in data and finds some evocativesubgroups for further analysis. A good clustering method will produce high quality clusterswith high intra-cluster similarity and low inter cluster similarity. Thequality of a result produced by clustering depends on both the similaritymeasure used by the method and its implementation. The quality of a clustersproduced by clustering method is also measured by its ability to discover someor all of the hidden patterns. Other requirements of clustering algorithms arescalability, ability to deal with insensitivity to the order of input recordsand with noisy data. 2.

 CLUSTER METHODS2.1PARTITIONING METHOD            Partitioning methods attain a single level partition of objects which isused to greedy heuristics that are used iteratively to obtain a local optimumsolution. Given n objects, these methods make k<=n clusters of data and usean iterative relocation method.

It is assumed that each cluster has at leastone object and each object belongs to only one cluster. Objects can berelocated between clusters as the clusters are refined. Aim of this method isto reduce the variance within each cluster as much as possible and have largevariance between the clusters.

Partitioning algorithm K-means, BisectingK-means2.2HIERARCHICAL METHOD               HierarchicalMethod obtains a nested partition of the objects and result will be produce asa tree structure. However, the nested series of clusters as opposed to thepartitioning method which produce only a flat set of clusters. Either startswith one cluster and then split into smaller clusters (called divisive or topdown) or start with each object in an individual cluster and then try to mergesimilar clusters into larger cluster and larger clusters(called agglomerativeor bottom up).

2.3DENSITY BASED METHOD            Density based method typically for each data point in a cluster, at leasta minimum number of points must exist within a given radius. These method dealswith arbitrary shape clusters since the major requirement of such methods isthat each cluster be a dense region of points surrounded by regions of lowdensity.2.

4GRID BASED METHOD            Grid based method approach the object space rather than the data isdivided into a grid. This is based on characteristic of the data and suchmethod deals with non-numeric data more easily which is not affected by dataordering.2.5MODEL BASED METHOD             Amodel based method is assumed, perhaps depending on a probability distribution.Essentially the algorithm tries to build clusters with a high level ofsimilarity within them and a low level of similarity measurement is based onthe mean values and the algorithm tries to minimize the squared error function. 3.     ANALYSIS OF PARTITIONING ANDHIERARCHICAL METHODPARTITIONING METHODK-means AlgorithmK-means assumes documents are real-valued vectors.In K-means clusters are based on centroids of point in cluster.

The centroid is(typically) the mean of the points in the cluster. Each point is assigned tothe cluster with the closest centroid Number of clusters, K, must be specified.’Closeness’ is measured by cosine similarity, Euclidean distance, correlation,etc.

K-means clustering will converge for common similarity measures mentionedabove. Mostly meeting happens in the first little iteration, the stoppingcondition often changed to ‘Until relatively few points change clusters’.K-meansalgorithmStepsinvolved in K-MeanSelectK, as the initial centroids.RepeatAssigningall points to the closest centroid from K cluster.Re-Evaluatethe centroid of each cluster.Untilthecentroids don’t change.The important property of k-means algorithm is, itis efficient in processing large data sets, it often terminates at a localoptimum, it works only on numeric values and the cluster has convex shapes.

K-medoidsor PAM (Partition around mediods), each cluster is represented by one of theobjects in the cluster. Find representative objects, called medoids, inclusters. PAM starts from an initial set of medoids and iteratively replacesone of the medoids by one of the non-medoids if it improves the total distanceof the resulting clustering and works effectively for small data sets whencompared to large data sets.Bisecting K –means algorithm            TheBisecting K-means algorithm is a straight forward extension of the basic K –means algorithm that is based on a simple idea: to obtain K clusters, split theset of all points into two clusters, select one of these clusters, select oneof these clusters to split, and so on, until K clusters have been produced.

Bisectingk –means algorithmStepsinvolved in Bisecting K-MeanInitializethe list of clusters to contain the cluster consisting of all points.RepeatRemovea cluster from the list of clusters{Performseveral “trial” bisections of the chosen cluster}ForI = 1to number of trials do            Bisect the selected cluster usingbasic K-meansEndforSelectthe two clusters from the bisection with lowest total SSE.Addthese two clusters to the list of clusters.Untilthe list of clusters contains K Clusters.

            We often refine the resultingclusters by using their centroids as the initial centroids for the basic K –means algorithm. This is necessary because, although the K –means algorithm isguaranteed to find a clustering that represents a local minimum with respect tothe SSE, in Bisecting K –means we are using the K- means algorithm “Locally”,i.e., to bisect individual clusters.  Prosand Cons of Partitioning Clustering It is easy to implement and by k-mean algorithm.

Reassignmentmonotonically decreases G since each vector is assigned to the closest centroidand drawback of this algorithm is whenever a point is close to the center of anothercluster, then it gives poor result due to overlapping of data points, the usershould predefined the number of cluster, document partition unchanged, centroidposition don’t change and there are fixed number of iteration. HIERARCHICAL METHODS Agglomerative Hierarchical Algorithm            Start with thepoints as individual clusters and, at each step, merge the most similar orclosest pair of clusters. This requires a definition of cluster similarity ordistance.Stepsinvolved in Agglomerative Hierarchical AlgorithmComputethe similarity between all pairs of clusters, i.e.

, calculate a similaritymatrix whose ijth entrygives the similarity between the ith andjth clusters.Mergethe most similar (closest) two clusters.Updatethe similarity matrix to reflect the pair wise similarity between the newcluster and the original clusters.Repeatsteps 2 and 3 until only a single cluster remains.DivisiveHierarchical Algorithm?  Startat the top with all patterns in one cluster?  Thecluster is split using a flat clustering algorithm?  Thisprocedure is applied recursively until each pattern is in its own singletoncluster Pros and Cons of Hierarchical ClusteringThe main advantage of hierarchical clustering is ithas no a- priori information about the number of clusters required and it iseasy to implement and gives best result in some cases. The cons of thehierarchical clustering is that the algorithm can never undo what was donepreviously, no objective function is directly minimized and sometimes it isdifficult to identify the correct number of clusters by the dendrogram. 4.     CONCLUSIONData mining is used to extract the neededinformation from the large amount of data.

Clustering is the main task of datamining which is form a similar object into same group. This paper representedthe analysis of partitioning method and hierarchical method. K-meanalgorithm has biggest advantage of clustering large data sets and itsperformance increases as number of clusters increases.

But its use is limitedto numeric values. Therefore Agglomerative and Divisive Hierarchical algorithmwas adopted for categorical data, but due to its complexity a new approach forassigning rank value to each categorical attribute using K- means can be usedin which categorical data is first converted into numeric by assigning rank.  5.

     REFERENCES1.      PradeepRai, Shubha Singh” A Survey of Clustering Techniques” International Journal ofComputer Applications, October 2010.2.      Han,J. and Kamber, M. Data Mining: Concepts and Techniques, 2001 (Academic Press,San Diego, California, USA). 3.

      K.Kameshwaran,K.Malarvizhi “Survey on Clustering Techniques in Data mining” international Journalof Computer Science and Information Technologies, 20144.      K.P.Soman,Shyam sundar, V.

Ajay, Insight into Data mining theory and practice, 2006(Prentice hall of india limited).5.      AnoopKumar Jain, Prof. Satyam Maheswari “Survey of Recent Clustering Techniques inData Mining”, International Journal of Computer Science and ManagementResearch.

6.      JiaweiHan and Micheline Kamber, “Data Mining: Concepts and Techniques,” MorganKaufmann Publishers, August 2000.7.      M.Vijayalakshmi,M.

Renuka Devi, “A Survey of Different Issue of Different clustering AlgorithmsUsed in Large Data sets”, International Journal of Advanced Research inComputer Science and Software Engineering.