Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data.
Babu, B.Hari; Chandra, N.Subash; and Gopal, T. Venu
"Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches,"
International Journal of Computer Science and Informatics: Vol. 2:
4, Article 13.
Available at: https://www.interscience.in/ijcsi/vol2/iss4/13