What problem does it address
Clustering refers to creating multiple group of elements which exhibit similar properties in the attribute values. This group of elements which have comparable value for different attributes for a domain is called a cluster.
- Input -
d : data instances a : data attributes dFunc : distance function between data instances
- Output - c : cluster of data instances
- Choose some initial seed centroid for clusters - Sort all the data instances d into clusters c in accordance with centroid proximity derived by using dFunc - Re-evaluate the centroid - Perform the above procedure till some clustering evaluation criteria has been fulfilled
This technique is widely used practice. e.g clustering of similar documents, summarization etc.