Clustering

From Cohen Courses
Jump to navigationJump to search

This is a technical method discussed in Social Media Analysis 10-802 in Spring 2010.

What problem does it address

Clustering refers to creating multiple group of elements which exhibit similar properties in the attribute values. This group of elements which have comparable value for different attributes for a domain is called a cluster.

Algorithm

  • Input -
         d : data instances
         a : data attributes
         dFunc : distance function between data instances
  • Output - c : cluster of data instances
 - Choose some initial seed centroid for clusters
 - Sort all the data instances  d into clusters c in accordance with centroid proximity derived by using dFunc
 - Re-evaluate the centroid
 - Perform the above procedure till some clustering evaluation criteria has been fulfilled

Used in

This technique is widely used practice. e.g clustering of similar documents, summarization etc.

Relevant Papers