Difference between revisions of "Clustering"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This is a technical [[category::method]] discussed in Social Media Analysis 10-802 in Spring 2010. == What problem does it address == Clustering refers to creating multiple…')
 
 
Line 10: Line 10:
 
           d : data instances
 
           d : data instances
 
           a : data attributes
 
           a : data attributes
           dFunc : distance function
+
           dFunc : distance function between data instances
  
 
* Output - c : cluster of data instances
 
* Output - c : cluster of data instances
 +
  - Choose some initial seed centroid for clusters
 +
  - Sort all the data instances  d into clusters c in accordance with centroid proximity derived by using dFunc
 +
  - Re-evaluate the centroid
 +
  - Perform the above procedure till some clustering evaluation criteria has been fulfilled
  
 
== Used in ==
 
== Used in ==

Latest revision as of 23:42, 6 February 2011

This is a technical method discussed in Social Media Analysis 10-802 in Spring 2010.

What problem does it address

Clustering refers to creating multiple group of elements which exhibit similar properties in the attribute values. This group of elements which have comparable value for different attributes for a domain is called a cluster.

Algorithm

  • Input -
         d : data instances
         a : data attributes
         dFunc : distance function between data instances
  • Output - c : cluster of data instances
 - Choose some initial seed centroid for clusters
 - Sort all the data instances  d into clusters c in accordance with centroid proximity derived by using dFunc
 - Re-evaluate the centroid
 - Perform the above procedure till some clustering evaluation criteria has been fulfilled

Used in

This technique is widely used practice. e.g clustering of similar documents, summarization etc.

Relevant Papers