Difference between revisions of "Clustering"
From Cohen Courses
Jump to navigationJump to search (Created page with 'This is a technical [[category::method]] discussed in Social Media Analysis 10-802 in Spring 2010. == What problem does it address == Clustering refers to creating multiple…') |
|||
Line 10: | Line 10: | ||
d : data instances | d : data instances | ||
a : data attributes | a : data attributes | ||
− | dFunc : distance function | + | dFunc : distance function between data instances |
* Output - c : cluster of data instances | * Output - c : cluster of data instances | ||
+ | - Choose some initial seed centroid for clusters | ||
+ | - Sort all the data instances d into clusters c in accordance with centroid proximity derived by using dFunc | ||
+ | - Re-evaluate the centroid | ||
+ | - Perform the above procedure till some clustering evaluation criteria has been fulfilled | ||
== Used in == | == Used in == |
Latest revision as of 23:42, 6 February 2011
This is a technical method discussed in Social Media Analysis 10-802 in Spring 2010.
What problem does it address
Clustering refers to creating multiple group of elements which exhibit similar properties in the attribute values. This group of elements which have comparable value for different attributes for a domain is called a cluster.
Algorithm
- Input -
d : data instances a : data attributes dFunc : distance function between data instances
- Output - c : cluster of data instances
- Choose some initial seed centroid for clusters - Sort all the data instances d into clusters c in accordance with centroid proximity derived by using dFunc - Re-evaluate the centroid - Perform the above procedure till some clustering evaluation criteria has been fulfilled
Used in
This technique is widely used practice. e.g clustering of similar documents, summarization etc.