Computational methods frequently used in Analysis of Social Media

From Cohen Courses
Jump to navigationJump to search

These are some computational methods discussed in Social Media Analysis 10-802 in Spring 2010.

k-means clustering
a way to cluster items so that there are items within a cluster are close, and distances between clusters are larger. K-means clustering is sometimes used as a subroutine in spectral clustering. See the Class Meeting for 10-802 02/04/2010
Naive Bayes classifier learning
A fast and simple way to learn a classifier, especially useful for text. See the Class Meeting for 10-802 01/14/2010
pointwise mutual information (PMI)
a particular way of measuring correlation between two items, usually words or phrases in text. Comparing the PMI of a word w to positive terms (e.g., "excellent") to the PMI of word w to negative terms (e.g., "poor") is a good indicator of semantic orientation, if a large corpus is used. See the Class Meeting for 10-802 01/14/2010
spectral clustering
a way to cluster nodes in a graph into "blocks" so that there are many connections within blocks and fewer between blocks, based on clustering in a space defined by the eigenvalues of the graph Laplacian. This is a way of solving a graph cut problem. See the Class Meeting for 10-802 02/04/2010
support vector machine classifier learning (SVMs)
A widely-used way to learn a classifier, especially useful for text. See the Class Meeting for 10-802 01/14/2010
Topic model
A widely-used way to learn the underlying semantic structure of a collection of documents. See the Class Meeting for 10-802 02/16/2010
Relational topic model
A specific topic model that allows links between documents to be modeled. See the Class Meeting for 10-802 02/18/2010
Random walk with restart
A technique to find relavance score between any two nodes in a weighted graph
Submodularity
A property of cost functions that explains the utility of greedy methods and speeds up their performance when used on cost functions that have this trait.
Logistic regression
A technique used for prediction of the probability of occurrence of an event by fitting data to a logistic curve.