# Important terms used in Analysis of Social Media

Below are some key concepts discussed in Social Media Analysis 10-802 in Spring 2010

An N-by-N matrix A encoding an N-node graph, where a(i,j)=1 if nodes i and j are connected. See the Class Meeting for 10-802 01/28/2010
associative sorting
The tendency of "actors" in a social network to form links based on common properties (e.g., friendships between people of common age). The opposite is disassociative sorting. See the Class Meeting for 10-802 01/28/2010
association network
A graph with two types of nodes, one corresponding to "actors" (usually people) and one corresponding to organizations (e.g., clubs, neighborhoods, workplaces, etc). See the Class Meeting for 10-802 01/28/2010
block diagonal matrix
A matrix that can be decomposed into (sometimes approximately) uniformly dense blocks - equivalently, the adjacency matrix for a stochastic blockmodel graph. See the Class Meeting for 10-802 02/04/2010
block matrix
A matrix that can be decomposed into (sometimes approximately) uniformly dense blocks all of which span the diagonal - equivalently, the adjacency matrix for a stochastic blockmodel graph where all links between block i,j for i!=j have the same probability. See the Class Meeting for 10-802 02/04/2010
Barbosi-Albert random graph
A random graph model in which nodes are added one by one, and attached to previous nodes stochastically, in a way that prefers links to nodes with higher degree. These graphs have scale-free degree distribution and small diameter. See the Class Meeting for 10-802 01/26/2010
betweenness of a node or edge in a graph
Intuitively actors or edges between actors that are involved in many paths in a social network may have different roles than others (e.g., some role involved in communication or coordination). There are a number of proposed measures of betweenness. See the Class Meeting for 10-802 01/28/2010
centrality of a node in a graph
Intuitively actors that are central in a social network may have different roles than others (e.g., some sort of leadership role). There are a number of proposed measures of centrality. See the Class Meeting for 10-802 01/28/2010
degree of a node in a graph
The number of neighbors of a node. The distribution of node degrees is one high-level statistic of a graph that is frequently analyzed. See the Class Meeting for 10-802 01/26/2010.
degree matrix for a graph
An N-by-N matrix encoding the degree of each node in an N-node graph, where d(i,i) is the degree of node i, and d(i,j)=0 for i!=j. See the Class Meeting for 10-802 02/04/2010
diameter of a graph
The maximum length over all pairs of nodes u,v of the shortest path between u and v. Diameter, and variants like mean diameter, is a high-level statistic of a graph that is frequently analyzed. See the Class Meeting for 10-802 01/26/2010
eigenvector of a matrix
If W is N-by-N matrix, an eigenvector is a vector v such that Wv=cv, where c is a constant (the eigenvalue of v). See the Class Meeting for 10-802 02/04/2010
Erdos-Renyi random graph
A random graph model in pairs of nodes are linked independently according to a Bernoulli. These graphs usually have degree distributions that are binomially distributed, and small degree. See the Class Meeting for 10-802 01/26/2010
giant connected component of a graph
Many graphs, including most natural graphs and most random graph models, will have one large connected component (rather than say two or three). This is usually called the giant connected component. See the Class Meeting for 10-802 01/26/2010
group cohesiveness in graphs
This refers to a high degree of homophily in a portion of a graph, presumably one corresponding to a meaningful subgroup or subcommunity in the graph. See the Class Meeting for 10-802 01/28/2010
homophily in graphs
The tendency for two nodes u,v connected to a third node w to be connected to each other. See the Class Meeting for 10-802 01/26/2010
Laplacian of a graph
The matrix D-A, where D is the degree matrix of the graph and A the adjacency matrix. This matrix is important in spectral clustering. There are some important variants of this - for instance the normalized Laplacian is the matrix I-W where W is the weighted adjacency matrix. See the Class Meeting for 10-802 02/04/2010
Lexicon
A list of words with semantic or other (linguistic?) information about them. E.g. WordNet, the Pitt/OpinionFinder subjectivity lexicon, [dictionary.com dictionary.com], etc.
private state
In sentiment analysis, a private state of a person is a state that can't be verified by anyone else. Subjective statements are defined to be statements about private state. See the Class Meeting for 10-802 01/19/2010
semantic orientation
In sentiment analysis, a document (or word, or phrase, or ...) has a positive orientation if it is primarily favorable (about some "target") and a negative orientation if it is primarily unfavorable. Semantic orientation is also sometimes called "polarity". See the Class Meeting for 10-802 01/14/2010
small world effect in graphs
An informal name given to tendency for natural graphs to have relatively small diameter. See the Class Meeting for 10-802 01/26/2010
spatial segregation models
A class of models that analyze the tendency of strongly homogenous spatial neighborhoods to develop from relatively weak local preferences for spatial homogeneity. See the Class Meeting for 10-802 01/26/2010
stochastic blockmodel
A random graph model in which there are k "blocks", and links between each pair of blocks i,j are determined by a Bernoulli random variable. See the Class Meeting for 10-802 01/28/2010