Difference between revisions of "Visualization of Social Net"

From Cohen Courses
Jump to navigationJump to search
 
(13 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
Currently there are visualization tools for search, music, networks, online communities, and almost anything else you can think of. In particular, we are interested in the visualization for social network. Some of the best examples include a visualization of connections in last.fm, a visualization of relationships for youtube videos, a visualization tools for related items in Amazon etc. [http://www.readwriteweb.com/archives/the_best_tools_for_visualization.php]
 
Currently there are visualization tools for search, music, networks, online communities, and almost anything else you can think of. In particular, we are interested in the visualization for social network. Some of the best examples include a visualization of connections in last.fm, a visualization of relationships for youtube videos, a visualization tools for related items in Amazon etc. [http://www.readwriteweb.com/archives/the_best_tools_for_visualization.php]
  
In order to develop visualization for social net, people usually use third party tools, one of the most popular tools in academia is the prefuse toolkit from University of California at Berkeley [http://prefuse.org/].
+
In order to develop visualization software for social net, people usually use third party tools, one of the most popular tools in academia is the Prefuse toolkit (Written in Java) from University of California at Berkeley [http://prefuse.org/]. A few others are available too, however, for different purpose. For example, Flare [http://flare.prefuse.org/] is widely used for web application and GUESS [http://graphexploration.cond.org/] supports a command line based front-end. The CMU-ENCORE project [http://www.cs.cmu.edu/~encore/] currently adopted Prefuse for developing a tool for visualizing the Cross-Document Co-reference results.
  
==Our Progress==
+
==Progress==
  
The most common type of Relation Extraction is binary (i.e. for two entities), and can take on one of the following specific forms:
+
CMU-ENCORE project has been focused on this task since the summer of 2010, the overall objective is:
  
* Finding the type of relationship between given entities in text
+
* To serve as an interface to the ENCORE:ENtity CO-REference system
Although in one sense this problem is easier than entity extraction because we only need to make one prediction instead of a vector of predictions, it is still considered harder because it requires a variety of syntactic and semantic features, both local as well as nonlocal. This problem is solved using one of the following types of methods:
+
The Cross Document Co-reference system is designed as a pipeline process of a Within-Document Co-reference process which produces chains of mentions for every document, a Cross-Document Co-reference process to cluster the chain together pair by pair and a visualization tool as the user interface to view the output of the entire system. The visualization tool currently has the following three features:
  
1) Feature based methods
+
1) Visualizing the graph of entities
These methods extract a flat set of features from the input and then invoke an off the shelf classifier like a decision tree or a SVM.
 
  
2) Kernel based methods
+
2) Visualizing the profile of entities
These methods design special kernels to capture similarity between entities with complicated structures.
 
  
3) Rule based methods
+
3) Visualizing the connection of entity groups
These methods create propositional and first order rules over structures around the two entities.
 
  
* For a given entity and relationship, find all entities that satisfy the relationship with the given entity.
+
As illustrated below:
  
* For a given relationship, find all entity pairs that satisfy it.
+
[[File:Vis-ENCORE.png]]
This problem is tackled by using a seed database of entity pairs to learn extraction patterns, which are then used to create candidate triplets (Entity1, Entity2, Relationsip). Finally, these candidates are pruned.
+
 
 +
* To serve as an analytical tool for business intelligence or forensics purpose. Thus the interface should be able to provide evidence (the links to the documents) and serve as a entity-based document browser.
 +
 
 +
As illustrated below:
 +
 
 +
[[File:Vis-ENCORE2.png]]
  
 
==Potential Improvement==
 
==Potential Improvement==
In spite of the extensive research on the topic, the accuracy values of relation extraction systems in ACE evaluations still range only in the 50-70% neighborhood. Open-domain systems do even worse, and aren't usually based on very principled approaches; they contain a lot of special case-handling.
+
The tool continues to evolve, mainly in these three different directions: (1) adding entity lifeline; (2) adding occurrence of the entity; (3) highlighting the entities of interests.
 
 
==References==
 
[1] .
 
[2] .
 

Latest revision as of 23:54, 23 October 2010

This is a technical problem related to one of the term projects in Information Extraction 10-707 in Fall 2010.

Visual representation of social networks is important to understand the network data and convey the result of the analysis [1]. Most of the softwares have besides the analytical tools also modules for network visuaization. Exploration of the data is done through displaying nodes and ties in various layouts, and attributing colors, size and other advanced properties to nodes.

Typical representation of the network data are graphs in network layout (nodes and ties). These are not very easy-to-read and do not allow an intuitive interpretation. Various new methods have been developed in order to display network data in more intuitive format (e.g. Sociomapping [2]).

State of the art

Currently there are visualization tools for search, music, networks, online communities, and almost anything else you can think of. In particular, we are interested in the visualization for social network. Some of the best examples include a visualization of connections in last.fm, a visualization of relationships for youtube videos, a visualization tools for related items in Amazon etc. [3]

In order to develop visualization software for social net, people usually use third party tools, one of the most popular tools in academia is the Prefuse toolkit (Written in Java) from University of California at Berkeley [4]. A few others are available too, however, for different purpose. For example, Flare [5] is widely used for web application and GUESS [6] supports a command line based front-end. The CMU-ENCORE project [7] currently adopted Prefuse for developing a tool for visualizing the Cross-Document Co-reference results.

Progress

CMU-ENCORE project has been focused on this task since the summer of 2010, the overall objective is:

  • To serve as an interface to the ENCORE:ENtity CO-REference system

The Cross Document Co-reference system is designed as a pipeline process of a Within-Document Co-reference process which produces chains of mentions for every document, a Cross-Document Co-reference process to cluster the chain together pair by pair and a visualization tool as the user interface to view the output of the entire system. The visualization tool currently has the following three features:

1) Visualizing the graph of entities

2) Visualizing the profile of entities

3) Visualizing the connection of entity groups

As illustrated below:

Vis-ENCORE.png

  • To serve as an analytical tool for business intelligence or forensics purpose. Thus the interface should be able to provide evidence (the links to the documents) and serve as a entity-based document browser.

As illustrated below:

Vis-ENCORE2.png

Potential Improvement

The tool continues to evolve, mainly in these three different directions: (1) adding entity lifeline; (2) adding occurrence of the entity; (3) highlighting the entities of interests.