Difference between revisions of "Visualization of Social Net"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This is a technical [[Category::Problem|problem]] related to one of the term projects in Information Extraction 10-707 in Fall 2010. Relation extraction broadly speaking ref…')
 
Line 1: Line 1:
 
This is a technical [[Category::Problem|problem]] related to one of the term projects in [[Information Extraction 10-707 in Fall 2010]].
 
This is a technical [[Category::Problem|problem]] related to one of the term projects in [[Information Extraction 10-707 in Fall 2010]].
  
Relation extraction broadly speaking refers to the task of relating entities present in a document. This can take on many specific forms, such as labeling the relation between two given entities, or finding all entity pairs that satisfy a relation, or even multiway relation extraction (also called [[Record Extraction]])
+
Visual representation of social networks is important to understand the network data and convey the result of the analysis [1]. Most of the softwares have besides the analytical tools also modules for network visuaization. Exploration of the data is done through displaying nodes and ties in various layouts, and attributing colors, size and other advanced properties to nodes.
  
==History==
+
Typical representation of the network data are graphs in network layout (nodes and ties). These are not very easy-to-read and do not allow an intuitive interpretation. Various new methods have been developed in order to display network data in more intuitive format (e.g. Sociomapping [2]).
 +
 
 +
==State of the art==
  
 
Relation Extraction has been studied in depth at MUC (MUC-6, MUC-7) and ACE (ACE2, ACE 2004, [[ACE 2005 Dataset |ACE 2005]], ACE 2007, ACE 2008) series of conferences and evaluations. For biomedical literature, the BioCreAtIvE II tasks have also been useful.
 
Relation Extraction has been studied in depth at MUC (MUC-6, MUC-7) and ACE (ACE2, ACE 2004, [[ACE 2005 Dataset |ACE 2005]], ACE 2007, ACE 2008) series of conferences and evaluations. For biomedical literature, the BioCreAtIvE II tasks have also been useful.
  
==Details==
+
==Our Progress==
  
 
The most common type of Relation Extraction is binary (i.e. for two entities), and can take on one of the following specific forms:
 
The most common type of Relation Extraction is binary (i.e. for two entities), and can take on one of the following specific forms:
Line 28: Line 30:
 
This problem is tackled by using a seed database of entity pairs to learn extraction patterns, which are then used to create candidate triplets (Entity1, Entity2, Relationsip). Finally, these candidates are pruned.
 
This problem is tackled by using a seed database of entity pairs to learn extraction patterns, which are then used to create candidate triplets (Entity1, Entity2, Relationsip). Finally, these candidates are pruned.
  
==State of the art==
+
==Potential Improvement==
 
In spite of the extensive research on the topic, the accuracy values of relation extraction systems in ACE evaluations still range only in the 50-70% neighborhood. Open-domain systems do even worse, and aren't usually based on very principled approaches; they contain a lot of special case-handling.
 
In spite of the extensive research on the topic, the accuracy values of relation extraction systems in ACE evaluations still range only in the 50-70% neighborhood. Open-domain systems do even worse, and aren't usually based on very principled approaches; they contain a lot of special case-handling.
  
==Related Paper==
+
==References==
The [[RelatedPaper::Information Extraction Survey by Sunita Sarawagi]] contains more detail on this problem, and prior work to solve it.
+
[1] .
 +
[2] .

Revision as of 14:13, 30 September 2010

This is a technical problem related to one of the term projects in Information Extraction 10-707 in Fall 2010.

Visual representation of social networks is important to understand the network data and convey the result of the analysis [1]. Most of the softwares have besides the analytical tools also modules for network visuaization. Exploration of the data is done through displaying nodes and ties in various layouts, and attributing colors, size and other advanced properties to nodes.

Typical representation of the network data are graphs in network layout (nodes and ties). These are not very easy-to-read and do not allow an intuitive interpretation. Various new methods have been developed in order to display network data in more intuitive format (e.g. Sociomapping [2]).

State of the art

Relation Extraction has been studied in depth at MUC (MUC-6, MUC-7) and ACE (ACE2, ACE 2004, ACE 2005, ACE 2007, ACE 2008) series of conferences and evaluations. For biomedical literature, the BioCreAtIvE II tasks have also been useful.

Our Progress

The most common type of Relation Extraction is binary (i.e. for two entities), and can take on one of the following specific forms:

  • Finding the type of relationship between given entities in text

Although in one sense this problem is easier than entity extraction because we only need to make one prediction instead of a vector of predictions, it is still considered harder because it requires a variety of syntactic and semantic features, both local as well as nonlocal. This problem is solved using one of the following types of methods:

1) Feature based methods These methods extract a flat set of features from the input and then invoke an off the shelf classifier like a decision tree or a SVM.

2) Kernel based methods These methods design special kernels to capture similarity between entities with complicated structures.

3) Rule based methods These methods create propositional and first order rules over structures around the two entities.

  • For a given entity and relationship, find all entities that satisfy the relationship with the given entity.
  • For a given relationship, find all entity pairs that satisfy it.

This problem is tackled by using a seed database of entity pairs to learn extraction patterns, which are then used to create candidate triplets (Entity1, Entity2, Relationsip). Finally, these candidates are pruned.

Potential Improvement

In spite of the extensive research on the topic, the accuracy values of relation extraction systems in ACE evaluations still range only in the 50-70% neighborhood. Open-domain systems do even worse, and aren't usually based on very principled approaches; they contain a lot of special case-handling.

References

[1] . [2] .