Difference between revisions of "Standard Citation Datasets"

Latest revision as of 04:08, 7 December 2011

Following are two datasets considered as standards to be used for the problem of Citation Matching. In citation matching, a cluster is a set of citations that refer to the same paper, and a nontrivial cluster contains more than one citation.

CiteSeer Dataset

The CiteSeer dataset has 1563 citations and 906 clusters. Contains four sections, each on a different topic. Over two-thirds of the clusters are singletons; largest cluster has 21 citations.

Cora Dataset

The Cora dataset has 1295 citations and 134 clusters. Almost every citation in Cora belongs to a nontrivial cluster; the largest cluster contains 54 citations.

One of the papers that uses these datasets is Joint Inference in Information Extraction

The dataset can be downloaded from here

@@ Line 1: / Line 1: @@
-Following are two datasets considered as standards to be used for the problem of [[Citation Matching Citation Matching]].
+Following are two datasets considered as standards to be used for the problem of [[Citation Matching|Citation Matching]].
 In citation matching, a cluster is a set of citations that refer to the same paper, and a nontrivial cluster contains more
 than one citation.

Difference between revisions of "Standard Citation Datasets"

Latest revision as of 04:08, 7 December 2011

CiteSeer Dataset

Cora Dataset

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools