Difference between revisions of "ANERgazet"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'ANERgazet is an Arabic gazetteer which had been built from web resources. It consists of three types of gazetteers. ** Location : This gazetteer consists of 1,950 names of conti…')
 
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
ANERgazet is an Arabic gazetteer which had been built from web resources. It consists of three types of gazetteers.
 
ANERgazet is an Arabic gazetteer which had been built from web resources. It consists of three types of gazetteers.
  
** Location : This gazetteer consists of 1,950 names of continents, countries, cities, rivers and mountains found in the Arabic version of wikipedia16.
+
* Location : This gazetteer consists of 1,950 names of continents, countries, cities, rivers and mountains found in the Arabic version of wikipedia16.
** Person : A list of 1,920 complete names of people found in wikipedia and other websites were used to built this gazetteer. The names were splitted into first and last names and duplicate names were omitted. After this processing the gazetteer consists of 2,309 names.
+
* Person : A list of 1,920 complete names of people found in wikipedia and other websites were used to built this gazetteer. The names were split into first and last names and duplicate names were removed. After this processing the gazetteer consists of 2,309 person names.
** Organizations : the last gazetteer consists of a list of 262 names
+
* Organizations : This one consists of 262 names of companies, football teams and other organizations.
of companies, football teams and other organizations.
 
  
 
+
The corpus is publicly available and can be downloaded from [[http://users.dsic.upv.es/~ybenajiba/ download URL]].  
The corpus is publicly available and can be downloaded from [[http://users.dsic.upv.es/~ybenajiba/ download URL]]. It is in standard CONLL format.
 
 
   
 
   
The data set has been used in several papers such as [[RelatedPaper::Benajiba et al, CICLing 2007]], [[RelatedPaper::Benajiba and Rosso, IICAI 2007]], [[RelatedPaper::Benajiba and Rosso, LREC 2008]].
+
The gazetteer  has been used in several papers such as [[RelatedPaper::Benajiba et al, CICLing 2007]], [[RelatedPaper::Benajiba and Rosso, IICAI 2007]], [[RelatedPaper::Benajiba and Rosso, LREC 2008]].

Latest revision as of 16:04, 7 December 2010

ANERgazet is an Arabic gazetteer which had been built from web resources. It consists of three types of gazetteers.

  • Location : This gazetteer consists of 1,950 names of continents, countries, cities, rivers and mountains found in the Arabic version of wikipedia16.
  • Person : A list of 1,920 complete names of people found in wikipedia and other websites were used to built this gazetteer. The names were split into first and last names and duplicate names were removed. After this processing the gazetteer consists of 2,309 person names.
  • Organizations : This one consists of 262 names of companies, football teams and other organizations.

The corpus is publicly available and can be downloaded from [download URL].

The gazetteer has been used in several papers such as Benajiba et al, CICLing 2007, Benajiba and Rosso, IICAI 2007, Benajiba and Rosso, LREC 2008.