Mnduong writeup of Lee Giles' Talk

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a writeup of Lee_Giles_Talk by user:mnduong.

  • In this talk, Professor Giles discussed SeerSuite, a suite of search engines designed for publications in different areas, such as computer science, chemistry, archaeology...
  • He mentioned the 4th paradigm, which says science is now data-driven. He noted that small sciences (such as chemistry), where people don't usually share data, are going to generate 2 to 3 times more data than big sciences.

He went on to discuss each of the engines in SeerSuite, of which, some of the interesting points are:

  • CiteSeerX: provides personalization: one can upload a paper and get recommendations for other papers that are similar. I think this is a very useful feature, as well as an interesting research topic.
  • The system includes a metadata extraction step, which first converts pdf documents into txt files, then after a filtering step, uses an SVM header parser to parse the header. The system then uses a CRF to parse the citations from the rest of the document.
  • ChemSeer: engine for chemistry - a point of focus in the talk. Problems here include how to index chemical formulas, compounds...
  • ArchSeer: engine for archaeology. Main interest of the community is artifacts.