Mnduong writeup of Lee Giles' Talk
From Cohen Courses
Jump to navigationJump to searchThis is a writeup of Lee_Giles_Talk by user:mnduong.
- In this talk, Professor Giles discussed SeerSuite, a suite of search engines designed for publications in different areas, such as computer science, chemistry, archaeology...
- He mentioned the 4th paradigm, which says science is now data-driven. He noted that small sciences (such as chemistry), where people don't usually share data, are going to generate 2 to 3 times more data than big sciences.
He went on to discuss each of the engines in SeerSuite, of which, some of the interesting points are:
- CiteSeerX: provides personalization: one can upload a paper and get recommendations for other papers that are similar. I think this is a very useful feature, as well as an interesting research topic.
- The system includes a metadata extraction step, which first converts pdf documents into txt files, then after a filtering step, uses an SVM header parser to parse the header. The system then uses a CRF to parse the citations from the rest of the document.
- ChemSeer: engine for chemistry - a point of focus in the talk. Problems here include how to index chemical formulas, compounds...
- ArchSeer: engine for archaeology. Main interest of the community is artifacts.