Recovering plans from the web

From Cohen Courses
Jump to navigationJump to search

Citation

Paul Heymann, Georgia Koutrika, Hector Garcia-Molina: Can social bookmarking improve web search? WSDM 2008: 195-206

 cicyt = {workshops},
 author = {Andrea Addis and Giuliano Armano and Daniel Borrajo},
 title = {Recovering Plans from the Web},
 url = {http://www.plg.inf.uc3m.es/~dborrajo/papers/spark-web09.pdf},
 booktitle = {Proceedings of SPARK, Scheduling and Planning Applications woRKshop, ICAPS'09},
 year = {2009},
 month = {September},
 key = {Planning-Web},
 address = {Thessaloniki (Greece)},


Online version

Paper: Recovering plans from the web


Abstract from the paper

Planning requires the careful and error-prone process of defining a domain model. This is usually performed by planning experts who should know about both the do- main in hand, and the planning techniques (including sometimes the inners of these techniques or the tools that implement them). In order planning to be widely used this process should be performed by non-planning experts. On the other hand, in many domains there are plenty of electronic documents (including the Web) that describe processes or plans in a semi-structured way. These descriptions mix natural language and certain templates for that specific domain. One such examples is the www.WikiHow.com web site that includes plans in many domains, all plans described through a set of common templates. In this work, we present a suite of tools that automatically extract knowledge from those unstructured descriptions of plans to be used for diverse planning applications.


Summary

They have developed a method which can automatically extract knowledge from WikiHow.com. Their tool consists of the following parts:

Crawler: The role of crawler is to find an article in WikiHow.com or a set of articles which are related to a subject. For example if use enters "Make Tortilla" then it returns the following webpages:

1- http://www.wikihow.com/Make-Your-Own-Tortillas

2- http://www.wikihow.com/Make-Tortilla-de-Patatas

[...]

Page Processor: The role of this part is to get a webpage from WikiHow.com and parse it. The page processor can use semantic tags which are defined in HTML file to convert HTML file to a structured format. Note that these tags are defined by developers of WikiHow.com.

Data Miner: Data miner is developed to statistically understand which component or action is most likely to be used in a specific context.