Difference between revisions of "Mehdi project abstract"

Revision as of 22:22, 29 September 2010

What I plan to do

In analysis of social media class (and also as part of my PhD research) I developed a program that was able to extract instructions for any how-to query from the Web. The key ideas behind the program were using some HTML features and also a classifier to extract instructions from the Web. However there are many other websites that contain instructions that are written in different formats. The structure of these websites are not known to our program. In this project I propose to use redundancy of the data available on the Web to learn the structure of new websites.

Motivation

Recently, many websites have been developed that provide solutions and tips for many tasks and projects (e.g., eHow.com or WikiHow.com). Each of these how-to manuals provides step-by-step instructions that describe how to do the given task. Currently, eHow.com contains more than 1.5 million articles produced both by experts and amateur users. According to the web statistics, each month 70 million people visit eHow.com. By extracting new instructions from "unknown" websites we may be able to add new instructions to these websites.

Interesting point

There are a lot of redundancy in the content of instructions that our program can extract at this stage, this redundancy might be useful to extract new instructions.

Evaluation

The performance of the system can be measured by comparing the instructions that are extracted by our program to the content of eHow.com or WikiHow.com. The comparison can be done by myself or autonomous users.

Techniques that can be used to solve this problem

Using wrappers to learn the structure of websites.

What question to answer

Is there enough redundancy on the extracted instructions so that we can use to extract new instructions? How many new instructions can be extracted by our program?

Team Member

Mehdi Samadi

@@ Line 1: / Line 1: @@
 == What I plan to do ==
-In analysis of social media class (and also as part of my PhD research) I developed a program that was able to extract instructions for any how-to query from the Web. The key ideas behind the program were using some HTML features and also a classifier to extract instructions from the Web. However there are many other websites that contain instructions that are written in different formats which are unknown to our program. In this project I propose to use redundancy of the data available on the Web to learn the structure of new websites.
+In analysis of social media class (and also as part of my PhD research) I developed a program that was able to extract instructions for any how-to query from the Web. The key ideas behind the program were using some HTML features and also a classifier to extract instructions from the Web. However there are many other websites that contain instructions that are written in different formats. The structure of these websites are not known to our program. In this project I propose to use redundancy of the data available on the Web to learn the structure of new websites.
 == Motivation ==

Difference between revisions of "Mehdi project abstract"

Revision as of 22:22, 29 September 2010

Contents

What I plan to do

Motivation

Interesting point

Evaluation

Techniques that can be used to solve this problem

What question to answer

Team Member

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools