Difference between revisions of "Forum-Based Language Learning Analysis"
m (moved Project Adam Gabriel to Forum-Based Language Learning Analysis: Temporary title) |
|||
Line 10: | Line 10: | ||
Online forums have been used to create topic-topic, user-user, and user-topic graphs. These graphs have been used for such tasks as recommendation systems, investigating knowledge propagation, and identifying influence. In this work we plan to use data from a forum dedicating to studying the Spanish language to facilitate language learning, either by identify salient topics or proposing a study peer. | Online forums have been used to create topic-topic, user-user, and user-topic graphs. These graphs have been used for such tasks as recommendation systems, investigating knowledge propagation, and identifying influence. In this work we plan to use data from a forum dedicating to studying the Spanish language to facilitate language learning, either by identify salient topics or proposing a study peer. | ||
+ | |||
+ | ===Motivation=== | ||
+ | |||
+ | The primary goal of this work will be the extraction of topics in the forum. Our the motivation is to find not just what learners of Spanish find difficult in the realms of vocabulary, grammar, and culture, but also how those difficulties relate to each other and change over time. In particular, we would like to investigate the stages of language learning in terms of topics of concern with the intention of showing whether or not there is a general pattern amongst learners. If these patterns can be found, evidence of certain linguistic difficulties could be used to predict further difficulties and students can be offered help possibly even before they are aware that help is needed. Along this line, it could also be possible to suggest to a learner other forum users that related strength/weakness to be study-peer. | ||
== Dataset == | == Dataset == | ||
Line 89: | Line 93: | ||
**Interests | **Interests | ||
− | == | + | == Evaluation == |
+ | We will perform a coarse-grain and fine-grain evaluation of our topic model. For both approaches, we will randomly partition the total posts (nodes) in two categories: training and testing. The former will be used to train our topic model while the second one will be used for evaluation. | ||
− | + | ===Coarse-grain evaluation=== | |
+ | Since the forum is already structured in 9 broad categories (see above), these categories can be used for testing. The testing data will be used to train our topic model, which will in turn be used to classify the testing node in one of the 9 categories. Accuracy and Kappa values will be reported for this task. | ||
− | === | + | === Fine-grain evaluation === |
+ | However, a more interesting question is how can a topic models be used to divide a general categories, such as grammar, into more concrete topics such as gender problems and | ||
− | + | == References == |
Revision as of 20:44, 14 February 2011
Contents
Team Members
Introduction
Second-language learning requires a lot of time and effort. Fortunately, some tools can be used to facilitate the learning task. Online forums are social medium that are used by learners, for example, to ask for help with a certain grammatical rule or a certain idiom.
Online forums have been used to create topic-topic, user-user, and user-topic graphs. These graphs have been used for such tasks as recommendation systems, investigating knowledge propagation, and identifying influence. In this work we plan to use data from a forum dedicating to studying the Spanish language to facilitate language learning, either by identify salient topics or proposing a study peer.
Motivation
The primary goal of this work will be the extraction of topics in the forum. Our the motivation is to find not just what learners of Spanish find difficult in the realms of vocabulary, grammar, and culture, but also how those difficulties relate to each other and change over time. In particular, we would like to investigate the stages of language learning in terms of topics of concern with the intention of showing whether or not there is a general pattern amongst learners. If these patterns can be found, evidence of certain linguistic difficulties could be used to predict further difficulties and students can be offered help possibly even before they are aware that help is needed. Along this line, it could also be possible to suggest to a learner other forum users that related strength/weakness to be study-peer.
Dataset
For this dataset will be performing a crawl of http://forums.tomisimo.org/
Some statistics about the forum:
- Threads: 9,046
- Posts: 100,535
- Members: 4,863
- Active Members: 742
The primary areas of the forum are:
- Vocabulary
- Translations
- Grammar
- Practice & Homework
- Teaching & Learning
- Culture
- Teaching and Learning Techniques
- Introductions
- General Chat
The forum is run on the vBulletin system and anonymous postings are not allowed.
Proposed Work
Network Structure
We will construct a network with nodes of types: Thread, Post, User, and Topic. The first three node types are explicit in the forum structure. The Topic nodes are not explicit, and must be extracted from the thread titles, post texts, and network structure. The following table shows potential link types between these nodes.
Thread | Post | User | Topic | |
Thread | Hyperlink | Part-of | Creator, Participant | Primary, Secondary |
Post | Direct Reply, Indirect Reply | Author | Primary, Secondary | |
User | Quotation, Hyperlink | Interest | ||
Topic | Related |
It will be possible to further attach the following attributes to these nodes:
- Thread
- Date
- Posted in section
- Number of views
- Post
- Date
- User
- Date joined
- Native language
- Age
- Location
- Interests
Evaluation
We will perform a coarse-grain and fine-grain evaluation of our topic model. For both approaches, we will randomly partition the total posts (nodes) in two categories: training and testing. The former will be used to train our topic model while the second one will be used for evaluation.
Coarse-grain evaluation
Since the forum is already structured in 9 broad categories (see above), these categories can be used for testing. The testing data will be used to train our topic model, which will in turn be used to classify the testing node in one of the 9 categories. Accuracy and Kappa values will be reported for this task.
Fine-grain evaluation
However, a more interesting question is how can a topic models be used to divide a general categories, such as grammar, into more concrete topics such as gender problems and