Forum-Based Language Learning Analysis
Fast Learning of Graph Structure for Anomalous Pattern Detection
Team Members
Introduction
Online forums have been used to create topic-topic, user-user, and user-topic graphs. These graphs have been used for such tasks as recommendation systems, investigating knowledge propagation, and identifying influence. In this work we plan to use data from a forum dedicating to studying the Spanish language to to identify salient topics among learners of Spanish and to track influence among the users of the forum.
Dataset
For this dataset will be performing a crawl of http://forums.tomisimo.org/
Some statistics about the forum:
- Threads: 9,046
- Posts: 100,535
- Members: 4,863
- Active Members: 742
The primary areas of the forum are:
- Vocabulary
- Translations
- Grammar
- Practice & Homework
- Teaching & Learning
- Culture
- Teaching and Learning Techniques
- Introductions
- General Chat
The forum is run on the vBulletin system and anonymous postings are not allowed.
Proposed Work
We will construct a network with nodes of types: Thread, Post, User, and Topic. The first three node types are explicit in the forum structure. The Topic nodes are not explicit, and must be extracted from the thread titles, post texts, and network structure. The following table shows potential link types between these nodes.
Thread | Post | User | Topic | |
Thread | Hyperlink | Membership | Creator, Participant | Primary, Secondary |
Post | Direct Reply, Replay | Author | Primary, Secondary | |
User | Quotation, Hyperlink | Interest | ||
Topic | Related |