Difference between revisions of "Project Proposal Second Draft:Daniel and Sherry"

Revision as of 16:12, 15 February 2011

Team Members

Dataset

Russian movie social network data: This dataset, which can only be used in this project and cannot be released publicly, consists of two files. The first one includes directed connection between users and the second one contains user ratings on Russian movies. The dataset contains approximately 65,000 users, and 250,000 friendship links, as well as 16 million user ratings of movies.

Project Ideas

We propose to explore the properties of a large social network tied with movie ratings. We will investigate a variety of topics within this, using a mix of supervised and unsupervised machine learning methods.

Tasks

Predict user ratings of movies based on their ratings of other movies and ratings made by their friends in addition to using social structure of the network.
Predict evolution of the social network using interest similarity and network structure.
Detect hidden communities.

Evaluation

Using cross-validation, we can compare predicted user ratings with actual ratings. Prediction of change in the network can be directly measured using recall and precision of new links predicted. Hidden communities are harder to evaluate, but can potentially be used as features in other tasks.

Potential Methods

Analyze data for correlation before doing anything else
Linear classification for link prediction
Graph clustering, such as spectral clustering
Regression models
Collaborative Filtering for rating prediction

Midterm Goals

Complete analysis of data
Implement baseline systems for all three tasks
Begin work on final systems

@@ Line 5: / Line 5: @@
 == Dataset ==
-Russian movie social network data: This dataset, which can only be used in this project and is not going to be released publicly, is consisted of two files. The first one includes directed connection between users and the second one contains user ratings on Russian movies.
+Russian movie social network data: This dataset, which can only be used in this project and cannot be released publicly, consists of two files. The first one includes directed connection between users and the second one contains user ratings on Russian movies.  The dataset contains approximately 65,000
+users, and 250,000 friendship links, as well as 16 million user ratings of movies.
 == Project Ideas ==
@@ Line 11: / Line 12: @@
 === Tasks ===
-* Predict evolution of the social network using interest similarity.
 * Predict user ratings of movies based on their ratings of other movies and ratings made by their friends in addition to using social structure of the network.
-* Detecting hidden communities.
+* Predict evolution of the social network using interest similarity and network structure.
+* Detect hidden communities.
 === Evaluation ===
-Using cross-validation, we can compare predicted user ratings with actual ratings.  Hidden communities are harder to evaluate, but can potentially be used as features in other tasks.  Prediction of change in the network can be directly measured using recall and precision of new links predicted.
+Using cross-validation, we can compare predicted user ratings with actual ratings.  Prediction of change in the network can be directly measured using recall and precision of new links predicted.  Hidden communities are harder to evaluate, but can potentially be used as features in other tasks.
 == Potential Methods ==
@@ Line 24: / Line 25: @@
 * Regression models
 * Collaborative Filtering for rating prediction
+== Midterm Goals ==
+* Complete analysis of data
+* Implement baseline systems for all three tasks
+* Begin work on final systems

Difference between revisions of "Project Proposal Second Draft:Daniel and Sherry"

Revision as of 16:12, 15 February 2011

Contents

Team Members

Dataset

Project Ideas

Tasks

Evaluation

Potential Methods

Midterm Goals

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools