Difference between revisions of "The Topic-Perspective Model for Social Tagging Systems"

From Cohen Courses
Jump to navigationJump to search
 
(12 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
 
The Topic-Perspective Model for Social Tagging Systems
 
The Topic-Perspective Model for Social Tagging Systems
 +
 
Caimei lu, Xiaohua Hu, Xin Chen, Jung-ran Park, TingTing He, and Zhoujun Li
 
Caimei lu, Xiaohua Hu, Xin Chen, Jung-ran Park, TingTing He, and Zhoujun Li
  
Line 9: Line 10:
  
 
== Summary ==
 
== Summary ==
In this paper, authors propose  LDA type[http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation] generative model for social annotation. Usually tags associated with a particularly URL belongs to either the content of the URL or the tagger’s perspective about the content of URL. In data mining applications, we would be  interested in separating  tags associated with the content from tagger’s perspective. In proposed generative model model, we get probability of each tag being associated with content or tagger perspective. This model improves on previously proposed models for same task where user’s perspective is not fully taken in account.  Tags associated with user perspective can help in improving personalised search.  
+
In this paper, authors propose  LDA type[http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation] generative model for social tag annotation. Usually tags associated with a particularly URL belongs either to the content of the URL or the tagger’s perspective about content of the URL. In data mining applications, we would be  interested in separating  tags associated with the content from tagger’s perspective. In proposed generative model model, we get probability of each tag being associated with content and tagger perspective. In the result section, authors shows that this model improves on previously proposed models for the same task.  Tags associated with user perspective can help in improving personalized search.
 +
 
 +
==Motivation for proposed model==
 +
 
 +
* Document is written before a tagger assigns a tag to the document so term generation process for each document should be separated from the tag generation process. They use standard LDA[http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation] topic model for the term generation process of document.
 +
 +
* When a user generates a tag for a document, it depends either on topic distribution of the document or user’s perspective. They use switch variable to decide whether the user’s perspective or the document topic is used in generation of the tag.  
  
== Evaluation ==
+
== Model ==
 +
[[File:SocialTagGM.png]]
  
They evaluate their methods by asking following 4 questions :
+
As shown in figure, model is divided in two parts by dashed line. Right part shows the normal LDA[http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation] generative model. Left part shows how tags are generated. To generate each tag, first an indicator variable x is generated. If x equals 1, then tag is generated using document’s topic distribution. If x is 0 then first a user perspective p is sampled using perspective distribution of user <math>\theta^u</math> then tag t is drawn from the tag distribution of perspective p, <math>\varphi_p</math>.
  - Does NF find out meaningful neighborhoods?
 
  - How close is Approximate NF to exact NF?
 
  - Can AD detect injected anomalies?
 
  - How much time these methods take to run on graphs of varying sizes?
 
  
== Discussion ==
 
This paper poses two important social problems related to bipartite social graphs and explained how those problems can be solved efficiently using random walks.
 
  
They also claim that the neighborhoods over nodes can represent personalized clusters depending on different perspectives.
+
Parameter estimation for this model is done using Gibbs sampling.
  
During presentation one of the audiences raised question about is anomaly detection in this paper similar to betweenness of edges defined in Kleinber's text as discussed in [[Class Meeting for 10-802 01/26/2010]]. I think they are similar. In the texbook they propose, detecting edges with high betweenness and using them to partition the graph. In this paper they first try to create neighbourhood partitions based on random walk prbabilities and which as a by product gives us nodes and edges with high betweenness value.
+
== Experiments and Results ==
 +
Performance of the model is measured on social bookmarking data set crawled from del.icio.us[http://www.del.icio.us].  
  
== Related papers ==
+
Evaluation criterion for experiments is perplexity [http://en.wikipedia.org/wiki/Perplexity]. As the performance of the model will depend on number of topics and perspectives considered, tuning of these two parameters is done. When number of topics and number of perspective both are set to 80, minimum perplexity was obtained.
There has been a lot of work on anomaly detection in graphs.
 
* The paper by [[RelatedPaper::Moonesinghe and Tan ICTAI06]] finds the clusters of outlier objects by doing random walk on the weighted graph.  
 
* The paper by [[RelatedPaper::Aggarwal SIGMOD 2001]] proposes techniques for projecting high dimensional data on lower dimensions to detect outliers.
 
  
== Study plan ==
+
==Related Papers==
* Article:Bipartite graph:[http://en.wikipedia.org/wiki/Bipartite_graph]
+
* M. Bundschus, S. Yu, V. Tresp, A. Rettinger, M. Dejori, and H.-P. Kriegel, Hierarchical Bayesian Models for Collaborative Tagging Systems, ICDM '09. Ninth IEEE International Conference on Data Mining., IEEE, Miami, Florida, 2009, pp. 728-733.
* Article:Anomaly detection:[http://en.wikipedia.org/wiki/Anomaly_detection]
+
*D. Newman, C. Chemudugunta, and P. Smyth, Statistical entity-topic models, the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Philadelphia, PA, 2006, pp. 680 – 686.
* Paper:Topic sensitive pagerank:[http://dl.acm.org/citation.cfm?id=511513]
 
**Paper:The PageRank Citation Ranking: Bringing Order to the Web:[http://ilpubs.stanford.edu:8090/422/]
 
* Paper:Multilevel k-way Partitioning Scheme for Irregular Graphs:[http://glaros.dtc.umn.edu/gkhome/node/81]
 

Latest revision as of 00:06, 2 October 2012

Citation

The Topic-Perspective Model for Social Tagging Systems

Caimei lu, Xiaohua Hu, Xin Chen, Jung-ran Park, TingTing He, and Zhoujun Li

Online version

http://www.pages.drexel.edu/~cl389/dataset/kdd10-lu.pdf

Summary

In this paper, authors propose LDA type[1] generative model for social tag annotation. Usually tags associated with a particularly URL belongs either to the content of the URL or the tagger’s perspective about content of the URL. In data mining applications, we would be interested in separating tags associated with the content from tagger’s perspective. In proposed generative model model, we get probability of each tag being associated with content and tagger perspective. In the result section, authors shows that this model improves on previously proposed models for the same task. Tags associated with user perspective can help in improving personalized search.

Motivation for proposed model

  • Document is written before a tagger assigns a tag to the document so term generation process for each document should be separated from the tag generation process. They use standard LDA[2] topic model for the term generation process of document.
  • When a user generates a tag for a document, it depends either on topic distribution of the document or user’s perspective. They use switch variable to decide whether the user’s perspective or the document topic is used in generation of the tag.

Model

SocialTagGM.png

As shown in figure, model is divided in two parts by dashed line. Right part shows the normal LDA[3] generative model. Left part shows how tags are generated. To generate each tag, first an indicator variable x is generated. If x equals 1, then tag is generated using document’s topic distribution. If x is 0 then first a user perspective p is sampled using perspective distribution of user then tag t is drawn from the tag distribution of perspective p, .


Parameter estimation for this model is done using Gibbs sampling.

Experiments and Results

Performance of the model is measured on social bookmarking data set crawled from del.icio.us[4].

Evaluation criterion for experiments is perplexity [5]. As the performance of the model will depend on number of topics and perspectives considered, tuning of these two parameters is done. When number of topics and number of perspective both are set to 80, minimum perplexity was obtained.

Related Papers

  • M. Bundschus, S. Yu, V. Tresp, A. Rettinger, M. Dejori, and H.-P. Kriegel, Hierarchical Bayesian Models for Collaborative Tagging Systems, ICDM '09. Ninth IEEE International Conference on Data Mining., IEEE, Miami, Florida, 2009, pp. 728-733.
  • D. Newman, C. Chemudugunta, and P. Smyth, Statistical entity-topic models, the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Philadelphia, PA, 2006, pp. 680 – 686.