Law et al., ECML 2010

From Cohen Courses
Jump to navigationJump to search


Learning to Tag from Open Vocabulary Labels. Edith Law, Burr Settles, and Tom Mitchell. 2010. In the proceedings of the ECML PKDD 2010 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

Online version

From Co-Author's Webpage


This paper makes use of Latent Dirichlet Allocation topic modelling to address the problem of Tag recommendation for music. The TagATune dataset, which is a collection of 10,000 unique user-generated tags for 30,000 music clips, was used for the experiments.

The authors motivate the paper by mentioning that most approaches to classifying media assume a fixed vocabulary, and argue that machine learning techniques can be used to exploit the open vocabularies generated by social tagging and crowd-sourcing communities. Some obvious problems with using an open vocabulary include the noise generated from mis-spellings ("chello"), synonymy ("serene" and "mello"), compound phrases ("guitar plucking") and size of vocabulary.

Their proposed method can be summarized as follows:

  • Training
    • First induce topic model using ground truth tags associated each music clip in training set (using Latent Dirichlet Allocation).
    • Then train classifier to predict topic/class distributions from the audio features directly (Using Maximum Entropy).
  • Inference
    • Using audio features, use Maximum Entropy classifier to predict topic distribution of music clip.
    • Based on this distribution, each tag is given a relevance score.


For the purpose of their experiments, the features extracted for each of the clips were the best ones typically used in music tagging literature (See MIReX), and aren't really discussed in the paper.

The authors experiments show that their technique can reduces training time by 94% compared to attempting to learn/train tags directly, and results in comparable or better results in classification and retrieval of tags for music clips. They also predict that the domain of bird-song classification can be an other potential domain that this method can be used in, making use of tags from "citizen scientists".

Related papers