Difference between revisions of "Log Tempered TF-IDF"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
Log Tempered TF-IDF is a variant of the standard information retrieval TF-IDF metric. This metric gives a a weight to how important a word is to a document in a given corpus, and is often used in search engines as part of the scoring / ranking of a document's relevance to a query.
+
Log Tempered TF-IDF is a variant [[category:method]] of calculating the standard information retrieval TF-IDF metric. This metric gives a a weight to how important a word is to a document in a given corpus, and is often used in search engines as part of the scoring / ranking of a document's relevance to a query.
  
 
== Algorithm / Calculation ==
 
== Algorithm / Calculation ==

Revision as of 02:04, 31 March 2011

Log Tempered TF-IDF is a variant of calculating the standard information retrieval TF-IDF metric. This metric gives a a weight to how important a word is to a document in a given corpus, and is often used in search engines as part of the scoring / ranking of a document's relevance to a query.

Algorithm / Calculation

Given a document and a corpus, we first calculate the following:

  • Term Frequency: Tf.png
    • A measure of importance of a given term to a document. Frequency of a term for a given document.
  • Inverse Document Frequency: Idf.png
    • A measure of general importance of a term in a corpus

Then the log tempered tf-idf for a word is given by the following:

Lt-tfidf.png

Relevant Papers