Cosine similarity

From Cohen Courses
Revision as of 22:46, 6 February 2011 by Nitina (talk | contribs) (Created page with 'Refers to measuring the angular distance (cosine) between two vectors. Cosine of two vectors can be easily derived by using the [[Euclidean vector#Dot product|Euclidean Dot Prod…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Refers to measuring the angular distance (cosine) between two vectors. Cosine of two vectors can be easily derived by using the Euclidean Dot Product formula:

Given two vectors of attributes, A and B, the cosine similarity, θ, is represented using a dot product and magnitude as

In text domains, a document is generally treated as a bag of words where each unique word in the vocabulary is a dimension of the vector. Thus similarity between two documents can be assessed by finding the cosine similarity between the vectors corresponding to these two documents. Each element of vector A and vector B is generally taken to be tf-idf weight.