Cosine similarity
From Cohen Courses
Jump to navigationJump to search
Refers to measuring the angular distance (cosine) between two vectors. Cosine of two vectors can be easily derived by using the Euclidean Dot Product formula:
Given two vectors of attributes, A and B, the cosine similarity, θ, is represented using a dot product and magnitude as
In text domains, a document is generally treated as a bag of words where each unique word in the vocabulary is a dimension of the vector. Thus similarity between two documents can be assessed by finding the cosine similarity between the vectors corresponding to these two documents. Each element of vector A and vector B is generally taken to be tf-idf weight.