Jaccard similarity

Jaccard similarity is used to measure the similarity between two sample sets. Jaccard similarity can be applied to binary sets. An extended version of Jaccard similarity which deals with attributes with counts or continuous values is called Tanimoto coefficient.

Algorithm

• Input
${\displaystyle \mathbf {A} :{\text{Binary Vector 1}}}$
${\displaystyle \mathbf {B} :{\text{Binary Vector 2}}}$

The size of A and B are same.

• Output
${\displaystyle \mathbf {M_{11}} :{\text{the number of attributes where A is 1 and B is 1}}}$
${\displaystyle \mathbf {M_{01}} :{\text{the number of attributes where A is 0 and B is 1}}}$
${\displaystyle \mathbf {M_{10}} :{\text{the number of attributes where A is 1 and B is 0}}}$
${\displaystyle \mathbf {M_{00}} :{\text{the number of attributes where A is 0 and B is 0}}}$
${\displaystyle {\text{Jaccard similarity}}=\mathbf {J} ={\frac {M_{11}}{M_{01}+M_{10}+M_{00}}}}$
${\displaystyle {\text{Jaccard dissimilarity}}=1-\mathbf {J} }$