Difference between revisions of "Jaccard similarity"
From Cohen Courses
Jump to navigationJump to search (Created page with 'This is a technical [[category::method]] discussed in Social Media Analysis 10-802 in Spring 2010. == What problem does it address == Quantifying similarity between two vec…') |
|||
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | Jaccard similarity is used to measure the similarity between two sample sets. Jaccard similarity can be applied to binary sets. An extended version of Jaccard similarity which deals with attributes with counts or continuous values is called [[UsesMethod::Tanimoto coefficient]]. | |
− | == | + | == Algorithm == |
− | + | * Input | |
− | + | :<math> \mathbf{A} : \text{Binary Vector 1}</math> | |
− | + | :<math> \mathbf{B} : \text{Binary Vector 2}</math> | |
− | + | The size of A and B are same. | |
− | |||
− | |||
+ | * Output | ||
− | + | :<math> \mathbf{M_{11}} : \text{the number of attributes where A is 1 and B is 1}</math> | |
− | + | :<math> \mathbf{M_{01}} : \text{the number of attributes where A is 0 and B is 1}</math> | |
− | :<math>\mathbf{ | + | :<math> \mathbf{M_{10}} : \text{the number of attributes where A is 1 and B is 0}</math> |
− | + | :<math> \mathbf{M_{00}} : \text{the number of attributes where A is 0 and B is 0}</math> | |
− | |||
− | |||
− | |||
− | :<math> \ | ||
− | == | + | :<math> \text{Jaccard similarity} = \mathbf{J} = \frac{ M_{11} }{ M_{01} + M_{10} + M_{00} }</math> |
− | + | :<math> \text{Jaccard dissimilarity} = 1 - \mathbf{J} </math> | |
== Relevant Papers == | == Relevant Papers == | ||
− | {{#ask: [[UsesMethod:: | + | {{#ask: [[UsesMethod::Jaccard_similarity]] |
| ?AddressesProblem | | ?AddressesProblem | ||
| ?UsesDataset | | ?UsesDataset | ||
}} | }} |
Latest revision as of 21:21, 30 March 2011
Jaccard similarity is used to measure the similarity between two sample sets. Jaccard similarity can be applied to binary sets. An extended version of Jaccard similarity which deals with attributes with counts or continuous values is called Tanimoto coefficient.
Algorithm
- Input
The size of A and B are same.
- Output