similarity measures

The most common useful indexes have been collected by Holliday et al (Holliday, JD., Hu, C-Y. and Willett, P. (2002) Combinatorial Chemistry and High Throughput Screening 5, 155-166) These are shown in the table, and can be referred to, by name, in applications and toolkits calls which allow user defined similarity functions.

Measure Range Formula
Cosine 0.0,1.0 {short description of image}
Dice 0.0,1.0 {short description of image}
Euclid 0.0,1.0 {short description of image}
Forbes 0.0,∞ {short description of image}
Hamman -1.0,1.0 {short description of image}
Jaccard 0.0,1.0 {short description of image}
Kulczynski 0.0,1.0 {short description of image}
Manhattan 1.0,0.0 {short description of image}
Matching 0.0,1.0 {short description of image}
Pearson -1.0,1.0 {short description of image}
Rogers-Tanimoto 0.0,1.0 {short description of image}
Russell-Rao 0.0,1.0 {short description of image}
Simpson 0.0,1.0 {short description of image}
Tanimoto 0.0,1.0 {short description of image}
Yule -1.0,1.0 {short description of image}

Notes

  • The Tanimoto and Jaccard indexes are the same.
  • The Forbes index has no upper limit.
  • The Manhattan index is a distance = 1.0 - Matching index
  • The Kulczynski index is the mean of the individual substructure similarities
  • The Simpson index is the best of the individual substructure similarities
  • The Dice index is the ratio of the bits in common to the arithmetic mean of the number of on bits in the two items.
  • The Cosine index is the ration of the bits in common to the geometric mean of the number of on bits in the two items.

from : http://www.daylight.com/dayhtml/doc/theory/theory.finger.html

原文地址:https://www.cnblogs.com/carol-wei/p/7664957.html