similarity measures

The most common useful indexes have been collected by Holliday et al (Holliday, JD., Hu, C-Y. and Willett, P. (2002) Combinatorial Chemistry and High Throughput Screening 5, 155-166) These are shown in the table, and can be referred to, by name, in applications and toolkits calls which allow user defined similarity functions.

Measure	Range	Formula
Cosine	0.0,1.0
Dice	0.0,1.0
Euclid	0.0,1.0
Forbes	0.0,∞
Hamman	-1.0,1.0
Jaccard	0.0,1.0
Kulczynski	0.0,1.0
Manhattan	1.0,0.0
Matching	0.0,1.0
Pearson	-1.0,1.0
Rogers-Tanimoto	0.0,1.0
Russell-Rao	0.0,1.0
Simpson	0.0,1.0
Tanimoto	0.0,1.0
Yule	-1.0,1.0

Notes

The Tanimoto and Jaccard indexes are the same.

The Forbes index has no upper limit.

The Manhattan index is a distance = 1.0 - Matching index

The Kulczynski index is the mean of the individual substructure similarities

The Simpson index is the best of the individual substructure similarities

The Dice index is the ratio of the bits in common to the arithmetic mean of the number of on bits in the two items.

The Cosine index is the ration of the bits in common to the geometric mean of the number of on bits in the two items.

from : http://www.daylight.com/dayhtml/doc/theory/theory.finger.html