tests/test-addremove-similar
author Erling Ellingsen <erlingalf@gmail.com>
Sun, 18 Feb 2007 20:39:25 +0100
changeset 4135 6cb6cfe43c5d
child 4471 736e49292809
permissions -rwxr-xr-x
Avoid some false positives for addremove -s The original code uses the similary score 1 - len(diff(after, before)) / len(after) The diff can at most be the size of the 'before' file, so any small 'before' file would be considered very similar. Removing an empty file would cause all files added in the same revision to be considered copies of the removed file. This changes the metric to bytes_overlap(before, after) / len(before + after) i.e. the actual percentage of bytes shared between the two files.

#!/bin/sh

hg init rep; cd rep

touch empty-file
python -c 'for x in range(10000): print x' > large-file

hg addremove

hg commit -m A

rm large-file empty-file
python -c 'for x in range(10,10000): print x' > another-file

hg addremove -s50

hg commit -m B

cd ..

hg init rep2; cd rep2

python -c 'for x in range(10000): print x' > large-file
python -c 'for x in range(50): print x' > tiny-file

hg addremove

hg commit -m A

python -c 'for x in range(70): print x' > small-file
rm tiny-file
rm large-file

hg addremove -s50

hg commit -m B