view tests/md5sum.py @ 4135:6cb6cfe43c5d

Avoid some false positives for addremove -s The original code uses the similary score 1 - len(diff(after, before)) / len(after) The diff can at most be the size of the 'before' file, so any small 'before' file would be considered very similar. Removing an empty file would cause all files added in the same revision to be considered copies of the removed file. This changes the metric to bytes_overlap(before, after) / len(before + after) i.e. the actual percentage of bytes shared between the two files.
author Erling Ellingsen <erlingalf@gmail.com>
date Sun, 18 Feb 2007 20:39:25 +0100
parents 306055f5b65c
children
line wrap: on
line source

#!/usr/bin/env python
#
# Based on python's Tools/scripts/md5sum.py
#
# This software may be used and distributed according to the terms
# of the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2, which is
# GPL-compatible.

import sys
import os
import md5

for filename in sys.argv[1:]:
    try:
        fp = open(filename, 'rb')
    except IOError, msg:
        sys.stderr.write('%s: Can\'t open: %s\n' % (filename, msg))
        sys.exit(1)

    m = md5.new()
    try:
        while 1:
            data = fp.read(8192)
            if not data:
                break
            m.update(data)
    except IOError, msg:
        sys.stderr.write('%s: I/O error: %s\n' % (filename, msg))
        sys.exit(1)
    sys.stdout.write('%s  %s\n' % (m.hexdigest(), filename))

sys.exit(0)