annotate tests/md5sum.py @ 2577:fa76c5d609c9

bdiff: improve worst case behavior by 100x. on 5.8MB (244.000 lines) text file with similar lines, hash before this change made diff against empty file take 75 seconds. this change improves performance to 0.6 seconds. result is that clone of smallish repo (137MB) with some files like this takes 1 minute instead of 10 minutes. common case of diff is 10% slower now, probably because of worse cache locality. but diff does not affect overall performance in common case (less than 1% of runtime is in diff when it is working ok), so this tradeoff looks good.
author Vadim Gelfer <vadim.gelfer@gmail.com>
date Fri, 07 Jul 2006 15:02:55 -0700
parents 50e1c90b0fcf
children 53e843840349
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
1 #! /usr/bin/env python
1928
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
2 #
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
3 # Based on python's Tools/scripts/md5sum.py
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
4 #
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
5 # This software may be used and distributed according to the terms
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
6 # of the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2, which is
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
7 # GPL-compatible.
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
8
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
9 import sys
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
10 import os
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
11 import md5
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
12
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
13 for filename in sys.argv[1:]:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
14 try:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
15 fp = open(filename, 'rb')
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
16 except IOError, msg:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
17 sys.stderr.write('%s: Can\'t open: %s\n' % (filename, msg))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
18 sys.exit(1)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
19
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
20 m = md5.new()
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
21 try:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
22 while 1:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
23 data = fp.read(8192)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
24 if not data:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
25 break
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
26 m.update(data)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
27 except IOError, msg:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
28 sys.stderr.write('%s: I/O error: %s\n' % (filename, msg))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
29 sys.exit(1)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
30 sys.stdout.write('%s %s\n' % (m.hexdigest(), filename))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
31
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
32 sys.exit(0)