I have spotted the biggest bottleneck in "bdiff.c". Actually it was
pretty easy to find after I recompiled the python interpreter and
mercurial for profiling.
In "bdiff.c" function "equatelines" allocates the minimum hash table
size, which can lead to tons of collisions. I introduced an
"overcommit" factor of 16, this is, I allocate 16 times more memory
than the minimum value. Overcommiting 128 times does not improve the
performance over the 16-times case.
#!/bin/sh
hg init
echo a > a
hg add a
hg commit -m "1" -d "1000000 0"
hg status
hg copy a b
hg status
hg --debug commit -m "2" -d "1000000 0"
echo "we should see two history entries"
hg history -v
echo "we should see one log entry for a"
hg log a
echo "this should show a revision linked to changeset 0"
hg debugindex .hg/store/data/a.i
echo "we should see one log entry for b"
hg log b
echo "this should show a revision linked to changeset 1"
hg debugindex .hg/store/data/b.i
echo "this should show the rename information in the metadata"
hg debugdata .hg/store/data/b.d 0 | head -3 | tail -2
$TESTDIR/md5sum.py .hg/store/data/b.i
hg cat b > bsum
$TESTDIR/md5sum.py bsum
hg cat a > asum
$TESTDIR/md5sum.py asum
hg verify