I have spotted the biggest bottleneck in "bdiff.c". Actually it was
pretty easy to find after I recompiled the python interpreter and
mercurial for profiling.
In "bdiff.c" function "equatelines" allocates the minimum hash table
size, which can lead to tons of collisions. I introduced an
"overcommit" factor of 16, this is, I allocate 16 times more memory
than the minimum value. Overcommiting 128 times does not improve the
performance over the 16-times case.
% init
% commit
adding base
% qnew mqbase
% qrefresh
% qdiff
diff -r 67e992f2c4f3 base
--- a/base
+++ b/base
@@ -1,1 +1,1 @@ base
-base
+patched
% qdiff dirname
diff -r 67e992f2c4f3 base
--- a/base
+++ b/base
@@ -1,1 +1,1 @@ base
-base
+patched