https://bz.mercurial-scm.org/show_bug.cgi?id=5533
Bug ID: 5533 Summary: json encoder is too slow Product: Mercurial Version: default branch Hardware: All OS: All Status: UNCONFIRMED Severity: feature Priority: wish Component: Mercurial Assignee: bugzi...@mercurial-scm.org Reporter: arcppzju+hg...@gmail.com CC: mercurial-devel@mercurial-scm.org I wrote a simple program to compare the performance difference between stdlib json and the json routine we have in core: from mercurial import encoding import contextlib import json import time def hgescape(obj): s = '{' s += ','.join('"%s":"%s"' % (encoding.jsonescape(k), encoding.jsonescape(v)) for k, v in obj.iteritems()) s += '}' return s @contextlib.contextmanager def measure(name): t1 = time.time() yield t2 = time.time() print('%s: %s' % (name, t2 - t1)) lines = [] with measure('insert 50k lines'): for l in xrange(50000): lines.append({'author': 'test', 'commit': 'fe4713a645e44df4bbaeb8a04ea428a2d1c82a4b', 'date': '1999-99-99'}) with measure('stdlib json escape'): s = json.dumps(lines) with measure('hg json escape'): s = ','.join([hgescape(l) for l in lines]) I got something like: insert 50k lines: 0.0199460983276 stdlib json escape: 0.0517330169678 hg json escape: 1.18240094185 So the core hg json escaping is roughly 25x slower. That means things like "annotate -Tjson" can spend noticeable time just doing the formatting. I can think of two paths worth a try: 1. Write the json encoding logic in C. 2. Write a general purpose string-like object in C that does 2 things: `+` and `x.join` in a zero-copy manner. This will increase the burden of the GC though. I'm not sure if there are existing libraries doing 2 already. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel