Running the following code on a macbook pro, using CouchDBX 1.0.2 (everything local), we're seeing the following output when trying to attach a file with 10MB of random data:
Code: https://gist.github.com/bc0c36f36be0c85e2a36 (code included in full below) Output: Using curl: 0.168450117111 Using put_attachment: 0.309157133102 post time: 2.5557808876 Using multipart: 2.61283898354 Encoding base64: 0.0497629642487 Updating: 5.0550069809 Server log: https://gist.github.com/a80a495fd35049ff871f (there's a HEAD/DELETE/PUT/GET cycle that's just cleanup) The calls in question are: Using curl: 0.168450117111 1> [info] [<0.27828.7>] 127.0.0.1 - - 'PUT' /benchmark_entity/bigfile/bigfile/bigfile.gz?rev=78-db58ded2899c5546e349feb5a8c0eee4 201 Using put_attachment: 0.309157133102 1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT' /benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa 201 Using multipart: 2.61283898354 (post time: 2.5557808876) 1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201 Updating: 5.0550069809 1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201 Profiling our code shows 1.5 sec of CPU usage in our code (which covers setup / cleanup code that's not included in the times above), and 11.8 sec of total run time, which roughly matches up with the PUT/POST times above. Basically, I feel pretty confident that the bulk of the times above are not in our client code, and are instead due to couchdb's handling time. Why is the form/multipart handler so much slower than using a bare PUT on the attachment? Why is the base64 approach even slower? Is it due to bandwidth issues, couchdb CPU usage...? Thanks for any help, Eli Full code from: https://gist.github.com/bc0c36f36be0c85e2a36 import base64 import contextlib import cStringIO import subprocess import time import couchdb import couchdb.json import couchdb.multipart @contextlib.contextmanager def stopwatch(m=''): t0=time.time() yield tdiff=time.time() - t0 if m: print '{}: {}'.format(m, tdiff) else: print tdiff def reset(d): try: del d['bigfile'] except couchdb.http.ResourceNotFound: pass d['bigfile'] = {'foo': 'bar'} return d['bigfile'] s = couchdb.Server() d = s['benchmark_entity'] fn = '/tmp/bigfile.gz' fn = '/tmp/smallfile' doc = reset(d) with stopwatch('Using curl'): p = subprocess.Popen([ 'curl', '-X', 'PUT', 'http://localhost:5984/benchmark_entity/{}/bigfile/bigfile.gz?rev={}'.format(doc.id, doc.rev), '-d', '@{}'.format(fn), '-H', 'Content-Type: application/gzip' ]) p.wait() doc = reset(d) with open(fn, 'r') as f: with stopwatch('Using put_attachment'): d.put_attachment(doc, f) doc = reset(d) with open(fn, 'r') as f: content_name = 'bigfile.gz' content = f.read() content_type = 'application/gzip' with stopwatch('Using multipart'): fileobj = cStringIO.StringIO() with couchdb.multipart.MultipartWriter(fileobj, headers=None, subtype='form-data') as mpw: mime_headers = {'Content-Disposition': '''form-data; name="_doc"'''} mpw.add('application/json', couchdb.json.encode(doc), mime_headers) mime_headers = {'Content-Disposition': '''form-data; name="_attachments"; filename="{}"'''.format(content_name)} mpw.add(content_type, content, mime_headers) header_str, blank_str, body = fileobj.getvalue().split('\r\n', 2) http_headers = {'Referer': d.resource.url, 'Content-Type': header_str[len('Content-Type: '):]} params = {} t0 = time.time() status, msg, data = d.resource.post(doc['_id'], body, http_headers, **params) print 'post time: {}'.format(time.time() - t0) doc = reset(d) with open(fn, 'r') as f: content_name = 'bigfile.gz' content = f.read() content_type = 'application/gzip' with stopwatch('Encoding base64'): doc['_attachments'] = {content_name: {'content_type': content_type, 'data': base64.b64encode(content)}} with stopwatch('Updating'): d.update([doc])
