Am 12.03.2010 um 17:24 schrieb J Chris Anderson:
>
> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
>
>> Hello!
>>
>> I have a problem with a view being slow, even though it’s indexed and cached
>> and so on. I have database of books (–120,000 documents) and a map/reduce
>> function that counts how many books there are per author. I’m then calling
>> the view with ?group=true to get the list. I’m neither emitting nor
>> outputting any actual documents, only the counts. This results in an output
>> of about 78,000 key/value pairs that look like the following: {"key":"Albert
>> Kapr","value":3}.
>>
>> Now, even when the view is indexed and cached, it still takes 60 seconds to
>> receive the output, using PHP’s cURL functions, the browser, whatever I’ve
>> tried. Getting the same output served from a static file takes only a
>> fraction of a second.
>>
>> When I set limit=100, it’s basically instantaneous. I want to sort the
>> output by value though, so I can’t really limit it or use ranges. Trying it
>> with about 7,000 books, the request takes about 5 seconds, so it seems to be
>> linear to the number of lines being output?
>
> For each line of output in the group reduce view, CouchDB must calculate 1
> final reduction (even when the intermediate reductions are already cached in
> the btree). This is because the btree nodes might not have the exact same
> boundaries as your group keys.
>
> There is a remedy. You can replace your simple summing reduce with the text
> "_sum" (without quotes). This triggers the same function, but implemented in
> Erlang by CouchDB. Most of your slowness is probably due to IO between
> CouchDB and serverside JavaScript. Using the _sum function will help with
> this.
>
> There will still be a calculation per group reduce row, but the cost is much
> lower.
>
> Let us know how much faster this is!
>
> Chris
Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!
Is this function documented somewhere? I didn’t come across it anywhere, so I
added it to the Performance page in the wiki:
http://wiki.apache.org/couchdb/Performance I hope that is okay. I also found a
commit message[1] in which was said that one could implement more of these
functions, I didn’t quite get how though. This seems like it could be very
helpful in some case. Maybe it should be documented properly somewhere by
somebody who actually knows about it?
Thanks a lot,
Julian
[1] http://svn.apache.org/viewvc?view=revision&revision=774101
>
>>
>> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
>>
>> Am I doing anything wrong, or should this really take so long? I wasn’t able
>> to find any information about this—only about indexing being slow, but that
>> doesn’t seem to be my problem.
>>
>> Maybe I should also mention that I’m an interaction design student who used
>> to be a front-end dev, but not a ‘real’ programmer.
>>
>> Thanks for any help!
>>
>> Best,
>> Julian
>>
>>
>> For reference, the map function:
>>
>> function (doc)
>> {
>> if (doc.author) {
>> for (i = 0; i < doc.author.length; i++) {
>> emit(doc.author[i], 1);
>> }
>> } else {
>> emit(null, 1);
>> }
>> }
>>
>> The reduce function:
>>
>> function (keys, values, rereduce)
>> {
>> return sum(values);
>> }
>>
>> Some sample output:
>>
>> {"rows":[
>> {"key":null,"value":1542},
>> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
>> ---more rows---
>> {"key":"Zwi Erich Kurzweil","value":1}
>> ]}
>