Am 12.03.2010 um 17:24 schrieb J Chris Anderson:

> 
> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
> 
>> Hello!
>> 
>> I have a problem with a view being slow, even though it’s indexed and cached 
>> and so on. I have database of books (–120,000 documents) and a map/reduce 
>> function that counts how many books there are per author. I’m then calling 
>> the view with ?group=true to get the list. I’m neither emitting nor 
>> outputting any actual documents, only the counts. This results in an output 
>> of about 78,000 key/value pairs that look like the following: {"key":"Albert 
>> Kapr","value":3}.
>> 
>> Now, even when the view is indexed and cached, it still takes 60 seconds to 
>> receive the output, using PHP’s cURL functions, the browser, whatever I’ve 
>> tried. Getting the same output served from a static file takes only a 
>> fraction of a second.
>> 
>> When I set limit=100, it’s basically instantaneous. I want to sort the 
>> output by value though, so I can’t really limit it or use ranges. Trying it 
>> with about 7,000 books, the request takes about 5 seconds, so it seems to be 
>> linear to the number of lines being output?
> 
> For each line of output in the group reduce view, CouchDB must calculate 1 
> final reduction (even when the intermediate reductions are already cached in 
> the btree). This is because the btree nodes might not have the exact same 
> boundaries as your group keys.
> 
> There is a remedy. You can replace your simple summing reduce with the text 
> "_sum" (without quotes). This triggers the same function, but implemented in 
> Erlang by CouchDB. Most of your slowness is probably due to IO between 
> CouchDB and serverside JavaScript. Using the _sum function will help with 
> this.
> 
> There will still be a calculation per group reduce row, but the cost is much 
> lower.
> 
> Let us know how much faster this is!
> 
> Chris

Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!

Is this function documented somewhere? I didn’t come across it anywhere, so I 
added it to the Performance page in the wiki: 
http://wiki.apache.org/couchdb/Performance I hope that is okay. I also found a 
commit message[1] in which was said that one could implement more of these 
functions, I didn’t quite get how though. This seems like it could be very 
helpful in some case. Maybe it should be documented properly somewhere by 
somebody who actually knows about it?

Thanks a lot,
Julian

[1] http://svn.apache.org/viewvc?view=revision&revision=774101

> 
>> 
>> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
>> 
>> Am I doing anything wrong, or should this really take so long? I wasn’t able 
>> to find any information about this—only about indexing being slow, but that 
>> doesn’t seem to be my problem.
>> 
>> Maybe I should also mention that I’m an interaction design student who used 
>> to be a front-end dev, but not a ‘real’ programmer.
>> 
>> Thanks for any help!
>> 
>> Best,
>> Julian
>> 
>> 
>> For reference, the map function:
>> 
>> function (doc)
>> {
>>   if (doc.author) {
>>              for (i = 0; i < doc.author.length; i++) {
>>                      emit(doc.author[i], 1);
>>              }
>>   } else {
>>       emit(null, 1);        
>>   }
>> }
>> 
>> The reduce function: 
>> 
>> function (keys, values, rereduce)
>> {
>>   return sum(values);
>> }
>> 
>> Some sample output:
>> 
>> {"rows":[
>> {"key":null,"value":1542},
>> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
>> ---more rows---
>> {"key":"Zwi Erich Kurzweil","value":1}
>> ]}
> 

Reply via email to