On Mar 12, 2010, at 12:56 PM, Julian Stahnke wrote:

> Am 12.03.2010 um 17:24 schrieb J Chris Anderson:
> 
>> 
>> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
>> 
>>> Hello!
>>> 
>>> I have a problem with a view being slow, even though it’s indexed and 
>>> cached and so on. I have database of books (–120,000 documents) and a 
>>> map/reduce function that counts how many books there are per author. I’m 
>>> then calling the view with ?group=true to get the list. I’m neither 
>>> emitting nor outputting any actual documents, only the counts. This results 
>>> in an output of about 78,000 key/value pairs that look like the following: 
>>> {"key":"Albert Kapr","value":3}.
>>> 
>>> Now, even when the view is indexed and cached, it still takes 60 seconds to 
>>> receive the output, using PHP’s cURL functions, the browser, whatever I’ve 
>>> tried. Getting the same output served from a static file takes only a 
>>> fraction of a second.
>>> 
>>> When I set limit=100, it’s basically instantaneous. I want to sort the 
>>> output by value though, so I can’t really limit it or use ranges. Trying it 
>>> with about 7,000 books, the request takes about 5 seconds, so it seems to 
>>> be linear to the number of lines being output?
>> 
>> For each line of output in the group reduce view, CouchDB must calculate 1 
>> final reduction (even when the intermediate reductions are already cached in 
>> the btree). This is because the btree nodes might not have the exact same 
>> boundaries as your group keys.
>> 
>> There is a remedy. You can replace your simple summing reduce with the text 
>> "_sum" (without quotes). This triggers the same function, but implemented in 
>> Erlang by CouchDB. Most of your slowness is probably due to IO between 
>> CouchDB and serverside JavaScript. Using the _sum function will help with 
>> this.
>> 
>> There will still be a calculation per group reduce row, but the cost is much 
>> lower.
>> 
>> Let us know how much faster this is!
>> 
>> Chris
> 
> Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!
> 
> Is this function documented somewhere? I didn’t come across it anywhere, so I 
> added it to the Performance page in the wiki: 
> http://wiki.apache.org/couchdb/Performance I hope that is okay. I also found 
> a commit message[1] in which was said that one could implement more of these 
> functions, I didn’t quite get how though. This seems like it could be very 
> helpful in some case. Maybe it should be documented properly somewhere by 
> somebody who actually knows about it?
> 
> Thanks a lot,
> Julian
> 
> [1] http://svn.apache.org/viewvc?view=revision&revision=774101

I added a _stats reduction to trunk a few days ago which returns an object with 
min, max, count, sum, and sum-of-squares fields.  Those are the primitives you 
need to calculate the aggregates jchris suggested in that commit message.  In 
my testing the internal _stats was about 12x faster than the version in Futon's 
reduce.js test.

If you're looking to hack on more of these, open up couch_query_servers.erl and 
search for builtin.  Best,

Adam

Reply via email to