I chose couch because I needed a way to take arbitrary hashes and combine them, performing various operations on dynamic key/value pairs. Seeing that couch would eventually be able to do this in a distributed manor seemed like a great fit.
My impression was that the reduce step was incremental once the functions were defined... Given the referential transparency of my reduce function, I don't understand the performance impact incurred by the large dynamic hash output from my reduce function. Can you think of a better fit for my needs in another solution? Chris On Wed, Jan 7, 2009 at 4:00 PM, Damien Katz <[email protected]> wrote: > In Couchdb, your reductions must compute to smallish, fixed sized data. The > problem is your reduce function, it's builds up and returns a map of values, > and as it computes the index, it will actually compute the reduction of > every value in the view. Every time the index is updated, it does this. > > -Damien > > > On Jan 7, 2009, at 6:38 PM, Chris Van Pelt wrote: > >> Ok, so I created a gist with the map, reduce, and a document: >> http://gist.github.com/44497 >> >> The purpose of this view is to combine multiple judgments (the data >> attribute of the doc) for a single unit_id. The "fields >> attribute tells couch how to aggregate the data (averaging numbers, >> choosing the most common item, etc.). >> >> I do use group=true, along with skip and count when querying this >> view. I understand that skip can slow things down, but the request is >> still slow when skip is 0. >> >> Another strange thing is that even when I query one of my "count" >> views (a simple sum() reduce step) I experience the same lag. Could >> this be because my count views are a part of the same design document? >> >> Also are there better ways to debug this? I've set my log level to >> debug, but it doesn't give me details about where the time spent >> processing is going, and I can only gauge response times to the >> second... >> >> Chris >> >> On Wed, Jan 7, 2009 at 3:12 PM, Chris Anderson <[email protected]> wrote: >>> >>> On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <[email protected]> wrote: >>>> >>>> Maybe someone else could chime in on when you get the hit for reduction? >>>> >>> >>> Based on my use of log() in the reduce function, it looks like for >>> each reduce query, the reduce function is run once, to obtain the >>> final reduce value. >>> >>> When you run a group=true, or group_level reduce query, which returns >>> values for many keys, you'll end up running the final reduction once >>> per returned value. I think this could be optimized to avoid running >>> final reduces if they've already been run for those key-ranges. I'm >>> not sure how much work that would be. >>> >>> -- >>> Chris Anderson >>> http://jchris.mfdz.com >>> > >
