On Mon, Sep 28, 2009 at 05:20:56PM -0400, Paul Davis wrote: > On Mon, Sep 28, 2009 at 5:10 PM, James Marca > <[email protected]> wrote: > > Hi All, > > > > I ran across some behavior this weekend that I didn't expect. > > > > I have a lot of data, and I'm trialing storing the raw data in > > CouchDB. I have 12 databases (manually "sharded" by month), each with > > about 115,000,000 documents, and an average size of about 25G. I > > created a view index for each database, and was updating away, when I > > noticed a typo in the October database. I corrected the typo (in > > Futon), reloaded the view page (again in futon), and *assumed* that > > the old view indexing job was killed and restarted with the new code. > > In fact, what happened was that the old, incorrect index job kept > > running (for 48 hours) and when it finished, it restarted. > > > > Is this a minor bug, or did I miss an option somewhere? > > > > I'd call it a bad inconvenience for large data sets. Technically what > happened was that it went through and indexed all of your data with > the bad version, then the last edit was your change to the design doc, > which reset the indexes and started things over which is the intended > behavior. > > Just using my brain debugger it seems like it should be fairly easy to > crash (Erlang speak for 'shutdown gracefully') any running update > processes. Feel free to create a ticket in JIRA for the feature. Put a > note on it that it'd be good for anyone wanting to learn the view > update engine code as all of that would be mostly towards the top of > the stack.
Thanks for the reply and explanation. I'll add "post a jira ticket" to today's todo list. > > > (The machine I am running this on is running version 0.9.0. I will > > test something similar in the near future on 0.9.1 machine.) > > > > Everything all the way up to trunk will behave the same way as 0.9.0. > > > On the plus side, Erlang and CouchDB happily cranked away both loading > > the data and then building the views on my 8-core machine, using up > > lots of the available processor resources (instead of just max-ing out > > one core). I still wish I could spawn multiple threads per index job, > > but since there's no way I could write that code, I'll wait. > > > > The view indexer in trunk will now split view updates over two cores > and is faster than 0.9's indexer. I'm uncertain if this work was back > ported to 0.10 or not. Cool, I just downloaded trunk today and will try it out. I'm also going to experiment with the couchdb-lounge project. Regards, James Marca -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
