Hey John,I'm encountering a similar problem where the server or client can no longer make the connection. My strongest theory is that my rails client using couchrest-0.33 is not giving up the file descriptors and too many files are being opened on the linux machine.
I am using a smaller dataset(400K docs, DB is roughly 900 MB) but with an intensive word counting/doc similarity view.
I haven't really had time to take a deeper look yet so it's not solved for me yet either. As a workaround, i just used couchdb as a storage bin and using pig/ hadoop to make my necessary computations.
Take a look at the "couchdb server connection refused error" thread on this mailing list. It might be of some help.
http://mail-archives.apache.org/mod_mbox/couchdb-user/200908.mbox/%[email protected]%3e Tommy El Sep 1, 2009, a las 11:19 AM, John Wood escribió:
Thanks for the reply Chris. I'll look into upgrading our test environment to the trunk version of CouchDB, and see if I can reproduce the error there. We're using CouchRest version 0.33 as the client library. Thanks again, JohnOn Tue, Sep 1, 2009 at 12:49 PM, Chris Anderson <[email protected]> wrote:On Tue, Sep 1, 2009 at 7:52 AM, John Wood<[email protected]>wrote:Hi everybody, I'm currently facing an issue with our production installation ofCouchDB.Two times within the past 5 days, the Erlang process running CouchDB pegs one of the 4 cores on the machine, consumes about 40% of the system RAM(which is 4GB), and becomes completely unresponsive to incoming HTTP requests. The only way we can get it back to normal is to restartCouchDB.I'm trying to determine what may be causing this, but I'm not having much luck. Nothing stands out in the CouchDB log files. I can see that there are no entries in the log files from the time it goes unresponsive untilthetime I restart it. Besides that, there doesn't appear to be any errors leading up to the issue. There are however a few errors like the onebelow,but none right before CouchDB goes unresponsive: [error] [<0.11738.288>] {error_report,<0.21.0>, {<0.11738.288>,crash_report, [[{pid,<0.11738.288>}, {registered_name,[]}, {error_info, {error, {case_clause,{error,enotconn}}, [{mochiweb_request,get,2}, {couch_httpd,handle_request,4}, {mochiweb_http,headers,5}, {proc_lib,init_p,5}]}}, {initial_call, {mochiweb_socket_server,acceptor_loop,[{<0.56.0>,#Port<0.148>,#Fun<mochiweb_http. 1.81679042>}]}},{ancestors, [couch_httpd,couch_secondary_services,couch_server_sup, <0.1.0>]}, {messages,[]}, {links,[<0.56.0>,#Port<0.5032425>]}, {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]}, {trap_exit,false}, {status,running}, {heap_size,28657}, {stack_size,23}, {reductions,14034}], []]}} [error] [<0.56.0>] {error_report,<0.21.0>, {<0.56.0>,std_error, {mochiweb_socket_server,235, {child_error,{case_clause,{error,enotconn}}}}}} =ERROR REPORT==== 30-Aug-2009::04:29:07 === {mochiweb_socket_server,235, {child_error,{case_clause,{error,enotconn}}}}I checked some of the other system log files (/var/log/messages, etc),andthere doesn't appear to be any information there either. Our CouchDB installation is fairly large. We have 7 productiondatabases,totaling almost 250GB. The largest database is 129GB. We are running CouchDB 0.9.0 on Red Hat Enterprise Server 5.3. As far as usage goes, we are constantly inserting documents into the database (5,000 at a time viaabulk insert), and pausing to regenerate the views after 100,000 documents have been inserted. Besides for the process that does the inserts, allviews are accessed using stale=ok. Has anybody else faced a similar issue? Can anybody suggest tipsregardinghow I should go about diagnosing this issue?Just a guess, based on the information available here, but the enotconn error suggests that the remote client is dropping theconnection prematurely. There is an old bug about this in the tracker,which might be a good thing to reopen if we learn much more about the issue (and it is still present in trunk / 0.10): http://issues.apache.org/jira/browse/COUCHDB-45 There is also this open bug which could be related: https://issues.apache.org/jira/browse/COUCHDB-394 Perhaps you have clients who aren't properly closing the connection, and them somehow this is running up against a limit in the underlying server system (max number of connections, or maybe even max number of erlang processes in the vm). It would be nice to get to the bottom of this one, eventually. The first step I'd suggest taking is attempting to reproduce on the0.10.x branch from svn. This will at least tell us if the bug has beenfixed. If it's still around and repeatable, that will give us a test case for finally crushing it into oblivion. It might help to know more about which client library you are using, as this bug seems to depend on the TCP behavior of clients. ChrisThanks, John -- John Wood Interactive Mediums [email protected]-- Chris Anderson http://jchrisa.net http://couch.io-- John Wood Interactive Mediums [email protected]
