Hey John,
I'm encountering a similar problem where the server or client can no longer make the connection. My strongest theory is that my rails client using couchrest-0.33 is not giving up the file descriptors and too many files are being opened on the linux machine.

I am using a smaller dataset(400K docs, DB is roughly 900 MB) but with an intensive word counting/doc similarity view.

I haven't really had time to take a deeper look yet so it's not solved for me yet either. As a workaround, i just used couchdb as a storage bin and using pig/ hadoop to make my necessary computations.

Take a look at the "couchdb server connection refused error" thread on this mailing list. It might be of some help.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200908.mbox/%[email protected]%3e


Tommy

El Sep 1, 2009, a las 11:19 AM, John Wood escribió:

Thanks for the reply Chris.

I'll look into upgrading our test environment to the trunk version of
CouchDB, and see if I can reproduce the error there.

We're using CouchRest version 0.33 as the client library.

Thanks again,
John

On Tue, Sep 1, 2009 at 12:49 PM, Chris Anderson <[email protected]> wrote:

On Tue, Sep 1, 2009 at 7:52 AM, John Wood<[email protected]>
wrote:
Hi everybody,

I'm currently facing an issue with our production installation of
CouchDB.
Two times within the past 5 days, the Erlang process running CouchDB pegs one of the 4 cores on the machine, consumes about 40% of the system RAM
(which is 4GB), and becomes completely unresponsive to incoming HTTP
requests.  The only way we can get it back to normal is to restart
CouchDB.

I'm trying to determine what may be causing this, but I'm not having much luck. Nothing stands out in the CouchDB log files. I can see that there are no entries in the log files from the time it goes unresponsive until
the
time I restart it. Besides that, there doesn't appear to be any errors leading up to the issue. There are however a few errors like the one
below,
but none right before CouchDB goes unresponsive:

[error] [<0.11738.288>] {error_report,<0.21.0>,
  {<0.11738.288>,crash_report,
   [[{pid,<0.11738.288>},
     {registered_name,[]},
     {error_info,
         {error,
             {case_clause,{error,enotconn}},
             [{mochiweb_request,get,2},
              {couch_httpd,handle_request,4},
              {mochiweb_http,headers,5},
              {proc_lib,init_p,5}]}},
     {initial_call,
         {mochiweb_socket_server,acceptor_loop,
[{<0.56.0>,#Port<0.148>,#Fun<mochiweb_http. 1.81679042>}]}},
     {ancestors,
         [couch_httpd,couch_secondary_services,couch_server_sup,
          <0.1.0>]},
     {messages,[]},
     {links,[<0.56.0>,#Port<0.5032425>]},
     {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]},
     {trap_exit,false},
     {status,running},
     {heap_size,28657},
     {stack_size,23},
     {reductions,14034}],
    []]}}
[error] [<0.56.0>] {error_report,<0.21.0>,
  {<0.56.0>,std_error,
   {mochiweb_socket_server,235,
       {child_error,{case_clause,{error,enotconn}}}}}}

=ERROR REPORT==== 30-Aug-2009::04:29:07 ===
{mochiweb_socket_server,235,
                      {child_error,{case_clause,{error,enotconn}}}}

I checked some of the other system log files (/var/log/messages, etc),
and
there doesn't appear to be any information there either.

Our CouchDB installation is fairly large.  We have 7 production
databases,
totaling almost 250GB. The largest database is 129GB. We are running CouchDB 0.9.0 on Red Hat Enterprise Server 5.3. As far as usage goes, we are constantly inserting documents into the database (5,000 at a time via
a
bulk insert), and pausing to regenerate the views after 100,000 documents have been inserted. Besides for the process that does the inserts, all
views are accessed using stale=ok.

Has anybody else faced a similar issue?  Can anybody suggest tips
regarding
how I should go about diagnosing this issue?


Just a guess, based on the information available here, but the
enotconn error suggests that the remote client is dropping the
connection prematurely. There is an old bug about this in the tracker,
which might be a good thing to reopen if we learn much more about the
issue (and it is still present in trunk / 0.10):

http://issues.apache.org/jira/browse/COUCHDB-45

There is also this open bug which could be related:

https://issues.apache.org/jira/browse/COUCHDB-394

Perhaps you have clients who aren't properly closing the connection,
and them somehow this is running up against a limit in the underlying
server system (max number of connections, or maybe even max number of
erlang processes in the vm).

It would be nice to get to the bottom of this one, eventually.

The first step I'd suggest taking is attempting to reproduce on the
0.10.x branch from svn. This will at least tell us if the bug has been
fixed. If it's still around and repeatable, that will give us a test
case for finally crushing it into oblivion.

It might help to know more about which client library you are using,
as this bug seems to depend on the TCP behavior of clients.

Chris

Thanks,
John

--
John Wood
Interactive Mediums
[email protected]




--
Chris Anderson
http://jchrisa.net
http://couch.io




--
John Wood
Interactive Mediums
[email protected]

Reply via email to