Re: CouchDB pegging the CPU and not responding to requests

Tommy Chheng Tue, 01 Sep 2009 11:35:45 -0700

Hey John,

I'm encountering a similar problem where the server or client can nolonger make the connection. My strongest theory is that my railsclient using couchrest-0.33 is not giving up the file descriptors andtoo many files are being opened on the linux machine.

I am using a smaller dataset(400K docs, DB is roughly 900 MB) but withan intensive word counting/doc similarity view.

I haven't really had time to take a deeper look yet so it's not solvedfor me yet either.As a workaround, i just used couchdb as a storage bin and using pig/hadoop to make my necessary computations.

Take a look at the "couchdb server connection refused error" thread onthis mailing list. It might be of some help.


http://mail-archives.apache.org/mod_mbox/couchdb-user/200908.mbox/%[email protected]%3e


Tommy

El Sep 1, 2009, a las 11:19 AM, John Wood escribió:

Thanks for the reply Chris.

I'll look into upgrading our test environment to the trunk version of
CouchDB, and see if I can reproduce the error there.

We're using CouchRest version 0.33 as the client library.

Thanks again,
John

On Tue, Sep 1, 2009 at 12:49 PM, Chris Anderson <[email protected]>wrote:

On Tue, Sep 1, 2009 at 7:52 AM, JohnWood<[email protected]>

wrote:

Hi everybody,

I'm currently facing an issue with our production installation of

CouchDB.

Two times within the past 5 days, the Erlang process runningCouchDB pegsone of the 4 cores on the machine, consumes about 40% of thesystem RAM
(which is 4GB), and becomes completely unresponsive to incoming HTTP
requests.  The only way we can get it back to normal is to restart

CouchDB.

I'm trying to determine what may be causing this, but I'm nothaving muchluck. Nothing stands out in the CouchDB log files. I can seethat thereare no entries in the log files from the time it goes unresponsiveuntil

the

time I restart it. Besides that, there doesn't appear to be anyerrorsleading up to the issue. There are however a few errors like theone

below,

but none right before CouchDB goes unresponsive:

[error] [<0.11738.288>] {error_report,<0.21.0>,
  {<0.11738.288>,crash_report,
   [[{pid,<0.11738.288>},
     {registered_name,[]},
     {error_info,
         {error,
             {case_clause,{error,enotconn}},
             [{mochiweb_request,get,2},
              {couch_httpd,handle_request,4},
              {mochiweb_http,headers,5},
              {proc_lib,init_p,5}]}},
     {initial_call,
         {mochiweb_socket_server,acceptor_loop,

[{<0.56.0>,#Port<0.148>,#Fun<mochiweb_http.1.81679042>}]}},

     {ancestors,
         [couch_httpd,couch_secondary_services,couch_server_sup,
          <0.1.0>]},
     {messages,[]},
     {links,[<0.56.0>,#Port<0.5032425>]},
     {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]},
     {trap_exit,false},
     {status,running},
     {heap_size,28657},
     {stack_size,23},
     {reductions,14034}],
    []]}}
[error] [<0.56.0>] {error_report,<0.21.0>,
  {<0.56.0>,std_error,
   {mochiweb_socket_server,235,
       {child_error,{case_clause,{error,enotconn}}}}}}

=ERROR REPORT==== 30-Aug-2009::04:29:07 ===
{mochiweb_socket_server,235,
                      {child_error,{case_clause,{error,enotconn}}}}

I checked some of the other system log files (/var/log/messages,etc),

and

there doesn't appear to be any information there either.

Our CouchDB installation is fairly large.  We have 7 production

databases,

totaling almost 250GB. The largest database is 129GB. We arerunningCouchDB 0.9.0 on Red Hat Enterprise Server 5.3. As far as usagegoes, weare constantly inserting documents into the database (5,000 at atime via

bulk insert), and pausing to regenerate the views after 100,000documentshave been inserted. Besides for the process that does theinserts, all
views are accessed using stale=ok.

Has anybody else faced a similar issue?  Can anybody suggest tips

regarding

how I should go about diagnosing this issue?


Just a guess, based on the information available here, but the
enotconn error suggests that the remote client is dropping the

connection prematurely. There is an old bug about this in thetracker,

which might be a good thing to reopen if we learn much more about the
issue (and it is still present in trunk / 0.10):

http://issues.apache.org/jira/browse/COUCHDB-45

There is also this open bug which could be related:

https://issues.apache.org/jira/browse/COUCHDB-394

Perhaps you have clients who aren't properly closing the connection,
and them somehow this is running up against a limit in the underlying
server system (max number of connections, or maybe even max number of
erlang processes in the vm).

It would be nice to get to the bottom of this one, eventually.

The first step I'd suggest taking is attempting to reproduce on the

0.10.x branch from svn. This will at least tell us if the bug hasbeen

fixed. If it's still around and repeatable, that will give us a test
case for finally crushing it into oblivion.

It might help to know more about which client library you are using,
as this bug seems to depend on the TCP behavior of clients.

Chris

Thanks,
John

--
John Wood
Interactive Mediums
[email protected]




--
Chris Anderson
http://jchrisa.net
http://couch.io




--
John Wood
Interactive Mediums
[email protected]

Re: CouchDB pegging the CPU and not responding to requests

Reply via email to