stderr shows this when I hit an empty response: heart_beat_kill_pid = 17700 heart_beat_timeout = 11 Killed heart: Sun Aug 19 18:23:54 2012: Erlang has closed. heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k". Terminating. heart_beat_kill_pid = 18390 heart_beat_timeout = 11 Killed heart: Sun Aug 19 18:35:18 2012: Erlang has closed. heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k". Terminating. heart_beat_kill_pid = 18438 heart_beat_timeout = 11
So, it looks like the OS is killing the process because it's running out of memory. I can see in syslog that the oom-killer is killing processes at exactly the same time. What's strange, though, is there's no mention of oom-killer killing couchdb. There's only mentions of other processes being killed. On Sun, Aug 19, 2012 at 8:15 AM, Robert Newson <rnew...@apache.org> wrote: > 3.9Mb isn't large enough to trigger memory issues on its own on a node with > 380M of ram. Can you use 'top' or 'atop' to see what memory consumption was > like before the crash? Erlang/OTP does usually report out of memory errors > when it crashes (to stderr which doesn't hit the .log file, iirc). > > B. > > > On 19 Aug 2012, at 11:30, CGS wrote: > >> On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall <tisd...@gmail.com> wrote: >> >>> So, it's possible that couchdb is running out of memory when >>> processing a large JSON file? >> >> >> Definitely. >> >> >>> From my last example I gave, the JSON >>> file is 3.9Mb which I didn't think was too big, but I do only have >>> ~380Mb of RAM. However, I am able to do several thousand similar >>> _bulk_doc updates of around the same size before I see the error... >>> are memory leaks possible with erlang? >> >> >> It looks more like a RAM limitation per process. There may be a memory >> leak, but I am not sure. >> >> >>> Also, why is there nothing in >>> the logs about running out of memory? (shouldn't that be something >>> the program is able to detect?) >>> >> >> It seems CouchDB doesn't catch this type of warnings. >> >> >>> >>> I switched over to using _bulk_doc's because the database grew way too >>> fast if I did only 1 update at a time. I'm doing about 5000 - 200000 >>> document updates each time I run my script so I've been doing the >>> updates in batches of 150. >>> >> >> I don't know about your requirements, but I remember a project in which I >> created a round-robin to buffer and feed the docs to CouchDB. In that >> project I had to find an optimization in between the number of slices and >> the number of docs I could store for being able to feed to CouchDB in order >> to minimize the insertion time. Maybe this idea will help you in your >> project as well. >> >> CGS >> >> >> >>> >>> -Tim >>> >>> On Fri, Aug 17, 2012 at 9:33 PM, CGS <cgsmcml...@gmail.com> wrote: >>>> I managed to reproduce the error: >>>> >>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params: [] >>>> [Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include Doc: >>>> <<"_design/_replicator">> {1, >>>> >>> <<91,250,44,153, >>>> >>> 238,254,43,46, >>>> >>>> 180,150,45,181, >>>> >>>> 10,163,207,212>>} >>>> [Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB has >>>> started on http://0.0.0.0:5984/ >>>> >>>> ...and I think I identified also the problem: too long/large JSON. >>>> >>>> Here is how to reproduce the error: >>>> >>>> 1. CouchDB error level: debug >>>> 2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}" > >>>> my_json.json && for var in $(seq 2 2000000) ; do echo -n >>>> ",{\"key\":\"${var}\"}" >> my_json.json ; done && echo -n "]}" >> >>>> my_json.json >>>> 3. attempting to send it with curl (requires to have database "test" >>>> already existing and preferably empty): >>>> >>>> curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type: >>>> application/json' -d @my_json.json > /dev/null >>>> % Total % Received % Xferd Average Speed Time Time Time >>>> Current >>>> Dload Upload Total Spent Left >>>> Speed >>>> 100 33.2M 0 0 100 33.2M 0 856k 0:00:39 0:00:39 --:--:-- >>>> 0 >>>> curl: (52) Empty reply from server >>>> >>>> Erlang shell report for the same problem: >>>> >>>> =INFO REPORT==== 18-Aug-2012::03:12:57 === >>>> alarm_handler: {set,{system_memory_high_watermark,[]}} >>>> >>>> =INFO REPORT==== 18-Aug-2012::03:12:57 === >>>> alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}} >>>> /usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has >>>> closed.Erlang has closed >>>> >>>> Tim, try to split your JSON in smaller pieces. Bulk operations tend to >>> use >>>> a lot of memory. >>>> >>>> The _design/_replicator error comes with multipart file set by cURL by >>>> default in such cases. Once a second piece is sent toward the server, the >>>> crash is registered. The first piece report looks like: >>>> >>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST' >>> /test/_bulk_docs >>>> {1,1} from "127.0.0.1" >>>> >>>> I hope this info may help. >>>> >>>> CGS >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Aug 17, 2012 at 7:30 PM, Tim Tisdall <tisd...@gmail.com> wrote: >>>> >>>>> Okay, so it always states that _replicator line any time I manually >>>>> restart the server. I think it's just a standard logging message when >>>>> the level is set to "debug". >>>>> >>>>> On Fri, Aug 17, 2012 at 1:13 PM, Tim Tisdall <tisd...@gmail.com> wrote: >>>>>> No. All my ids (except for design documents) are strings containing >>>>>> integers. Also, none of my design documents are called anything like >>>>>> "_replicator". The only thing with that name is in the _replicator >>>>>> database which I'm not doing anything with. >>>>>> >>>>>> Why does it say "Include Doc"? And what's that series of numbers >>>>>> afterwards? That log message seems to consistently occur just before >>>>>> the log message about the server starting. Is that just a normal >>>>>> message you get when the server restarts and you have logging set to >>>>>> "debug"? >>>>>> >>>>>> >>>>>> On Fri, Aug 17, 2012 at 1:03 PM, Robert Newson <rnew...@apache.org> >>>>> wrote: >>>>>>> >>>>>>> Does app_stats_test contain a document called _design/_replicator or >>> is >>>>> a document with that id in the body of your bulk post? >>>>>>> >>>>>>> B. >>>>>>> >>>>>>> On 17 Aug 2012, at 17:52, Tim Tisdall wrote: >>>>>>> >>>>>>>> I do have UTF8 characters in the JSON, but isn't that acceptable? I >>>>>>>> have no problem retrieving UTF8 encoded content from the server and >>> I >>>>>>>> have a bunch of it saved in there already too. >>>>>>>> >>>>>>>> On Fri, Aug 17, 2012 at 10:35 AM, CGS <cgsmcml...@gmail.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Do you have somehow special characters (non-latin1 ones) in your >>>>> JSON? That >>>>>>>>> error looks strangely close to trying to transform a list of >>> unicode >>>>>>>>> characters into a binary. I might be wrong though. >>>>>>>>> >>>>>>>>> CGS >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Aug 17, 2012 at 4:09 PM, Tim Tisdall <tisd...@gmail.com> >>>>> wrote: >>>>>>>>> >>>>>>>>>> I thought I added that to the init script before when you >>> mentioned >>>>>>>>>> it, but I checked and it was gone. I added a "cd ~couchdb" in >>> there >>>>>>>>>> and now I no longer get eaccess errors, but the process still >>> crashes >>>>>>>>>> with very little information: >>>>>>>>>> >>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] 'POST' >>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1" >>>>>>>>>> Headers: [{'Accept',"*/*"}, >>>>>>>>>> {'Content-Length',"3902444"}, >>>>>>>>>> {'Content-Type',"application/json"}, >>>>>>>>>> {'Host',"localhost:5984"}] >>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] OAuth >>> Params: [] >>>>>>>>>> [Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>] Include Doc: >>>>>>>>>> <<"_design/_replicator">> {1, >>>>>>>>>> >>>>>>>>>> <<91,250,44,153, >>>>>>>>>> >>>>>>>>>> 238,254,43,46, >>>>>>>>>> >>>>>>>>>> 180,150,45,181, >>>>>>>>>> >>>>>>>>>> 10,163,207,212>>} >>>>>>>>>> [Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>] Apache CouchDB >>> has >>>>>>>>>> started on http://127.0.0.1:5984/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Someone mentioned seeing the JSON that I'm submitting... Wouldn't >>>>>>>>>> mal-formed JSON throw an error? >>>>>>>>>> >>>>>>>>>> -Tim >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 17, 2012 at 4:33 AM, Robert Newson < >>> rnew...@apache.org> >>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I've seen couchdb start despite the eacces errors before and >>>>> tracked it >>>>>>>>>> down to the current working directory setting. It seems that the >>> cwd >>>>> is >>>>>>>>>> searched first, and then erlang looks elsewhere. So, if our >>> startup >>>>> script >>>>>>>>>> doesn't change it to somewhere that the couchdb user can read, you >>>>> get >>>>>>>>>> spurious eacces errors. >>>>>>>>>>> >>>>>>>>>>> Don't ask me how I know this. >>>>>>>>>>> >>>>>>>>>>> B. >>>>>>>>>>> >>>>>>>>>>> On 16 Aug 2012, at 20:19, Tim Tisdall wrote: >>>>>>>>>>> >>>>>>>>>>>> Paul, did you ever solve the eaccess problem you had described >>>>> here: >>>>>>>>>>>> >>>>>>>>>> >>>>> >>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3c4e0b304f.5080...@lymegreen.co.uk%3E >>>>>>>>>>>> I found that post from doing Google searches for my issue. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 14, 2012 at 11:41 PM, Paul Davis >>>>>>>>>>>> <paul.joseph.da...@gmail.com> wrote: >>>>>>>>>>>>> On Tue, Aug 14, 2012 at 9:38 PM, Tim Tisdall < >>> tisd...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>>>>> I'm still having problems with couchdb, but I'm trying out >>>>> different >>>>>>>>>>>>>> things to see if I can narrow down what the problem is... >>>>>>>>>>>>>> >>>>>>>>>>>>>> I stopped using fsockopen() in PHP and am using curl now to >>>>> hopefully >>>>>>>>>>>>>> be able to see more debugging info. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I get an empty response when sending a POST to _bulk_docs. >>> From >>>>> the >>>>>>>>>>>>>> couch logs it seems like the server restarts in the middle of >>>>>>>>>>>>>> processing the request. Here's what I have in my logs: (I >>> have >>>>> no >>>>>>>>>>>>>> idea what the _replicator portion is about there, I'm >>> currently >>>>> not >>>>>>>>>>>>>> using it) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] 'POST' >>>>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1" >>>>>>>>>>>>>> Headers: [{'Accept',"*/*"}, >>>>>>>>>>>>>> {'Content-Length',"2802300"}, >>>>>>>>>>>>>> {'Content-Type',"application/json"}, >>>>>>>>>>>>>> {'Host',"localhost:5984"}] >>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] OAuth >>>>> Params: [] >>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [debug] [<0.115.0>] Include >>> Doc: >>>>>>>>>>>>>> <<"_design/_replicator">> {1, >>>>>>>>>>>>>> >>>>>>>>>> <<91,250,44,153, >>>>>>>>>>>>>> >>>>>>>>>> 238,254,43,46, >>>>>>>>>>>>>> >>>>>>>>>> 180,150,45,181, >>>>>>>>>>>>>> >>>>>>>>>> 10,163,207,212>>} >>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [info] [<0.32.0>] Apache >>> CouchDB >>>>> has >>>>>>>>>>>>>> started on http://127.0.0.1:5984/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> In my code logs I have the following by running curl in >>> verbose >>>>> mode: >>>>>>>>>>>>>> >>>>>>>>>>>>>> * About to connect() to localhost port 5984 (#0) >>>>>>>>>>>>>> * Trying 127.0.0.1... * connected >>>>>>>>>>>>>> * Connected to localhost (127.0.0.1) port 5984 (#0) >>>>>>>>>>>>>>> POST /app_stats_test/_bulk_docs HTTP/1.0 >>>>>>>>>>>>>> Host: localhost:5984 >>>>>>>>>>>>>> Accept: */* >>>>>>>>>>>>>> Content-Type: application/json >>>>>>>>>>>>>> Content-Length: 2802300 >>>>>>>>>>>>>> >>>>>>>>>>>>>> * Empty reply from server >>>>>>>>>>>>>> * Connection #0 to host localhost left intact >>>>>>>>>>>>>> curl error: 52 : Empty reply from server >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I also tried using HTTP/1.1 and I get an empty response after >>>>>>>>>>>>>> receiving only a "100 Continue", but the end result appears >>> the >>>>> same. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Tim >>>>>>>>>>>>> >>>>>>>>>>>>> If you have a request that triggers this, a good way to catch >>> it >>>>> is >>>>>>>>>> like such: >>>>>>>>>>>>> >>>>>>>>>>>>> $ /usr/local/bin/couchdb # or however you start it >>>>>>>>>>>>> $ ps ax | grep beam.smp # Get the pid of couchdb >>>>>>>>>>>>> $ gdb >>>>>>>>>>>>> (gdb) attach $pid # Where $pid was just found with ps. >>> Might >>>>>>>>>>>>> throw up an access prompt >>>>>>>>>>>>> (gdb) continue >>>>>>>>>>>>> # At this point, run the command that makes couchdb reboot >>>>> in a >>>>>>>>>>>>> # different console. If it happens you should see Gdb >>> notice >>>>> the >>>>>>>>>>>>> # error. Then the following: >>>>>>>>>>>>> (gdb) t a a bt >>>>>>>>>>>>> >>>>>>>>>>>>> And that should spew out a bunch of stack traces. If you can >>> get >>>>> that >>>>>>>>>>>>> we should be able to fairly specifically narrow down the issue. >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> >