On Wednesday, September 18, 2013, James Marca wrote: > Dear list, > > For future reference, I think my problem is solved, and it doesn't > appear to be a CouchDB or Erlang thing, but rather a library/Gentoo > Linux issue. > > This is a Gentoo Linux box, and Gentoo likes to be rebuilt from top to > bottom every 6 months or so, I bit the bullet and did that. In the > process I noticed here and there messages about links to icu library > within couchdb that required a rebuild of couchdb. So, wildly > guessing, I *think* that was the problem...an older build of icu was > being used during the couchdb build, but was incompatible with some > other, more recently built system library. > > Or perhaps it was something else. Regardless, a rebuild of everything > solved the problems I was having. Been stable for a few hours now with > about twice the load that was crashing it before. > > Thanks, > > James Marca
The odd thing is it was looking in the lost+found folder . Like your files have been delted or smth happened during a restart and/or fs checking. Imo you could find such events in the system logs. It would also explain wby a rebuild fixed the things. - benoit > > On Mon, Sep 16, 2013 at 08:28:09PM +0200, Dave Cottlehuber wrote: > > My gut feel is that some OS thing is killing off beam and the usual > > suspect for that is OOM. I see you've noted nothing wrt in logs > > though. > > > > On ubuntu > 12.x this works: > > > > ps -ef |grep beam > > # you'll see 2 processes, so do this for both pids > > cat /proc/$PID/oom_score > > 124 > > # echo '-1000' > /proc/$PID/oom_score_adj > > # cat /proc/$PID/oom_score > > > > > > only other advice I can offer is to login & run as sudo <couchdb_user> > > `couchdb -i` for a while, it's interactive mode and *maybe* something > > useful will be left… > > > > > > > > On 16 September 2013 18:59, James Marca > > <[email protected]<javascript:;>> > wrote: > > > On Sun, Sep 15, 2013 at 10:10:24PM -0700, James Marca wrote: > > >> On Sun, Sep 15, 2013 at 08:04:27PM +0200, Dave Cottlehuber wrote: > > >> > NIF scheduler issues could be a reasonable suspect; > > >> > > > >> > heart: Fri Sep 13 20:59:36 2013: heart-beat time-out, no activity > for > > >> > 15 seconds > > >> > > > >> > 15 seconds is a *long* time however. > > >> > > > >> > 1.4.0 needs 14B04 or higher I think due to one of our dependencies, > so > > >> > I'd suggest reverting back to that & seeing if you are having any > > >> > other issues. > > >> > > > >> > Also, probably unrelated, why is kernel polling disabled? > > >> > > >> Honestly, on my gentoo boxes I just use the ebuild. I have no idea > > >> why kernel polling is false...it is whatever the default is in the > > >> ebuild I guess. I have no clue about whether kpoll should be enabled, > > >> so I'm trusting the default. > > > > > > correction. kernel polling is enabled. The kpoll option is set when > > > building, and /usr/bin/couchdb has +K true. If I invoke erl with +K > true, then > > > kpoll=true. One think I do not havae though is HIPE enabled. > > > > > > -- > > > This message has been scanned for viruses and > > > dangerous content by MailScanner, and is > > > believed to be clean. > > > > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > >
