So our problem is this: when we restart a region server, or it goes
down, hbase slows down, while we send super high frequency thrift
calls from our PHP front-end APP we actually spawn up 20000+ threads on
thrift, and what this
does is destroys all memory on the boxes, and causes DNs just to shut
down, and everything else crash.

Is there a way to put thread limiter on thrift? Maybe 1000 threads MAX?

-Jack

On Sat, Mar 12, 2011 at 3:31 AM, Suraj Varma <svarma...@gmail.com> wrote:

> >> to:java.lang.OutOfMemoryError: unable to create new native thread
>
> This indicates that you are oversubscribed on your RAM to the extent
> that the JVM doesn't have any space to create native threads (which
> are allocated outside of the JVM heap.)
>
> You may actually have to _reduce_ your heap sizes to allow more space
> for native threads (do an inventory of all the JVM heaps and don't let
> it go over about 75% of available RAM.)
> Another option is to use the -Xss stack size JVM arg to reduce the per
> thread stack size - set it to 512k or 256k (you may have to
> experiment/perf test a bit to see what's the optimum size.
> Or ... get more RAM ...
>
> --Suraj
>
> On Fri, Mar 11, 2011 at 8:11 PM, Jack Levin <magn...@gmail.com> wrote:
> > I am noticing following errors also:
> >
> > 2011-03-11 17:52:00,376 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > 10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438,
> > infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
> > to:java.lang.OutOfMemoryError: unable to create new native thread
> >        at java.lang.Thread.start0(Native Method)
> >        at java.lang.Thread.start(Thread.java:597)
> >        at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
> >        at java.lang.Thread.run(Thread.java:619)
> >
> >
> > and this:
> >
> > nf_conntrack: table full, dropping packet.
> > nf_conntrack: table full, dropping packet.
> > nf_conntrack: table full, dropping packet.
> > nf_conntrack: table full, dropping packet.
> > nf_conntrack: table full, dropping packet.
> > nf_conntrack: table full, dropping packet.
> > net_ratelimit: 10 callbacks suppressed
> > nf_conntrack: table full, dropping packet.
> > possible SYN flooding on port 9090. Sending cookies.
> >
> > This seems like a network stack issue?
> >
> > So, does datanode need higher heap than 1GB?  Or possible we ran out of
> RAM
> > for other reasons?
> >
> > -Jack
> >
> > On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> >
> >> Looks like a datanode went down.  InterruptedException is how java
> >> uses to interrupt IO in threads, its similar to the EINTR errno.  That
> >> means the actual source of the abort is higher up...
> >>
> >> So back to how InterruptedException works... at some point a thread in
> >> the JVM decides that the VM should abort.  So it calls
> >> thread.interrupt() on all the threads it knows/cares about to
> >> interrupt their IO.  That is what you are seeing in the logs. The root
> >> cause lies above I think.
> >>
> >> Look for the first "Exception" string or any FATAL or ERROR strings in
> >> the datanode logfiles.
> >>
> >> -ryan
> >>
> >> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin <magn...@gmail.com> wrote:
> >> > http://pastebin.com/ZmsyvcVc  Here is the regionserver log, they all
> >> have
> >> > similar stuff,
> >> >
> >> > On Thu, Mar 10, 2011 at 11:34 AM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> Whats in the regionserver logs?  Please put up regionserver and
> >> >> datanode excerpts.
> >> >> Thanks Jack,
> >> >> St.Ack
> >> >>
> >> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin <magn...@gmail.com>
> wrote:
> >> >> > All was well, until this happen:
> >> >> >
> >> >> > http://pastebin.com/iM1niwrS
> >> >> >
> >> >> > and all regionservers went down, is this xciever issue?
> >> >> >
> >> >> > <property>
> >> >> > <name>dfs.datanode.max.xcievers</name>
> >> >> > <value>12047</value>
> >> >> > </property>
> >> >> >
> >> >> > this is what I have, should I set it higher?
> >> >> >
> >> >> > -Jack
> >> >> >
> >> >>
> >> >
> >>
> >
>

Reply via email to