So our problem is this: when we restart a region server, or it goes down, hbase slows down, while we send super high frequency thrift calls from our PHP front-end APP we actually spawn up 20000+ threads on thrift, and what this does is destroys all memory on the boxes, and causes DNs just to shut down, and everything else crash.
Is there a way to put thread limiter on thrift? Maybe 1000 threads MAX? -Jack On Sat, Mar 12, 2011 at 3:31 AM, Suraj Varma <svarma...@gmail.com> wrote: > >> to:java.lang.OutOfMemoryError: unable to create new native thread > > This indicates that you are oversubscribed on your RAM to the extent > that the JVM doesn't have any space to create native threads (which > are allocated outside of the JVM heap.) > > You may actually have to _reduce_ your heap sizes to allow more space > for native threads (do an inventory of all the JVM heaps and don't let > it go over about 75% of available RAM.) > Another option is to use the -Xss stack size JVM arg to reduce the per > thread stack size - set it to 512k or 256k (you may have to > experiment/perf test a bit to see what's the optimum size. > Or ... get more RAM ... > > --Suraj > > On Fri, Mar 11, 2011 at 8:11 PM, Jack Levin <magn...@gmail.com> wrote: > > I am noticing following errors also: > > > > 2011-03-11 17:52:00,376 ERROR > > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > > 10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438, > > infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due > > to:java.lang.OutOfMemoryError: unable to create new native thread > > at java.lang.Thread.start0(Native Method) > > at java.lang.Thread.start(Thread.java:597) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132) > > at java.lang.Thread.run(Thread.java:619) > > > > > > and this: > > > > nf_conntrack: table full, dropping packet. > > nf_conntrack: table full, dropping packet. > > nf_conntrack: table full, dropping packet. > > nf_conntrack: table full, dropping packet. > > nf_conntrack: table full, dropping packet. > > nf_conntrack: table full, dropping packet. > > net_ratelimit: 10 callbacks suppressed > > nf_conntrack: table full, dropping packet. > > possible SYN flooding on port 9090. Sending cookies. > > > > This seems like a network stack issue? > > > > So, does datanode need higher heap than 1GB? Or possible we ran out of > RAM > > for other reasons? > > > > -Jack > > > > On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson <ryano...@gmail.com> wrote: > > > >> Looks like a datanode went down. InterruptedException is how java > >> uses to interrupt IO in threads, its similar to the EINTR errno. That > >> means the actual source of the abort is higher up... > >> > >> So back to how InterruptedException works... at some point a thread in > >> the JVM decides that the VM should abort. So it calls > >> thread.interrupt() on all the threads it knows/cares about to > >> interrupt their IO. That is what you are seeing in the logs. The root > >> cause lies above I think. > >> > >> Look for the first "Exception" string or any FATAL or ERROR strings in > >> the datanode logfiles. > >> > >> -ryan > >> > >> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin <magn...@gmail.com> wrote: > >> > http://pastebin.com/ZmsyvcVc Here is the regionserver log, they all > >> have > >> > similar stuff, > >> > > >> > On Thu, Mar 10, 2011 at 11:34 AM, Stack <st...@duboce.net> wrote: > >> > > >> >> Whats in the regionserver logs? Please put up regionserver and > >> >> datanode excerpts. > >> >> Thanks Jack, > >> >> St.Ack > >> >> > >> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin <magn...@gmail.com> > wrote: > >> >> > All was well, until this happen: > >> >> > > >> >> > http://pastebin.com/iM1niwrS > >> >> > > >> >> > and all regionservers went down, is this xciever issue? > >> >> > > >> >> > <property> > >> >> > <name>dfs.datanode.max.xcievers</name> > >> >> > <value>12047</value> > >> >> > </property> > >> >> > > >> >> > this is what I have, should I set it higher? > >> >> > > >> >> > -Jack > >> >> > > >> >> > >> > > >> > > >