Re: major hdfs issues

Suraj Varma Sat, 12 Mar 2011 03:31:46 -0800

>> to:java.lang.OutOfMemoryError: unable to create new native thread

This indicates that you are oversubscribed on your RAM to the extent
that the JVM doesn't have any space to create native threads (which
are allocated outside of the JVM heap.)


You may actually have to _reduce_ your heap sizes to allow more space
for native threads (do an inventory of all the JVM heaps and don't let
it go over about 75% of available RAM.)
Another option is to use the -Xss stack size JVM arg to reduce the per
thread stack size - set it to 512k or 256k (you may have to
experiment/perf test a bit to see what's the optimum size.
Or ... get more RAM ...

--Suraj

On Fri, Mar 11, 2011 at 8:11 PM, Jack Levin <magn...@gmail.com> wrote:
> I am noticing following errors also:
>
> 2011-03-11 17:52:00,376 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438,
> infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
> to:java.lang.OutOfMemoryError: unable to create new native thread
>        at java.lang.Thread.start0(Native Method)
>        at java.lang.Thread.start(Thread.java:597)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> and this:
>
> nf_conntrack: table full, dropping packet.
> nf_conntrack: table full, dropping packet.
> nf_conntrack: table full, dropping packet.
> nf_conntrack: table full, dropping packet.
> nf_conntrack: table full, dropping packet.
> nf_conntrack: table full, dropping packet.
> net_ratelimit: 10 callbacks suppressed
> nf_conntrack: table full, dropping packet.
> possible SYN flooding on port 9090. Sending cookies.
>
> This seems like a network stack issue?
>
> So, does datanode need higher heap than 1GB?  Or possible we ran out of RAM
> for other reasons?
>
> -Jack
>
> On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>
>> Looks like a datanode went down.  InterruptedException is how java
>> uses to interrupt IO in threads, its similar to the EINTR errno.  That
>> means the actual source of the abort is higher up...
>>
>> So back to how InterruptedException works... at some point a thread in
>> the JVM decides that the VM should abort.  So it calls
>> thread.interrupt() on all the threads it knows/cares about to
>> interrupt their IO.  That is what you are seeing in the logs. The root
>> cause lies above I think.
>>
>> Look for the first "Exception" string or any FATAL or ERROR strings in
>> the datanode logfiles.
>>
>> -ryan
>>
>> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin <magn...@gmail.com> wrote:
>> > http://pastebin.com/ZmsyvcVc  Here is the regionserver log, they all
>> have
>> > similar stuff,
>> >
>> > On Thu, Mar 10, 2011 at 11:34 AM, Stack <st...@duboce.net> wrote:
>> >
>> >> Whats in the regionserver logs?  Please put up regionserver and
>> >> datanode excerpts.
>> >> Thanks Jack,
>> >> St.Ack
>> >>
>> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin <magn...@gmail.com> wrote:
>> >> > All was well, until this happen:
>> >> >
>> >> > http://pastebin.com/iM1niwrS
>> >> >
>> >> > and all regionservers went down, is this xciever issue?
>> >> >
>> >> > <property>
>> >> > <name>dfs.datanode.max.xcievers</name>
>> >> > <value>12047</value>
>> >> > </property>
>> >> >
>> >> > this is what I have, should I set it higher?
>> >> >
>> >> > -Jack
>> >> >
>> >>
>> >
>>
>

Re: major hdfs issues

Reply via email to