oh, really? ulimit -n is 2048, I'd assumed that would be sufficient for just testing on my machine. I was going to use 4096 in production. my hdfs-site.xml has "dfs.datanode.max.xcievers" set to 4096.
As for my logs... there's a lot of "INFO" entries, I haven't gotten around to configuring it down yet - I'm not quite sure why it's so extensive at INFO level. My log files is 4.4gb (is this a sign I've configured or done something wrong?) I grep -v "INFO" in the log to get the actual error entry (assuming the stack trace is actually is on the same line or else those stack lines maybe misleading) ------------ 2013-03-23 15:11:43,653 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1419421989-192.168.1.5-50010-1363780956652, infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due to:java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:691) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133) at java.lang.Thread.run(Thread.java:722) 2013-03-23 15:11:44,177 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1419421989-192.168.1.5-50010-1363780956652, infoPort=50075, ipcPort=50020):DataXceiver java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[closed]. 0 millis timeout left. at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) at java.lang.Thread.run(Thread.java:722) On 3/23/13, Harsh J <ha...@cloudera.com> wrote: > I'm guessing your OutOfMemory then is due to "Unable to create native > thread" message? Do you mind sharing your error logs with us? Cause if > its that, then its a ulimit/system limits issue and not a real memory > issue. > > On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6squee...@gmail.com> wrote: >> I just checked and after running my tests, I generate only 670mb of >> data, on 89 blocks. >> >> What's more, when I ran the test this time, I had increased my memory >> to 2048mb so it completed fine - but I decided to run jconsole through >> the test so I could see what's happenning. The data node never >> exceeded 200mb of memory usage. It mostly stayed under 100mb. >> >> I'm not sure why it would complain about out of memory and shut itself >> down when it was only 1024. It was fairly consistently doing that the >> last few days including this morning right before I switched it to >> 2048. >> >> I'm going to run the test again with 1024mb and jconsole running, none >> of this makes any sense to me. >> >> On 3/23/13, Harsh J <ha...@cloudera.com> wrote: >>> I run a 128 MB heap size DN for my simple purposes on my Mac and it >>> runs well for what load I apply on it. >>> >>> A DN's primary, growing memory consumption comes from the # of blocks >>> it carries. All of these blocks' file paths are mapped and kept in the >>> RAM during its lifetime. If your DN has acquired a lot of blocks by >>> now, like say close to a million or more, then 1 GB may not suffice >>> anymore to hold them in and you'd need to scale up (add more RAM or >>> increase heap size if you have more RAM)/scale out (add another node >>> and run the balancer). >>> >>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6squee...@gmail.com> wrote: >>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local >>>> machines in a single node setup. I'm encountering out of memory errors >>>> on the jvm running my data node. >>>> >>>> I'm pretty sure I can just increase the heap size to fix the errors, >>>> but my question is about how memory is actually used. >>>> >>>> As an example, with other things like an OS's disk-cache or say >>>> databases, if you have or let it use as an example 1gb of ram, it will >>>> "work" with what it has available, if the data is more than 1gb of ram >>>> it just means it'll swap in and out of memory/disk more often, i.e. >>>> the cached data is smaller. If you give it 8gb of ram it still >>>> functions the same, just performance increases. >>>> >>>> With my hdfs setup, this does not appear to be true, if I allocate it >>>> 1gb of heap, it doesn't just perform worst / swap data to disk more. >>>> It out right fails with out of memory and shuts the data node down. >>>> >>>> So my question is... how do I really tune the memory / decide how much >>>> memory I need to prevent shutdowns? Is 1gb just too small even on a >>>> single machine test environment with almost no data at all, or is it >>>> suppose to work like OS-disk caches were it always works but just >>>> performs better or worst and I just have something configured wrong?. >>>> Basically my objective isn't performance, it's that the server must >>>> not shut itself down, it can slow down but not shut off. >>>> >>>> -- >>>> Ted. >>> >>> >>> >>> -- >>> Harsh J >>> >> >> >> -- >> Ted. > > > > -- > Harsh J > -- Ted.