FW: Issues with performance on Hadoop/Hive

2009-09-01 Thread Ramiya V
Hi, I just wanted to add that I know 45GB data is really less to test the performance of Hadoop/Hive as it needs data in Terabytes. Actually I have to implement a POC and it requires me to test only 45GB of data. Please let me know if the performance can be improved. Thanks, Ramya __

Issues with performance on Hadoop/Hive

2009-09-01 Thread Ramiya V
Hi, I have set up a 4 (physica) nodes Hadoop cluster. Configuration: 2GB RAM each machine. Currently am using the sub-project Hive for firing queries on 45GB of data. I have certain queries that need to be resolved:- 1) The performance that I am getting with the above setup is quite bad. It ta

Re: CFP of 3rd Hadoop in China event (Hadoop World:Beijing)

2009-09-01 Thread He Yongqiang
Hi all, The 3rd Hadoop in China event (Hadoop World:Beijing 2009) is open for registration now. http://hadoop-world-beijing.eventbrite.com/ Please register as early as possible. Thanks, Yongqiang On 09-8-22 上午12:21, "He Yongqiang" wrote: > > http://www.hadooper.c

Re: Datanode high memory usage

2009-09-01 Thread Bryan Talbot
For info on newer JDK support for compressed oops, see http://java.sun.com/javase/6/webnotes/6u14.html and http://wikis.sun.com/display/HotSpotInternals/CompressedOops -Bryan On Sep 1, 2009, at Sep 1, 12:21 PM, Brian Bockelman wrote: On Sep 1, 2009, at 1:58 PM, Stas Oskin wrote: Hi.

Re: Datanode high memory usage

2009-09-01 Thread Brian Bockelman
On Sep 1, 2009, at 1:58 PM, Stas Oskin wrote: Hi. With regards to memory, have you tried the compressed pointers JDK option (we saw great benefits on the NN)? Java is incredibly hard to get a straight answer from with regards to memory. You need to perform a GC first manually - the act

Re: Datanode high memory usage

2009-09-01 Thread Brian Bockelman
On Sep 1, 2009, at 2:02 PM, Stas Oskin wrote: Hi. What does 'up to 700MB' mean? Is it JVM's virtual memory? resident memory? or java heap in use? 700 MB is what taken by overall java process. Resident, shared, or virtual? Unix memory management is not straightforward; the worst thi

Re: Datanode high memory usage

2009-09-01 Thread Stas Oskin
Hi. What does 'up to 700MB' mean? Is it JVM's virtual memory? resident memory? > or java heap in use? > 700 MB is what taken by overall java process. > > How many blocks to you have? For an idle DN, most of the memory is taken by > block info structures. It does not really optimize for it.. May

Re: Datanode high memory usage

2009-09-01 Thread Stas Oskin
Hi. [ > https://issues.apache.org/jira/browse/HADOOP-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Does it has any effect on the issue I have? It seems from the description that the issues are related to various node task, and not to one particular. Regards.

Re: Datanode high memory usage

2009-09-01 Thread Stas Oskin
Hi. > With regards to memory, have you tried the compressed pointers JDK option > (we saw great benefits on the NN)? Java is incredibly hard to get a > straight answer from with regards to memory. You need to perform a GC first > manually - the actual usage is the amount it reports used post-GC

Re: Datanode high memory usage

2009-09-01 Thread Stas Oskin
Hi. The datanode would be using the major part of memory to do following- > a. Continuously (at regular interval) send heartbeat messages to namenode > to > say 'I am live and awake' > b. In case, any data/file is added to DFS, OR Map Reduce jobs are running, > datanode would again be talking to n

Re: Datanode high memory usage

2009-09-01 Thread Raghu Angadi
I think this thread is moving in all the possible directions... without many details on original problem. There is no need to speculate on where the memory goes you can run 'jmap -histo:live' and 'jmap -heap' to get much better idea. What does 'up to 700MB' mean? Is it JVM's virtual memory?

Re: DistCp - NoClassDefFoundError

2009-09-01 Thread Boris Shkolnik
After split Distcp belongs to hadoop-mapreduce. Make sure hadoop-mapred-tools-$VER-dev.jar is in your classpath. Boris. On 8/31/09 2:19 PM, "Kevin Peterson" wrote: > On Fri, Aug 28, 2009 at 10:34 AM, mpiller wrote: > >> >> I am using the DistCp class inside of my application to copy final ou

Re: Datanode high memory usage

2009-09-01 Thread indoos
Hi, The recommended RAM for namenode,datanode, jobtracker and tasktracker is 1 GB. The datanode would be using the major part of memory to do following- a. Continuously (at regular interval) send heartbeat messages to namenode to say 'I am live and awake' b. In case, any data/file is added to DFS,

RE: hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink

2009-09-01 Thread umer arshad
I have resolved the issue: What i did: 1) '/etc/init.d/iptables stop' -->stopped firewall 2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux I worked for me after these two changes. thanks, --umer > From: m_umer_ars...@hotmail.com > To: common-user@hadoop.apache.org > Subject

RE: Datanode high memory usage

2009-09-01 Thread Amogh Vasekar
Ahh.. very luckily got a mesg on that jira today itself. -- [ https://issues.apache.org/jira/browse/HADOOP-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-6168. -- Resolution: Duplicat

Re: Datanode high memory usage

2009-09-01 Thread Brian Bockelman
Hey Mafish, If you are getting 1-2m blocks on a single datanode, you'll have many other problems - especially with regards to periodic block reports. With regards to memory, have you tried the compressed pointers JDK option (we saw great benefits on the NN)? Java is incredibly hard to ge

Re: Cloudera Video - Hadoop build on eclipse

2009-09-01 Thread Steve Loughran
ashish pareek wrote: Hello Bharath, Earlier even I faced the same problem. I think your are accessing internet through proxy.So try using direct broadband connection. Hope this will solve your problem. or set Ant's proxy up http://ant.apache.org/manual/proxy.html Ashish

Re: Datanode high memory usage

2009-09-01 Thread Mafish Liu
2009/9/1 Mafish Liu : > Both NameNode and DataNode will be affected by number of files greatly. > In my test, almost 60% memory are used in datanodes while storing 1m > files, and the value reach 80% with 2m files. > My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes test bed >

Re: Datanode high memory usage

2009-09-01 Thread Mafish Liu
Both NameNode and DataNode will be affected by number of files greatly. In my test, almost 60% memory are used in datanodes while storing 1m files, and the value reach 80% with 2m files. My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes have 2GB memory and replication is 3. 2009/

Re: Datanode high memory usage

2009-09-01 Thread Stas Oskin
Hi. 2009/9/1 Amogh Vasekar > This wont change the daemon configs. > Hadoop by default allocates 1000MB of memory for each of its daemons, which > can be controlled by HADOOP_HEAPSIZE, HADOOP_NAMENODE_OPTS, > HADOOP_TASKTRACKER_OPTS in the hadoop script. > However, there was a discussion on this

Re: Datanode high memory usage

2009-09-01 Thread Stas Oskin
Hi. 2009/9/1 Mafish Liu > Did you have many small files in your system? > > Yes, quite plenty. But this should influence the Namenode, and not the Datanode, correct? Regards.