Re: Disable sorting in reducer

2012-10-16 Thread Radim Kolar
I have a need of reducers because i am writing multiple outputs/ you can write multiple outputs in mapper too - see multipleoutputformat

Re: Hadoopn1.03 There is insufficient memory for the Java Runtime Environment to continue.

2012-10-16 Thread Attila Csordas
On Mon, Oct 8, 2012 at 4:29 PM, Arpit Gupta ar...@hortonworks.com wrote: i would recommended using the oracle jdk. oracle version didn't help Also from your email below you mention that mapred.child.java.opts and mapred.child.ulimit were added to try to solve this problem. Are you setting

Re: Hadoop installation on mac

2012-10-16 Thread Bejoy KS
Hi Suneel You can get the latest stable versions of hadoop from the following url http://hadoop.apache.org/releases.html#Download to download choose a mirror and slect the stable versions (the ones Harsh suggested) you like to go for. (the 1.0.x releases are the current stable versions)

Re: Hadoop and CUDA

2012-10-16 Thread sudha sadhasivam
Hello When we create a jar file for hadoop programs from command prompt it runs faster. When we create a jar file from netbeans it runs slower. We could not understand the problem. This is important as we are trying to work with hadoop and CUDA (jcuda).We could create a jar file only using

Re: Hadoop and CUDA

2012-10-16 Thread Manoj Babu
Hi, If it is a runnable jar you are creating from netbeans Check only the necessary dependencies are added. Cheers! Manoj. On Tue, Oct 16, 2012 at 11:38 AM, sudha sadhasivam sudhasadhasi...@yahoo.com wrote: Hello When we create a jar file for hadoop programs from command prompt it runs

Re: Hadoop and CUDA

2012-10-16 Thread sudha sadhasivam
The code executes, but time taken for execution is high Does not show any advantages in two levels of parallelism G Sudha --- On Tue, 10/16/12, Manoj Babu manoj...@gmail.com wrote: From: Manoj Babu manoj...@gmail.com Subject: Re: Hadoop and CUDA To: user@hadoop.apache.org Date: Tuesday, October

Re: problem using s3 instead of hdfs

2012-10-16 Thread Hemanth Yamijala
Hi, I've not tried this on S3. However, the directory mentioned in the exception is based on the value of this particular configuration key: mapreduce.jobtracker.staging.root.dir. This defaults to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3 location and try ? Thanks

Re: final the dfs.replication and fsck

2012-10-16 Thread Patai Sangbutsarakum
Thanks you so much for confirming that. On Mon, Oct 15, 2012 at 9:25 PM, Harsh J ha...@cloudera.com wrote: Patai, My bad - that was on my mind but I missed noting it down on my earlier reply. Yes you'd have to control that as well. 2 should be fine for smaller clusters. On Tue, Oct 16,

{Sale Hadoop World Registration - full 3 day conference access}

2012-10-16 Thread Zoe
Have bought it but now need to cancel trips. can give away with 30% discount. please contact this email or call 408-821-5915 asap.

Re: GroupingComparator

2012-10-16 Thread Dave Beech
Great! Glad the problem is solved. You're right - the object returned by iterator.next() is re-used too. So yes, you would need to clone in this case and you'd have no choice but to create new objects. Please be sure though that you really do need to store values in a list to do what you're

Re: possible resource leak in capacity scheduler

2012-10-16 Thread Radim Kolar
Dne 16.10.2012 1:13, Vinod Kumar Vavilapalli napsal(a): Which version are you running? it was branch-0.23 but after i updated to trunk or latest branch-0.23 it seems to work fine now.

datanodes doesn't work in HDFS

2012-10-16 Thread Khaled Ben Bahri
Hi all, I installed Hadoop HDFS on 3 nodes, a namenode and 2 datanodes, when i want to start dfs processes, Only secondaryNameNode is launched but the namenode datanodes processes doesn't work there is a print screen to illustrate what a said Thanks in advance Regards Khaled

Re: datanodes doesn't work in HDFS

2012-10-16 Thread Mohammad Tariq
Hi Khaled, I cant' find any attachment. Also, could you please provide us the logs??It seems there is some config related issue. Regards, Mohammad Tariq On Tue, Oct 16, 2012 at 2:48 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote: Hi all, I installed Hadoop HDFS on 3 nodes, a

Re: datanodes doesn't work in HDFS

2012-10-16 Thread Mohammad Tariq
Could you please post the logs ? Regards, Mohammad Tariq On Tue, Oct 16, 2012 at 2:52 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote: Hi all, I installed Hadoop HDFS on 3 nodes, a namenode and 2 datanodes, when i want to start dfs processes, Only secondaryNameNode is launched but

Re: GroupingComparator

2012-10-16 Thread Alberto Cordioli
Yes, I know that keeping an in-memory collection ins't a good idea. The problem is that I need to perform a join, so there is no other possibilities! :( Cheers, Alberto On 16 October 2012 11:08, Dave Beech dbe...@apache.org wrote: Great! Glad the problem is solved. You're right - the object

RE: datanodes doesn't work in HDFS

2012-10-16 Thread Khaled Ben Bahri
There is also in attachement the core-site and hdfs-sites files thanks in advance Regards Khaled From: khaled-...@hotmail.com To: user@hadoop.apache.org Subject: RE: datanodes doesn't work in HDFS Date: Tue, 16 Oct 2012 11:41:24 +0200 there is the prin screen is forgot it :) and also

Re: datanodes doesn't work in HDFS

2012-10-16 Thread Mohammad Tariq
Change the permissions of the directories where you are planning to put your dfs.data.dir and dfs.naem.dir to 755 and start again. Regards, Mohammad Tariq On Tue, Oct 16, 2012 at 3:11 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote: there is the prin screen is forgot it :) and also

Re: datanodes doesn't work in HDFS

2012-10-16 Thread Mohammad Tariq
You are welcome. Also add hadoop.tmp.dir property in your core-site.xml file. Regards, Mohammad Tariq On Tue, Oct 16, 2012 at 3:31 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote: Thanks a lot it work perfectly Regards, Khaled -- From:

Re: problem using s3 instead of hdfs

2012-10-16 Thread Rahul Patodi
I think these blog posts will answer your question: http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-with-hadoop_05.html http://www.technology-mania.com/2011/05/s3-as-input-or-output-for-hadoop-mr.html On Tue, Oct 16, 2012 at 1:30 PM, sudha sadhasivam sudhasadhasi...@yahoo.com wrote:

Hadoop installation on mac

2012-10-16 Thread suneel hadoop
Hi All, Had anyone tried installing Hadoop on mac pc..if yes can u please share the installation steps.. Thanks in advance.. Thanks, Suneel Sent from my iphone

Re: Hadoop installation on mac

2012-10-16 Thread Harsh J
Suneel, What version are you trying to run? Following regular tarball instructions on a Mac mostly works just fine. On Tue, Oct 16, 2012 at 4:20 PM, suneel hadoop suneel.bigd...@gmail.com wrote: Hi All, Had anyone tried installing Hadoop on mac pc..if yes can u please share the installation

Re: problem using s3 instead of hdfs

2012-10-16 Thread Yanbo Liang
Because you did not set defaultFS in conf, so you need to explicit indicate the absolute path (include schema) of the file in S3 when you run a MR job. 2012/10/16 Rahul Patodi patodirahul.had...@gmail.com I think these blog posts will answer your question:

Re: Hadoop installation on mac

2012-10-16 Thread suneel hadoop
Hi Harsh, Thanks for ur quick turn around, the mac version is 10.7.4 and hadoop version which im trying is hadoop-0.21.0, please share if u have any instructions step by step.. Thanks, Suneel On Tue, Oct 16, 2012 at 4:26 PM, Harsh J ha...@cloudera.com wrote: Suneel, What version are you

Re: Hadoop installation on mac

2012-10-16 Thread Dave Beech
+1 Installing from tarball by usual method is fine for mac os. One issue to be aware of is https://issues.apache.org/jira/browse/HADOOP-7489 (but even that doesn't stop it working) On 16 October 2012 11:56, Harsh J ha...@cloudera.com wrote: Suneel, What version are you trying to run? Following

Re: Hadoop installation on mac

2012-10-16 Thread suneel hadoop
Thanks a lot Dave..Bravooo..pat on ur back.. On Tue, Oct 16, 2012 at 4:35 PM, Dave Beech dbe...@apache.org wrote: Instructions for single node operation: http://hadoop.apache.org/docs/r0.21.0/single_node_setup.html Instructions for cluster:

Re: Hadoop installation on mac

2012-10-16 Thread suneel hadoop
can u please share the link where I can download..so that i will be in correct page..else I will be finding in wring page.. Thanks a lot dude.. On Tue, Oct 16, 2012 at 4:40 PM, Harsh J ha...@cloudera.com wrote: Suneel, Note though that the 0.21.0 version is unsupported and was abandoned.

mapred.reduce.tasks doesn't work

2012-10-16 Thread Yue Guan
Hi, there Is there any chance set mapred.reducel.tasks=20 doesn't work in hadoop 0.20.2? Thanks Yue

Re: WEKA logistic regression on hadoop

2012-10-16 Thread Bertrand Dechoux
Weka is indeed a more complete package of data mining solutions but its aim is not to support Hadoop whereas it is the aim of Mahout. The implemented methods are standard data mining methods. If you are looking for Hadoop support you should ask the Mahout mailing list but if you have question on

Re: WEKA logistic regression on hadoop

2012-10-16 Thread Abhishek Shivkumar
As far as I know weka cannot be run on hadoop directly. What can be done is if your algorithm first generats a model based on a training data initially, then you can run your training offline on your laptop and serialize, i.e. write the trained model in a file. Now, put this model file on hdfs

Re: WEKA logistic regression on hadoop

2012-10-16 Thread Rajesh Nikam
Hi Abhishek, I have also tried using WEKA SMO, however it take too long (I waited for more than 6 days ) for training for set of more than million instances. However logistic regression could come out with model in 20 mins. This is pretty fast! My problem is I can use model as is in

Re: problem using s3 instead of hdfs

2012-10-16 Thread Parth Savani
Hello Hemanth, I set the hadoop staging directory to s3 location. However, it complains. Below is the error 12/10/16 10:22:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread main java.lang.IllegalArgumentException: Wrong FS:

Re: problem using s3 instead of hdfs

2012-10-16 Thread Parth Savani
One question, Can I use both file systems at the same time (hdfs and s3)? According to this linkhttp://www.mail-archive.com/core-user@hadoop.apache.org/msg03481.html, I cannot. On Tue, Oct 16, 2012 at 10:32 AM, Parth Savani pa...@sensenetworks.comwrote: Hello Hemanth, I set

Re: mapred.reduce.tasks doesn't work

2012-10-16 Thread Harsh J
Please elaborate a bit more. Are you asking about Hive? It should work but there are certain queries you can't change the Num(1) reducer counts for - which is by design in a few cases. P.s. There's a typo there. Its mapred.reduce.tasks, not mapred.reducel.tasks (no l). On Tue, Oct 16, 2012 at

Re: mapred.reduce.tasks doesn't work

2012-10-16 Thread Yue Guan
my Fault. I should ask on Hive mail list On Tue, Oct 16, 2012 at 10:49 AM, Harsh J ha...@cloudera.com wrote: Please elaborate a bit more. Are you asking about Hive? It should work but there are certain queries you can't change the Num(1) reducer counts for - which is by design in a few

Re: problem using s3 instead of hdfs

2012-10-16 Thread Hemanth Yamijala
Parth, I notice in the below stack trace that the LocalJobRunner, instead of the JobTracker is being used. Are you sure this is a distributed cluster ? Could you please check the value of mapred.job.tracker ? Thanks Hemanth On Tue, Oct 16, 2012 at 8:02 PM, Parth Savani

wait at the end of job

2012-10-16 Thread Harun Raşit Er
Hi Everyone; I have a windows hadoop cluster consists of 8 slaves 1 master node. My hadoop program is a collection of recursive jobs. I create 14 map, 14 reduce tasks in each job. My files are up to 10mb. My problem is that all jobs are waiting at the end of job. Map %100 Reduce %100 seen on

Re: Hadoop installation on mac

2012-10-16 Thread Deniz Demir
Take a look at this for mac installation: http://denizdemir.com/2012/01/18/setting-up-hadoop-on-macosx-lion-single-node/ On Oct 16, 2012, at 4:13 AM, suneel hadoop suneel.bigd...@gmail.com wrote: can u please share the link where I can download..so that i will be in correct page..else I

Re: GroupingComparator

2012-10-16 Thread Vinod Kumar Vavilapalli
On Oct 15, 2012, at 12:27 PM, Dave Beech wrote: This only happens in the new mapreduce API - in the older mapred API you get the first key, and it appears to stay the same during the loop. It's sometimes useful behaviour, but it's confusing how the two APIs don't act the same. Yes, it is

HDFS using SAN

2012-10-16 Thread Pamecha, Abhishek
Hi I have read scattered documentation across the net which mostly say HDFS doesn't go well with SAN being used to store data. While some say, it is an emerging trend. I would love to know if there have been any tests performed which hint on what aspects does a direct storage excels/falls

Strata pass for Wednesday and Thursday for sale

2012-10-16 Thread Mark Kerzner
Hi, I need to cancel my trip to the conference, and I have a pass for the two days, Wednesday and Thursday, Oct. 24-25. My conference reservation was $876. If anybody is interested, please contact me directly. Thank you. Sincerely, Mark

HDFS federation

2012-10-16 Thread Visioner Sadak
I have a single linux node in which i installed 0.23.3 in pseudo-mode can i test federation configuration and functionality using this or will i have to install hadoop in a cluster with atleast 3 linux nodes ..incase of my single linux box i have 3 ips with me.

package org.kosmix.kosmosfs.access does not exist

2012-10-16 Thread Nan Zhu
Hi, all When I tried to compile Hadoop 1.0.3, it tells me that src/core/org/apache/hadoop/fs/kfs/KFSImpl.java:30: package org.kosmix.kosmosfs.access does not exist Can anyone tell me why this issue happen? Best, -- Nan Zhu School of Computer Science, McGill University

RE: HDFS using SAN

2012-10-16 Thread Jeffrey Buell
It will be difficult to make a SAN work well for Hadoop, but not impossible. I have done direct comparisons (but not published them yet). Direct local storage is likely to have much more capacity and more total bandwidth. But you can do pretty well with a SAN if you stuff it with the

Re: package org.kosmix.kosmosfs.access does not exist

2012-10-16 Thread Charles Woerner
I've seen this happen when the native kfs libs aren't in your java library path. Add them to both LD_LIBRARY_PATH and -Djava.library.path Sent from my iPhone On Oct 16, 2012, at 1:55 PM, Nan Zhu zhunans...@gmail.com wrote: Hi, all When I tried to compile Hadoop 1.0.3, it tells me that

Re: package org.kosmix.kosmosfs.access does not exist

2012-10-16 Thread Nan Zhu
Yes, I fix this issue manually by downloading kfs library and copy it to lib/, but immediately fall into another package not found issue why maven didn't download those depended jars to lib/ directory for me? several days ago, maven did this very well...but from yesterday, it rejects to work,

HDFS on SAN

2012-10-16 Thread Pamecha, Abhishek
Hi not sure if my previous message made it as I just subscribed I have read scattered documentation across the net which mostly say HDFS doesn't go well with SAN being used to store data. While some say, it is an emerging trend. I would love to know if there have been any tests performed which

Re: HDFS using SAN

2012-10-16 Thread lohit
Adding to this. Locality is very important for MapReduce applications. One might not see much of a difference for small MapReduce jobs running on direct attached storage vs SAN, but when you cluster grows or you find jobs which are heavy on IO, you would see quite a bit of difference. One thing

Re: Fair scheduler.

2012-10-16 Thread Goldstone, Robin J.
This is similar to issues I ran into with permissions/ownership of mapred.system.dir when using the fair scheduler. We are instructed to set the ownership of mapred.system.dir to mapred:hadoop and then when the job tracker starts up (running as user mapred) it explicitly sets the permissions on

RE: HDFS using SAN

2012-10-16 Thread Pamecha, Abhishek
Yes, for MR, my impression is typically the n/w utilization is next to none during map and reduce tasks but jumps during shuffle. With a SAN, I would assume there is no such separation. There will be network activity all over the job’s time window with shuffle probably doing more than what it

Re: one or more file system

2012-10-16 Thread Andy Isaacson
RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among other problems). Read this paper for details: Disks are like Snowflakes: No Two Are Alike www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf For best performance configure your storage as JBOD instead of RAID, format

Re: HDFS federation

2012-10-16 Thread lohit
You can try out federation by create 3 different conf directories and starting 3 different NameNodes out of those configurations. These configurations should make sure they have different directories and port numbers. If you want to just give it a try, it is easier to spawn 3 VMs and use them as

Re: Fair scheduler.

2012-10-16 Thread Patai Sangbutsarakum
Thanks everyone, Seem like i hit the dead end. It's kind of funny when i read that jira; run it 4 time and everything will work.. where that magic number from..lol respects On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta ar...@hortonworks.com wrote:

map-red with many input paths

2012-10-16 Thread Koert Kuipers
currently i run a map-reduce job that reads from a single path with a glob: /data/* i am considering replacing this one glob path with an explicit list of all the paths (so that i can check for _SUCCESS files in the subdirs and exclude the subdirs that don't have this file, to avoid reading from

Re: map-red with many input paths

2012-10-16 Thread Lohit
There is no limit in the number of input path you can have for your job. The more input paths you have the more time is spent in calculating job split and hence startup cost of the job. You could write your own InputFormat which can do the filtering base on your use case. Take a look at

Re: Reg LZO compression

2012-10-16 Thread Robert Dyer
Hi Manoj, If the data is the same for both tests and the number of mappers is fewer, then each mapper has more (uncompressed) data to process. Thus each mapper should take longer and overall execution time should increase. As a simple example: if your data is 128MB uncompressed it may use 2