Re: hadoop-0.20 in Eclipse

2009-12-08 Thread Hamza Kaya
Hi, Following screencast may help you: http://www.cloudera.com/blog/2009/04/20/configuring-eclipse-for-hadoop-development-a-screencast/

RE: some current features in hadoop

2009-12-08 Thread Krishna Kumar
Hi Todd, Thanks for the reply. Can you please tell me about some more aspects which are currently going on hadoop development from where I can contribute something, some to do type of things. Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die,

Re: multiple file input

2009-12-08 Thread laser08150815
pmg wrote: > > I am evaluating hadoop for a problem that do a Cartesian product of input > from one file of 600K (File A) with another set of file set (FileB1, > FileB2, FileB3) with 2 millions line in total. > > Each line from FileA gets compared with every line from FileB1, FileB2 > etc. etc.

Re: multiple file input

2009-12-08 Thread Ed Kohlwey
One important thing to note is that, with cross products, you'll almost always get better performance if you can fit both files on a single node's disk rather than distributing the files. On Tue, Dec 8, 2009 at 9:18 AM, laser08150815 wrote: > > > pmg wrote: > > > > I am evaluating hadoop for a p

Re: multiple file input

2009-12-08 Thread Gang Luo
To do the cartesian product, any node has to see at least one table completely. So what I think is to name input2 as the input to mapper, and in each map task, you read the whole fileA at input1 manually using HDFS api, for it is smaller, and build hash table on fileA. For each line from input2,

Re: LeaseExpiredException Exception

2009-12-08 Thread Ken Krugler
Hi all, In searching the mail/web archives, I see occasionally questions from people (like me) who run into the LeaseExpiredException (in my case, on 0.18.3 while running a 50 server cluster in EMR). Unfortunately I don't see any responses, other than Dennis Kubes saying that he thought s

Re: some current features in hadoop

2009-12-08 Thread Todd Lipcon
On Tue, Dec 8, 2009 at 1:54 AM, Krishna Kumar wrote: > Hi Todd, > > Thanks for the reply. Can you please tell me about some more aspects > which are currently going on hadoop development from where I can > contribute something, some to do type of things. > > Hi Krishna, Simply check the JIRA: htt

Re: LeaseExpiredException Exception

2009-12-08 Thread Jason Venner
Is it possible that this is occurring in a task that is being killed by the framework. Sometimes there is a little lag, between the time the tracker 'kills a task' and the task fully dies, you could be getting into a situation like that where the task is in the process of dying but the last write i

Re: some current features in hadoop

2009-12-08 Thread Owen O'Malley
On Dec 8, 2009, at 1:54 AM, Krishna Kumar wrote: Thanks for the reply. Can you please tell me about some more aspects which are currently going on hadoop development from where I can contribute something, some to do type of things. I'd suggest trying out the framework and writing some example

Re: LeaseExpiredException Exception

2009-12-08 Thread Ken Krugler
Hi Jason, Hi Jason, Thanks for the info - it's good to hear from somebody else who's run into this :) I tried again with a bigger box for the master, and wound up with the same results. I guess the framework could be killing it - but no idea why. This is during a very simple "write out

Re: hadoop idle time on terasort

2009-12-08 Thread Vasilis Liaskovitis
Hi Scott, thanks for the extra tips, these are very helpful. On Mon, Dec 7, 2009 at 3:57 PM, Scott Carey wrote: > >> >> I am using hadoop-0.20.1 to run terasort and randsort benchmarking >> tests on a small 8-node linux cluster. Most runs consist of usually >> low (<50%) core utilizations in the

Hadoop Pipes with distributed cache

2009-12-08 Thread Upendra Dadi
Hi, I am facing some problems with using distributed cache archive with Pipes job. In my configuration file I have the following two properties: mapred.create.symlink yes mapred.cache.archives hdfs://localhost:9000/user/upendra/archive/pipeArchive.zip#pipeSym The zip archive contains

Re: writing files to HDFS (from c++/pipes)

2009-12-08 Thread Prakhar Sharma
Hi Owen, "It also provides the entire job configuration as a string->string map." Can you provide some example as to how to do this?. I am trying to write a DNA sequence assembler using Hadoop MapReduce to improve the throughput of the assembler. I have to call runTask() repeatedly with different s

error=12, Cannot allocate memory (-;

2009-12-08 Thread pavel kolodin
I have a situation: --- 09/12/09 01:53:37 INFO mapred.FileInputFormat: Total input paths to process : 8 09/12/09 01:53:37 INFO mapred.JobClient: Running job: job_200912090152_0001 09/12/09 01:53:38 INFO mapred.JobClient: map 0% reduce 0% 09/12/09 01:53:54 INFO mapred.JobC

Namenode / Data node disk requirements

2009-12-08 Thread John Martyniak
Does the namenode and data node require the same disk requirements, for example On my slave machines they have a 1TB partitiion called /hdfs and a 200 GB partition called /mapreduce for the obvious tasks But on the namenode/jobtracker machine I don't have that, I just have a RAIDED pair o

Error while building project(common) with ant

2009-12-08 Thread Eason.Lee
The error shows that something wrong with package-info.java compile-core-classes: [javac] Compiling 346 source files to E:\projects\HadoopCommon\build\classes [javac] E:\projects\HadoopCommon\build\src\org\apache\hadoop\package-info.java:5: 未结束的字符串字面值 [javac] us