Re: How can I compile and use my own hadoop?

2010-11-08 Thread Lance Norskog
I write simple driver classes and run them from Eclipse. It takes some fiddling, so if you're not comfortable with that, running it from ant is your best bet. On Mon, Nov 8, 2010 at 9:48 PM, Harsh J wrote: > Hi, > > On Tue, Nov 9, 2010 at 5:12 AM, Shen LI wrote: >> Hi, >> How can I compile and u

Re: Predicting how many values will I see in a call to reduce?

2010-11-08 Thread Lance Norskog
It is key to the scheduling paradigm of Hadoop that it doesn't have to tell you how many or when. It would have to store up all of the data for your key before activating your reducer. This is exactly what it cannot do and scale. (right?) On Mon, Nov 8, 2010 at 3:32 AM, Niels Basjes wrote: > Hi,

Re: How can I compile and use my own hadoop?

2010-11-08 Thread Harsh J
Hi, On Tue, Nov 9, 2010 at 5:12 AM, Shen LI wrote: > Hi, > How can I compile and use my own hadoop? I modified some source code of > hadoop-0.20.2. Then, I tried to build it with eclipse according to this > tutorial "http://wiki.apache.org/hadoop/EclipseEnvironment";. It was build > successfully.

Re: How can I compile and use my own hadoop?

2010-11-08 Thread Wei Xue
Hi, Shen Li, I'm modifying hadoop core code too. I just use the build.xml distributed along with the source code. It just works. 2010/11/9 Shen LI > Hi, > > How can I compile and use my own hadoop? I modified some source code of > hadoop-0.20.2. Then, I tried to build it with eclipse according

Re: Duplicated entries with map job reading from HBase

2010-11-08 Thread Adam Phelps
Ok, poked around at this a little more with a few experiments. The most interesting one is that I ran a a couple of the jobs that generate this data in HBase, one for the existing table I had seen the problem on and one for a new table with the same configuration as the old one. When the ana

How can I compile and use my own hadoop?

2010-11-08 Thread Shen LI
Hi, How can I compile and use my own hadoop? I modified some source code of hadoop-0.20.2. Then, I tried to build it with eclipse according to this tutorial "http://wiki.apache.org/hadoop/EclipseEnvironment";. It was build successfully. But when I checked the output in ${Hadoop_HOME}/build/eclipse

Re: Job without Output files

2010-11-08 Thread Rajappa Iyer
Inserting multiple times is indeed OK. Each new version of the cell will get a new timestamp. On Mon, Nov 8, 2010 at 4:47 AM, Jeff Zhang wrote: > My guess is that HBase has version on cells, so inserting > multiple-times is OK, not sure my guessing is correct > > > On Mon, Nov 8, 2010 at 8:32 P

Re: Job without Output files

2010-11-08 Thread Shuja Rehman
hi all what does speculative execution of tasks (if it is turned on)? means??? and how to turn off it and what is the advantage/disadvantage of it? I am not using Tableoutput format because i need to use put statement millions of times in single job and if i use tableoutput format then the same j

Configure Ganglia with Hadoop

2010-11-08 Thread Shuja Rehman
Hi I have cluster of 4 machines and want to configure ganglia for monitoring purpose. I have read the wiki and add the following lines to hadoop-metrics.properties on each machine. dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext dfs.period=10 dfs.servers=10.10.10.2:8649 mapred.class=or

Re: Job without Output files

2010-11-08 Thread Jeff Zhang
My guess is that HBase has version on cells, so inserting multiple-times is OK, not sure my guessing is correct On Mon, Nov 8, 2010 at 8:32 PM, Harsh J wrote: > Hi Jeff, > > On Mon, Nov 8, 2010 at 3:17 PM, Jeff Zhang wrote: >> Hi Harsh, >> >> you point is interesting, then how hbase (TableOutpu

Re: Job without Output files

2010-11-08 Thread Harsh J
Hi Jeff, On Mon, Nov 8, 2010 at 3:17 PM, Jeff Zhang wrote: > Hi Harsh, > > you point is interesting, then how hbase (TableOutputFormat) handle > speculative execution ? which part of code doing this ? > I was under the impression that they do something to avoid speculative execution induced issu

Re: Predicting how many values will I see in a call to reduce?

2010-11-08 Thread Niels Basjes
Hi, 2010/11/7 Anthony Urso > Is there any way to know how many values I will see in a call to > reduce without first counting through them all with the iterator? > > Under 0.21? 0.20? 0.19? > I've looked for an answer to the same question a while ago and came to the conclusion that you can't. T

Re: Job without Output files

2010-11-08 Thread Jeff Zhang
Hi Harsh, you point is interesting, then how hbase (TableOutputFormat) handle speculative execution ? which part of code doing this ? On Mon, Nov 8, 2010 at 5:41 PM, Harsh J wrote: > Hi again, > > On Mon, Nov 8, 2010 at 2:19 PM, Shuja Rehman wrote: >> Jeff, >> >> I am using java api to dump t

Re: Job without Output files

2010-11-08 Thread Harsh J
Hi again, On Mon, Nov 8, 2010 at 2:19 PM, Shuja Rehman wrote: > Jeff, > > I am using java api to dump the data into hbase and thats why i did not > require any output. I might be out-dated in this regard, but doesn't HBase provide proper Input/OutputFormat classes for using table data via Hadoop

Re: Job without Output files

2010-11-08 Thread Shuja Rehman
Thanks Nullouput format works. On Mon, Nov 8, 2010 at 1:46 PM, Harsh J wrote: > Hi, > > On Mon, Nov 8, 2010 at 1:19 AM, Shuja Rehman > wrote: > > Hi > > > > I have a job where i did not need any reducers. I am using only mappers. > At > > the moment, the output of job is generated in files. Bu

Re: Job without Output files

2010-11-08 Thread Shuja Rehman
Jeff, I am using java api to dump the data into hbase and thats why i did not require any output. Thanks On Mon, Nov 8, 2010 at 6:21 AM, Jeff Zhang wrote: > You can have no output files by creating a customized OutputFormat, > but without output files, how do you get the output of result and >

Re: Job without Output files

2010-11-08 Thread Harsh J
Hi, On Mon, Nov 8, 2010 at 1:19 AM, Shuja Rehman wrote: > Hi > > I have a job where i did not need any reducers. I am using only mappers. At > the moment, the output of job is generated in files. But i want to use only > java api to do some calculation and i want that there should be no output >