help: InputFormat problem ?

2008-10-26 Thread ZhiHong Fu
Hello : In hadoop InputFormat are always based on the InputFileFormat , But Now I will get data from a web service application. The data will be wrapped as ResultSet type. Now I am wandering " should I write the ResultSet to a file And then read out to do mapreduce job. Or How can I pro

[ANNOUNCE] Apache ZooKeeper 3.0.0

2008-10-26 Thread Patrick Hunt
The Apache ZooKeeper team is proud to announce our first official Apache release, version 3.0.0 of ZooKeeper. ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group ser

"local bytes written/read" much higher than the "hdfs bytes"

2008-10-26 Thread 肖欣延
I run map/reduce job through streaming, and notice that "local bytes written/read" in my job is always many times higher than the "hdfs bytes"? but if i run the job straight through java, this problem goes away. Why does this happen? Is it because jvm memory is not enough and use the disk for cach

Re: writable class to be used to read floating point values from input?

2008-10-26 Thread pols cut
Thanks .. I converted the text-->string --> Float. I am trying to calculate the average of a very large set of numbers. You are right...I plan to use a dummy key (its not null as i said before) as input to reduce. Then in reduce when sorted, i will have a single record as > which i will use to

Re: Is there a way to know the input filename at Hadoop Streaming?

2008-10-26 Thread Runping Qi
Each mapper works on only one file split, which is either from file1 or file2 in your case. So the value for map.input.file gives you the exact information you need. Runping On 10/23/08 11:09 AM, "Steve Gao" <[EMAIL PROTECTED]> wrote: > Thanks, Amogh. But my case is slightly different. The

lot's of small jobs

2008-10-26 Thread Shirley Cohen
Hi, I have lot's of small jobs and would like to compute the aggregate running time of all the mappers and reducers in my job history rather than tally the numbers by hand through the web interface. I know that the Reporter object can be used to output performance numbers for a single job

Re: Help: How to change number of mappers in Hadoop streaming?

2008-10-26 Thread Owen O'Malley
On Oct 26, 2008, at 8:38 AM, chaitanya krishna wrote: I forgot to mention that although the number of map tasks are set in the code as I mentioned before, the actual number of map tasks are not essentially the same number but is very close to this number. The number of reduces is precisely

Re: Help: How to change number of mappers in Hadoop streaming?

2008-10-26 Thread chaitanya krishna
I forgot to mention that although the number of map tasks are set in the code as I mentioned before, the actual number of map tasks are not essentially the same number but is very close to this number. V.V.Chaitanya Krishna IIIT,Hyderabad India On Sun, Oct 26, 2008 at 4:29 PM, chaitanya krishna <

Re: Help: How to change number of mappers in Hadoop streaming?

2008-10-26 Thread chaitanya krishna
Hi, In order to have different number of map tasks for each of the jobs, in the run method of the code , I had the following syntax: conf.setNumMapTasks(num); // for number of map tasks conf.setNumReduceTasks(num); // for number of reduce tasks conf is the JobConf object and num is the number