Re: Hadoop-on-demand and torque

2012-05-20 Thread Stijn De Weirdt
hi all, i'm part of an HPC group of a university, and we have some users that are interested in Hadoop to see if it can be useful in their research and we also have researchers that are using hadoop already on their own infrastructure, but that is is not enough reason for us to start with

Set number of mappers by the number of input lines for a single file?

2012-05-20 Thread biro lehel
Dear all, I have one single input file, which contains, on every line, some hydrological calibration models (data). Each line of the file should be processed and then the output from every line written to another single output file. I understood that hadoop spawns mapper tasks with the same

Re: Set number of mappers by the number of input lines for a single file?

2012-05-20 Thread Harsh J
Lehel, You may use the NLineInputFormat with N=1: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html On Sun, May 20, 2012 at 2:48 PM, biro lehel lehel.b...@yahoo.com wrote: Dear all, I have one single input file, which contains, on every line,

Re: Set number of mappers by the number of input lines for a single file?

2012-05-20 Thread biro lehel
Hello Harsh, Thanks for your answer. The problem is, that I'm using version 0.20.2, and, as I checked, NLineInputFormat is not implemented here (at least I couldn't find it). Switching to an other version would be kind of a big deal in my infrastructure, since I'm using VM's deployed form

Re: Set number of mappers by the number of input lines for a single file?

2012-05-20 Thread Harsh J
Biro, 0.20.2 did carry NLineInputFormat but in the older/stable (marked deprecated, but was undeprecated subsequently) API package. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html which does confirm that 0.20.2 carried it. For 0.20.2, I

Re: Set number of mappers by the number of input lines for a single file?

2012-05-20 Thread biro lehel
Hello Harsh, Meantime I figured out what was the problem (it was my bad, intermixing of the API's), however I read somewhere that using it (from the old API) in 0.20.2 can cause problems. So I took NLineInputFormat.java from the 2.0 branch and simply inserted it in my project, it all went

Re: Hadoop-on-demand and torque

2012-05-20 Thread Pierre Antoine DuBoDeNa
We run similar infrastructure in a university project.. we plan to install hadoop.. and looking for alternatives based on hadoop in case the pure hadoop is not working as expected. Keep us updated on the code release. Best, PA 2012/5/20 Stijn De Weirdt stijn.dewei...@ugent.be hi all, i'm

Re: Hadoop-on-demand and torque

2012-05-20 Thread Ralph Castain
FWIW: Open MPI now has an initial cut at MR+ that runs map-reduce under any HPC environment. We don't have the Java integration yet to support the Hadoop MR class, but you can write a mapper/reducer and execute that programming paradigm. We plan to integrate the Hadoop MR class soon. If you

Re: RemoteException writing files

2012-05-20 Thread Todd McFarland
Thanks for the links. The behavior is as the links describe but bottom line it works fine if I'm copying these files on the Linux VMWare instance via the command line. Using my java program remotely, it simply doesn't work. All I can think of is that there is some property on the Java side (in

Re: RemoteException writing files

2012-05-20 Thread Ravi Prakash
Hi Todd, It might be useful to try the CDH user mailing list too. I'm afraid I haven't used CDH, so I'm not entirely certain. The fact that after you run your JAVA program, the NN has created a directory and a 0-byte file means you were able to contact and interact with the NN just fine. I'm

Re: Hadoop-on-demand and torque

2012-05-20 Thread Brian Bockelman
Hi Ralph, I admit - I've only been half-following the OpenMPI progress. Do you have a technical write-up of what has been done? Thanks, Brian On May 20, 2012, at 9:31 AM, Ralph Castain wrote: FWIW: Open MPI now has an initial cut at MR+ that runs map-reduce under any HPC environment. We