Hi Doug,

thanks a lot for your reply
the point is clear hoe to create a job instance and to configure it using the 
TableMapReduceUtil.initTableMapperJob
actually our job is working just perfectly, even the third party libs are 
simple to import using TableMapReduceUtil.addDependencyJars

the problem is about the starting the MR job...

at the moment we do it this way:
 - set HADOOP_CLASSPATH with hbase, zookeeper, and all third party jars
 - execute "./bin/hadoop jar /tmp/map_reduce_v1.jar package1.MRDriver1"

that works like a charm, the question is now, how to start the job from our web 
application running on tomcat ???

one option is may be to fork a new process, like this:
ProcessBuilder pb = new ProcessBuilder("/opt/hadoop/bin/hadoop", "jar", 
"/tmp/map_reduce_v1.jar", "package1.MRDriver1");
...
// configure ProcessBuilder
Process p = processBuilder.start();

but this does not seem to be very elegant to us... does it?

so how to start a job from a running app, in the same process without forking

andre



Doug Meil wrote:

Hi there-

Take a look at this for starters...

http://hbase.apache.org/book.html#mapreduce


if you do job.waitForCompletion(true); it will execute synchronously.  If you 
do job.waitForCompletion(false) it will fire and forget.  A simple pattern is 
to spin off a thread where it executes job.waitFor..(true) and then you can 
pick up the results.


-----Original Message-----
From: Andre Reiter [mailto:[email protected]]
Sent: Friday, June 24, 2011 12:41 AM
To: [email protected]
Subject: Re: Running MapReduce from a web application

Hi everybody,

no suggestiona about that questions?
how to submit a MR out of my application, and not manually from a shell useing 
./bin/hadoop jar ... ?

best regards
andre



Andre Reiter wrote:
now i would like to start MR jobs from my web application running on a tomcat, 
is there an elegant way to do it?

the second question: at the moment i use the TextOutputFormatis the
output format, which creates a file in the specified dfs directory:
part-r-00000 so i can read id using ./bin/hadoop fs -cat
/tmp/requests/part-r-00000 on the shell

how can i get the path to this output file after my job is finished, to process 
it however... is there another way to collect results of a MR job, a text file 
is good for humans, but IMHO parsing a text file for results is not the 
preferable way...

thanks in advance
andre

PS:
versions:
- Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
- hadoop-0.20.2-CDH3B4
- hbase-0.90.1-CDH3B4
- zookeeper-3.3.2-CDH3B4




Reply via email to