Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Chris K Wensel
However, for continuous production data processing, hadoop+cascading sounds like a good option. This will be especially true with stream assertions and traps (as mentioned previously, and available in trunk). I've written workloads for clients that render down to ~60 unique Hadoop map/r

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Chris K Wensel
Thanks Ted.. Couple quick comments. At one level Cascading is a MapReduce query planner, just like PIG. Except the API is for public consumption and fully extensible, in PIG you typically interact with the PigLatin syntax. Subsequently, with Cascading, you can layer your own syntax on top

RE: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Haijun Cao
ood option. Haijun -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 5:01 PM To: core-user@hadoop.apache.org Subject: Re: does anyone have idea on how to run multiple sequential jobs with bash script Pig is much more ambitious than cascading.

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Ted Dunning
Pig is much more ambitious than cascading. Because of the ambitions, simple things got overlooked. For instance, something as simple as computing a file name to load is not possible in pig, nor is it possible to write functions in pig. You can hook to Java functions (for some things), but you ca

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Arun C Murthy
On Jun 10, 2008, at 2:48 PM, Meng Mao wrote: I'm interested in the same thing -- is there a recommended way to batch Hadoop jobs together? Hadoop Map-Reduce JobControl: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job +Control and http://hadoop.apache.org/core/docs/cur

RE: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Haijun Cao
, June 11, 2008 2:16 PM To: core-user@hadoop.apache.org Subject: Re: does anyone have idea on how to run multiple sequential jobs with bash script Just a quick plug for Cascading. Our team uses cascading quite a bit and found it to be a simpler way to write map reduce jobs. The guys using it find it

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Ted Dunning
Just a quick plug for Cascading. Our team uses cascading quite a bit and found it to be a simpler way to write map reduce jobs. The guys using it find it very helpful. On Wed, Jun 11, 2008 at 1:31 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > > Depending on the nature of your jobs, Cascading

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Chris K Wensel
Depending on the nature of your jobs, Cascading has built in a topological scheduler. It will schedule all your work as their dependencies are satisfied. Dependencies being source data and inter- job intermediate data. http://www.cascading.org The first catch is that you will still need b

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-10 Thread Ashish Venugopal
I am not totally sure if I understand the problem that you face, but we do the following in version 0.16.4 (where the hod shell is deprecated). a) Use shell scripts to echo commands into a runme.hod script b) An example of a runme.hod script is: hadoop jar /grid/0/hadoop/current/hadoop-streaming.j

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-10 Thread Miles Osborne
You have another problem in that Hadoop is still initialising --this will cause subsequent jobs to fail. I've not yet migrated to 17.0 (I still use 16.3), but all my jobs are done from nohuped scripts. If you really want to check on the running status and busy wait, you can look at the jobtracker

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-10 Thread Edward Capriolo
wait and sleep are not what you are looking for. you can use 'nohup' to run a job in the background and have its output piped to a file. On Tue, Jun 10, 2008 at 5:48 PM, Meng Mao <[EMAIL PROTECTED]> wrote: > I'm interested in the same thing -- is there a recommended way to batch > Hadoop jobs toge

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-10 Thread Meng Mao
I'm interested in the same thing -- is there a recommended way to batch Hadoop jobs together? On Tue, Jun 10, 2008 at 5:45 PM, Richard Zhang <[EMAIL PROTECTED]> wrote: > Hello folks: > I am running several hadoop applications on hdfs. To save the efforts in > issuing the set of commands every tim