Depending on the nature of your jobs, Cascading has built in a
topological scheduler. It will schedule all your work as their
dependencies are satisfied. Dependencies being source data and inter-
job intermediate data.
http://www.cascading.org
The first catch is that you will still need
Just a quick plug for Cascading. Our team uses cascading quite a bit and
found it to be a simpler way to write map reduce jobs. The guys using it
find it very helpful.
On Wed, Jun 11, 2008 at 1:31 PM, Chris K Wensel [EMAIL PROTECTED] wrote:
Depending on the nature of your jobs, Cascading has
Ted,
I find cascading very similar to pig, do you care to provide your comment here?
If map reduce programmers are to go to the next level (scripting/query
language), which way to go?
Thanks
Haijun
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Wednesday,
On Jun 10, 2008, at 2:48 PM, Meng Mao wrote:
I'm interested in the same thing -- is there a recommended way to
batch
Hadoop jobs together?
Hadoop Map-Reduce JobControl:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job
+Control
and
Pig is much more ambitious than cascading. Because of the ambitions, simple
things got overlooked. For instance, something as simple as computing a
file name to load is not possible in pig, nor is it possible to write
functions in pig. You can hook to Java functions (for some things), but you
Thanks for sharing. We have need to expose hadoop cluster to 'casual' users for
ad-hoc query, I find it difficult to ask them to write map reduce program, pig
latin comes in very handy in this case. However, for continuous production data
processing, hadoop+cascading sounds like a good option.
Thanks Ted..
Couple quick comments.
At one level Cascading is a MapReduce query planner, just like PIG.
Except the API is for public consumption and fully extensible, in PIG
you typically interact with the PigLatin syntax. Subsequently, with
Cascading, you can layer your own syntax on top
However, for continuous production data processing, hadoop+cascading
sounds like a good option.
This will be especially true with stream assertions and traps (as
mentioned previously, and available in trunk). grin
I've written workloads for clients that render down to ~60 unique
Hadoop
Hello folks:
I am running several hadoop applications on hdfs. To save the efforts in
issuing the set of commands every time, I am trying to use bash script to
run the several applications sequentially. To let the job finishes before it
is proceeding to the next job, I am using wait in the script
I'm interested in the same thing -- is there a recommended way to batch
Hadoop jobs together?
On Tue, Jun 10, 2008 at 5:45 PM, Richard Zhang [EMAIL PROTECTED]
wrote:
Hello folks:
I am running several hadoop applications on hdfs. To save the efforts in
issuing the set of commands every time, I
wait and sleep are not what you are looking for. you can use 'nohup'
to run a job in the background and have its output piped to a file.
On Tue, Jun 10, 2008 at 5:48 PM, Meng Mao [EMAIL PROTECTED] wrote:
I'm interested in the same thing -- is there a recommended way to batch
Hadoop jobs
You have another problem in that Hadoop is still initialising --this will
cause subsequent jobs to fail.
I've not yet migrated to 17.0 (I still use 16.3), but all my jobs are done
from nohuped scripts. If you really want to check on the running status and
busy wait, you can look at the
12 matches
Mail list logo