Re: Scaling Pig Projects - The Hairy Pig

2010-06-24 Thread hc busy
More great ideas, Scott! The one thing about idempotency of IMPORT is that you may not necessarily want it. The scripts that I wrote will indeed take alias from a previously imported pig script and overwrite it with an improved version with additional columns. This satisfies the need to be able

Re: Scaling Pig Projects - The Hairy Pig

2010-06-23 Thread Scott Carey
There is one other thing that would be immensely useful, and does not require that much from pig other than the parser: Script inclusion and alias export. Think bash or other shell languages. You want to define a set of aliases for export for other users. This can be stored in a file

RE: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread Katukuri, Jay
: Russell Jurney [mailto:russell.jur...@gmail.com] Sent: Tuesday, June 22, 2010 10:40 AM To: pig-user@hadoop.apache.org Subject: Scaling Pig Projects - The Hairy Pig I'm curious to hear how other people are scaling the code on big Pig projects. Thousands of lines of dataflow code can get pretty

Re: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread Alan Gates
Here at Yahoo we use Oozie for managing large workflows (latest open source edition at http://github.com/tucu00/oozie1 though they expect to make another drop before the Hadoop summit). There are plans to make Oozie a full open source project (instead of just making drops to github).

Re: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread Scott Carey
Even without loops and functions, templating would be very useful. Often, the exact same sort of join happens repeated with slightly different aliases or columns --- which is basically copy-paste with substitution. I have seen several subtle bugs in Pig scripts because the find/replace was

Re: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread Alan Gates
On Jun 22, 2010, at 1:06 PM, Dmitriy Ryaboy wrote: I think everyone has some sort of an ad-hoc system for building and managing these types of things. Seems like a prime candidate for some community development -- we would all benefit from sharing a framework like that, and it should be

Re: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread Alan Gates
On Jun 22, 2010, at 1:06 PM, Dmitriy Ryaboy wrote: I think everyone has some sort of an ad-hoc system for building and managing these types of things. Seems like a prime candidate for some community development -- we would all benefit from sharing a framework like that, and it should be

Re: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread hc busy
Russ, That is a great wiki page with a lot of insightful discussions!! As a non-Ph.D. I'd like to say that I feel that the theoretic adherence to turing machines is rather artificial(I mean who the heck uses turing machine (directly) anyways?? What's the point of simulating it? And at what level?

Re: Scaling Pig Projects - The Hairy Pig

2010-06-22 Thread hc busy
Hey, Scott, yeah, that's brilliant! Macro expansion means the script that PIG receives is a expanded script with all aliases defined, so that PIG can perform it's optimization. And the technology is old wheel, I'll bet you can take cpp and get it to work on PigLatin. ;-) On Tue, Jun 22, 2010