More great ideas, Scott!
The one thing about idempotency of IMPORT is that you may not necessarily
want it. The scripts that I wrote will indeed take alias from a previously
imported pig script and overwrite it with an improved version with
additional columns. This satisfies the need to be able
There is one other thing that would be immensely useful, and does not require
that much from pig other than the parser:
Script inclusion and alias export.
Think bash or other shell languages. You want to define a set of aliases for
export for other users. This can be stored in a file
: Russell Jurney [mailto:russell.jur...@gmail.com]
Sent: Tuesday, June 22, 2010 10:40 AM
To: pig-user@hadoop.apache.org
Subject: Scaling Pig Projects - The Hairy Pig
I'm curious to hear how other people are scaling the code on big Pig
projects.
Thousands of lines of dataflow code can get pretty
Here at Yahoo we use Oozie for managing large workflows (latest open
source edition at http://github.com/tucu00/oozie1 though they expect
to make another drop before the Hadoop summit). There are plans to
make Oozie a full open source project (instead of just making drops to
github).
Even without loops and functions, templating would be very useful.
Often, the exact same sort of join happens repeated with slightly different
aliases or columns --- which is basically copy-paste with substitution. I have
seen several subtle bugs in Pig scripts because the find/replace was
On Jun 22, 2010, at 1:06 PM, Dmitriy Ryaboy wrote:
I think everyone has some sort of an ad-hoc system for building and
managing
these types of things. Seems like a prime candidate for some community
development -- we would all benefit from sharing a framework like
that, and
it should be
On Jun 22, 2010, at 1:06 PM, Dmitriy Ryaboy wrote:
I think everyone has some sort of an ad-hoc system for building and
managing
these types of things. Seems like a prime candidate for some community
development -- we would all benefit from sharing a framework like
that, and
it should be
Russ, That is a great wiki page with a lot of insightful discussions!!
As a non-Ph.D. I'd like to say that I feel that the theoretic adherence to
turing machines is rather artificial(I mean who the heck uses turing machine
(directly) anyways?? What's the point of simulating it? And at what level?
Hey, Scott, yeah, that's brilliant!
Macro expansion means the script that PIG receives is a expanded script with
all aliases defined, so that PIG can perform it's optimization.
And the technology is old wheel, I'll bet you can take cpp and get it to
work on PigLatin.
;-)
On Tue, Jun 22, 2010