Seconded for PigUnit.

As for a faster debugging procedure, I've gone modular. First I JUnit test
individual UDFs against their functional requirements and use cases a
priori.  Then I mockup my whiteboard workflow as multiple pig script
logical blocks (multiple pig files to test), start a pig -x local, and try
each aliased line one-by-one per each logical block, with a DESCRIBE after
each.  This ensures that I have correct syntactical formulation in the
scripting, schemas, desired re-aliasing, etc., and you can merge logical
blocks back together for optimizations when blocks are completed.

Once a block is completed, you can do an ILLUSTRATE on each block to
spot-check results as well, but be forewarned, I've had issues with larger
scripts failing prematurely in this regard due to complexity.

Hope this helps,

-Dan


On Tue, May 20, 2014 at 3:26 PM, Suraj Nayak <snay...@gmail.com> wrote:

> Also,  Pig is data flow language where the statements gets converted to
> java and then run. In case of python, its native. Thus runs faster.
> On 21-May-2014 12:52 AM, "Suraj Nayak" <snay...@gmail.com> wrote:
>
> > Why not consider PigUnit? PigUnit gives flexibility to test locally. Also
> > debugging is pretty simple, almost similar to JUnit.
> >
> > --
> > Suraj
> > On 21-May-2014 12:47 AM, "Paul Houle" <ontolo...@gmail.com> wrote:
> >
> >> Slow iteration is a problem with Pig.
> >>
> >> I still write MR jobs mainly in Java because (1) I control the
> >> execution plan,  (2) can do things nearly zero-copy,  and (3) I can
> >> get a quick iteration cycle by using JUnit to test mappers,  reducers,
> >>  and other components.
> >>
> >> On Tue, May 20, 2014 at 3:02 PM, Kevin Burton <bur...@spinn3r.com>
> wrote:
> >> > I've noticed that while working with pig my stress level and
> frustration
> >> > with the system is higher than other systems I've worked with.
> >> >
> >> > I think it's because the iteration cycle is longer.
> >> >
> >> > Even pig -x local takes a while to execute.
> >> >
> >> > Is this just me?
> >> >
> >> > If you're trying to learn and debug python lists, dictionaries, etc.
> >>  It's
> >> > almost instant response time.
> >> >
> >> > But with pig literally everything takes 30-60 seconds to play with.
> >> >
> >> > --
> >> >
> >> > Founder/CEO Spinn3r.com
> >> > Location: *San Francisco, CA*
> >> > Skype: *burtonator*
> >> > blog: http://burtonator.wordpress.com
> >> > … or check out my Google+
> >> > profile<https://plus.google.com/102718274791889610666/posts>
> >> > <http://spinn3r.com>
> >> > War is peace. Freedom is slavery. Ignorance is strength. Corporations
> >> are
> >> > people.
> >>
> >>
> >>
> >> --
> >> Paul Houle
> >> Expert on Freebase, DBpedia, Hadoop and RDF
> >> (607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
> >>
> >
>

Reply via email to