Re: .processed file

2010-10-28 Thread Corbin Hoenes
This is part of something custom that we have been doing with our custom ruby PigRunner we are using. It allows us to do this: It includes the code from constants.pig into the script defining it. That being said. I am not sold on this pattern is the best way to handle this stuff in pig. O

.processed file

2010-10-28 Thread Dave Wellman
I have seen an pig error reported in a .processed file. I have not been able to find the documentation about what a .processed file is. Is it akin to a .substituted file?

Re: Reporting progress in a storage UDF

2010-10-28 Thread Alan Gates
Progress is reported by Pig operators, so in general other operators in your pipeline should be reporting progress so that the store function does not need to. The store function is not passed a reference to the progress reporter, so it would not be able to report progress anyway. The pro

Re: UDF Loader - one line in input result in multiple tuples

2010-10-28 Thread John Hui
Awesome Alan, let me try that out and see if it works. John On Thu, Oct 28, 2010 at 11:49 AM, Alan Gates wrote: > > On Oct 28, 2010, at 8:36 AM, John Hui wrote: > > I look into the return data bag as an option. The problem is the Loader >> interface require me to return a Tuple object. >> >>

Re: UDF Loader - one line in input result in multiple tuples

2010-10-28 Thread Alan Gates
On Oct 28, 2010, at 8:36 AM, John Hui wrote: I look into the return data bag as an option. The problem is the Loader interface require me to return a Tuple object. public Tuple getNext() throws IOException { but the DataBag interface is not a derive class of Tuple so this means I will

Re: UDF Loader - one line in input result in multiple tuples

2010-10-28 Thread John Hui
If I return a single bag with many tuples, how can I split that into multiple tuples? Can you give me an example of how this works? Let me read up on the inputformat and see if I can work my way around it. Why can't getNext return a type T instead of coupling it with the Tuple data type. Isn't

Re: UDF Loader - one line in input result in multiple tuples

2010-10-28 Thread Dmitriy Ryaboy
Alan means return a tuple of a single bag of many tuples (don't try to make pig work with a loader that returns a bag instead of a tuple.. you'll be up to your neck in the visitor pattern in no time if you start heading that direction). Alternative is to change what constitutes a record your loade

Re: UDF Loader - one line in input result in multiple tuples

2010-10-28 Thread John Hui
I look into the return data bag as an option. The problem is the Loader interface require me to return a Tuple object. public Tuple getNext() throws IOException { but the DataBag interface is not a derive class of Tuple so this means I will need to change the internal code for pig for my load

Re: loading from HBase - Pig 0.7

2010-10-28 Thread Dmitriy Ryaboy
It works with 20.2, and the error trace you pasted appears to be completely independent of HBaseStorage.. I see that you are using the snapshot jar -- try putting your hadoop jars and various dependencies on your classpath, and only using the -nohadoop jar that pig also builds. -D On Thu, Oct 28

Re: loading from HBase - Pig 0.7

2010-10-28 Thread Anze
Does anyone know, should Pig (0.8 - svn trunk) work with Hadoop 0.20.2? I still can't start the Pig... Thanks, Anze On Wednesday 27 October 2010, Anze wrote: > Thanks, I guess I would trip over that later on - but for this immediate > problem it doesn't help (of course, because Pig fails at t