Re: PigStorageSchema and S3 bug

2012-10-12 Thread Cheolsoo Park
Hi Meghana, Are you sure that you're using Apache Pig version 0.10.0-cdh4.1.0? Because a change was made to PigStorageSchema in Pig 0.10 ( https://issues.apache.org/jira/browse/PIG-2143), it is not possible to get your call stack: at org.apache.pig.piggybank.storage.PigStorageSchema.s

Re: NEED HELP in PigStorage

2012-10-12 Thread Dmitriy Ryaboy
Sounds like however you wrote the data, it has some sort of a binary delimiter. Figure out what that delimiter is, and tell PigStorage to use it. For example: my_data = load 'path/to/data' using PigStorage('\\u001'); D On Thu, Oct 11, 2012 at 10:23 AM, yogesh dhari wrote: > > Hi All , > > How t

Re: Parallel Join with Pig

2012-10-12 Thread Dmitriy Ryaboy
The default partitioning algorithm is basically this: reducer_id = key.hashCode() % num_reducers If you are joining on values that all map to the same reducer_id using this function, they will go to the same reducer. But if you have a reasonable hash code distribution and a decent volume of uniqu

Re: How can I read Hive text files on S3 from Pig?

2012-10-12 Thread Dmitriy Ryaboy
Martin, Do you have the compete stack trace? Generally, for Hive interop I recommend HCatalog; AllLoader is neat but it's a 3rd party contrib and we don't really know it too well. I can check out the error dump and see if there's anything obvious though. D On Fri, Oct 12, 2012 at 8:48 AM, Martin

Re: Small question

2012-10-12 Thread Abhishek
Thanks Arun you are right. Sent from my iPhone On Oct 12, 2012, at 11:26 AM, Arun Ahuja wrote: > From my interpretation Hive coaelsce returns the first non-null value. > > So it seems you are just doing a null check on x and return y if it is > null and z otherwise? > > In Pig you could do s

How can I read Hive text files on S3 from Pig?

2012-10-12 Thread Martin Goodson
I am trying to load some text files in hive partitions on S3 using the AllLoader function with no success. I get an error which indicates that AllLoader is expecting the files to be on hdfs: a = LOAD 's3n://x/y/zzz' using org.apache.pig.piggybank.storage.AllLoader(); grunt> 2012-10-12 14:5

Re: question

2012-10-12 Thread Arun Ahuja
Instead of count = foreach perCust generate group, COUNT(filtered_times.movie); use count = foreach perCust generate FLATTEN(group), COUNT(filtered_times.movie); FLATTEN is a special operator that replaces a tuple with the elements inside the tuple. On Thu, Oct 11, 2012 at 4:36 PM, jamal sasha

Re: Small question

2012-10-12 Thread Arun Ahuja
>From my interpretation Hive coaelsce returns the first non-null value. So it seems you are just doing a null check on x and return y if it is null and z otherwise? In Pig you could do something like --- " (x is null ? y : z) This a standard ternary if/else. Don't see if the 0.00 actually plays

Small question

2012-10-12 Thread Abhishek
Hi all, How to Hive coalesce statement in pig Example: Case when Coalesce(x,0.00)=0.00 then y else z How to write this in pig Regards Abhi