Re: LOAD function vs. UDF eval

2012-05-29 Thread Norbert Burger
Thanks, Raghu. Maybe another benefit of the UDF route is that it could support the accumulator interface. Since both approaches would use the HBase client API directly, there's no Pig-specific benefit to using a loader, right? Norbert On Tue, May 29, 2012 at 8:37 PM, Raghu Angadi wrote: > I w

Re: LOAD function vs. UDF eval

2012-05-29 Thread Raghu Angadi
I would still use a UDF, it is lot more flexible. Passing large number of ids to the loader is part of the problem.. Your UDF would take a bag of ids and return bag{(session, events:bag{})} You can pass the bag of ids in various ways : - load ids as a relation, group all to put all of them in

Re: Diagnostic Operators inside Macros

2012-05-29 Thread Jonathan Coveney
There is a GSOC to move grunt into ANTLR, which may make it possible (if it is desirable) to move more of these commands into macros. 2012/5/29 Alan Gates > It's not an intended feature, but it is a side effect of the way macros > are implemented. Pig actually has a couple of parser in it. One

Re: Issue while running Pig in Hadoop mode

2012-05-29 Thread Prashant Kommireddi
Hi Nikhil, Can you paste your script here or pastebin? The warning message says you are trying to access a field that does not exist. An easy way to debug would be to make sure you have records flowing out of each Pig statement. You can use LIMIT operator to dump 10 records or so and troubleshoot

Issue while running Pig in Hadoop mode

2012-05-29 Thread nikhil desai
Hello, I am trying to run Pig in Hadoop mode with 2 clusters. I have installed Hadoop 1.0.3 and Pig 0.10. When I run Pig statements like "foreach" or if I use "MAX or AVG" i get the following error: WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered

Re: Diagnostic Operators inside Macros

2012-05-29 Thread Alan Gates
It's not an intended feature, but it is a side effect of the way macros are implemented. Pig actually has a couple of parser in it. One parses Pig Latin, the other is used by Grunt, the shell. Grunt does not know Pig Latin, but it knows to pass it on to the Pig Latin parser. Pig Latin knows

Re: Losing ordering after using ORDER BY

2012-05-29 Thread Jonathan Coveney
If you do a grouping, the ordering changes. What you want to do is: D = FOREACH C GENERATE COUNT($1) as countd; D1 = GROUP D ALL; D2 = FOREACH D1 { ord = ORDER $1 BY $0 desc; GENERATE MyCustomEvalFunc(ord); } Keep in mind that you'llbe ordering all of your data on one reducer, but this isn't

Losing ordering after using ORDER BY

2012-05-29 Thread James Newhaven
Hi, I've noticed that I seem to be losing the ordering of my relation after passing the result of an ORDER BY to an EVAL function. For example: D = FOREACH C GENERATE COUNT($1) as countd; E = ORDER D BY $0 DESC; D1 = GROUP E ALL; D2 = FOREACH D1 GENERATE MyCustomEvalFunc($1); When inspecting th

Re: Verifying unordered output with PigUnit

2012-05-29 Thread Jonathan Coveney
Generally, sorting is the way to go. It's going to be difficult to get around doing some sort of processing in order to make it easier to evaluate equality. If you want something generally O(n) instead of O(n log n), you could calculate the hashCode for every tuple then SUM it (which is algebraic)

Re: Syntax highlighting in kate

2012-05-29 Thread Jonathan Coveney
You should throw that on github, and then we could put it on https://cwiki.apache.org/confluence/display/PIG/PigTools 2012/5/29 Johannes Schwenk > For those who are writing pig scripts in kate, I have written a basic > syntax highlighting file which can be found here: > > http://pastebin.com/dFR

LOAD function vs. UDF eval

2012-05-29 Thread Norbert Burger
We're analyzing session(s) using Pig and HBase, and this session data is currently stored in a single HBase table, where rowkey is a sessionid-eventid combo (tall table). I'm trying to optimize the "extract-all-events-for-a-given-session" step of our workflow. This could be a simple JOIN. But th

Syntax highlighting in kate

2012-05-29 Thread Johannes Schwenk
For those who are writing pig scripts in kate, I have written a basic syntax highlighting file which can be found here: http://pastebin.com/dFR71BVx Installation: # mkdir ~./kde/share/apps/katepart/syntax/ # cp pig.xml ~./kde/share/apps/katepart/syntax/ Have fun, Johannes Schwenk -- Softwaree

Verifying unordered output with PigUnit

2012-05-29 Thread Johannes Schwenk
Hello all, I'd like to verify output from a pig script that does not sort its results prior to output. Thus the order of the tuples in the output is non-deterministic. I would rather not add sorting to my script, because I am potentially dealing with a lot of data here. As I have found PigLatin do