RE: More on issue with local vs mapreduce mode

2013-11-06 Thread Serega Sheypak
Show the code if your func. How did you define input and output schema? 07.11.2013 2:09 пользователь "Sameer Tilak" написал: > Dear Serega, > I am now using log4j for debugging my UDF. Here is what I found out. For > some reason in mapreduce mode my exec function does not get called. The > log

Re: Bag of tuples

2013-11-06 Thread Pradeep Gollakota
Each element in A is not a Bag. A relation is a collection of tuples (just like a bag). So each element in A is a tuple whose first element is a Bag. If you want to order the tuples by id, you have to extract them from the bag first. A = LOAD 'data' ...; B = FOREACH A GENERATE FLATTEN($0); C = OR

RE: Bag of tuples

2013-11-06 Thread Sameer Tilak
Hi Alan, Thanks for your reply. I am trying to understand how Pig processes these relations. As I mentioned, my UDF returns the result in the following format; {(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */ {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */ {(id8,x,y,z)

RE: More on issue with local vs mapreduce mode

2013-11-06 Thread Sameer Tilak
Dear Serega, I am now using log4j for debugging my UDF. Here is what I found out. For some reason in mapreduce mode my exec function does not get called. The log message in the constructor gets printed onto the console. However, the log message in the exec funciton does not get printed to the

Re: Bag of tuples

2013-11-06 Thread Alan Gates
Do you mean you want to find the top 5 per input record? Also, what is your ordering criteria? Just sort by id? Something like this should order all tuples in each bag by id and then produce the top 5. My syntax may be a little off as I'm working offline and don't have the manual in front of

Re: More on issue with local vs mapreduce mode

2013-11-06 Thread Serega Sheypak
You get 4 empty tuples. Maybe your UDF parser.customFilter(key,'A') works differently? Maybe you use the old version? You can add print statement to UDF and see what does it accept and what does produce. 2013/11/6 Sameer Tilak > Dear Serega, > > When I run the script in local mode, I get co

RE: More on issue with local vs mapreduce mode

2013-11-06 Thread Sameer Tilak
Dear Serega, When I run the script in local mode, I get correct o/p stored in AU/part-m-000 file. However, when I run it in the mapreduce mode (with i/p and o/p from HDFS), the file /scratch/AU/part-m-000 is of size 4 and there is nothing in it. I am not sure whether AU relation somehow does n

RE: Need example of python code with dependency files

2013-11-06 Thread william.dowling
You said "The .py code takes input from sys.stdin and outputs to sys.stdout" so I infer you are talking about streaming, not a python UDF. In that case, rather than streaming through your python script P.py, instead stream through a shell script S.sh. The shell script can untar shipped or cached

Re: Pig Distributed Cache

2013-11-06 Thread burakkk
No as I said before, doing the cross product is my workaround solution. I try it to do replicated join. I'll share the results soon later. Thanks Best regards... On Tue, Nov 5, 2013 at 9:50 PM, Pradeep Gollakota wrote: > I see... do you have to do a full cross product or are you able to do a >