Show the code if your func. How did you define input and output schema?
07.11.2013 2:09 пользователь "Sameer Tilak" написал:
> Dear Serega,
> I am now using log4j for debugging my UDF. Here is what I found out. For
> some reason in mapreduce mode my exec function does not get called. The
> log
Each element in A is not a Bag. A relation is a collection of tuples (just
like a bag). So each element in A is a tuple whose first element is a Bag.
If you want to order the tuples by id, you have to extract them from the
bag first.
A = LOAD 'data' ...;
B = FOREACH A GENERATE FLATTEN($0);
C = OR
Hi Alan,
Thanks for your reply.
I am trying to understand how Pig processes these relations. As I mentioned, my
UDF returns the result in the following format;
{(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */
{(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
{(id8,x,y,z)
Dear Serega,
I am now using log4j for debugging my UDF. Here is what I found out. For some
reason in mapreduce mode my exec function does not get called. The log
message in the constructor gets printed onto the console. However, the log
message in the exec funciton does not get printed to the
Do you mean you want to find the top 5 per input record? Also, what is your
ordering criteria? Just sort by id? Something like this should order all
tuples in each bag by id and then produce the top 5. My syntax may be a little
off as I'm working offline and don't have the manual in front of
You get 4 empty tuples.
Maybe your UDF parser.customFilter(key,'A') works differently? Maybe
you use the old version?
You can add print statement to UDF and see what does it accept and what
does produce.
2013/11/6 Sameer Tilak
> Dear Serega,
>
> When I run the script in local mode, I get co
Dear Serega,
When I run the script in local mode, I get correct o/p stored in AU/part-m-000
file. However, when I run it in the mapreduce mode (with i/p and o/p from
HDFS), the file /scratch/AU/part-m-000 is of size 4 and there is nothing in it.
I am not sure whether AU relation somehow does n
You said "The .py code takes input from sys.stdin and outputs to sys.stdout" so
I infer you are talking about streaming, not a python UDF. In that case, rather
than streaming through your python script P.py, instead stream through a shell
script S.sh. The shell script can untar shipped or cached
No as I said before, doing the cross product is my workaround solution. I
try it to do replicated join. I'll share the results soon later.
Thanks
Best regards...
On Tue, Nov 5, 2013 at 9:50 PM, Pradeep Gollakota wrote:
> I see... do you have to do a full cross product or are you able to do a
>