Actually how do you pass a bag to UDF? I did this:

    a = LOAD 'file_a' AS (a1, a2, a3);

    *bag1* = LOAD 'somefile' AS (f1, f2, f3);

    b = FOREACH a GENERATE myUDF(*bag1*, a1, a2);

But I got this error:

     Invalid scalar projection: bag1 : A column needs to be projected from
a relation for it to be used as a scalar

What is the right way of doing this? Thanks.


On Wed, Jun 27, 2012 at 10:30 AM, Dexin Wang <wangde...@gmail.com> wrote:

> That's a good idea (to pass the bag to UDF and initialize it on first UDF
> invocation). Thanks.
>
> Why do you think it is expensive Mridul?
>
>
> On Tue, Jun 26, 2012 at 2:50 PM, Mridul Muralidharan <
> mrid...@yahoo-inc.com> wrote:
>
>>
>>
>> > -----Original Message-----
>> > From: Jonathan Coveney [mailto:jcove...@gmail.com]
>> > Sent: Wednesday, June 27, 2012 3:12 AM
>> > To: user@pig.apache.org
>> > Subject: Re: Passing a BAG to Pig UDF constructor?
>> >
>> > You can also just pass the bag to the UDF, and have a lazy initializer
>> > in exec that loads the bag into memory.
>>
>>
>> Can you elaborate what you mean by pass the bag to the UDF ?
>> Pass it as part of the input to the udf in exec and initialize it only
>> once (first time) ? (If yes, this is expensive)
>> Or something else ?
>>
>>
>> Regards,
>> Mridul
>>
>>
>>
>> >
>> > 2012/6/26 Mridul Muralidharan <mrid...@yahoo-inc.com>
>> >
>> > > You could dump the data in a dfs file and pass the location of the
>> > > file as param to your udf in define - so that it initializes itself
>> > > using that data ...
>> > >
>> > >
>> > > - Mridul
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Dexin Wang [mailto:wangde...@gmail.com]
>> > > > Sent: Tuesday, June 26, 2012 10:58 PM
>> > > > To: user@pig.apache.org
>> > > > Subject: Passing a BAG to Pig UDF constructor?
>> > > >
>> > > > Is it possible to pass a bag to a Pig UDF constructor?
>> > > >
>> > > > Basically in the constructor I want to initialize some hash map so
>> > > > that on every exec operation, I can use the hashmap to do a lookup
>> > > > and find the value I need, and apply some algorithm to it.
>> > > >
>> > > > I realize I could just do a replicated join to achieve similar
>> > > > things but the algorithm is more than a few lines and there are
>> > some
>> > > > edge cases so I would rather wrap that logic inside a UDF function.
>> > > > I also realize I could just pass a file path to the constructor and
>> > > > read the files to initialize the hashmap but my files are on
>> > > > Amazon's S3 and I don't want to deal with
>> > > > S3 API to read the file.
>> > > >
>> > > > Is this possible or is there some alternative ways to achieve the
>> > > > same thing?
>> > > >
>> > > > Thanks.
>> > > > Dexin
>> > >
>>
>
>

Reply via email to