Actually how do you pass a bag to UDF? I did this: a = LOAD 'file_a' AS (a1, a2, a3);
*bag1* = LOAD 'somefile' AS (f1, f2, f3); b = FOREACH a GENERATE myUDF(*bag1*, a1, a2); But I got this error: Invalid scalar projection: bag1 : A column needs to be projected from a relation for it to be used as a scalar What is the right way of doing this? Thanks. On Wed, Jun 27, 2012 at 10:30 AM, Dexin Wang <wangde...@gmail.com> wrote: > That's a good idea (to pass the bag to UDF and initialize it on first UDF > invocation). Thanks. > > Why do you think it is expensive Mridul? > > > On Tue, Jun 26, 2012 at 2:50 PM, Mridul Muralidharan < > mrid...@yahoo-inc.com> wrote: > >> >> >> > -----Original Message----- >> > From: Jonathan Coveney [mailto:jcove...@gmail.com] >> > Sent: Wednesday, June 27, 2012 3:12 AM >> > To: user@pig.apache.org >> > Subject: Re: Passing a BAG to Pig UDF constructor? >> > >> > You can also just pass the bag to the UDF, and have a lazy initializer >> > in exec that loads the bag into memory. >> >> >> Can you elaborate what you mean by pass the bag to the UDF ? >> Pass it as part of the input to the udf in exec and initialize it only >> once (first time) ? (If yes, this is expensive) >> Or something else ? >> >> >> Regards, >> Mridul >> >> >> >> > >> > 2012/6/26 Mridul Muralidharan <mrid...@yahoo-inc.com> >> > >> > > You could dump the data in a dfs file and pass the location of the >> > > file as param to your udf in define - so that it initializes itself >> > > using that data ... >> > > >> > > >> > > - Mridul >> > > >> > > >> > > > -----Original Message----- >> > > > From: Dexin Wang [mailto:wangde...@gmail.com] >> > > > Sent: Tuesday, June 26, 2012 10:58 PM >> > > > To: user@pig.apache.org >> > > > Subject: Passing a BAG to Pig UDF constructor? >> > > > >> > > > Is it possible to pass a bag to a Pig UDF constructor? >> > > > >> > > > Basically in the constructor I want to initialize some hash map so >> > > > that on every exec operation, I can use the hashmap to do a lookup >> > > > and find the value I need, and apply some algorithm to it. >> > > > >> > > > I realize I could just do a replicated join to achieve similar >> > > > things but the algorithm is more than a few lines and there are >> > some >> > > > edge cases so I would rather wrap that logic inside a UDF function. >> > > > I also realize I could just pass a file path to the constructor and >> > > > read the files to initialize the hashmap but my files are on >> > > > Amazon's S3 and I don't want to deal with >> > > > S3 API to read the file. >> > > > >> > > > Is this possible or is there some alternative ways to achieve the >> > > > same thing? >> > > > >> > > > Thanks. >> > > > Dexin >> > > >> > >