For example,

Using -param path=filename.txt

DEFINE MYUDF org.package.MYUDF('$path') SHIP('$path');

Doesn't seem to work...

On Thu, Apr 21, 2011 at 11:18 AM, Mark Laczin <[email protected]> wrote:

> Does anyone know how to ship the config file in this situation?
> I'm encountering problems with file not found exceptions when trying to run
> this over a cluster.
>
>
> On Wed, Apr 20, 2011 at 1:03 PM, Mark Laczin <[email protected]>wrote:
>
>> I kind of solved it by reading in the data from my UDF constructor (it's
>> just a file with a list of like 10 regular expressions, so I did manual file
>> I/O), by passing the path (provided as a parameter), and then just storing
>> it (and then, looping over it and testing a, b by hand).  It's not the
>> MapReduce way, but it will work for this application, considering the small
>> size of the file.
>>
>> If anyone knows how my "patch" might fail, or if there is a better way -
>> feel free to speak up.
>>
>> -Mark
>>
>>
>> On Wed, Apr 20, 2011 at 12:51 PM, Bill Graham <[email protected]>wrote:
>>
>>> You could try doing GROUP ALL on the contents of M, which would
>>> produce a since bag containing each record and then joining M with
>>> data using a surrogate constant key. Or CROSS would also work instead
>>> of the join I suspect. Then you'd have a tuple like this to work with:
>>>
>>> (a, b, M:bag)
>>>
>>> I'm not sure if things would blow up if M is too large to fit into
>>> memory in your UDF though.
>>>
>>>
>>> On Wed, Apr 20, 2011 at 6:27 AM, Mark Laczin <[email protected]>
>>> wrote:
>>> > I'm trying to do something like this:
>>> > (if 'data' is a set of tuples loaded from a file containing fields a, b
>>> and
>>> > c)
>>> > (if 'M' is another set of tuples loaded from a file)
>>> >
>>> > data = FOREACH data GENERATE *, someUDF(a, b, M);
>>> >
>>> > What I'm looking for is to generate (in this case, a string) based on a
>>> and
>>> > b, using the contents of M inside the UDF.
>>> >
>>> > The UDF looks like this, in pseudocode:
>>> >
>>> > foreach element x in M {
>>> >  if a matches x or b matches x {
>>> >    return "something"
>>> >  }
>>> > }
>>> > return "something else"
>>> >
>>> > Is this possible?  I keep getting errors related to "Scalars can only
>>> be
>>> > used with projections" and the like.
>>> > The thing holding me back from using filters is that I won't know
>>> what's in
>>> > M until it's read, and since (in this case) they'll be regular
>>> expressions,
>>> > I'd need to be able to join/group with regex matching which I don't
>>> think
>>> > Pig can do.
>>> >
>>> > -Mark
>>> >
>>>
>>
>>
>

Reply via email to