I'm trying to do something like this:
(if 'data' is a set of tuples loaded from a file containing fields a, b and
c)
(if 'M' is another set of tuples loaded from a file)

data = FOREACH data GENERATE *, someUDF(a, b, M);

What I'm looking for is to generate (in this case, a string) based on a and
b, using the contents of M inside the UDF.

The UDF looks like this, in pseudocode:

foreach element x in M {
  if a matches x or b matches x {
    return "something"
  }
}
return "something else"

Is this possible?  I keep getting errors related to "Scalars can only be
used with projections" and the like.
The thing holding me back from using filters is that I won't know what's in
M until it's read, and since (in this case) they'll be regular expressions,
I'd need to be able to join/group with regex matching which I don't think
Pig can do.

-Mark

Reply via email to