1) Can someone point me to any examples where Pig is used to break down a
string into substrings based on a pattern?
REGEX_EXTRACT_ALL
<http://pig.apache.org/docs/r0.13.0/func.html#regex-extract-all> might
help. Or there are a couple of more built-in UDFs that might meet your
needs.

2) If I create a UDF using Java to break down the string into substrings,
can I return the substrings as a list to Pig?
You can return substrings as fields in a tuple, or as a bag.

3) If so, how do I iterate through the list in Pig?
What are you trying to achieve? You could iterate over substrings within
your UDF at 2 before return. Or if your UDF returns a tuple, you could
process each field in a foreach. Or if your UDF returns a bag, you could
flatten it and process them row by row.

Hope this helps.



On Fri, Jul 18, 2014 at 1:06 PM, Ron Wurzberger <
[email protected]> wrote:

> I am processing a data source containing log files. Each record contains a
> number of fields, but one field has a string value that is a dump of the
> log records for given day. The log entries contained within this string
> "blob" are not fixed length, but do follow a set pattern. I can break this
> blob down into individual log entries fairly easily using Java. The basic
> pattern is [<date>][<component>][<msgtype>] <message text> and a given blob
> may contain up to 100 such records. Can this be broken down using Pig? I'm
> looking for records containing specific message types to output.
>
> Can someone point me to any examples where Pig is used to break down a
> string into substrings based on a pattern?
> If I create a UDF using Java to break down the string into substrings, can
> I return the substrings as a list to Pig?
>     If so, how do I iterate through the list in Pig?
>
> Ron W.
>

Reply via email to