1) Can someone point me to any examples where Pig is used to break down a string into substrings based on a pattern? REGEX_EXTRACT_ALL <http://pig.apache.org/docs/r0.13.0/func.html#regex-extract-all> might help. Or there are a couple of more built-in UDFs that might meet your needs.
2) If I create a UDF using Java to break down the string into substrings, can I return the substrings as a list to Pig? You can return substrings as fields in a tuple, or as a bag. 3) If so, how do I iterate through the list in Pig? What are you trying to achieve? You could iterate over substrings within your UDF at 2 before return. Or if your UDF returns a tuple, you could process each field in a foreach. Or if your UDF returns a bag, you could flatten it and process them row by row. Hope this helps. On Fri, Jul 18, 2014 at 1:06 PM, Ron Wurzberger < [email protected]> wrote: > I am processing a data source containing log files. Each record contains a > number of fields, but one field has a string value that is a dump of the > log records for given day. The log entries contained within this string > "blob" are not fixed length, but do follow a set pattern. I can break this > blob down into individual log entries fairly easily using Java. The basic > pattern is [<date>][<component>][<msgtype>] <message text> and a given blob > may contain up to 100 such records. Can this be broken down using Pig? I'm > looking for records containing specific message types to output. > > Can someone point me to any examples where Pig is used to break down a > string into substrings based on a pattern? > If I create a UDF using Java to break down the string into substrings, can > I return the substrings as a list to Pig? > If so, how do I iterate through the list in Pig? > > Ron W. >
