Opened a JIRA to better clarify the docs here: https://issues.apache.org/jira/browse/PIG-2905
On Tue, Sep 4, 2012 at 12:37 AM, MiaoMiao <liy...@gmail.com> wrote: > Pity the document of REPLACE doesn't mention about regex at all. Thank > you so much for your reply, being able to know what's going on is such > a relief. Now I can trust myself with pig a little more. > > > On Sat, 18 Aug 2012 at 00:05:29 AM, Cheolsoo Park <cheol...@cloudera.com> > wrote: > > Hi, > > > If you look at the source code of REPLACE, what it does is basically: > > > String source = "[02/Aug/2012:05:01:17"; > > > String target ="["; > > > String replaceWith = ""; > > > return source.replaceAll(source, target, replaceWith); > > > > Note that Java String.replaceAll() takes a regular expression for the 2nd > > parameter (i.e. target), and "[" is a special character. To use it as is, > > you have to escape it, so in your Pig script, you should do: > > > REPLACE(date,'\\[','') > > > Now regarding the result that you're seeing, it looks like whatever > > exception is thrown inside REPLACE is swallowed rather than makes the job > > fail, and null is returned: > > > try{ > > > ... > > > }catch(Exception e){ > > > warn("Failed to process input; error - " + e.getMessage(), > > > PigWarning.*UDF_WARNING_1*); > > > return null; > > > } > > > > But I do see the following message at the end of the job status: > > > 2012-08-17 16:51:25,061 [main] WARN > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Encountered *Warning UDF_WARNING_1* 1 time(s) > > > I must admit that this is not very visible though. > > > Thanks, > > Cheolsoo > > > On Mon, Aug 13, 2012 at 10:03 PM, MiaoMiao <liy...@gmail.com> wrote: > > > > I used pig to do some ETL job, but met with a strange bug of the > > > built-in REPLACE function. > > > > > > After I replace '[' with '' in '[02/Aug/2012:05:01:17' , the whole > > > string just went blank. > > > > > > Here I posted some info that may help debug. > > > > > > My pig version is: Apache Pig version 0.11.0-SNAPSHOT (r1364475) > > > compiled Jul 23 2012, 10:30:53 > > > > > > The original text file: > > > ip.ip.ip.ip - - [02/Aug/2012:05:01:17 -0600] "GET > > > /player.php/sid/XNDM0Njk3MjEy/v.swf HTTP/1.1" 302 26 > > > > > > The whole pig script is : > > > read = load '/home/test/apacheLog' > > > using PigStorage(' ') > > > as ( > > > ip:chararray > > > , indentity:chararray > > > , name:chararray > > > , date:chararray > > > , timezone:chararray > > > , method:chararray > > > , path:chararray > > > , protocol:chararray > > > , status:chararray > > > , size:chararray > > > ); > > > dump read; > > > > > > > --(ip.ip.ip.ip,-,-,[02/Aug/2012:05:01:17,-0600],"GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1",302,26) > > > data = foreach read generate > > > ip > > > , REPLACE(date,'[','') > > > , REPLACE(timezone,']','') > > > , REPLACE(method,'"','') > > > , path > > > , REPLACE(protocol,'"','') > > > , status > > > , size; > > > describe data; > > > --data: {ip: chararray,date: chararray,timezone: chararray,method: > > > chararray,path: chararray,protocol: chararray,status: chararray,size: > > > chararray} > > > dump data; > > > > > > > --(ip.ip.ip.ip,,-0600,GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1,302,26) > > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*