cool, that sounds like good idea Thanks Norbert!
-- regards, Jakub Glapa On Tue, Jun 19, 2012 at 1:22 AM, Norbert Burger <[email protected]>wrote: > Any reason you can't wrap this regex with wildcards aligned with start-line > and end-line anchors, i.e.: > > ^.*([^0-9])\1{3,}.*$ > > Agreed that it would be nice if MATCHES was less greedy here, but perhaps > this'll avoid you having to write your own UDF. > > Norbert > > On Mon, Jun 18, 2012 at 3:31 PM, Jakub Glapa <[email protected]> > wrote: > > > Hi Norbert, > > thanks for the tip. > > I think that MATCHES operator won't work for me because it tries to match > > the whole region. > > In my case I'm interesting in detecting the sequence anywhere in the > > string. > > > > e.g. > > abccccdef - filter out > > abcdeeeef - filter out > > aabcdeef - leave > > 111111abcd - leave > > > > I want to filter out all the string with at least 4 times repeated char > > sequences but not numbers. > > > > regexp for detecting those is: ([^0-9])\1{3,} > > but it won't work with MATCHES > > > > I have a trivial working UDF that just calls the > pattern().matcher().find() > > but maybe there is something that I could use? > > > > > > -- > > regards, > > Jakub Glapa > > > > > > On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger < > [email protected] > > >wrote: > > > > > Jakub -- The MATCHES operator accepts regexes as input. You can add a > > NOT > > > to invert the logic. > > > > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html > > > > > > Norbert > > > > > > On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa <[email protected]> > > > wrote: > > > > > > > Hi all, > > > > I found in pig latin a 'matches' operator for pattern matching. > > > > I didn't find it in documentation but maybe there exists something > > > similar > > > > but for searching? > > > > Basically in java world I would want to get the result of the > > > > Matcher.find() method not Matcher.matches(). > > > > Will I have to end up writing my own UDF for that? > > > > > > > > Thanks for help. > > > > > > > > PS. > > > > I'm trying to filter out strings with consecutive repeated > characters. > > > I've > > > > constructed a regexp that detects them. > > > > Now I just have to apply it somehow. > > > > > > > > > > > > -- > > > > regards, > > > > Jakub Glapa > > > > > > > > > >
