Re: [R] Regex: workaround for variable length negative lookbehind

2008-12-01 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote: > On Mon, Dec 1, 2008 at 12:20 AM, Wacek Kusnierczyk > <[EMAIL PROTECTED]> wrote: > >> Gabor Grothendieck wrote: >> >>> Try this: >>> >>> >>> vec <- c("", "baaa", "bbaa", "bbba", "baamm", "aa") >>> grep("^(?!(.)\\1{1,}

Re: [R] Regex: workaround for variable length negative lookbehind

2008-12-01 Thread Gabor Grothendieck
On Mon, Dec 1, 2008 at 12:20 AM, Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > Gabor Grothendieck wrote: >> Try this: >> >> >>> vec <- c("", "baaa", "bbaa", "bbba", "baamm", "aa") >>> >> >> >>> grep("^(?!(.)\\1{1,}$).*(.)\\2{1,}$", vec, perl = TRUE) >>> > > or even > > grep("^(?!(.)\\1+$).*(.)

Re: [R] Regex: workaround for variable length negative lookbehind

2008-12-01 Thread joris meys
On Sun, Nov 30, 2008 at 9:59 PM, Stefan Evert <[EMAIL PROTECTED]> wrote: > Still, I think it's better to write a few lines of R code than to abuse > regular expressions to do something they were never intended to do. How do > other people on this list feel about that issue? > Honestly, I believe

Re: [R] Regex: workaround for variable length negative lookbehind

2008-11-30 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote: > Try this: > > >> vec <- c("", "baaa", "bbaa", "bbba", "baamm", "aa") >> > > >> grep("^(?!(.)\\1{1,}$).*(.)\\2{1,}$", vec, perl = TRUE) >> or even grep("^(?!(.)\\1+$).*(.)\\2+$", vec, perl = TRUE) vQ __

Re: [R] Regex: workaround for variable length negative lookbehind

2008-11-30 Thread Stefan Evert
But is there a one-line grep thingy to do this? Can't think of a one-liner, but a three-line solution you can easily enough wrap in a small function: vec<-c("", "baaa", "bbaa", "bbba", "baamm", "aa") idx.1 <- grep("(.)\\1$", vec) idx.2 <- grep("^(.)\\1*$", vec) vec[setdiff(idx.1, idx.2)]

Re: [R] Regex: workaround for variable length negative lookbehind

2008-11-30 Thread Gabor Grothendieck
Here is a very slight further simplification, i.e. we can drop the final {1,} > grep("^(?!(.)\\1{1,}$).*(.)\\2$", vec, perl = TRUE) [1] 2 3 5 On Sun, Nov 30, 2008 at 3:26 PM, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Try this: > >> vec <- c("", "baaa", "bbaa", "bbba", "baamm", "aa") >

Re: [R] Regex: workaround for variable length negative lookbehind

2008-11-30 Thread Gabor Grothendieck
Try this: > vec <- c("", "baaa", "bbaa", "bbba", "baamm", "aa") > grep("^(?!(.)\\1{1,}$).*(.)\\2{1,}$", vec, perl = TRUE) [1] 2 3 5 The (?...) succeeds only if the string is not all the same character and since that consumes no characters it restarts at the beginning to match anything follow

Re: [R] Regex: workaround for variable length negative lookbehind

2008-11-30 Thread Stefan Evert
Hi Stefan! :-) From tools where negative lookbehind can involve variable lengths, one would think this would work: grep("(? It's really the PCRE library that doesn't like your regexp, not R. The problem is that negative behind is only possible with a fixed- length expression, and sinc

[R] Regex: workaround for variable length negative lookbehind

2008-11-30 Thread Stefan Th. Gries
Hi all I have the following regular expression problem: I want to find complete elements of a vector that end in a repeated character but where the repetition doesn't make up the whole word. That is, for the vector vec: vec<-c("", "baaa", "bbaa", "bbba", "baamm", "aa") I would like to get "b