Re: [R] Regular expressions: offsets of groups

2010-09-30 Thread Titus von der Malsburg
Ok, we decided to have a shot at modifying gregexpr. Let's see how it works out. If anybody is interested in discussing this please contact me. R-help doesn't seem like the right place for further discussion. Is there a default place for discussing things like that? Thanks everybody for your re

Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Titus von der Malsburg
On Wed, Sep 29, 2010 at 1:58 PM, Michael Bedward wrote: > How is your C coding ? Bill ? Anyone else ?  I could have a got at > writing some prototype code to test in the next few days, though if > someone else with decent C skills is itching to do it please speak up. We have a skilled C- and R-pr

Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Michael Bedward
I'd definitely be a customer for it Titus. And it does seem like an obvious hole in regex processing in R that cries out to be filled. Um, ggregexpr isn't the sexiest of function names :) Perhaps we can think of something a little easier ? How is your C coding ? Bill ? Anyone else ? I could hav

Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Titus von der Malsburg
Bill, Michael, good to see I'm not the only one who sees potential for improvements in the regexpr domain. Adding a subpattern argument is certainly a step in the right direction and would make my life much easier. However, in my application I need to know not only the position of one group but a

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Michael Bedward
Ah, that's interesting - thanks Bill. That's certainly on the right track for me (Titus, you too ?) especially if the subpattern argument accepted a vector of multiple group indices. As you say, this is straightforward in C. I'd be happy to (try to) make a patch for the R sources if there was some

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread William Dunlap
> -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Bedward > Sent: Tuesday, September 28, 2010 12:46 AM > To: Titus von der Malsburg > Cc: r-help@r-project.org > Subject: Re: [R] Regular expressio

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Gabor Grothendieck
On Tue, Sep 28, 2010 at 6:52 AM, Titus von der Malsburg wrote: > On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward > wrote: >> What Titus wants to do is akin to retrieving capturing groups from a >> Matcher object in Java. > > Precisely.  Here's the description: > >  http://download.oracle.com/jav

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Titus von der Malsburg
On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward wrote: > What Titus wants to do is akin to retrieving capturing groups from a > Matcher object in Java. Precisely. Here's the description: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int) Gabor's lookbe

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Michael Bedward
What Titus wants to do is akin to retrieving capturing groups from a Matcher object in Java. I also thought there must be an existing, elegant solution to this some time ago and searched for it, including looking at the sources (albeit with not much expertise) but came up blank. I also looked at t

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Gabor Grothendieck
On Mon, Sep 27, 2010 at 1:34 PM, Titus von der Malsburg wrote: > On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck > wrote: >> Try this zero width negative look behind expression: >> >>> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) >> [[1]] >> [1] 2 7 >> attr(,"match.length") >> [1] 1 2 > >

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Henrique Dallazuanna
You've tried: gregexpr("b+", "abcdaabbc") On Mon, Sep 27, 2010 at 12:48 PM, Titus von der Malsburg wrote: > Dear list! > > > gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Henrique Dallazuanna
You could do this: gregexpr("ab+", "abcdaabbcbb")[[1]] + 1 On Mon, Sep 27, 2010 at 2:25 PM, Titus von der Malsburg wrote: > On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna > wrote: > > You've tried: > > > > gregexpr("b+", "abcdaabbc") > > But this would match the third occurrence of b+ in

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck wrote: > Try this zero width negative look behind expression: > >> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) > [[1]] > [1] 2 7 > attr(,"match.length") > [1] 1 2 Thanks Gabor, but this gives me the same result as gregexpr("b+", "abcdaab

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Gabor Grothendieck
On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg wrote: > Dear list! > >> gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not the offsets of the complete matches.  Is there

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna wrote: > You've tried: > > gregexpr("b+", "abcdaabbc") But this would match the third occurrence of b+ in "abcdaabbcbb". But in this example I'm only interested in b+ if it's preceded by a+. Titus _

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
Thank you Jim, but just as the solution that I discussed, your proposal involves deconstructing the pattern and searching several times. I'm looking for a general and efficient solution. Internally, the regexpr engine has all necessary information after one pass through the string. What I need i

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread jim holtman
try this: > x <- gregexpr("a+(b+)", "abcdaabbcaaacaaab") > justA <- gregexpr("a+", "abcdaabbcaaacaaab") > # find matches in 'x' for 'justA' > indx <- which(justA[[1]] %in% x[[1]]) > # now determine where 'b' starts > justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx] [1] 2 7 17 > On M

[R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
Dear list! > gregexpr("a+(b+)", "abcdaabbc") [[1]] [1] 1 5 attr(,"match.length") [1] 2 4 What I want is the offsets of the matches for the group (b+), i.e. 2 and 7, not the offsets of the complete matches. Is there a way in R to get that? I know about gsubgn and strapply, but they only give me