On Jan 30, 2018 12:05 AM, Ori Bernstein <o...@eigenstate.org> wrote:
>
> On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn <ed...@pettijohn-web.com> 
> wrote:
>
> > I'm trying to use patterns.c for some pattern matching. The manual 
> > mentions captures using "()" around what you want to capture.  I don't 
> > see how to get at the data though.  Here is a sample program.
> > 
> > #include <stdio.h>
> > #include "patterns.h"
> > 
> > int
> > main(int argc, char *argv[])
> > {
> >      const char        *errstr = NULL;
> >      const char        *string = "the quick the brown the fox";
> >      const char        *pattern = "the";
> >      int            ret;
> >      struct str_match     match;
> > 
> >      ret = str_match(string, pattern, &match, &errstr);
> > 
> >      if (errstr != NULL)
> >          printf("%s\n", errstr);
> >      else
> >          printf("number of matches %d\n", match.sm_nmatch);
> > 
> >      return 0;
> > }
> > 
> > It prints 2 which I was expecting 3. I've tried multiple other patterns 
> > and it seems the answer is always 2. Which leads me to believe I'm doing 
> > something wrong.  Any assistance appreciated.
> > 
> > 
> > Thanks,
> > 
> > 
> > Edgar
>
> The code is looking for a match of the pattern in the string, not all matches
> of the pattern in the string. It also makes the (IMO, surprising) decision
> that not having any capture groups in the pattern implies capturing the whole
> pattern. The whole string goes into the first match.
>
> So, in your case, you're matching:
>
> "the quick the brown the fox";
> ^^^
>
> Accordingly:
>
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "the"
>
> If you had 'quick', you'd get similar behavior:
>
> "the quick the brown the fox";
>      ^^^^
>
> Equivalently, putting the whole pattern in '()' will match the same thing:
>
> pattern = "(quick)"
>
> But multiple parens will match their substrings:
>
> pattern = "(qu)ick (the)"
>
> "the quick the brown the fox";
>      ^^    ^^^
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "qu"
> matches.sm_match[2] = "the"
>
> The choice to capture implicitly, I think, is confusing, but the behavior
> seems to me to be correct.
>
> -- 
>     Ori Bernstein

Thanks. Makes sense now. Probably would have figured it out for myself if I'd 
have printed out matches.sm_match[0], etc. Live and learn.

Edgar

Reply via email to