Additionally, your algorithm reference for step1c is from the "Snowball
English (Porter2)" algorithm.
The implementation used in SQLite is for the original "Porter" algorithm
discussed here:
http://tartarus.org/~martin/PorterStemmer/

HTH.
-SHane



On Wed, Feb 24, 2010 at 10:05 AM, D. Richard Hipp <d...@hwaci.com> wrote:

> We got the Porter stemmer code directly from Martin Porter.
>
> I'm sorry it does not work like you want it to.  Unfortunately, we
> cannot change it now without introducing a serious incompatibility
> with the millions and millions of applications already in the field
> that are using the existing implementation.
>
> FTS3 has a pluggable stemmer module.  You can write your own stemmer
> that works "correctly" if you like, and link it in for use in your
> applications.  We will also investigate making your recommended
> changes for FTS4.  However, in order to maintain backwards
> compatibility of FTS3, we cannot change the stemmer algorithm, even to
> fix a "bug".
>
> On Feb 24, 2010, at 9:59 AM, James Berry wrote:
>
> > Can somebody please clarify the bug reporting process for sqlite? My
> > understanding is that it's not possible to file bug reports
> > directly, and that the advise is to write to the user list first.
> > I've done that (below) but have no response so far and am concerned
> > that this means the bug report will just be forgotten others, as
> > well as by me.
> >
> > How does this bug move from a message on a list to a ticket (and
> > ultimately a patch, we hope) in the system?
> >
> > James
> >
> > On Feb 22, 2010, at 2:51 PM, James Berry wrote:
> >
> >> I'm writing to report a bug in the porter-stemmer algorithm
> >> supplied as part of the FTS3 implementation.
> >>
> >> The stemmer has an inverted logic error that prevents it from
> >> properly stemming words of the following form:
> >>
> >>      dry -> dri
> >>      cry -> cri
> >>
> >> This means, for instance, that the following words don't stem the
> >> same:
> >>
> >>      dried -> dri   -doesn't match-   dry
> >>      cried -> cry   -doesn't match-   cry
> >>
> >> The bug seems to have been introduced as a simple logic error by
> >> whoever wrote the stemmer code. The original description of step 1c
> >> is here: http://snowball.tartarus.org/algorithms/english/stemmer.html
> >>
> >>      Step 1c:
> >>              replace suffix y or Y by i if preceded by a non-vowel which
> is
> >> not the first letter of the word (so cry -> cri, by -> by, say ->
> >> say)
> >>
> >> But the code in sqlite reads like this:
> >>
> >> /* Step 1c */
> >> if( z[0]=='y' && hasVowel(z+1) ){
> >>   z[0] = 'i';
> >> }
> >>
> >> In other words, sqlite turns the y into an i only if it is preceded
> >> by a vowel (say -> sai), while the algorithm intends this to be
> >> done if it is _not_ preceded by a vowel.
> >>
> >> But there are two other problems in that same line of code:
> >>
> >>      (1) hasVowel checks whether a vowel exists anywhere in the string,
> >> not just in the next character, which is incorrect, and goes
> >> against the step 1c directions above. (amplify would not be
> >> properly stemmed to amplifi, for instance)
> >>
> >>      (2) The check for the first letter is not performed (for words
> >> like "by", etc)
> >>
> >> I've fixed both of those errors in the patch below:
> >>
> >>  /* Step 1c */
> >> -  if( z[0]=='y' && hasVowel(z+1) ){
> >> + if( z[0]=='y' && isConsonant(z+1) && z[2] ){
> >>    z[0] = 'i';
> >>  }
> >>
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> D. Richard Hipp
> d...@hwaci.com
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to