Re: [sqlite] Bug in porter stemmer

2010-02-24 Thread James Berry
Can somebody please clarify the bug reporting process for sqlite? My understanding is that it's not possible to file bug reports directly, and that the advise is to write to the user list first. I've done that (below) but have no response so far and am concerned that this means the bug report

Re: [sqlite] Bug in porter stemmer

2010-02-24 Thread D. Richard Hipp
We got the Porter stemmer code directly from Martin Porter. I'm sorry it does not work like you want it to. Unfortunately, we cannot change it now without introducing a serious incompatibility with the millions and millions of applications already in the field that are using the existing

Re: [sqlite] Bug in porter stemmer

2010-02-24 Thread Shane Harrelson
Additionally, your algorithm reference for step1c is from the Snowball English (Porter2) algorithm. The implementation used in SQLite is for the original Porter algorithm discussed here: http://tartarus.org/~martin/PorterStemmer/ HTH. -SHane On Wed, Feb 24, 2010 at 10:05 AM, D. Richard Hipp

Re: [sqlite] Bug in porter stemmer

2010-02-24 Thread James Berry
drh, Thanks for the response: it's nice to know that the report was actually seen. It would be hubris indeed to claim to fix an implementation bug in Porter's code. The code in sqlite didn't match any of Porter's code I could find, so I assumed it came from elsewhere: but maybe I missed

Re: [sqlite] Bug in porter stemmer

2010-02-24 Thread Scott Hess
Actually, I think a new version of the tokenizer would have to be a distinct tokenizer (ie, porter versus porter1 versus porter2, whatever). fts4 should not interpret the meaning of an explicit tokenizer differently from fts3, but it could use a different default tokenizer. [Don't take this as

[sqlite] Bug in porter stemmer

2010-02-22 Thread James Berry
I'm writing to report a bug in the porter-stemmer algorithm supplied as part of the FTS3 implementation. The stemmer has an inverted logic error that prevents it from properly stemming words of the following form: dry - dri cry - cri This means, for instance, that the