Hi Richard,
My application is basically just to take a text file as a command line
argument and run the spellchecker on it, showing an alert for each word
that is not found in the dictionary and giving the user some options.
After a bit of experimentation I concluded that one way to speed things
up is to store the entire dictionary in memory as a hash map and look
for exact matches. Only when an exact match isn't found do I fall back
to the spellfix table. This allowed me to scan a document with just over
86000 words in less than 500 milliseconds, which is more than acceptable
for my needs. Certainly not ideal if you aren't on a workstation, but
it's a reasonable tradeoff if memory is not an issue.
Perhaps something similar could be done in the spellfix table itself?
Have an indexed integer column containing a crc32 or similar for each
word in the dictionary so that we can look for exact matches very
quickly. We only fall back to the fuzzy search if no match is found. Can
you see any obvious drawbacks with this? If not, I'd like to put this
optimization forth as an initial suggestion. I'll write again if I can
think of anything else after reading the code more thoroughly.
Kind regards,
Philip Bennefall
On 7/24/2014 12:25 AM, Richard Hipp wrote:
On Wed, Jul 23, 2014 at 6:18 PM, Philip Bennefall <phi...@blastbay.com
<mailto:phi...@blastbay.com>> wrote:
I have to amend my last message. The timings I just gave was for
looking up that word 10 times, not 1. So the longest time I've
seen would be about 150 ms. However, if you have a document with a
few thousand words we would still be looking at a significant
total searching time. Is this to be expected?
There is no expectation.
Spellfix is an experiment in doing fuzzy matching. It was designed
for a specific customer who is doing spell-checking in real-time, as
the text is being entered. Spellfix works way faster than the end
user can enter text, so performance is not an issue in its original
purpose.
Perhaps you are using spellfix in a different way? You are welcomed
to do so. If you want to contribute ideas on how to improve spellfix
for use in different scenarios, we will welcome your input.
There are comments in the code explaining how spellfix works. Please
review the principles of operation and then perhaps run a performance
analysis using gprof or cachegrind. Then describe exactly what you
are doing and why it isn't working out for you and perhaps we can help.
--
D. Richard Hipp
d...@sqlite.org <mailto:d...@sqlite.org>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users