Hi Richard,

My application is basically just to take a text file as a command line argument and run the spellchecker on it, showing an alert for each word that is not found in the dictionary and giving the user some options.

After a bit of experimentation I concluded that one way to speed things up is to store the entire dictionary in memory as a hash map and look for exact matches. Only when an exact match isn't found do I fall back to the spellfix table. This allowed me to scan a document with just over 86000 words in less than 500 milliseconds, which is more than acceptable for my needs. Certainly not ideal if you aren't on a workstation, but it's a reasonable tradeoff if memory is not an issue.

Perhaps something similar could be done in the spellfix table itself? Have an indexed integer column containing a crc32 or similar for each word in the dictionary so that we can look for exact matches very quickly. We only fall back to the fuzzy search if no match is found. Can you see any obvious drawbacks with this? If not, I'd like to put this optimization forth as an initial suggestion. I'll write again if I can think of anything else after reading the code more thoroughly.

Kind regards,

Philip Bennefall
On 7/24/2014 12:25 AM, Richard Hipp wrote:



On Wed, Jul 23, 2014 at 6:18 PM, Philip Bennefall <phi...@blastbay.com <mailto:phi...@blastbay.com>> wrote:

    I have to amend my last message. The timings I just gave was for
    looking up that word 10 times, not 1. So the longest time I've
    seen would be about 150 ms. However, if you have a document with a
    few thousand words we would still be looking at a significant
    total searching time. Is this to be expected?


There is no expectation.

Spellfix is an experiment in doing fuzzy matching. It was designed for a specific customer who is doing spell-checking in real-time, as the text is being entered. Spellfix works way faster than the end user can enter text, so performance is not an issue in its original purpose.

Perhaps you are using spellfix in a different way? You are welcomed to do so. If you want to contribute ideas on how to improve spellfix for use in different scenarios, we will welcome your input.

There are comments in the code explaining how spellfix works. Please review the principles of operation and then perhaps run a performance analysis using gprof or cachegrind. Then describe exactly what you are doing and why it isn't working out for you and perhaps we can help.



--
D. Richard Hipp
d...@sqlite.org <mailto:d...@sqlite.org>

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to