Re: [sqlite] [Spellfix] Searching for short words is very slow

Philip Bennefall Wed, 23 Jul 2014 16:23:24 -0700

Hi Richard,

My application is basically just to take a text file as a command lineargument and run the spellchecker on it, showing an alert for each wordthat is not found in the dictionary and giving the user some options.

After a bit of experimentation I concluded that one way to speed thingsup is to store the entire dictionary in memory as a hash map and lookfor exact matches. Only when an exact match isn't found do I fall backto the spellfix table. This allowed me to scan a document with just over86000 words in less than 500 milliseconds, which is more than acceptablefor my needs. Certainly not ideal if you aren't on a workstation, butit's a reasonable tradeoff if memory is not an issue.

Perhaps something similar could be done in the spellfix table itself?Have an indexed integer column containing a crc32 or similar for eachword in the dictionary so that we can look for exact matches veryquickly. We only fall back to the fuzzy search if no match is found. Canyou see any obvious drawbacks with this? If not, I'd like to put thisoptimization forth as an initial suggestion. I'll write again if I canthink of anything else after reading the code more thoroughly.


Kind regards,

Philip Bennefall
On 7/24/2014 12:25 AM, Richard Hipp wrote:

On Wed, Jul 23, 2014 at 6:18 PM, Philip Bennefall <phi...@blastbay.com<mailto:phi...@blastbay.com>> wrote:
    I have to amend my last message. The timings I just gave was for
    looking up that word 10 times, not 1. So the longest time I've
    seen would be about 150 ms. However, if you have a document with a
    few thousand words we would still be looking at a significant
    total searching time. Is this to be expected?


There is no expectation.
Spellfix is an experiment in doing fuzzy matching. It was designedfor a specific customer who is doing spell-checking in real-time, asthe text is being entered. Spellfix works way faster than the enduser can enter text, so performance is not an issue in its originalpurpose.
Perhaps you are using spellfix in a different way? You are welcomedto do so. If you want to contribute ideas on how to improve spellfixfor use in different scenarios, we will welcome your input.
There are comments in the code explaining how spellfix works. Pleasereview the principles of operation and then perhaps run a performanceanalysis using gprof or cachegrind. Then describe exactly what youare doing and why it isn't working out for you and perhaps we can help.
--
D. Richard Hipp
d...@sqlite.org <mailto:d...@sqlite.org>


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [Spellfix] Searching for short words is very slow

Reply via email to