>  Simon Slavin, Thank you for your suggestion. Our deduper prototoype 
> uses fuzzy matching methods such as the Levenshtein Distance to 
> detect duplicates. We have found that these fuzzy matching methods 
> are best implemented in C++ for processing time requirements.
>               We would still like to know your experience with SQLite 
> WAL databases compared to SQlite non-WAL databases. Particularly, we 
> are in the sqlite read processing in SQLIte WAL databases. Is 
> possible to SQLiTe WAL databases to have faster read processing than 
> SQLite non-WAL databases. If so, what method to use to gain the read 
> improvement? Thank you.

It is possible that you would see the biggest improvement by 
implementing your matching method in a plain C SQLite extension.  Doing 
would keep moving/converting data back and forth to a minimum as the 
workload would be made as close to the engine as possible.

I mail you a download link to an extension offering a Unicode-aware 
fuzzy compare function (Levhenshtein-Damerau exactly).  Have a look at 
it, play with it to see how it can fit part of your bill and adapt the 
code as you whish.

Like Simon, I think you should get rid of journaling in your case.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to