A fixed up version can be found at http://www.dessus.com/libstringmetrics-fixed.zip which includes a working DLL compiled with 32-bit MinGW, plus modified source and the "compile.cmd" used to compile on Windows with MinGW installed. I will remove it once Andrea has merged the changes.
>-----Original Message----- >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >boun...@sqlite.org] On Behalf Of Keith Medcalf >Sent: Sunday, 28 September, 2014 14:21 >To: General Discussion of SQLite Database >Subject: Re: [sqlite] A new extension for sqlite to analyze the >stringmetrics > > >Andrea, here are some problems and their solutions: > >(1) Where do you get a "tolower" function that takes a char* and returns >a char*? The one in ctypes.h works a character at a time (thus takes an >int and returns an int). > >(2) Why is the entire sqlite3 engine included in an dll which is loaded >as an extension to sqlite3? > >Please follow along and I will show you where we can find and fix these >issues ... this might be helpful to other extension writers as well. > >First, the -std=c99 needs to be -std=gnu99 to permit the gnu extension >functions to be recognized. >Without the GNU extensions a bunch of non-ansi names are not recognized >because c99 implies -ansi. Once this change is made 99% of the errors go >away. > >src\wrapper_functions.c: In function 'stringmetricsFunc': >src\wrapper_functions.c:350:16: warning: 'return' with a value, in >function returning void [enabled by default] > return (1); > ^ > >This is easy. SQLite scalar functions are supposed to return an int >status code. That code is either SQLITE_ERR if there was an error, or >SQLITE_OK if everything is OK. So change the function definition to >return an int, and the two return statements to return SQLITE_ERR (not 1) >and SQLITE_OK (not nothing). > > >src\wrapper_functions.c:353:4: warning: implicit declaration of function >'tolower' [-Wimplicit-function-declaration] > if(strcmp(tolower(kindofoutput),"similarity")==0) { > ^ >src\wrapper_functions.c:353:4: warning: passing argument 1 of 'strcmp' >makes pointer from integer without a cast [enabled by default] >In file included from src\wrapper_functions.c:57:0: >c:\apps\mingw\include\string.h:55:37: note: expected 'const char *' but >argument is of type 'int' > _CRTIMP int __cdecl __MINGW_NOTHROW strcmp (const char*, const char*) >__MINGW_ATTRIB_PURE; > ^ >src\wrapper_functions.c:355:4: warning: passing argument 1 of 'strcmp' >makes pointer from integer without a cast [enabled by default] > } else if(strcmp(tolower(kindofoutput),"metric")==0) { > ^ >In file included from src\wrapper_functions.c:57:0: >c:\apps\mingw\include\string.h:55:37: note: expected 'const char *' but >argument is of type 'int' > _CRTIMP int __cdecl __MINGW_NOTHROW strcmp (const char*, const char*) >__MINGW_ATTRIB_PURE; > > >The "tolower" function works on an int (single character) and returns an >int (single character). It does not work on whole strings. The function >for doing a case insensitive string compare is "stricmp": > >This can be fixed by making the following changes in wrapper_functions.c: > > if(kindofoutput!=NULL) { > if(stricmp(kindofoutput,"similarity")==0) { > sqlite3_result_double(context, similarity); > } else if(stricmp(kindofoutput,"metric")==0) { > sqlite3_result_text(context, metrics, strlen(metrics)+1, >NULL); > } else { > mex = malloc(strlen(sm_name) + 200 + strlen(metrics)+1); > sprintf(mex,"%s between \"%s\" & \"%s\" is \"%s\" and >yields a %3.0f%% similarity",sm_name,par1,par2,metrics,similarity*100); > sqlite3_result_text(context, mex, strlen(mex)+1, NULL); > } > } else { > mex = malloc(strlen(sm_name) + 200 + strlen(metrics)+1); > sprintf(mex,"%s between \"%s\" & \"%s\" is \"%s\" and yields >a %3.0f%% similarity",sm_name,par1,par2,metrics,similarity*100); > sqlite3_result_text(context, mex, strlen(mex)+1, NULL); > >(basically a global search and replace for >"strcmp(tolower(kindofoutput)," and replacing it with >"stricmp(kindofoutput,") > > >Now we have left only the problem that the entirety of SQLite3 itself is >compiled into the extension. > >Since we are not compiling the extension into the core, you simply need >to use the correct header. "wrapper_functions.c" should be using >sqlite3ext.h, not sqlite3.h. You then need to add a macro to get a >reference to the sqlite3_api thus: > > >#include <sqlite3ext.h> >#include <string.h> >#include <stdlib.h> >#include <malloc.h> >#include <stddef.h> >#include "simmetrics.h" > >SQLITE_EXTENSION_INIT3 > > const int SIMMETC = 27; > > >SQLITE_EXTENSION_INIT1 creates the "sqlite3_api" pointer. >SQLITE3_EXTENSION_INIT2 initializes its value. If you need to access the >"sqlite3_api" in a source file which is "linked with" something which has >a declaration and initialization of sqlite3_api, then you just put in the >SQLITE_EXTENSION_INIT3 macro at the top of those modules. (The >definitions are at the end of sqlite3ext.h) > >You then change the compile command thusly: > > gcc -s -O3 -std=gnu99 -mdll -mthreads -Bl,--static -static-libgcc > -I src > -I src\libsimmetrics\include > -I ..\sqlite\dist > src\*.c > src\libsimmetrics\simmetrics\*.c > -o stringmetrics.dll > >where "..\sqlite\dist" is the location of the sqlite3 header files (I >point them to my own SQLite3 build directories, you can carry an extra >copy in the src/sqlite3 directory and refer to those if you prefer). >This produces a 73K extension module with no external dependancies (other >than to the MSVCRT.DLL subsystem runtime library) and produces no >diagnostic output. > >I added the -mthreads because I presume this may be used in a multithread >environment. It added no linkage to the thread library code, so I assume >the base functions used were already thread-safe (or could not be made >so). I haven't looked into which is the case. > >After these changes we get the following (on Win81 x64, with the current >MingW 32-bit compiler) (slightly reformatted to fit your screen): > >2014-09-28 14:05:30 [D:\Source\libstringmetrics-master] >>gcc -s -O3 -std=gnu99 -mdll -mthreads -Bl,--static -static-libgcc > -I src > -I src\libsimmetrics\include > -I ..\sqlite\dist > src\*.c > src\libsimmetrics\simmetrics\*.c > -o stringmetrics.dll > >and comparing this extension to the original included in the distribution >(I stripped it, so it is smaller than the one in the distribution because >the internal symbol table is gone) > >2014-09-28 14:05:33 [D:\Source\libstringmetrics-master] >>dir *.dll > >2014-09-28 12:57 769,038 libstringmetrics.dll >2014-09-28 14:05 75,776 stringmetrics.dll > >and running it: > >2014-09-28 13:55:55 [D:\Source\libstringmetrics-master] >>sqlite >SQLite version 3.8.7 2014-09-26 18:30:11 >Enter ".help" for usage hints. >Connected to a transient in-memory database. >Use ".open FILENAME" to reopen on a persistent database. >sqlite> .load stringmetrics >sqlite> .echo on >sqlite> .read test.sql >.read test.sql >select load_extension("libstringmetrics.dll"); > >select stringmetrics("block_distance_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Block Distance customized between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "0" and yields a 100% similarity >select stringmetrics("cosine_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Cosine Similarity customized between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "1.000000" and yields a 100% similarity >select stringmetrics("dice_custom","phrase","via giuseppe-garibaldi,25", >"via giuseppe garibaldi 25",",-"); >Dice Similarity customized between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "1.000000" and yields a 100% similarity >select stringmetrics("euclidean_distance","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Euclidean Distance between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "2.00" and yields a 55% similarity >select stringmetrics("euclidean_distance_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Euclidean Distance customized between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "0" and yields a 100% similarity >select stringmetrics("jaccard","phrase","via giuseppe-garibaldi,25", "via >giuseppe garibaldi 25",",-"); >Jaccard Similarity between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "0.200000" and yields a 20% similarity >select stringmetrics("jaccard_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Jaccard Similarity customized between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "1.000000" and yields a 100% similarity >select stringmetrics("jaro","phrase","via giuseppe-garibaldi,25", "via >giuseppe garibaldi 25",",-"); >Jaro Similarity between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "0.920000" and yields a 92% similarity >select stringmetrics("jaro_winkler","phrase","via giuseppe-garibaldi,25", >"via giuseppe garibaldi 25",",-"); >Jaro Winkler Similarity between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "0.968000" and yields a 97% similarity >select stringmetrics("levenshtein","phrase","via giuseppe-garibaldi,25", >"via giuseppe garibaldi 25",",-"); >Levenshtein Distance between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "2" and yields a 92% similarity >select stringmetrics("matching_coefficient","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Matching Coefficient SimMetrics between "via giuseppe-garibaldi,25" & >"via giuseppe garibaldi 25" is "1.00" and yields a 25% similarity >select stringmetrics("matching_coefficient_custom","phrase","via >giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-"); >Matching Coefficient SimMetrics customized between "via giuseppe- >garibaldi,25" & "via giuseppe garibaldi 25" is "4.00" and yields a 100% >sim >ilarity >select stringmetrics("monge_elkan","phrase","via giuseppe-garibaldi,25", >"via giuseppe garibaldi 25",",-"); >Monge Elkan Similarity between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "1.012500" and yields a 101% similarity >select stringmetrics("monge_elkan_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Matching Coefficient SimMetrics customized STILL NOT IMPLEMENTED between >"via giuseppe-garibaldi,25" & "via giuseppe garibaldi 25" is "still > not implemented" and yields a 0% similarity >select stringmetrics("needleman_wunch","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Needleman Wunch SimMetrics between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "2.00" and yields a 96% similarity >select stringmetrics("overlap_coefficient","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Overlap Coefficient Similarity between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "0.500000" and yields a 50% similarity >select stringmetrics("overlap_coefficient_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Overlap Coefficient Similarity customized between "via giuseppe- >garibaldi,25" & "via giuseppe garibaldi 25" is "1.000000" and yields a >100% >similarity >select stringmetrics("qgrams_distance","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >QGrams Distance between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "12" and yields a 78% similarity >select stringmetrics("qgrams_distance_custom","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >QGrams Distance customized between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "0" and yields a 100% similarity >select stringmetrics("smith_waterman","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Smith Waterman SimMetrics between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "21.00" and yields a 84% similarity >select stringmetrics("smith_waterman_gotoh","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Smith Waterman Gotoh SimMetrics between "via giuseppe-garibaldi,25" & >"via giuseppe garibaldi 25" is "109.00" and yields a 87% similarity >select stringmetrics("soundex_phonetics","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Soundex Phonetics between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "V221 & V221" and yields a 100% similarity >select stringmetrics("metaphone_phonetics","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Metaphone Phonetics between "via giuseppe-garibaldi,25" & "via giuseppe >garibaldi 25" is "FJSP & FJSP" and yields a 100% similarity >select stringmetrics("double_metaphone_phonetics","phrase","via giuseppe- >garibaldi,25", "via giuseppe garibaldi 25",",-"); >Double Metaphone Phonetics between "via giuseppe-garibaldi,25" & "via >giuseppe garibaldi 25" is "FJSP & FJSP" and yields a 100% similarity > >sqlite> > > > >>-----Original Message----- >>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >>boun...@sqlite.org] On Behalf Of Andrea Peri >>Sent: Sunday, 28 September, 2014 02:53 >>To: Gert Van Assche; General Discussion of SQLite Database >>Subject: Re: [sqlite] A new extension for sqlite to analyze the >>stringmetrics >> >>You should use SQLite 32bit >>Il 28/set/2014 10:45 "Gert Van Assche" <ger...@gmail.com> ha scritto: >> >>> Thanks Andrea. >>> When I download the DLL I get exactly the same error. >>> I'm using the 32bit SQLite3.exe on a Win 64 bit machine. >>> Could that cause the error? >>> >>> thanks >>> >>> gert >>> >>> 2014-09-27 20:27 GMT+02:00 Andrea Peri <aperi2...@gmail.com>: >>> >>>> https://github.com/aperi2007/libstringmetrics >>>> >>>> >>>> >Andrea, where do I find it? >>>> > >>>> >thanks >>>> > >>>> >gert >>>> >>>> >>>> >>>> -- >>>> ----------------- >>>> Andrea Peri >>>> . . . . . . . . . >>>> qwerty àèìòù >>>> ----------------- >>>> >>> >>> >>_______________________________________________ >>sqlite-users mailing list >>sqlite-users@sqlite.org >>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > >_______________________________________________ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users