Andrea, here are some problems and their solutions:
(1) Where do you get a "tolower" function that takes a char* and returns a
char*? The one in ctypes.h works a character at a time (thus takes an int and
returns an int).
(2) Why is the entire sqlite3 engine included in an dll which is loaded as an
extension to sqlite3?
Please follow along and I will show you where we can find and fix these issues
... this might be helpful to other extension writers as well.
First, the -std=c99 needs to be -std=gnu99 to permit the gnu extension
functions to be recognized.
Without the GNU extensions a bunch of non-ansi names are not recognized because
c99 implies -ansi. Once this change is made 99% of the errors go away.
src\wrapper_functions.c: In function 'stringmetricsFunc':
src\wrapper_functions.c:350:16: warning: 'return' with a value, in function
returning void [enabled by default]
return (1);
^
This is easy. SQLite scalar functions are supposed to return an int status
code. That code is either SQLITE_ERR if there was an error, or SQLITE_OK if
everything is OK. So change the function definition to return an int, and the
two return statements to return SQLITE_ERR (not 1) and SQLITE_OK (not nothing).
src\wrapper_functions.c:353:4: warning: implicit declaration of function
'tolower' [-Wimplicit-function-declaration]
if(strcmp(tolower(kindofoutput),"similarity")==0) {
^
src\wrapper_functions.c:353:4: warning: passing argument 1 of 'strcmp' makes
pointer from integer without a cast [enabled by default]
In file included from src\wrapper_functions.c:57:0:
c:\apps\mingw\include\string.h:55:37: note: expected 'const char *' but
argument is of type 'int'
_CRTIMP int __cdecl __MINGW_NOTHROW strcmp (const char*, const char*)
__MINGW_ATTRIB_PURE;
^
src\wrapper_functions.c:355:4: warning: passing argument 1 of 'strcmp' makes
pointer from integer without a cast [enabled by default]
} else if(strcmp(tolower(kindofoutput),"metric")==0) {
^
In file included from src\wrapper_functions.c:57:0:
c:\apps\mingw\include\string.h:55:37: note: expected 'const char *' but
argument is of type 'int'
_CRTIMP int __cdecl __MINGW_NOTHROW strcmp (const char*, const char*)
__MINGW_ATTRIB_PURE;
The "tolower" function works on an int (single character) and returns an int
(single character). It does not work on whole strings. The function for doing
a case insensitive string compare is "stricmp":
This can be fixed by making the following changes in wrapper_functions.c:
if(kindofoutput!=NULL) {
if(stricmp(kindofoutput,"similarity")==0) {
sqlite3_result_double(context, similarity);
} else if(stricmp(kindofoutput,"metric")==0) {
sqlite3_result_text(context, metrics, strlen(metrics)+1, NULL);
} else {
mex = malloc(strlen(sm_name) + 200 + strlen(metrics)+1);
sprintf(mex,"%s between \"%s\" & \"%s\" is \"%s\" and yields a
%3.0f%% similarity",sm_name,par1,par2,metrics,similarity*100);
sqlite3_result_text(context, mex, strlen(mex)+1, NULL);
}
} else {
mex = malloc(strlen(sm_name) + 200 + strlen(metrics)+1);
sprintf(mex,"%s between \"%s\" & \"%s\" is \"%s\" and yields a
%3.0f%% similarity",sm_name,par1,par2,metrics,similarity*100);
sqlite3_result_text(context, mex, strlen(mex)+1, NULL);
(basically a global search and replace for "strcmp(tolower(kindofoutput)," and
replacing it with "stricmp(kindofoutput,")
Now we have left only the problem that the entirety of SQLite3 itself is
compiled into the extension.
Since we are not compiling the extension into the core, you simply need to use
the correct header. "wrapper_functions.c" should be using sqlite3ext.h, not
sqlite3.h. You then need to add a macro to get a reference to the sqlite3_api
thus:
#include <sqlite3ext.h>
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <stddef.h>
#include "simmetrics.h"
SQLITE_EXTENSION_INIT3
const int SIMMETC = 27;
SQLITE_EXTENSION_INIT1 creates the "sqlite3_api" pointer.
SQLITE3_EXTENSION_INIT2 initializes its value. If you need to access the
"sqlite3_api" in a source file which is "linked with" something which has a
declaration and initialization of sqlite3_api, then you just put in the
SQLITE_EXTENSION_INIT3 macro at the top of those modules. (The definitions are
at the end of sqlite3ext.h)
You then change the compile command thusly:
gcc -s -O3 -std=gnu99 -mdll -mthreads -Bl,--static -static-libgcc
-I src
-I src\libsimmetrics\include
-I ..\sqlite\dist
src\*.c
src\libsimmetrics\simmetrics\*.c
-o stringmetrics.dll
where "..\sqlite\dist" is the location of the sqlite3 header files (I point
them to my own SQLite3 build directories, you can carry an extra copy in the
src/sqlite3 directory and refer to those if you prefer). This produces a 73K
extension module with no external dependancies (other than to the MSVCRT.DLL
subsystem runtime library) and produces no diagnostic output.
I added the -mthreads because I presume this may be used in a multithread
environment. It added no linkage to the thread library code, so I assume the
base functions used were already thread-safe (or could not be made so). I
haven't looked into which is the case.
After these changes we get the following (on Win81 x64, with the current MingW
32-bit compiler) (slightly reformatted to fit your screen):
2014-09-28 14:05:30 [D:\Source\libstringmetrics-master]
>gcc -s -O3 -std=gnu99 -mdll -mthreads -Bl,--static -static-libgcc
-I src
-I src\libsimmetrics\include
-I ..\sqlite\dist
src\*.c
src\libsimmetrics\simmetrics\*.c
-o stringmetrics.dll
and comparing this extension to the original included in the distribution (I
stripped it, so it is smaller than the one in the distribution because the
internal symbol table is gone)
2014-09-28 14:05:33 [D:\Source\libstringmetrics-master]
>dir *.dll
2014-09-28 12:57 769,038 libstringmetrics.dll
2014-09-28 14:05 75,776 stringmetrics.dll
and running it:
2014-09-28 13:55:55 [D:\Source\libstringmetrics-master]
>sqlite
SQLite version 3.8.7 2014-09-26 18:30:11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load stringmetrics
sqlite> .echo on
sqlite> .read test.sql
.read test.sql
select load_extension("libstringmetrics.dll");
select stringmetrics("block_distance_custom","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Block Distance customized between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "0" and yields a 100% similarity
select stringmetrics("cosine_custom","phrase","via giuseppe-garibaldi,25", "via
giuseppe garibaldi 25",",-");
Cosine Similarity customized between "via giuseppe-garibaldi,25" & "via
giuseppe garibaldi 25" is "1.000000" and yields a 100% similarity
select stringmetrics("dice_custom","phrase","via giuseppe-garibaldi,25", "via
giuseppe garibaldi 25",",-");
Dice Similarity customized between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "1.000000" and yields a 100% similarity
select stringmetrics("euclidean_distance","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
Euclidean Distance between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "2.00" and yields a 55% similarity
select stringmetrics("euclidean_distance_custom","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Euclidean Distance customized between "via giuseppe-garibaldi,25" & "via
giuseppe garibaldi 25" is "0" and yields a 100% similarity
select stringmetrics("jaccard","phrase","via giuseppe-garibaldi,25", "via
giuseppe garibaldi 25",",-");
Jaccard Similarity between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "0.200000" and yields a 20% similarity
select stringmetrics("jaccard_custom","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
Jaccard Similarity customized between "via giuseppe-garibaldi,25" & "via
giuseppe garibaldi 25" is "1.000000" and yields a 100% similarity
select stringmetrics("jaro","phrase","via giuseppe-garibaldi,25", "via giuseppe
garibaldi 25",",-");
Jaro Similarity between "via giuseppe-garibaldi,25" & "via giuseppe garibaldi
25" is "0.920000" and yields a 92% similarity
select stringmetrics("jaro_winkler","phrase","via giuseppe-garibaldi,25", "via
giuseppe garibaldi 25",",-");
Jaro Winkler Similarity between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "0.968000" and yields a 97% similarity
select stringmetrics("levenshtein","phrase","via giuseppe-garibaldi,25", "via
giuseppe garibaldi 25",",-");
Levenshtein Distance between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "2" and yields a 92% similarity
select stringmetrics("matching_coefficient","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Matching Coefficient SimMetrics between "via giuseppe-garibaldi,25" & "via
giuseppe garibaldi 25" is "1.00" and yields a 25% similarity
select stringmetrics("matching_coefficient_custom","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Matching Coefficient SimMetrics customized between "via giuseppe-garibaldi,25"
& "via giuseppe garibaldi 25" is "4.00" and yields a 100% sim
ilarity
select stringmetrics("monge_elkan","phrase","via giuseppe-garibaldi,25", "via
giuseppe garibaldi 25",",-");
Monge Elkan Similarity between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "1.012500" and yields a 101% similarity
select stringmetrics("monge_elkan_custom","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
Matching Coefficient SimMetrics customized STILL NOT IMPLEMENTED between "via
giuseppe-garibaldi,25" & "via giuseppe garibaldi 25" is "still
not implemented" and yields a 0% similarity
select stringmetrics("needleman_wunch","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
Needleman Wunch SimMetrics between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "2.00" and yields a 96% similarity
select stringmetrics("overlap_coefficient","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Overlap Coefficient Similarity between "via giuseppe-garibaldi,25" & "via
giuseppe garibaldi 25" is "0.500000" and yields a 50% similarity
select stringmetrics("overlap_coefficient_custom","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Overlap Coefficient Similarity customized between "via giuseppe-garibaldi,25" &
"via giuseppe garibaldi 25" is "1.000000" and yields a 100%
similarity
select stringmetrics("qgrams_distance","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
QGrams Distance between "via giuseppe-garibaldi,25" & "via giuseppe garibaldi
25" is "12" and yields a 78% similarity
select stringmetrics("qgrams_distance_custom","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
QGrams Distance customized between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "0" and yields a 100% similarity
select stringmetrics("smith_waterman","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
Smith Waterman SimMetrics between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "21.00" and yields a 84% similarity
select stringmetrics("smith_waterman_gotoh","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Smith Waterman Gotoh SimMetrics between "via giuseppe-garibaldi,25" & "via
giuseppe garibaldi 25" is "109.00" and yields a 87% similarity
select stringmetrics("soundex_phonetics","phrase","via giuseppe-garibaldi,25",
"via giuseppe garibaldi 25",",-");
Soundex Phonetics between "via giuseppe-garibaldi,25" & "via giuseppe garibaldi
25" is "V221 & V221" and yields a 100% similarity
select stringmetrics("metaphone_phonetics","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Metaphone Phonetics between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "FJSP & FJSP" and yields a 100% similarity
select stringmetrics("double_metaphone_phonetics","phrase","via
giuseppe-garibaldi,25", "via giuseppe garibaldi 25",",-");
Double Metaphone Phonetics between "via giuseppe-garibaldi,25" & "via giuseppe
garibaldi 25" is "FJSP & FJSP" and yields a 100% similarity
sqlite>
>-----Original Message-----
>From: [email protected] [mailto:sqlite-users-
>[email protected]] On Behalf Of Andrea Peri
>Sent: Sunday, 28 September, 2014 02:53
>To: Gert Van Assche; General Discussion of SQLite Database
>Subject: Re: [sqlite] A new extension for sqlite to analyze the
>stringmetrics
>
>You should use SQLite 32bit
>Il 28/set/2014 10:45 "Gert Van Assche" <[email protected]> ha scritto:
>
>> Thanks Andrea.
>> When I download the DLL I get exactly the same error.
>> I'm using the 32bit SQLite3.exe on a Win 64 bit machine.
>> Could that cause the error?
>>
>> thanks
>>
>> gert
>>
>> 2014-09-27 20:27 GMT+02:00 Andrea Peri <[email protected]>:
>>
>>> https://github.com/aperi2007/libstringmetrics
>>>
>>>
>>> >Andrea, where do I find it?
>>> >
>>> >thanks
>>> >
>>> >gert
>>>
>>>
>>>
>>> --
>>> -----------------
>>> Andrea Peri
>>> . . . . . . . . .
>>> qwerty àèìòù
>>> -----------------
>>>
>>
>>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users