I've made a Vapi for "Aho–Corasick string matching algorithm".
Lib can be taken from here: http://sourceforge.net/projects/multifast/files/
Need modifications of Makefile to make shared lib:
    add line to the head:
        SONAME := libahocorasick.so.$(ACVERSION)
    add to CFLAGS:
    add section:
        so: aho_corasick.o node.o
            gcc -shared -Wl,-soname,$(SONAME) -o $(SONAME) aho_corasick.o node.o
            ln -s -f $(SONAME) libahocorasick.so

    run "make so".

What is the advantage on using this library:
Get  substrings  found  in  given  string  (for example: tags in text,
domains/keywords in uri, etc.)

My benchmarks on i7:
500 000 substrings: index time - 15 sec., in memory 1.7Gb, average key
length 32 chars.
10 000 keys check: overall time 0.048 sec.

Example of use with: --pkg ahocorasick -X -lahocorasick :

using AC;

public static int match_handler (Match m, void * param)
       uint j;
       for (j=0; j < m.match_num; j++)

                                stdout.printf("%ld ", m.position);
                                stdout.printf("%ld ", m.matched_strings[j].id);
                                stdout.printf("%s ", m.matched_strings[j].str);

       return 0; /* Find all matches */

public void main (string[] args)
       var aca = AC.Automata (match_handler);

       var str = AC.String () {
           id = 1,
           str = "test",
           length = 4 // "str".length should be passed here
       aca.add_string (str);

       aca.build();  // this build an index, before it's done - search
       can't be executed

       var str = AC.String () {
           str = "tes",
           length = "tes".length
       aca.search (str, null);
       aca.reset(); // reset AC instance

Attachment: ahocorasick.vapi
Description: Binary data

vala-list mailing list

Reply via email to