On 08.06.2016 13:34, Satya Gadepalli wrote:
I want to look up concepts and entities by their name even if it
contains typos or omissions in wiki data.
Can I do this using Wikidata-Toolkit?
No, there is no error-tolerant string matching function in there. If no
other tool can help you, Wikidata Toolkit could be used to get access to
all labels and aliases, so that you can run the (slow) search yourself
(deciding for each label whether you like it or not depending on custom
code). But this is not the same as a live search interface.
Can I use achieve using sparql query from the web interface?
SPARQL has several string matching functions available, including a
general regular expression matching. Running regular expressions over
all labels and aliases for a big language will still take time, possibly
too long for the timeout (since there is no easy way to index for
arbitrary regexps). However, specialised regexps, such as search words
by their initial letters, is quite fast. See the example query for
"Rockbands starting with M" for illustration.
If nothing else help you, you could load the relevant data into a more
specialised string searching database such as Lucene. Wikidata Toolkit
can parse the dumps for you in this case, so you don't have to implement
the whole dump file decompression and parsing, but you would have to
write code to fill your DB. This would only give you a static version of
the data yet; if you want live updates, this is more work.
Regards,
Markus
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata