On 08.06.2016 13:34, Satya Gadepalli wrote:
I want to look up concepts and entities by their name even if it
contains typos or omissions in wiki data.

Can I do this using Wikidata-Toolkit?

No, there is no error-tolerant string matching function in there. If no other tool can help you, Wikidata Toolkit could be used to get access to all labels and aliases, so that you can run the (slow) search yourself (deciding for each label whether you like it or not depending on custom code). But this is not the same as a live search interface.


Can I use achieve using sparql query from the web interface?

SPARQL has several string matching functions available, including a general regular expression matching. Running regular expressions over all labels and aliases for a big language will still take time, possibly too long for the timeout (since there is no easy way to index for arbitrary regexps). However, specialised regexps, such as search words by their initial letters, is quite fast. See the example query for "Rockbands starting with M" for illustration.

If nothing else help you, you could load the relevant data into a more specialised string searching database such as Lucene. Wikidata Toolkit can parse the dumps for you in this case, so you don't have to implement the whole dump file decompression and parsing, but you would have to write code to fill your DB. This would only give you a static version of the data yet; if you want live updates, this is more work.

Regards,

Markus



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to