Re: [Wikidata] Wikidata HDT dump

Osma Suominen Thu, 02 Nov 2017 07:00:32 -0700

Laura Morales kirjoitti 02.11.2017 klo 15:54:

The tool is in the hdt-jena package (not hdt-java-cli where the other

command line tools reside), since it uses parts of Jena (e.g. ARQ).

There is a wrapper script called hdtsparql.sh for executing it with the

proper Java environment.


Does this tool work nicely with large HDT files such as wikidata? Or does it 
need to load the whole graph+index into memory?

I haven't tested it with huge datasets like Wikidata. But for themoderately sized (40M triples) data that I use it for, it runs prettyfast and without using lots of memory, so I think it just memory mapsthe hdt and index file and reads only what it needs to answer the query.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata HDT dump

Reply via email to