> How can I speed up the queries processing even more? imho: drop the unwanted data as early as you can ... ( ~ aggressive prefiltering ; ~ not import )
> Any suggestion will be appreciated. in your case .. - I will check the RDF dumps .. https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps - I will try to write a custom filter for pre-filter for 2 million parameters ... ( simple text parsing .. in GoLang; using multiple cores ... or with other fast code ) - and just load the results to PostgreSQL .. I have a good experience - parsing the and filtering the wikidata json dump (gzipped) .. and loading the result to PostgreSQL database .. I can run the full code on my laptop .... and the result in my case ~ 12 GB in the PostgreSQL ... the biggest problem .. the memory requirements of "2 million parameters" .. but you can choose some fast key-value storage .. like RocksDB ... but there are other low tech parsing solutions ... Regards, Imre Best, Imre Adam Sanchez <a.sanche...@gmail.com> ezt írta (időpont: 2020. júl. 13., H, 19:42): > Hi, > > I have to launch 2 million queries against a Wikidata instance. > I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with > RAID 0). > The queries are simple, just 2 types. > > select ?s ?p ?o { > ?s ?p ?o. > filter (?s = ?param) > } > > select ?s ?p ?o { > ?s ?p ?o. > filter (?o = ?param) > } > > If I use a Java ThreadPoolExecutor takes 6 hours. > How can I speed up the queries processing even more? > > I was thinking : > > a) to implement a Virtuoso cluster to distribute the queries or > b) to load Wikidata in a Spark dataframe (since Sansa framework is > very slow, I would use my own implementation) or > c) to load Wikidata in a Postgresql table and use Presto to distribute > the queries or > d) to load Wikidata in a PG-Strom table to use GPU parallelism. > > What do you think? I am looking for ideas. > Any suggestion will be appreciated. > > Best, > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata