Hi Johannes,

thank you for bringing the issue to this mailinglist again.

At
https://stackoverflow.com/questions/61813248/jena-tdbloader-performance-and-limits
there is a question describing the issue and at
http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData#Test_with_Apache_Jena
a documentation of my own attempts. There has been some feedback by a
few people in the mean time but i have no report of a success yet. Also
the only hints to achieve better performance are currently related to
RAM and disk so using lots of RAM (up to 2 Terrrabyte) and SSDs (also
some 2 Terrabyte) was mentioned. I asked at my local IT center and the
machine with such RAM is around 30-60 thousand EUR and definitely out of
my budget. I might invest in a 200 EUR 2 Terrabyte SSD if i could be
sure that this would solve the problem. At this time i doubt it since
the software keeps crashing on me and there seem to be bugs in Operating
System, Java Virtual Machine and Jena itself that prevent the success as
well as the severe degradation in performance for multi-billion triple
imports that make it almost impossible to test given a estimated time of
finish of half a year on (old but sophisticated) hardware that i am
using daily.

Cheers
  Wolfgang

Am 08.06.20 um 17:54 schrieb Hoffart, Johannes:
> Hi,
>
> I want to load the full Wikidata dump, available at 
> https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 to use 
> in Jena.
>
> I tried it using the tdb2.tdbloader with $JVM_ARGS set to -Xmx120G. 
> Initially, the progress (measured by dataset size) is quick. It slows down 
> very much after a couple of 100GB written, and finally, at around 500GB, the 
> progress is almost halted.
>
> Did anyone ingest Wikidata into Jena before? What are the system 
> requirements? Is there a specific tdb2.tdbloader configuration that would 
> speed things up? For example building an index after data ingest?
>
> Thanks
> Johannes
>
> Johannes Hoffart, Executive Director, Technology Division
> Goldman Sachs Bank Europe SE | Marienturm | Taunusanlage 9-10 | D-60329 
> Frankfurt am Main
> Email: johannes.hoff...@gs.com<mailto:johannes.hoff...@gs.com> | Tel: +49 
> (0)69 7532 3558
> Vorstand: Dr. Wolfgang Fink (Vorsitzender) | Thomas Degn-Petersen | Dr. 
> Matthias Bock
> Vorsitzender des Aufsichtsrats: Dermot McDonogh
> Sitz: Frankfurt am Main | Amtsgericht Frankfurt am Main HRB 114190
>
>
> ________________________________
>
> Your Personal Data: We may collect and process information about you that may 
> be subject to data protection laws. For more information about how we use and 
> disclose your personal data, how we protect your information, our legal basis 
> to use your information, your rights and who you can contact, please refer 
> to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>
>
-- 

BITPlan - smart solutions
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de
BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, 
Geschäftsführer: Wolfgang Fahl 


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to