On Jun 20, 2019, at 08:37 AM, Adam Sanchez <a.sanche...@gmail.com> wrote: > > For your information > > ... > b) It took 43 hours to load the Wikidata RDF dump > (wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso > 07.20.3230. > I had to patch Virtuoso because it was given the following error each > time I load the RDF data > > 09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error > 42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry > RDF type and a non-geometry content > > The virtuoso.db file turned to be 340G. > > Server technical features > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 12 > On-line CPU(s) list: 0-11 > Thread(s) per core: 2 > Core(s) per socket: 6 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 63 > Model name: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz > Stepping: 2 > CPU MHz: 1199.920 > CPU max MHz: 3800.0000 > CPU min MHz: 1200.0000 > BogoMIPS: 6984.39 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 15360K > NUMA node0 CPU(s): 0-11 > RAM: 128G > > Best,
Hi, Adam -- We're quite interested in the time your Wikidata load took on Virtuoso, as it seems rather slow, given our experience with other large (and much larger!) data sets. The hardware information you provided focused primarily on the processors -- but RAM and disk details are much more important to data loads. Also, there are some significant Virtuoso configuration settings (in the INI file) which have an impact. We'd like to get the info that would let us fill in the blanks on this spreadsheet (itself a work in progress), so we can do some analysis, and likely provide some tuning hints that would bring the Virtuoso Wikidata load time down significantly. https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5OITtrbFw/edit?usp=sharing You can see the settings in use for some other deployments, on the "Current" tab, which may in themselves show you some places you could improve things immediately. Last, we would appreciate knowing exactly what you patched to get around the geodata error, as there are a few open issues along those lines, which are also works in progress. Thanks, Ted -- A: Yes. http://www.idallen.com/topposting.html | Q: Are you sure? | | A: Because it reverses the logical flow of conversation. | | | Q: Why is top posting frowned upon? Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 Senior Support & Evangelism // mailto:tthibod...@openlinksw.com // http://twitter.com/TallTed OpenLink Software, Inc. // http://www.openlinksw.com/ 20 Burlington Mall Road, Suite 322, Burlington MA 01803 Weblog -- http://www.openlinksw.com/blogs/ Community -- https://community.openlinksw.com/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata