dr0ptp4kt added a comment.
Good news. With the N-triples style scholarly entity graph files, with a buffer capacity of 100000**0**, a write retention queue capacity of 4000, and a heap size of 31g, on the gaming-class desktop, it took about 2.40 days. Recall that with buffer capacity of 100000 it took about 3.25 days on this desktop (and again, recall that it was 5.875 days on wdqs1024). So, there was about a 35% (1.35 minus 1) speed increase with the higher buffer capacity here on this gaming-class desktop. It appears then that the combination of faster CPU, NVMe, and a higher buffer capacity is somewhere around 144% (5.875 / 2.40 = 2.44, 2.44 minus 1 = 1.44) faster than what we observed on a target data center machine. It will likely be somewhat less dramatic on 10B triples if the previous munged file runs are any clue. I'm going to think on how to check this notion - it could be done by using the scholarly graph plus a portion of the main graph, which would be probably close enough for our purposes. A high speed NVMe is in the process of being acquired so that we can verify on wdqs2024 the level of speedup achieved on a server similar to what was used for the graph split test servers. wdqs2024 has a hardware profile similar to wdqs1024 at present. Some stuff from the terminal from the import on the gaming-class desktop: ubuntu22:~$ head -9 ~/rdf/dist/target/service-0.3.138-SNAPSHOT/loadData.log Sun Apr 7 12:03:19 PM CDT 2024 Processing part-00000-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"><title>blazegraph™ by SYSTAP</title ></head ><body<p>totalElapsed=64069ms, elapsed=64024ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p ><hr><p>COMMIT: totalElapsed=71897ms, commitTime=1712509470732, mutationCount=7349689</p ></html >Sun Apr 7 12:04:31 PM CDT 2024 Processing part-00001-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz # screen output at the end: Processing part-01023-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"><title>blazegraph™ by SYSTAP</title ></head ><body<p>totalElapsed=51703ms, elapsed=51703ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p ><hr><p>COMMIT: totalElapsed=181013ms, commitTime=1712716306763, mutationCount=7946575</p ></html >Tue Apr 9 09:31:50 PM CDT 2024 File /mnt/firehose/split_0/nt_wd_schol/part-01024-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz not found, terminating real 3447m18.542s TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: ssingh, bking, dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org