dr0ptp4kt added a comment.

  Good news. With the N-triples style scholarly entity graph files, with a 
buffer capacity of 100000**0**, a write retention queue capacity of 4000, and a 
heap size of 31g, on the gaming-class desktop, it took about 2.40 days. Recall 
that with buffer capacity of 100000 it took about 3.25 days on this desktop 
(and again, recall that it was 5.875 days on wdqs1024). So, there was about a 
35% (1.35 minus 1) speed increase with the higher buffer capacity here on this 
gaming-class desktop.
  
  It appears then that the combination of faster CPU, NVMe, and a higher buffer 
capacity is somewhere around 144% (5.875 / 2.40 = 2.44, 2.44 minus 1 = 1.44) 
faster than what we observed on a target data center machine.
  
  It will likely be somewhat less dramatic on 10B triples if the previous 
munged file runs are any clue. I'm going to think on how to check this notion - 
it could be done by using the scholarly graph plus a portion of the main graph, 
which would be probably close enough for our purposes.
  
  A high speed NVMe is in the process of being acquired so that we can verify 
on wdqs2024 the level of speedup achieved on a server similar to what was used 
for the graph split test servers. wdqs2024 has a hardware profile similar to 
wdqs1024 at present.
  
  Some stuff from the terminal from the import on the gaming-class desktop:
  
    ubuntu22:~$ head -9 ~/rdf/dist/target/service-0.3.138-SNAPSHOT/loadData.log
    Sun Apr  7 12:03:19 PM CDT 2024
    Processing part-00000-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";><html><head><meta 
http-equiv="Content-Type" 
content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
    ></head
    ><body<p>totalElapsed=64069ms, elapsed=64024ms, connFlush=0ms, 
batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
    ><hr><p>COMMIT: totalElapsed=71897ms, commitTime=1712509470732, 
mutationCount=7349689</p
    ></html
    >Sun Apr  7 12:04:31 PM CDT 2024
    Processing part-00001-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
    
    # screen output at the end:
    
    Processing part-01023-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";><html><head><meta 
http-equiv="Content-Type" 
content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title
    ></head
    ><body<p>totalElapsed=51703ms, elapsed=51703ms, connFlush=0ms, 
batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p
    ><hr><p>COMMIT: totalElapsed=181013ms, commitTime=1712716306763, 
mutationCount=7946575</p
    ></html
    >Tue Apr  9 09:31:50 PM CDT 2024
    File 
/mnt/firehose/split_0/nt_wd_schol/part-01024-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz
 not found, terminating
    
    real    3447m18.542s

TASK DETAIL
  https://phabricator.wikimedia.org/T359062

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dr0ptp4kt
Cc: ssingh, bking, dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to