[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

dr0ptp4kt Wed, 03 Apr 2024 14:15:21 -0700

dr0ptp4kt added a comment.


  This morning of April 3 around 6:25 AM I had SSH'd to check progress, and it 
was working, but going slowly, similar to the day before. It was on a file 
number in the 1200s, but I didn't write down the number or copy terminal 
output; I do remember seeing it was taking around 796 seconds for one of the 
files at that time. Look at the previous comment, you'll see those were going 
slow; not surprising as we know imports on these munged files are slower upon 
more stuff is imported.
  
  I checked several hours later in the middle of a meeting, and it had gone 
into a bad spiral.
  
  I've been able to use `screen` backscrolling to obtain much of the stack 
trace, but could not backscroll to a point of having all of the information to 
tell when the last successful file imported without a stack trace for sure. 
What we can say is that //probably// the last somewhat stable commit was on 
file 1302 at about 7:24 AM. And probably file 1303 and definitely 1304 and 1305 
have been failing badly and taking a really long time in doing so; this would 
probably continue indefinitely from here without killing the process. Just a 
slice of the paste here to give an idea of things (notice `lastCommitTime` and 
`commitCounter` in the stack trace).
  
    Wed Apr  3 02:05:26 PM CDT 2024
    Processing wikidump-000001305.ttl.gz
    SPARQL-UPDATE: updateStr=LOAD 
<file:///mnt/firehose/munge_on_later_data_set/wikidump-000001305.ttl.gz>
    java.util.concurrent.ExecutionException: 
java.util.concurrent.ExecutionException: 
org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem 
    with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, 
challisField=1302, version=3, nextOffset=47806576684846562, 
localTime=1712147044389 [Wedne
    sday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, 
March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 
3, 2024
     7:24:01 AM CDT], commitCounter=1302, 
commitRecordAddr={off=NATIVE:-140859033,len=422}, 
commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555,
     quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, 
storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, 
checksum=16003356
    92, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], 
closeTime=0}
  
  Unfortunately `jstack` seems to hiccup.
  
    ubuntu22:~$ sudo jstack -m 987870
    [sudo] password: 
    Attaching to process ID 987870, please wait...
    Debugger attached successfully.
    Server compiler detected.
    JVM version is 25.402-b06
    Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
        at sun.tools.jstack.JStack.main(JStack.java:106)
    Caused by: java.lang.RuntimeException: Unable to deduce type of thread from 
address 0x00007fecb400b800 (expected type JavaThread, CompilerThread, 
ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
        at 
sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:169)
        at sun.jvm.hotspot.runtime.Threads.first(Threads.java:153)
        at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:200)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:71)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:58)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:53)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
        at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
        ... 6 more
    Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for 
type of address 0x00007fecb400b800
        at 
sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
        at 
sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
        at 
sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:165)
        ... 16 more
    ubuntu22:~$ sudo jstack -Flm 987870
    Usage:
        jstack [-l] <pid>
            (to connect to running process)
        jstack -F [-m] [-l] <pid>
            (to connect to a hung process)
        jstack [-m] [-l] <executable> <core>
            (to connect to a core file)
        jstack [-m] [-l] [server_id@]<remote server IP or hostname>
            (to connect to a remote debug server)
    
    Options:
        -F  to force a thread dump. Use when jstack <pid> does not respond 
(process is hung)
        -m  to print both java and native frames (mixed mode)
        -l  long listing. Prints additional information about locks
        -h or -help to print this help message
    ubuntu22:~$ sudo jstack -F -l -m 987870
    Attaching to process ID 987870, please wait...
    Debugger attached successfully.
    Server compiler detected.
    JVM version is 25.402-b06
    Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
        at sun.tools.jstack.JStack.main(JStack.java:106)
    Caused by: java.lang.RuntimeException: Unable to deduce type of thread from 
address 0x00007fecb400b800 (expected type JavaThread, CompilerThread, 
ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
        at 
sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:169)
        at sun.jvm.hotspot.runtime.Threads.first(Threads.java:153)
        at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:200)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:71)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:58)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:53)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
        at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
        ... 6 more
    Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for 
type of address 0x00007fecb400b800
        at 
sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
        at 
sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
        at 
sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:165)
        ... 16 more
  
  The number of files imported between the previous comment at 2:39 PM 
yesterday and 7:24 AM today is about what we might expect if we were for 
example assuming about 10 minutes or less per file in keeping with the 
trajectory observed.
  
    (1302-1179+1)*10/60
  
  That is to say, we would have expected 20.6 hours up to file 1302 at ten 
minutes per file, but it was actually about 16.9 hours.
  
  Blazegraph was still running, so I took this opportunity to check how many 
triples were in place.
  
    select (count(*) as ?ct)
    where { ?s ?p ?o }
  
  produced a result of 9082617267.
  
  For the sake of extrapolation, let's suppose then that it would take about 
8.1 minutes per remaining file and things would have kept working all the way 
up to 1332 files (with 1302 being the last good one). 30 files times 8.1 
minutes per file = 243 minutes, or 6 hours and 3 minutes. If this had been the 
case, this would put us at about 1:27 PM today for completion. Remember, this 
was started around Friday, March 29, 2024 at 1:40 PM CT. So we're really quite 
close to a projection of about 5 days with this lower buffer capacity setting. 
Given that the other approach took about 3.95 days, from this run we might say 
that this run was about 26% (1.26 minus 1) times longer than the other approach.
  
  This sort of instability is unfortunately proving to be common (indeed, it's 
part of the reason for the graph split to have fewer triples and to reduce 
risk), but we may have stumbled onto something with the buffer capacity that 
helps us get both greater stability and greater speed, at least up to a certain 
number of munged files and at least on this machine.
  
  As I said earlier, I'm going to try comparing buffer capacity pieces against 
graph split files and see how things stack up. Graph split based files is 
ultimately our target, after all.
  
  Knowing that at least on this machine the larger buffer capacity seems to 
likely be associated with greater stability, we should probably try a full 
reload on another server with the munged N3-style Turtle files with the larger 
buffer capacity as well and see what this does for a full graph import. Or at 
least be aware that this is an option in case we need to do a full import 
(e.g., we might try it on one node with the higher buffer capacity in parallel 
with any other nodes attempting to recover with the currently established 
buffer capacity).
  
  Here's the paste of what stack trace material I could `screen` backscroll:
  
  P59389 BlazeGraph stack trace during munged file import, obtained from screen 
backscroll <https://phabricator.wikimedia.org/P59389>

TASK DETAIL
  https://phabricator.wikimedia.org/T359062

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dr0ptp4kt
Cc: ssingh, bking, dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331

_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

Reply via email to