dr0ptp4kt added a comment.
This morning of April 3 around 6:25 AM I had SSH'd to check progress, and it was working, but going slowly, similar to the day before. It was on a file number in the 1200s, but I didn't write down the number or copy terminal output; I do remember seeing it was taking around 796 seconds for one of the files at that time. Look at the previous comment, you'll see those were going slow; not surprising as we know imports on these munged files are slower upon more stuff is imported. I checked several hours later in the middle of a meeting, and it had gone into a bad spiral. I've been able to use `screen` backscrolling to obtain much of the stack trace, but could not backscroll to a point of having all of the information to tell when the last successful file imported without a stack trace for sure. What we can say is that //probably// the last somewhat stable commit was on file 1302 at about 7:24 AM. And probably file 1303 and definitely 1304 and 1305 have been failing badly and taking a really long time in doing so; this would probably continue indefinitely from here without killing the process. Just a slice of the paste here to give an idea of things (notice `lastCommitTime` and `commitCounter` in the stack trace). Wed Apr 3 02:05:26 PM CDT 2024 Processing wikidump-000001305.ttl.gz SPARQL-UPDATE: updateStr=LOAD <file:///mnt/firehose/munge_on_later_data_set/wikidump-000001305.ttl.gz> java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -83289912769511002: lastRootBlock=rootBlock{ rootBlock=0, challisField=1302, version=3, nextOffset=47806576684846562, localTime=1712147044389 [Wedne sday, April 3, 2024 7:24:04 AM CDT], firstCommitTime=1711737574896 [Friday, March 29, 2024 1:39:34 PM CDT], lastCommitTime=1712147041973 [Wednesday, April 3, 2024 7:24:01 AM CDT], commitCounter=1302, commitRecordAddr={off=NATIVE:-140859033,len=422}, commitRecordIndexAddr={off=NATIVE:-93467508,len=220}, blockSequence=34555, quorumToken=-1, metaBitsAddr=26754033649714513, metaStartAddr=11989126, storeType=RW, uuid=f993598d-497c-46a7-8434-d25c8859a0b8, offsetBits=42, checksum=16003356 92, createTime=1711737574192 [Friday, March 29, 2024 1:39:34 PM CDT], closeTime=0} Unfortunately `jstack` seems to hiccup. ubuntu22:~$ sudo jstack -m 987870 [sudo] password: Attaching to process ID 987870, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.402-b06 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jstack.JStack.runJStackTool(JStack.java:140) at sun.tools.jstack.JStack.main(JStack.java:106) Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007fecb400b800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread) at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:169) at sun.jvm.hotspot.runtime.Threads.first(Threads.java:153) at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:200) at sun.jvm.hotspot.tools.PStack.run(PStack.java:71) at sun.jvm.hotspot.tools.PStack.run(PStack.java:58) at sun.jvm.hotspot.tools.PStack.run(PStack.java:53) at sun.jvm.hotspot.tools.JStack.run(JStack.java:66) at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.JStack.main(JStack.java:92) ... 6 more Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fecb400b800 at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80) at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:165) ... 16 more ubuntu22:~$ sudo jstack -Flm 987870 Usage: jstack [-l] <pid> (to connect to running process) jstack -F [-m] [-l] <pid> (to connect to a hung process) jstack [-m] [-l] <executable> <core> (to connect to a core file) jstack [-m] [-l] [server_id@]<remote server IP or hostname> (to connect to a remote debug server) Options: -F to force a thread dump. Use when jstack <pid> does not respond (process is hung) -m to print both java and native frames (mixed mode) -l long listing. Prints additional information about locks -h or -help to print this help message ubuntu22:~$ sudo jstack -F -l -m 987870 Attaching to process ID 987870, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.402-b06 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jstack.JStack.runJStackTool(JStack.java:140) at sun.tools.jstack.JStack.main(JStack.java:106) Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007fecb400b800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread) at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:169) at sun.jvm.hotspot.runtime.Threads.first(Threads.java:153) at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:200) at sun.jvm.hotspot.tools.PStack.run(PStack.java:71) at sun.jvm.hotspot.tools.PStack.run(PStack.java:58) at sun.jvm.hotspot.tools.PStack.run(PStack.java:53) at sun.jvm.hotspot.tools.JStack.run(JStack.java:66) at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.JStack.main(JStack.java:92) ... 6 more Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fecb400b800 at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80) at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:165) ... 16 more The number of files imported between the previous comment at 2:39 PM yesterday and 7:24 AM today is about what we might expect if we were for example assuming about 10 minutes or less per file in keeping with the trajectory observed. (1302-1179+1)*10/60 That is to say, we would have expected 20.6 hours up to file 1302 at ten minutes per file, but it was actually about 16.9 hours. Blazegraph was still running, so I took this opportunity to check how many triples were in place. select (count(*) as ?ct) where { ?s ?p ?o } produced a result of 9082617267. For the sake of extrapolation, let's suppose then that it would take about 8.1 minutes per remaining file and things would have kept working all the way up to 1332 files (with 1302 being the last good one). 30 files times 8.1 minutes per file = 243 minutes, or 6 hours and 3 minutes. If this had been the case, this would put us at about 1:27 PM today for completion. Remember, this was started around Friday, March 29, 2024 at 1:40 PM CT. So we're really quite close to a projection of about 5 days with this lower buffer capacity setting. Given that the other approach took about 3.95 days, from this run we might say that this run was about 26% (1.26 minus 1) times longer than the other approach. This sort of instability is unfortunately proving to be common (indeed, it's part of the reason for the graph split to have fewer triples and to reduce risk), but we may have stumbled onto something with the buffer capacity that helps us get both greater stability and greater speed, at least up to a certain number of munged files and at least on this machine. As I said earlier, I'm going to try comparing buffer capacity pieces against graph split files and see how things stack up. Graph split based files is ultimately our target, after all. Knowing that at least on this machine the larger buffer capacity seems to likely be associated with greater stability, we should probably try a full reload on another server with the munged N3-style Turtle files with the larger buffer capacity as well and see what this does for a full graph import. Or at least be aware that this is an option in case we need to do a full import (e.g., we might try it on one node with the higher buffer capacity in parallel with any other nodes attempting to recover with the currently established buffer capacity). Here's the paste of what stack trace material I could `screen` backscroll: P59389 BlazeGraph stack trace during munged file import, obtained from screen backscroll <https://phabricator.wikimedia.org/P59389> TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: ssingh, bking, dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org