[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. The problem seems to be persistent and not fixed by restart. Looks like permanent data corruption :( TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Noted. Can you replicate the situation leading up to this event? Are you doing anything that goes around the REST API? Can you publish a compressed copy of the journal file that we can download or expose the machine for ssh access? Thanks, Bryan TASK DE

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The issue is an address in the free list that can not be recycle. You should be able to open in a read only mode based on that stack trace. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. @Thompsonbry.systap Leading to this we had updater tool (which fetches updated entities from Wikidata) run for a week or so, loading lots of entities. It all goes through the REST endpoint http://localhost:/bigdata/namespace/wdq/sparql The journal file is 59G by

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Let's boil this down to something that can be replicated by a known series of actions with known data sets. The last time we had something like this it was related to a failure to correctly rollback an UPDATE at the REST API layer. So we need to understand

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. ok. we will need access one way or another. Please start by running the DumpJournal utility class (in the same package as the Journal). Specify the -pages option. Post the output. And please get us a copy of the logs. Thanks, Bryan - Bryan Thompson Ch

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I am only suggesting that you can probably open the database in a read-only mode and access the data. The "problem" is in the list of deferred allocation slot addresses for recycling. See the Options file in the Journal package for how to force a read-only ope

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. @Thompsonbry.systap could you add more details about the read-only mode? Is it going to help to fix the database? We'll need it writable since it is constantly updated... TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. @Thompsonbry.systap replicating it from zero can be problematic... :( The code running is release bigdata-1.5.1.jar with our extensions: com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.DefaultBigdataVocabulary com.bigdata.rdf.store

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. F158276: log TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.o

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Was group commit enabled? I suggest that we run this long running test in an environment that where we have shared access. Martyn (per the email to you and nik) will be the point person looking at the RWStore, but this could easily be a failure triggered by ba

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. That log file is only since the journal restart. Do you have the historical log files? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREF

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We can probably force the RWStore to discard the list of deferred deletes. Assuming that the problem is purely in that deferred deletes list, this would let you continue to apply updates. However, what we should do is use this as an opportunity to root cause

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. @Thompsonbry.systap no, unfortunately I didn't keep the log file from previous blazegraph run :( it printed to console but turns out I forgot to redirect it to the log, sorry. Will do DumpJournal now. Is it supposed to take a long time? It dumped some data and then si

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. It will take a long time with the -pages option. It is doing a sequential scan of each index. You will see the output update each time it finishes with one index and starts on the next. At the end it will write out a table with statistics about the indices.

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. You should be able to see the IOs as it pages through the indices if you are concerned that it might not be making progress. It is probably just busy scanning the disk TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to c

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. Also, here's how we run the Blazegraph for full reproduction: Run script: F158280: run.sh JAR files: F158283: wikidata-query-blazegraph-0.0.1-SNAPSHOT.jar F158284: wikidata-query-c

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. What JVM (version and vendor) and OS? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/setting

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. What happened before the failure was just regular loading of updates (i.e. 10 threads sending mix of INSERT/DELETE statements though REST API). I was checking on it from time to time, and last time I checked, instead of regular update flow I saw a stream of failure ex

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. > What JVM (version and vendor) and OS? Linux db01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux java version "1.7.0_75" OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-2) OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode) TAS

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok. We have much less experience with OpenJDK in production. Normally people deploy the Oracle JVM. I am not aware of any specific problems. It is just that I do not have as much confidence around OpenJDK deployments. But this looks like an internals bug, no

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. I think this: https://wikitech.wikimedia.org/wiki/Help:Getting_Started can get you set up for shell access to the test machine in question. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !cl

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Manybubbles
Manybubbles added a comment. In https://phabricator.wikimedia.org/T97468#1243270, @Smalyshev wrote: > > What JVM (version and vendor) and OS? > > > Linux db01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 > GNU/Linux > > java version "1.7.0_75" > OpenJDK Runtime Environment (Ic

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. ok. like I said, no known issues. - Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@systap.com http://blazegraph.com http://blog.bigdata.com http://mapgraph.io Blazegraphâ„¢ is our u

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread JanZerebecki
JanZerebecki added a subscriber: JanZerebecki. JanZerebecki added a comment. In https://phabricator.wikimedia.org/T97468#1243114, @Smalyshev wrote: > As for ssh access, not sure - it's on internal labs machine, need to check > with Powers That Be what is the policy about it. Anyone who help wi

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Stas notes that they had recently deleted a large namespace (one of two) from the workbench GUI. We should check the deferred frees in depth and make sure that the deleted namespace deletes are no longer on the deferred deletes list. Stas also notes that som

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. The journal can be picked up at http://wdq-wikidata.testme.wmflabs.org/w/resources/assets/blazegraph.jnl.gz - caution, it's 14G file! TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim,

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. Since dumping journal takes a lot of time, here's partial one: F158320: dumpJournal - will update when it's done. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach fi

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. Dump script failed with: java.lang.UnsupportedOperationException at com.bigdata.btree.Checkpoint.getHeight(Checkpoint.java:124) at com.bigdata.btree.BaseIndexStats.(BaseIndexStats.java:101) at com.bigdata.bop.solutions.SolutionSetStream.

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I have created a ticket for this. See http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). Please subscribe for updates. Thanks, Bryan TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Can I get the full stack trace for that failure? It looks like it does not understand the type of index. I would like to figure out which index has this problem. Try DumpJournal without the -pages option. It will run through the indices very quickly. Let m

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. See below for the code that threw the exception in Checkpoint.java public final int getHeight() { switch (indexType) { case BTree: return height; default: t

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The problem might be an issue with DumpJournal and the SolutionSetStream class. The last thing that it visited was a SolutionSetStream, not a BTree. **Can you modify the code to test the index type (interface test) and skip things that are not BTree classes a

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. Journal dump with no pages: F158346: dumpJournal TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFEREN

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. It looks like it hit the same error. At least, it has not listed out the rest of the indices and stops on that solution set stream. We will need to redo this with the patch to DumpJournal. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER A

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. @Thompsonbry.systap please check again, I've uploaded wrong file at first, fixed since. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok, i see the same number of entries in the forward and reverse indices and across the various statement indices, TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Smalyshev
Smalyshev added a comment. dumJournal with pages: F158480: dumpJournal2 TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Smalyshev
Smalyshev added a comment. Should we also create another bug for DumpJournal being unable to handle SolutionSetStream? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We have a ticket for this. http://trac.bigdata.com/ticket/1229 TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabric

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Martyn and I talked this through this morning. He is proceeding per http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). His first step will be to understand the problem in terms of the RWStore internal state so we can generate some theories about

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Our plan is to implement a utility that overwrites the address of the deferred free list. So you would run this utility. It would overwrite the address as 0L so that the deferred free list is empty. You would then re-open the journal normally. The head of

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. One of the questions I had is why would #1021 not have caught this problem. So, #1021 only protects against a call to abort() that fails. If there is a write set and - (a) a failed commit where the code does not then go on to invoke abort() -or- (b) the cod

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Smalyshev
Smalyshev added a comment. Where can I check the new code out? The latest commit I see on https://sourceforge.net/p/bigdata/git/ci/master/tree/ in master is from April 15... TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !cl

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Can you apply the patch to 1.5.1? The changes that I made are committed against the 1.5.2 development branch. The change to get past that UnsupportedOperationException thrown from Checkpoint.getHeight() is just to return the height field. That is: public f

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Smalyshev
Smalyshev added a comment. OK, I hand-patched the source, the output without -pages is here: F158672: dumpJournal . Will add one with -pages as soon as it's ready. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply t

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I've pushed the change to Checkpoint.getHeight() to SF git as branch TICKET_1228. Commit 44ed00d6fd348b2be9bea73f9b8a8c646744a6fd. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !c

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. See http://trac.bigdata.com/ticket/1228 for some further thoughts on a root cause, some thoughts on how to create a stress test to replicate the problem, and some thoughts on how to fix this. At this point I think we should focus on tests that interrupt the C

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Smalyshev
Smalyshev added a comment. Full dumpJournal output with pages: F158796: dumpJournal2 TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. See http://trac.bigdata.com/ticket/1228#comment:11 for a utility that should unblock you on writes. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign .

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Smalyshev
Smalyshev added a comment. @Thompsonbry.systap looks like using the utility fixes the journal at least to the point it now accepts the updates again. Thanks a lot for the quick turnaround! TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attac

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Smalyshev
Smalyshev added a comment. New exception now: ERROR: DefaultResourceLocator.java:531: java.lang.IllegalArgumentException: fromKey > toKey java.lang.IllegalArgumentException: fromKey > toKey at com.bigdata.btree.ChildIterator.(ChildIterator.java:124) at com.bigdata.btree.N

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Not sure. There is really no reliable way to carry the database forward from an error like the one that triggered originally. This might be something we didn't think through in the utility to unblock the journal. Or it might be bad data in the indices from the o

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Can you also include the operation that was executed for this error? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://ph

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Smalyshev
Smalyshev added a comment. Seems to be transient error, other updates work fine. We don't have yet the logging of SPARQL statements sent, I'll add it now. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubs

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Your branching factors are all set to 128 in this namespace. You should override them per the results I had posted on a different ticket to target 8k pages with few blobs (pages that exceed 8k). Just FYI. This can be done when the namespace is created.

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-05-13 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We have migrated to JIRA. See http://jira.blazegraph.com/browse/BLZG-1236 for the ticket that corresponds to http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). We have not yet been able to reproduce this. I have developed and I am continuing