[Wikidata-bugs] [Maniphest] [Commented On] T90445: BlazeGraph Finalization: Plan for no downtime upgrade

2015-03-02 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. HA uses versioned RMI messages to avoid potential conflicts in rolling upgrades. To upgrade, simply shutdown a given HAJournalServer process. Redeploy the code. Restart the HAJournalServer. It will automatically resync and go live. We try to avoid

[Wikidata-bugs] [Maniphest] [Commented On] T90128: BlazeGraph Finalization: Validate AST rewrite

2015-03-03 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Do you have a pointer to your code? I'd like to understand where the pain is. TASK DETAIL https://phabricator.wikimedia.org/T90128 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFER

[Wikidata-bugs] [Maniphest] [Commented On] T90128: BlazeGraph Finalization: Validate AST rewrite

2015-03-03 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok. That makes sense. In terms of turning this feature on and off automatically, there are some options. One is to add a query hint. See the com.bigdata.rdf.sparql.ast.hints.QueryHintRegistry. It looks like this will work out of the box for 3rd party

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-03-10 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. It might be useful to take some of these questions to the bigdata-developers mailing list. Some of these questions already have answers on the wiki. 1. RDR Syntax. For the RDR exception, please create a unit test. There are a few places where that test

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-03-10 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. What is the correct prefix declaration for "wdt"? TASK DETAIL https://phabricator.wikimedia.org/T92308 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-03-10 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. 1. I am not sure. That's why I would like to see it in a test case. Note that the openrdf jars need to appear before the blazegraph jars on the classpath in order for the blazegraph RDF parser not to be replaced by the openrdf parser. If that happens th

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-03-10 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Great! Please put my name on ticket so I will see it. @beebs.systap Unfortunately the executable JAR might not provide a sufficiently strong class ordering guarantee. TASK DETAIL https://phabricator.wikimedia.org/T92308 REPLY HANDLER ACTIONS Reply to

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-03-11 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. This may not be the right ticket, but I did some experimentation with the data sets that I referenced above looking at parameterization of the load. Using an Intel 2011 Mac Mini with 16GB of RAM and an SSD I have a total throughput across all datasets of 6

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-03-12 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok. That query PREFIX wdt: <http://wikidata-wdq.testme.wmflabs.org/entity/assert/> PREFIX entity: <http://wikidata-wdq.testme.wmflabs.org/entity/> SELECT ?h ?date WHERE { ?h wdt:P31 entity:Q5 . ?h wdt:P569 ?date . FILTER NO

[Wikidata-bugs] [Maniphest] [Commented On] T90115: BlazeGraph Security Review

2015-03-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The analytic query mode does offer some ability to bound memory but it is not 100% across all queries. For example, quads mode queries do not currently put the distinct triple pattern filter on the native heap. ORDER BY is not currently on the native heap

[Wikidata-bugs] [Maniphest] [Commented On] T90115: BlazeGraph Security Review

2015-03-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Another possibility is using CAS counters (striped atomic counters) to track resources associated with a query and use that to bound memory for queries running on the java managed heap (in addition to bounding the memory associated with the native heap

[Wikidata-bugs] [Maniphest] [Commented On] T94539: BlazeGraph uses old xsd:dateTime standard

2015-03-31 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Peter comments that he has also run into this just recently. I think that we should create two tickets 1. The xsd date time specification change. This will need to get documented at the data migration page. We should also modify the service description

[Wikidata-bugs] [Maniphest] [Commented On] T96100: Last update check query is too slow

2015-04-14 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. There are two existing tickets for Blazegraph that might be related to this ticket on your tracker and the other ticket that you filed today. They are: - http://trac.bigdata.com/ticket/953 - http://trac.bigdata.com/ticket/1083 Can you replicate the query

[Wikidata-bugs] [Maniphest] [Commented On] T96100: Last update check query is too slow

2015-04-16 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ah. That is good. Copying Michael. Thanks, Bryan Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@systap.com http://blazegraph.com http://blog.bigdata.com http://bigdata.com http://mapgraph.io Blazegr

[Wikidata-bugs] [Maniphest] [Commented On] T96094: Bad query performance for FILTER NOT EXISTS

2015-04-17 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. This part could be expensive. Instead of having a prefix scan, it is scanning all statements, materializing the Object position from the dictionary, and then checking the string representation of that URI for a match. http://www.wikidata.org/entity/Q30

[Wikidata-bugs] [Maniphest] [Commented On] T96094: Bad query performance for FILTER NOT EXISTS

2015-04-17 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok. At that number this will not matter. But it is not being as efficient as you would hope. Something to work on. Bryan TASK DETAIL https://phabricator.wikimedia.org/T96094 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim

[Wikidata-bugs] [Maniphest] [Commented On] T96490: Blazegraph support for owl:sameAs and redirects

2015-04-18 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. One approach is to update the forward dictionary to make the IRIs onto one of the existing IVs and then rewrite all of the statements. However, this sort of destructive rewrite of does not help if you need the "redirects" to be reversible. I am p

[Wikidata-bugs] [Maniphest] [Commented On] T97095: Add query runtime cap for blazegraph

2015-04-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. You can do this through web.xml. You can also mark the end point as read-only. See http://wiki.blazegraph.com/wiki/index.php/NanoSparqlServer#web.xml and also inline below for some of the more relevant options. queryThreadPoolSize 16 The size of

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Noted. Can you replicate the situation leading up to this event? Are you doing anything that goes around the REST API? Can you publish a compressed copy of the journal file that we can download or expose the machine for ssh access? Thanks, Bryan TASK

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The issue is an address in the free list that can not be recycle. You should be able to open in a read only mode based on that stack trace. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Let's boil this down to something that can be replicated by a known series of actions with known data sets. The last time we had something like this it was related to a failure to correctly rollback an UPDATE at the REST API layer. So we need to under

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. ok. we will need access one way or another. Please start by running the DumpJournal utility class (in the same package as the Journal). Specify the -pages option. Post the output. And please get us a copy of the logs. Thanks, Bryan - Bryan Thompson

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I am only suggesting that you can probably open the database in a read-only mode and access the data. The "problem" is in the list of deferred allocation slot addresses for recycling. See the Options file in the Journal package for how to force a

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Was group commit enabled? I suggest that we run this long running test in an environment that where we have shared access. Martyn (per the email to you and nik) will be the point person looking at the RWStore, but this could easily be a failure triggered by

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. That log file is only since the journal restart. Do you have the historical log files? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We can probably force the RWStore to discard the list of deferred deletes. Assuming that the problem is purely in that deferred deletes list, this would let you continue to apply updates. However, what we should do is use this as an opportunity to root

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. It will take a long time with the -pages option. It is doing a sequential scan of each index. You will see the output update each time it finishes with one index and starts on the next. At the end it will write out a table with statistics about the indices

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. You should be able to see the IOs as it pages through the indices if you are concerned that it might not be making progress. It is probably just busy scanning the disk TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. What JVM (version and vendor) and OS? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok. We have much less experience with OpenJDK in production. Normally people deploy the Oracle JVM. I am not aware of any specific problems. It is just that I do not have as much confidence around OpenJDK deployments. But this looks like an internals bug

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. ok. like I said, no known issues. - Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@systap.com http://blazegraph.com http://blog.bigdata.com http://mapgraph.io Blazegraphâ„¢ is

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Stas notes that they had recently deleted a large namespace (one of two) from the workbench GUI. We should check the deferred frees in depth and make sure that the deleted namespace deletes are no longer on the deferred deletes list. Stas also notes that

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I have created a ticket for this. See http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). Please subscribe for updates. Thanks, Bryan TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Can I get the full stack trace for that failure? It looks like it does not understand the type of index. I would like to figure out which index has this problem. Try DumpJournal without the -pages option. It will run through the indices very quickly. Let

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. See below for the code that threw the exception in Checkpoint.java public final int getHeight() { switch (indexType) { case BTree: return height; default

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The problem might be an issue with DumpJournal and the SolutionSetStream class. The last thing that it visited was a SolutionSetStream, not a BTree. **Can you modify the code to test the index type (interface test) and skip things that are not BTree classes

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. It looks like it hit the same error. At least, it has not listed out the rest of the indices and stops on that solution set stream. We will need to redo this with the patch to DumpJournal. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-28 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Ok, i see the same number of entries in the forward and reverse indices and across the various statement indices, TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We have a ticket for this. http://trac.bigdata.com/ticket/1229 TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Martyn and I talked this through this morning. He is proceeding per http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). His first step will be to understand the problem in terms of the RWStore internal state so we can generate some theories

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Our plan is to implement a utility that overwrites the address of the deferred free list. So you would run this utility. It would overwrite the address as 0L so that the deferred free list is empty. You would then re-open the journal normally. The head of

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. One of the questions I had is why would #1021 not have caught this problem. So, #1021 only protects against a call to abort() that fails. If there is a write set and - (a) a failed commit where the code does not then go on to invoke abort() -or- (b) the

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Can you apply the patch to 1.5.1? The changes that I made are committed against the 1.5.2 development branch. The change to get past that UnsupportedOperationException thrown from Checkpoint.getHeight() is just to return the height field. That is: public

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I've pushed the change to Checkpoint.getHeight() to SF git as branch TICKET_1228. Commit 44ed00d6fd348b2be9bea73f9b8a8c646744a6fd. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap removed a subscriber: Thompsonbry. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-29 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. See http://trac.bigdata.com/ticket/1228 for some further thoughts on a root cause, some thoughts on how to create a stress test to replicate the problem, and some thoughts on how to fix this. At this point I think we should focus on tests that interrupt the

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. See http://trac.bigdata.com/ticket/1228#comment:11 for a utility that should unblock you on writes. TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Not sure. There is really no reliable way to carry the database forward from an error like the one that triggered originally. This might be something we didn't think through in the utility to unblock the journal. Or it might be bad data in the indices fro

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Can you also include the operation that was executed for this error? TASK DETAIL https://phabricator.wikimedia.org/T97468 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Your branching factors are all set to 128 in this namespace. You should override them per the results I had posted on a different ticket to target 8k pages with few blobs (pages that exceed 8k). Just FYI. This can be done when the namespace is created

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-04-30 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Yes but. There are global defaults and you can set them. For example, if you list out e namespace properties you will see how the 128 default is set. The issue is that we are using different branching factors for the spo and lex relations, and even inside of

[Wikidata-bugs] [Maniphest] [Commented On] T97538: Write integration tests that include creation/destruction of namespaces

2015-05-04 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Does the integration test demonstrate the problem? TASK DETAIL https://phabricator.wikimedia.org/T97538 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T97468: Blazegraph crash on updater

2015-05-13 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We have migrated to JIRA. See http://jira.blazegraph.com/browse/BLZG-1236 for the ticket that corresponds to http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). We have not yet been able to reproduce this. I have developed and I am continuing

[Wikidata-bugs] [Maniphest] [Commented On] T92308: Open questions for Blazegraph data model research

2015-05-14 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. That is normal. You can choose to explicitly disable bloom filters in advance. Otherwise they are disabled once their expected error rate would be too high. Nothing to be concerned about. Bryan TASK DETAIL https://phabricator.wikimedia.org/T92308 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T90131: BlazeGraph Finalization: Pluggable inline values

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Inline values are not necessary. They represent a tradeoff between dictionary encoding values and their direct representation as inline values within the statement indices. There is a simple example ColorsEnumExtension in com.bigdata.rdf.internal that

[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. This depends on how you model the reified RDF data. However, the inlined statements about statements are not in the same part of the statement indices as the ground statements. This is because the IVs all have a prefix byte that includes whether the IV is

[Wikidata-bugs] [Maniphest] [Commented On] T88717: Investigate BlazeGraph aka BigData for WDQ

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I've received the signed CLA from corporate. Once I get Nik's SF account I will set him up as a developer. TASK DETAIL https://phabricator.wikimedia.org/T88717 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsu

[Wikidata-bugs] [Maniphest] [Commented On] T90130: BlazeGraph Finalization: Geo

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We would be interested in both the elastic search and GeoSPARQL integrations. Full GeoSPARQL support is complex. It involves not only the spatial index, but extensions to the query planning and there is a large test suite that we would need to pass. A

[Wikidata-bugs] [Maniphest] [Commented On] T90131: BlazeGraph Finalization: Pluggable inline values

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I agree that this is related to how you choose to represent, index, and query values with additional annotations (error bounds, uncertainty, different values at different points in time, etc.). This is not a simple issue. One idea that I have seen is that the

[Wikidata-bugs] [Maniphest] [Commented On] T90114: BlazeGraph Finalization: Update performance

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a subscriber: Thompsonbry.systap. Thompsonbry.systap added a comment. The data storage on the disk is identical for the single machine and HA replication cluster (HAJournalServer) modes. In fact, you can take a compressed snapshot file from the HA replication cluster

[Wikidata-bugs] [Maniphest] [Commented On] T90130: BlazeGraph Finalization: Geo

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. It is very easy to write and register custom functions for things like distance filtering. See [1]. This could be combined with the MGRS approach and prefix scans on the text index to provide a fairly efficient spatial distance capability. [1] http

[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. BlazeGraph supports arbitrary nesting of statements on statements, so, yes, that would be fine. TASK DETAIL https://phabricator.wikimedia.org/T90117 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign

[Wikidata-bugs] [Maniphest] [Commented On] T90128: BlazeGraph Finalization: Validate AST rewrite

2015-02-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. No. But this is very easy to add. I have created a ticket for this [1]. You can register with trac and subscribe to that ticket. Update: I've implemented the hook in the SF GIT branch TICKET_1113. [1] http://trac.bigdata.com/ticket/1113 (Add hook fo

[Wikidata-bugs] [Maniphest] [Commented On] T90116: BlazeGraph Finalization: Machine Sizing/Shaping

2015-02-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. There are two known issues with the property path operator that Nik and I discussed today. These are [1] and [2]. The first of these issues means that the operator is actually fully materializing the property paths *before* giving you any results. [1] is a

[Wikidata-bugs] [Maniphest] [Commented On] T90116: BlazeGraph Finalization: Machine Sizing/Shaping

2015-02-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. By label lookup I assume that you mean materializing (projecting out through a SELECT expression) the actual RDF Values for URIs or Literals that have become bound by the query. In general, join processing can proceed without dictionary materialization and

[Wikidata-bugs] [Maniphest] [Commented On] T90116: BlazeGraph Finalization: Machine Sizing/Shaping

2015-02-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. Concerning thread. See [1] for a general background on query optimization for blazegraph. The QueryEngine class in blazegraph supports parallelism across queries (concurrent non-blocking queries), within queries (different operators in the same query will

[Wikidata-bugs] [Maniphest] [Commented On] T90116: BlazeGraph Finalization: Machine Sizing/Shaping

2015-02-24 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. In terms of the machine shape, the general guidelines you give are appropriate. However, here is out it plays out in terms of GC. Large heaps => long GC pauses. So you want to keep the JVM heap fairly small (4G => 8G). Analytic queries can use the

[Wikidata-bugs] [Maniphest] [Commented On] T90116: BlazeGraph Finalization: Machine Sizing/Shaping

2015-02-25 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. We've assigned the property path optimization and will focus on it in our next sprint. In fact, we hope to get started on this late this week. TASK DETAIL https://phabricator.wikimedia.org/T90116 REPLY HANDLER ACTIONS Reply to comment or attach

[Wikidata-bugs] [Maniphest] [Commented On] T90119: BlazeGraph Finalization: RDF Issues

2015-02-25 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The RDR inlining of reified statement models is handled by the StatementBuffer class. It is important to have a limited lexical scope in the dump for the different RDF triples involved in the reified statement model. The code needs to buffer incomplete

[Wikidata-bugs] [Maniphest] [Commented On] T90119: BlazeGraph Finalization: RDF Issues

2015-02-25 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. You can use URIs instead of blank nodes. Most of the time when people use blank nodes they SHOULD be using URIs. Blank nodes are existential variables. Coin URIs if you want to have a reference. Bryan Thompson Chief Scientist & Founder SYSTAP,

[Wikidata-bugs] [Maniphest] [Commented On] T90119: BlazeGraph Finalization: RDF Issues

2015-02-25 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. There is support for inline UUIDs for blank nodes. See UUIDBNodeIV. You could also define a fully inline URI with a well-known prefix and a UUID. Bryan Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 2741

[Wikidata-bugs] [Maniphest] [Commented On] T90952: Figure out if we need/can use RDR

2015-02-26 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. You do not need to use the RDR interchange syntax or the RDR syntax for SPARQL query to take advantage of the RDR support inside of BlazeGraph. All you need to do is serialize your data as RDF and use RDF reification to interchange statements modeling

[Wikidata-bugs] [Maniphest] [Commented On] T90952: Figure out if we need/can use RDR

2015-02-26 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. The performance gain from the indexing strategy is roughly 4x when performing a join that would otherwise require the use of a reified statement model join. This is because we actually eliminate the 4 joins required to match the reified statement model

[Wikidata-bugs] [Maniphest] [Commented On] T90952: Figure out if we need/can use RDR

2015-02-26 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. I think that it is important for the data to have a certain "readiness to hand" to really promote reuse. I suggest that you try to model the data using a few different reification strategies and see what the queries look like and how the data c

[Wikidata-bugs] [Maniphest] [Commented On] T90952: Figure out if we need/can use RDR

2015-02-27 Thread Thompsonbry.systap
Thompsonbry.systap added a comment. That is a good point. This can (and should) be fixed. As long as the scope of the document is reasonable (think wiki page), then there is no significant overhead incurred by resolving reified statement models. However this could be a problem if all the data