[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-25 Thread Manybubbles
Manybubbles added a comment.

This isn't really a scale out thing but i'm putting it here any way:  One 
option for installation is keep all the BlazeGraph servers independently up to 
date using the same mechanism that we'd use to sync data to a cluster.  In this 
case we don't need HA at all - just one copy of the change poller per server.  
No zookeeper.  No quorum.  Just independent machines keeping themselves up to 
date.  Its nice in its simplicity.  I dunno if its the right way for us to go 
though.  @joe, what do you think of that?


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: Joe, Smalyshev, Thompsonbry.systap, Jdouglas, daniel, Beebs.systap, 
Haasepeter, Aklapper, Manybubbles, jkroll, Wikidata-bugs, aude, GWicke, 
JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment.

BlazeGraph supports arbitrary nesting of statements on statements, so, yes,
that would be fine.


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Thompsonbry.systap
Cc: Smalyshev, Thompsonbry.systap, Jdouglas, daniel, Beebs.systap, Haasepeter, 
Aklapper, Manybubbles, jkroll, Wikidata-bugs, aude, GWicke, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-23 Thread Smalyshev
Smalyshev added a subscriber: Smalyshev.
Smalyshev added a comment.

@Thompsonbry.systap currently our reification model is pretty close to one 
described here: http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf except 
that we also have direct link between entity and value in addition to fully 
reified representation.

Additional complication is that we have more than two levels, if you look in 
the paper - i.e. we have entity -> value, and then 
>->qualifiers, but both value and qualifiers can be compound 
values - i.e. value may be:

  value a wikidata:Value;
wikidata:amount 1234;
wikidata:precision 10;
wikidata:unit "1";

etc. and the quailifiers each may be composed of values too. So I wonder if 
that still would be fine.


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Smalyshev, Thompsonbry.systap, Jdouglas, daniel, Beebs.systap, Haasepeter, 
Aklapper, Manybubbles, jkroll, Wikidata-bugs, aude, GWicke, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-23 Thread Thompsonbry.systap
Thompsonbry.systap added a comment.

This depends on how you model the reified RDF data.  However, the inlined 
statements about statements are not in the same part of the statement indices 
as the ground statements.  This is because the IVs all have a prefix byte that 
includes whether the IV is a Literal, URI, Blank Node or Statement (inlined 
statements about statements support).  So the statement indices are partitioned 
on each component of the key in terms of whether that key component is a 
Literal, URI, Statement, etc.

The way this works normally, you have a ground statement like:

<:a :knows :b>

You then have statements about that statement:

<<<:a :knows :b>> :source :facebook>
<<<:a :knows :b>> :date "1/12/2015"^^xsd:dateTime>

Those statements about statements have the inlined statement as the first 
component of the key for the SPO(C) index.  Thus they are clustered together, 
but not cluster with the original ground statement.

Thanks,
Bryan


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Thompsonbry.systap
Cc: Thompsonbry.systap, Jdouglas, daniel, Beebs.systap, Haasepeter, Aklapper, 
Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-20 Thread Manybubbles
Manybubbles added a comment.

Already asked to get statistics on current usage but I imagine we can add
new nodes if we need to. I think we'll have to see what load is like once
we get it for wikigrok and see what its like to run in laba. For more data
scaling up looks to be the thing. I think maybe we can have two instances -
one with just truthy claims which is small and one with fully reified
claims and truthy claims. The small one might be faster because it'll have
better cache hit rate. Maybe not needed. Just an idea.


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: Jdouglas, daniel, Beebs.systap, Haasepeter, Aklapper, Manybubbles, jkroll, 
Smalyshev, Wikidata-bugs, aude, GWicke, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-20 Thread Jdouglas
Jdouglas added a subscriber: Jdouglas.
Jdouglas added a comment.

> BlazeGraph doesn't support clustering - only high availability


Does this refer to update scaling, or just query scaling?  Blazegraph supports 
replication clustering, which allows horizontal/linear //query// scaling.

>   Is federated queries an answer?


Blazegraph's scale-out architecture supports dynamic partitioning and service 
discovery, which could help to distribute data in a way that promotes 
distributed querying.



Which problem are we more concerned about: supporting lots and lots of data, or 
supporting lots and lots of queries?

- If the former, note that Systap recommends a scale-up architecture (i.e. 
single-node or replication cluster) for data sets with less than 50B statements.
- If the latter: can we dig into Wikidata's statistics to get concrete numbers 
for current usage?


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Jdouglas
Cc: Jdouglas, daniel, Beebs.systap, Haasepeter, Aklapper, Manybubbles, jkroll, 
Smalyshev, Wikidata-bugs, aude, GWicke, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T90117: BlazeGraph Finalization: Scale out plans

2015-02-20 Thread Manybubbles
Manybubbles added a comment.

One option might be to keep the "truthy" dump on nice SSDs and make sure those 
are super duper fast but to support the more reified forms only on other 
machines.  Maybe those machines use spinning disks and allow more time between 
writes?


TASK DETAIL
  https://phabricator.wikimedia.org/T90117

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: daniel, Beebs.systap, Haasepeter, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs