Re: [Wikidata] Scaling Wikidata Query Service

2019-06-26 Thread Eric Prud'hommeaux
On Mon, Jun 17, 2019 at 09:41:51PM +0200, Finn Aarup Nielsen wrote: > > Changing the subject a bit: > > I am surprised to see how many SPARQL requests go to the endpoint when > performing a ShEx validation with the shex-simple Toolforge tool. They are > all very simple and quickly complete. For

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-25 Thread Ted Thibodeau Jr
On Jun 17, 2019, at 03:41 PM, Finn Aarup Nielsen wrote: > > > Changing the subject a bit: Well... Changing the subject a *lot*, to an extent probably worthy of its own subject line, and an entirely new thread, not only because it seems more relevant to the "shex-simple Toolforge tool" you

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-22 Thread Thad Guidry
In the enterprise where I work as a Data Architect, we approach scaling in many ways, but there's no question that the age old technique of SORTING lines up everything for systems and cpu's to massively ingest and pipeline across IO boundaries. Sometimes this involves more indices, lots of

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-22 Thread Marco Neumann
Thibaut, while it's certainly exciting to see continued work on the development of storage solutions and hybrids are most likely part of the future story here I'd also would to stay as close as possible to existing Semantic Web / Linked Data standards like RDF and SPARQL to guarantee interop and

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-21 Thread Thibaut DEVERAUX
Dear, I've seen this suggestion on Quora : https://www.quora.com/Wouldnt-a-mix-database-system-that-handle-both-JSON-documents-and-graph-functions-like-ArangoDB-provide-a-better-scalability-to-enormous-knowledge-graphs-like-Wikidata-than-a-classical-quadstore I'm not qualified enough to know

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-19 Thread Finn Aarup Nielsen
Changing the subject a bit: I am surprised to see how many SPARQL requests go to the endpoint when performing a ShEx validation with the shex-simple Toolforge tool. They are all very simple and quickly complete. For each Wikidata item tested, one of our tests [1] requests tens of times. That

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-17 Thread Stas Malyshev
Hi! > The documented limits about FDB states that it to support up to 100TB of > data > . > That is 100x times more > than what WDQS needs at the moment. "Support" is such a multi-faceted word. It can mean "it works very

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-17 Thread Ted Thibodeau Jr
Hello, Stas -- On Jun 13, 2019, at 07:52 PM, Stas Malyshev wrote: > > Hi! > >> It handles data locality across a shared nothing cluster just fine i.e., you >> can interact with any node in a Virtuoso cluster and experience identical >> behavior (everyone node looks like single node in the

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-17 Thread Sebastian Hellmann
Hi Amirouche, On 16.06.19 23:01, Amirouche Boubekki wrote: Le mer. 12 juin 2019 à 19:27, Amirouche Boubekki mailto:amirouche.boube...@gmail.com>> a écrit : Hello Sebastian, First thanks a lot for the reply. I started to believe that what I was saying was complete nonsense.

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-16 Thread Amirouche Boubekki
Hello Sebastian and Stas, Le mer. 12 juin 2019 à 19:27, Amirouche Boubekki < amirouche.boube...@gmail.com> a écrit : > Hello Sebastian, > > First thanks a lot for the reply. I started to believe that what I was > saying was complete nonsense. > > Le mer. 12 juin 2019 à 16:51, Sebastian Hellmann

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-14 Thread Kingsley Idehen
On 6/13/19 7:55 PM, Stas Malyshev wrote: > Hi! > >> Data living in an RDBMS engine distinct from Virtuoso is handled via the >> engines Virtual Database module i.e., you can build powerful RDF Views >> over ODBC- or JDBC- accessible data using Virtuoso. These view also have >> the option of being

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-13 Thread Stas Malyshev
Hi! > Data living in an RDBMS engine distinct from Virtuoso is handled via the > engines Virtual Database module i.e., you can build powerful RDF Views > over ODBC- or JDBC- accessible data using Virtuoso. These view also have > the option of being materialized etc.. Yes, but the way the data

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-13 Thread Stas Malyshev
Hi! > It handles data locality across a shared nothing cluster just fine i.e., > you can interact with any node in a Virtuoso cluster and experience > identical behavior (everyone node looks like single node in the eyes of > the operator). Does this mean no sharding, i.e. each server stores the

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-13 Thread Stas Malyshev
Hi! > Unlike, most sites we do have our own custom frontend in front of > virtuoso. We did this to allow more styling, as well as being flexible > and change implementations at our whim. e.g. we double parse the SPARQL > queries and even rewrite some to be friendlier. I suggest you do the > same

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-13 Thread Kingsley Idehen
On 6/12/19 1:11 PM, Stas Malyshev wrote: >> That will be vendor lock-in for wikidata and wikimedia along all the >> poor souls that try to interop with it. > Since Virtuoso is using standard SPARQL, it won't be too much of a > vendor lock in, though of course the standard does not cover all, so >

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Amirouche Boubekki
Le mer. 12 juin 2019 à 19:11, Stas Malyshev a écrit : > Hi! > > >> So there needs to be some smarter solution, one that we'd unlike to > > develop inhouse > > > > Big cat, small fish. As wikidata continue to grow, it will have specific > > needs. > > Needs that are unlikely to be solved by

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Amirouche Boubekki
Hello Sebastian, First thanks a lot for the reply. I started to believe that what I was saying was complete nonsense. Le mer. 12 juin 2019 à 16:51, Sebastian Hellmann < hellm...@informatik.uni-leipzig.de> a écrit : > Hi Amirouche, > On 12.06.19 14:07, Amirouche Boubekki wrote: > > > So there

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Stas Malyshev
Hi! >> So there needs to be some smarter solution, one that we'd unlike to > develop inhouse > > Big cat, small fish. As wikidata continue to grow, it will have specific > needs. > Needs that are unlikely to be solved by off-the-shelf solutions. Here I think it's good place to remind that we're

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Sebastian Hellmann
Hi Amirouche, On 12.06.19 14:07, Amirouche Boubekki wrote: > So there needs to be some smarter solution, one that we'd unlike to develop inhouse Big cat, small fish. As wikidata continue to grow, it will have specific needs. Needs that are unlikely to be solved by off-the-shelf solutions.

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Kingsley Idehen
On 6/11/19 12:06 PM, Andra Waagmeester wrote: > > > On Tue, Jun 11, 2019 at 11:23 AM Jerven Bolleman et al wrote: > > > >>  So we are playing the game since ten years now: Everybody > tries other databases, but then most people come back to virtuoso.  > > > Nothing bad about virtuoso, on

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Amirouche Boubekki
Le dim. 9 juin 2019 à 23:18, Amirouche Boubekki < amirouche.boube...@gmail.com> a écrit : > I made a proposal for a grant at > https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB > > Mind the fact that this is not about the versioned quadstore. It is about > simple triplestore, it

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Marco Neumann
and of course not to forget the fully open source SPARQL 1.1 compliant RDF database Apache Jena with TDB. Did you already evaluate Apache Jena for use in wikidata? On Tue, Jun 11, 2019 at 5:07 PM Andra Waagmeester wrote: > > > On Tue, Jun 11, 2019 at 11:23 AM Jerven Bolleman et al wrote: >

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Andra Waagmeester
On Tue, Jun 11, 2019 at 11:23 AM Jerven Bolleman et al wrote: > > >> So we are playing the game since ten years now: Everybody tries other > databases, but then most people come back to virtuoso. > Nothing bad about virtuoso, on the contrary, they are a prime infrastructure provider (Except

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Kingsley Idehen
On 6/10/19 4:25 PM, Stas Malyshev wrote: >> Just a note here: Virtuoso is also a full RDMS, so you could probably >> keep wikibase db in the same cluster and fix the asynchronicity. That is > Given how the original data is stored (JSON blob inside mysql table) it > would not be very useful. In

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Kingsley Idehen
On 6/10/19 4:46 PM, Stas Malyshev wrote: > Hi! > >> thanks for the elaboration. I can understand the background much better. >> I have to admit, that I am also not a real expert, but very close to the >> real experts like Vidal and Rahm who are co-authors of the SWJ paper or >> the OpenLink devs.

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Kingsley Idehen
On 6/10/19 3:49 PM, Guillaume Lederrey wrote: > On Mon, Jun 10, 2019 at 9:03 PM Sebastian Hellmann > wrote: >> Hi Guillaume, >> >> On 10.06.19 16:54, Guillaume Lederrey wrote: >> >> Hello! >> >> On Mon, Jun 10, 2019 at 4:28 PM Sebastian Hellmann >> wrote: >> >> Hi Guillaume, >> >> On 06.06.19

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Kingsley Idehen
On 6/10/19 10:54 AM, Guillaume Lederrey wrote: >> - Virtuoso has proven quite useful. I don't want to advertise here, but the >> thing they have going for DBpedia uses ridiculous hardware, i.e. 64GB RAM >> and it is also the OS version, not the professional with clustering and >> repartition

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Jerven Bolleman
Hi Guillaume, All, As the lead developer for sparql.uniprot.org one of the few sparql endpoints with much more data (7x) than wikidata and significant external users. I can chime in with our experiences of hosting data with Virtuoso. All in all, I am very happy with it and it has made our

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann
Yes, I can ask. I am talking a lot with them as we are redeploying DBpedia live and also pushing the new DBpedia to them soon. I think, they also had a specific issue with how Wikidata does linked data, but I didn't get it, as it was mentioned too briefly. All the best, Sebastian On

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Stas Malyshev
Hi! > thanks for the elaboration. I can understand the background much better. > I have to admit, that I am also not a real expert, but very close to the > real experts like Vidal and Rahm who are co-authors of the SWJ paper or > the OpenLink devs. If you know anybody at OpenLink that would be

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann
Hi Stas, thanks for the elaboration. I can understand the background much better. I have to admit, that I am also not a real expert, but very close to the real experts like Vidal and Rahm who are co-authors of the SWJ paper or the OpenLink devs. I am also spoiled, because OpenLink solves

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Stas Malyshev
Hi! > Yes, sharding is what you need, I think, instead of replication. This is > the technique where data is repartitioned into more manageable chunks > across servers. Agreed, if we are to get any solution that is not constrained by hardware limits of a single server, we can not avoid looking

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Stas Malyshev
Hi! > I am not sure how to evaluate this correctly. Scaling databases in > general is a "known hard problem" and graph databases a sub-field of it, > which are optimized for graph-like queries as opposed to column stores > or relational databases. If you say that "throwing hardware at the >

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Guillaume Lederrey
On Mon, Jun 10, 2019 at 9:03 PM Sebastian Hellmann wrote: > > Hi Guillaume, > > On 10.06.19 16:54, Guillaume Lederrey wrote: > > Hello! > > On Mon, Jun 10, 2019 at 4:28 PM Sebastian Hellmann > wrote: > > Hi Guillaume, > > On 06.06.19 21:32, Guillaume Lederrey wrote: > > Hello all! > > There has

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann
Hi Guillaume, On 10.06.19 16:54, Guillaume Lederrey wrote: Hello! On Mon, Jun 10, 2019 at 4:28 PM Sebastian Hellmann wrote: Hi Guillaume, On 06.06.19 21:32, Guillaume Lederrey wrote: Hello all! There has been a number of concerns raised about the performance and scaling of Wikdata Query

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Guillaume Lederrey
Hello! On Mon, Jun 10, 2019 at 4:28 PM Sebastian Hellmann wrote: > > Hi Guillaume, > > On 06.06.19 21:32, Guillaume Lederrey wrote: > > Hello all! > > There has been a number of concerns raised about the performance and > scaling of Wikdata Query Service. We share those concerns and we are >

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann
Hi Guillaume, On 06.06.19 21:32, Guillaume Lederrey wrote: Hello all! There has been a number of concerns raised about the performance and scaling of Wikdata Query Service. We share those concerns and we are doing our best to address them. Here is some info about what is going on: In an ideal

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-09 Thread Amirouche Boubekki
I made a proposal for a grant at https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB Mind the fact that this is not about the versioned quadstore. It is about simple triplestore, it mainly missing bindings for foundationdb and SPARQL syntax. Also, I will prolly need help to

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-07 Thread Amirouche Boubekki
Le jeu. 6 juin 2019 à 21:33, Guillaume Lederrey a écrit : > Hello all! > > There has been a number of concerns raised about the performance and > scaling of Wikdata Query Service. We share those concerns and we are > doing our best to address them. Here is some info about what is going > on: > >

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-06 Thread Daniel Mietchen
Thanks, Guillaume - this is very helpful, and it would be great to have similar information posted/ collected on other kinds of limits and potential approaches to addressing them. Some weeks ago, we started a project to keep track of tsuch limits, and I have added pointers to your information

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-06 Thread Gerard Meijssen
Hoi, Thank you for this answer. It helps. It helps to understand / appreciate the work that is done. Without updates like this, it becomes increasingly hard to be confident that our future will remain bright. Thanks, GerardM On Thu, 6 Jun 2019 at 21:33, Guillaume Lederrey wrote: > Hello

[Wikidata] Scaling Wikidata Query Service

2019-06-06 Thread Guillaume Lederrey
Hello all! There has been a number of concerns raised about the performance and scaling of Wikdata Query Service. We share those concerns and we are doing our best to address them. Here is some info about what is going on: In an ideal world, WDQS should: * scale in terms of data size * scale in