Re: [Wikidata] Scaling Wikidata Query Service

Stas Malyshev Wed, 12 Jun 2019 10:11:51 -0700

Hi!

>> So there needs to be some smarter solution, one that we'd unlike to
> develop inhouse
> 
> Big cat, small fish. As wikidata continue to grow, it will have specific
> needs.
> Needs that are unlikely to be solved by off-the-shelf solutions.


Here I think it's good place to remind that we're not Google, and
developing a new database engine inhouse is probably a bit beyond our
resources and budgets. Fitting existing solution to our goals - sure,
but developing something new of that scale is probably not going to happen.

> FoundationDB and WiredTiger are respectively used at Apple (among other
> companies)
> and MongoDB since 3.2 all over-the-world. WiredTiger is also used at Amazon.

I believe they are, but I think for our particular goals we have to
limit themselves for a set of solution that are a proven good match for
our case.

>> We also have a plan on improving the throughput of Blazegraph, which
> we're working on now.
> 
> What is the phabricator ticket? Please.

You can see WDQS task board here:
https://phabricator.wikimedia.org/tag/wikidata-query-service/

> That will be vendor lock-in for wikidata and wikimedia along all the
> poor souls that try to interop with it.

Since Virtuoso is using standard SPARQL, it won't be too much of a
vendor lock in, though of course the standard does not cover all, so
some corners are different in all SPARQL engines. This is why even
migration between SPARQL engines, even excluding operational aspects, is
non-trivial. Of course, migration to any non-SPARQL engine would be
order of magnitude more disruptive, so right now we do not seriously
consider doing that.

> It has two backends: MMAP and rocksdb.

Sure, but I was talking about the data model - ArangoDB sees the data as
set of documents. RDF approach is a bit different.

> ArangoDB is a multi-model database, it support:

As I already mentioned, there's a difference between "you can do it" and
"you can do it efficiently". Graphs are simple creatures, and can be
modeled on many backends - KV, document, relational, column store,
whatever you have. The tricky part starts when you need to run millions
of queries on 10B triples database. If your backend is not optimal for
that task, it's not going to perform.

-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Scaling Wikidata Query Service

Reply via email to