I like the idea of comparing live instances; could we pose a test-instance
challenge, with some benchmarks, and invite different communities to take
it up, hosting their own demos of what a well-tuned instance of WD could
look like?  (Could also be hosted by us / spun up by advocates for a tool
in our community; could also spur some kaggle interest)

The size of the community actively interested in the health of Wikidata
seems complementary information; alongside overall community size/health
(which appears on the existing metrics list).   //S

On Fri, Aug 27, 2021 at 10:19 AM Kingsley Idehen via Wikidata
wikidata@lists.wikimedia.org> wrote:

> On 8/25/21 3:17 PM, Mike Pham wrote:
> Thanks for all suggestions, and general enthusiasm in helping scale WDQS!
> A number of you have suggested various graph backends to consider moving to
> from Blazegraph, and I wanted to take a minute to respond more generically.
> There are several criteria we need to consider for a Blazegraph
> alternative. Ideally we would have this list of criteria ready and
> available to share, so that the community can help vet alternatives with
> us. Unfortunately, we do not currently have a full list of these criteria.
> While the criteria we judged candidate graph backends on are available
> here
> <https://docs.google.com/spreadsheets/d/1MXikljoSUVP77w7JKf9EXN40OB-ZkMqT8Y5b2NYVKbU/edit?usp=sharing>,
> it is highly unlikely these will be the exact set we will use in this next
> stage of scaling, and should only be used as a historical reference.
> It is likely that there is no silver bullet solution that will satisfy
> every criteria. We will probably need to make compromises in some areas in
> order to optimize for others. This is a primary reason for conducting the WDQS
> user survey
> <https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2021/08#Wikidata_Query_Service_(WDQS)_User_Survey_2021>:
> we would like a better understanding of what the overall community
> priorities are, including from those who may be less vocal in existing
> discussions. These priorities will then be a major component in distilling
> the criteria (and weights) for a new graph backend.
> The current plan is to share the (most up to date as we can) survey
> results at WikidataCon
> <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021> this year. I
> appreciate the discussion around potential candidates so far, and welcome
> the continued insight/help, but wanted to also be clear that we will not be
> making any decisions about a new graph backend, or have a complete list of
> criteria or testing process, at the moment ā€” WikidataCon will be the next
> strategic check-in point.
> As always, your patience is appreciated, and Iā€™m looking forward to the
> continuing discussions and collaboration!
> Best,
> Mike
> Hi Mike,
> Here's a suggestion regarding this important matter, circa 2021:
> At the very least, a candidate platform should be able to deliver on a
> live instance of the Wikidata dataset accessible for interaction via SPARQL
> Query Services Endpoint.
> Based on the interesting list of suggestions presented in this mailing
> list (and in the Google Spreadsheet
> <https://docs.google.com/spreadsheets/d/1MXikljoSUVP77w7JKf9EXN40OB-ZkMqT8Y5b2NYVKbU/edit#gid=0&range=M1>
> it's spawned), the larger goal of a vibrant LOD Cloud Knowledge Graph would
> benefit exponentially if each platform actually offered a live instance.
> Irrespective of the final decision made, we are always going to offer a
> live Wikidata instance, just as we do a LOD Cloud Cache etc..
> Also note, the WDQS and SPARQL loose-coupling suggested by Jerven is
> ultra-important, making that cool Query Services App independent of SPARQL
> Query Service backend will improve utility and general resilience,
> immensely.
On 25August, 2021 at 09:41:28, Samuel Klein
> Aha, hello jerven :)  I should have remembered your earlier comment,
> delighted you are here.
> Thank you again for sharing your promising experience + benchmarks +
> suggestions -- and for highlighting both similarities and differences.
> SJ
On Tue, Aug 24, 2021 at 2:18 AM jerven Bolleman
> <jerven.bolleman@sib.swiss> <jerven.bolleman@sib.swiss> wrote:
>> Hi Samuel, All,
>> I am the software engineer responsible for sparql.uniprot.org.
>> I already offered to help in https://phabricator.wikimedia.org/T206561.
>> So no need to ask Andra or Egon ;)
>> While we are good users of virtuoso, and strongly suggest it is
>> evaluated. As it is in general a good product that does scale.[1]
>> One of the things we did differently than WDQS is to introduce a
>> controlled layer between the "public" and the "database".
>> To allow things like query rewriting/redirection upon data model
>> changes, as well as rewriting some schema rediscovery queries to a known
>> faster query. We also parse the queries with RDF4J before handing them
>> to virtuoso. This makes sure that the queries that we accept are only
>> valid SPARQL 1.1. Avoiding users getting used to almost SPARQL dialects
>> (i.e. retain the flexiblity to move to a different endpoint). We are in
>> the process of updating this code and contributing it to RDF4J, with the
>> first contribution in the develop/4.0.0 branch
>> I think a number of current customizations in WDQS can be moved to a
>> front RDF4J layer. Then the RDF4J sail/repository layer can be used to
>> preserve flexibility. So that WDQS can more easily switch between
>> backend databases in the future.
>> One large difference between UniProt and WDQS is that WikiData is
>> continually updated while UniProt is batch released a few times a year.
>> WDQS is somewhat easier in some areas and more difficult in others
>> because of that.
>> Regards,
>> Jerven
>> [1] No Database is perfect, but it does scale a lot better than
>> Blazegraph did. Which we also evaluated in the past. There is still a
>> lot of potential in Virtuoso to scale even better in the future.
On 23/08/2021 21:36, Samuel Klein wrote:
>> > Ah, that's lovely.  Thanks for the update, Kingsley!  Uniprot is a good
>> > parallel to keep in mind.
>> >
>> > For Egon, Andra, others who work with them: Is there someone you'd
>> > recommend chatting with at uniprot?
>> > "scaling alongside uniprot" or at least engaging them on how to solve
>> > shared + comparable issues (they also offer authentication-free SPARQL
>> > querying) sounds like a compelling option.
>> >
>> > S.
>> >
On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata
>> > <wikidata@lists.wikimedia.org <mailto:wikidata@lists.wikimedia.org>>
>> wrote:
>> >
>> >     On 8/18/21 5:07 PM, Mike Pham wrote:
>> >>
>> >>     Wikidata community members,
>> >>
>> >>
>> >>     Thank you for all of your work helping Wikidata grow and improve
>> >>     over the years. In the spirit of better communication, we would
>> >>     like to take this opportunity to share some of the current
>> >>     challenges Wikidata Query Service (WDQS) is facing, and some
>> >>     strategies we have for dealing with them.
>> >>
>> >>
>> >>     WDQS currently risks failing to provide acceptable service quality
>> >>     due to the following reasons:
>> >>
>> >>     1.
>> >>
>> >>         Blazegraph scaling
>> >>
>> >>         1.
>> >>
>> >>             Graph size. WDQS uses Blazegraph as our graph backend.
>> >>             While Blazegraph can theoretically support 50 billion
>> >>             edges <https://blazegraph.com/>, in reality Wikidata is
>> >>             the largest graph we know of running on Blazegraph (~13
>> >>             billion triples
>> >>             <
>> https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m
>> >),
>> >>             and there is a risk that we will reach a size
>> >>             <
>> https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29
>> >limit
>> >>             of what it can realistically support
>> >>             <https://phabricator.wikimedia.org/T213210>. Once
>> >>             Blazegraph is maxed out, WDQS can no longer be updated.
>> >>             This will also break Wikidata tools that rely on WDQS.
>> >>
>> >>         2.
>> >>
>> >>             Software support. Blazegraph is end of life software,
>> >>             which is no longer actively maintained, making it an
>> >>             unsustainable backend to continue moving forward with long
>> >>             term.
>> >>
>> >>
>> >>     Blazegraph maxing out in size poses the greatest risk for
>> >>     catastrophic failure, as it would effectively prevent WDQS from
>> >>     being updated further, and inevitably fall out of date. Our long
>> >>     term strategy to address this is to move to a new graph backend
>> >>     that best meets our WDQS needs and is actively maintained, and
>> >>     begin the migration off of Blazegraph as soon as a viable
>> >>     alternative is identified
>> >>     <https://phabricator.wikimedia.org/T206560>.
>> >>
>> >
>> >     Hi Mike,
>> >
>> >     Do bear in mind that pre and post selection of Blazegraph for
>> >     Wikidata, we've always offered an RDF-based DBMS that can handle
>> >     current and future requirements for Wikidata, just as we do DBpedia.
>> >
>> >     At the time of our first rendezvous, handling 50 billion triples
>> >     would have typically required our Cluster Edition which is a
>> >     Commercial Only offering -- basically, that was the deal breaker
>> >     back then.
>> >
>> >     Anyway, in recent times, our Open Source Edition has evolved to
>> >     handle some 80 Billion+ triples (exemplified by the live Uniprot
>> >     instance) where performance and scale is primary a function of
>> >     available memory.
>> >
>> >     I hope this helps.
>> >
>> >     Related:
>> >
>> >     [1] https://wikidata.demo.openlinksw.com/sparql
>> >     <https://wikidata.demo.openlinksw.com/sparql>-- Our Live Wikidata
>> >     SPARQL Query Endpoint
>> >     [2]
>> >
>> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
>> >     <
>> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
>> >
>> >     -- Google Spreadsheet about various Virtuoso Configurations
>> >     associated with some well-known public endpoints
>> >     [3] https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE> -- this query
>> >     doesn't complete with the current Blazegraph-based Wikidata endpoint
>> >     [4] https://t.co/GTATPPJNBI <https://t.co/GTATPPJNBI> -- same query
>> >     completing when applied to the Virtuoso-based endpoint
>> >     [5] https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69> -- about
>> >     loading Wikidata's datasets into a Virtuoso instance
>> >     [6]
>> >
>> https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live
>> >     <
>> https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live
>> >
>> >     -- various demos shared via Twitter over the years regarding
>> Wikidata
>> >
>> >
>> >
>> > --
>> > Samuel Klein          @metasj           w:user:sj          +1 617 529
>> 4266
>> >
>> --
>>         *Jerven Tjalling Bolleman*
>> Principal Software Developer
>> *SIB | Swiss Institute of Bioinformatics*
>> 1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
>> t +41 22 379 58 85
>> Jerven.Bolleman@sib.swiss - www.sib.swiss
> --
> Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
