On 8/25/21 3:17 PM, Mike Pham wrote:
>
> Thanks for all suggestions, and general enthusiasm in helping scale
> WDQS! A number of you have suggested various graph backends to
> consider moving to from Blazegraph, and I wanted to take a minute to
> respond more generically.
>
> There are several criteria we need to consider for a Blazegraph
> alternative. Ideally we would have this list of criteria ready and
> available to share, so that the community can help vet alternatives
> with us. Unfortunately, we do not currently have a full list of these
> criteria. While the criteria we judged candidate graph backends on are
> available here
> <https://docs.google.com/spreadsheets/d/1MXikljoSUVP77w7JKf9EXN40OB-ZkMqT8Y5b2NYVKbU/edit?usp=sharing>,
> it is highly unlikely these will be the exact set we will use in this
> next stage of scaling, and should only be used as a historical reference.
>
> It is likely that there is no silver bullet solution that will satisfy
> every criteria. We will probably need to make compromises in some
> areas in order to optimize for others. This is a primary reason for
> conducting the WDQS user survey
> <https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2021/08#Wikidata_Query_Service_(WDQS)_User_Survey_2021>:
> we would like a better understanding of what the overall community
> priorities are, including from those who may be less vocal in existing
> discussions. These priorities will then be a major component in
> distilling the criteria (and weights) for a new graph backend.
>
> The current plan is to share the (most up to date as we can) survey
> results at WikidataCon
> <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021> this year. I
> appreciate the discussion around potential candidates so far, and
> welcome the continued insight/help, but wanted to also be clear that
> we will not be making any decisions about a new graph backend, or have
> a complete list of criteria or testing process, at the moment —
> WikidataCon will be the next strategic check-in point.
>
> As always, your patience is appreciated, and I’m looking forward to
> the continuing discussions and collaboration!
>
> Best,
> Mike
>
>
>
>
> —
>
> *Mike Pham* (he/him)
> Sr Product Manager, Search
> Wikimedia Foundation <https://wikimediafoundation.org/>


Hi Mike,

Here's a suggestion regarding this important matter, circa 2021:

At the very least, a candidate platform should be able to deliver on a
live instance of the Wikidata dataset accessible for interaction via
SPARQL Query Services Endpoint.

Based on the interesting list of suggestions presented in this mailing
list (and in the Google Spreadsheet
<https://docs.google.com/spreadsheets/d/1MXikljoSUVP77w7JKf9EXN40OB-ZkMqT8Y5b2NYVKbU/edit#gid=0&range=M1>
it's spawned), the larger goal of a vibrant LOD Cloud Knowledge Graph
would benefit exponentially if each platform actually offered a live
instance.

Irrespective of the final decision made, we are always going to offer a
live Wikidata instance, just as we do a LOD Cloud Cache etc..

Also note, the WDQS and SPARQL loose-coupling suggested by Jerven is
ultra-important, making that cool Query Services App independent of
SPARQL Query Service backend will improve utility and general
resilience, immensely. 

*Links*

[1] https://wikidata.demo.openlinksw.com/sparql -- Wikidata instance
we've been hosting for quite some time

[2] http://lod.openlinksw.com/sparql -- 40 Billion+ Triples instance
(used to be the largest live SPARQL Query Service instance until Uniprot
dethroned it!).

[3]
https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedia-and-wikidata-5fb2b9f22ada
-- On the Mutually Beneficial Nature of DBpedia and Wikidata


Kingsley


>
> On 25August, 2021 at 09:41:28, Samuel Klein (meta...@gmail.com
> <mailto:meta...@gmail.com>) wrote:
>
>> Aha, hello jerven :)  I should have remembered your earlier comment,
>> delighted you are here.  
>>
>> Thank you again for sharing your promising experience + benchmarks +
>> suggestions -- and for highlighting both similarities and differences. 
>>
>> SJ
>>
>> On Tue, Aug 24, 2021 at 2:18 AM jerven Bolleman
>> <jerven.bolleman@sib.swiss> wrote:
>>
>>     Hi Samuel, All,
>>
>>     I am the software engineer responsible for sparql.uniprot.org
>>     <http://sparql.uniprot.org>.
>>     I already offered to help in
>>     https://phabricator.wikimedia.org/T206561
>>     <https://phabricator.wikimedia.org/T206561>.
>>     So no need to ask Andra or Egon ;)
>>
>>     While we are good users of virtuoso, and strongly suggest it is
>>     evaluated. As it is in general a good product that does scale.[1]
>>
>>     One of the things we did differently than WDQS is to introduce a
>>     controlled layer between the "public" and the "database".
>>     To allow things like query rewriting/redirection upon data model
>>     changes, as well as rewriting some schema rediscovery queries to
>>     a known
>>     faster query. We also parse the queries with RDF4J before handing
>>     them
>>     to virtuoso. This makes sure that the queries that we accept are
>>     only
>>     valid SPARQL 1.1. Avoiding users getting used to almost SPARQL
>>     dialects
>>     (i.e. retain the flexiblity to move to a different endpoint). We
>>     are in
>>     the process of updating this code and contributing it to RDF4J,
>>     with the
>>     first contribution in the develop/4.0.0 branch
>>
>>     I think a number of current customizations in WDQS can be moved to a
>>     front RDF4J layer. Then the RDF4J sail/repository layer can be
>>     used to
>>     preserve flexibility. So that WDQS can more easily switch between
>>     backend databases in the future.
>>
>>     One large difference between UniProt and WDQS is that WikiData is
>>     continually updated while UniProt is batch released a few times a
>>     year.
>>     WDQS is somewhat easier in some areas and more difficult in others
>>     because of that.
>>
>>     Regards,
>>     Jerven
>>
>>     [1] No Database is perfect, but it does scale a lot better than
>>     Blazegraph did. Which we also evaluated in the past. There is
>>     still a
>>     lot of potential in Virtuoso to scale even better in the future.
>>
>>
>>
>>
>>
>>     On 23/08/2021 21:36, Samuel Klein wrote:
>>     > Ah, that's lovely.  Thanks for the update, Kingsley!  Uniprot
>>     is a good
>>     > parallel to keep in mind.
>>     >
>>     > For Egon, Andra, others who work with them: Is there someone you'd
>>     > recommend chatting with at uniprot?
>>     > "scaling alongside uniprot" or at least engaging them on how to
>>     solve
>>     > shared + comparable issues (they also offer authentication-free
>>     SPARQL
>>     > querying) sounds like a compelling option.
>>     >
>>     > S.
>>     >
>>     > On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata
>>     > <wikidata@lists.wikimedia.org
>>     <mailto:wikidata@lists.wikimedia.org>
>>     <mailto:wikidata@lists.wikimedia.org
>>     <mailto:wikidata@lists.wikimedia.org>>> wrote:
>>     >
>>     >     On 8/18/21 5:07 PM, Mike Pham wrote:
>>     >>
>>     >>     Wikidata community members,
>>     >>
>>     >>
>>     >>     Thank you for all of your work helping Wikidata grow and
>>     improve
>>     >>     over the years. In the spirit of better communication, we
>>     would
>>     >>     like to take this opportunity to share some of the current
>>     >>     challenges Wikidata Query Service (WDQS) is facing, and some
>>     >>     strategies we have for dealing with them.
>>     >>
>>     >>
>>     >>     WDQS currently risks failing to provide acceptable service
>>     quality
>>     >>     due to the following reasons:
>>     >>
>>     >>     1.
>>     >>
>>     >>         Blazegraph scaling
>>     >>
>>     >>         1.
>>     >>
>>     >>             Graph size. WDQS uses Blazegraph as our graph backend.
>>     >>             While Blazegraph can theoretically support 50 billion
>>     >>             edges <https://blazegraph.com/
>>     <https://blazegraph.com/>>, in reality Wikidata is
>>     >>             the largest graph we know of running on Blazegraph
>>     (~13
>>     >>             billion triples
>>     >>           
>>      
>> <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m
>>     
>> <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>>),
>>     >>             and there is a risk that we will reach a size
>>     >>           
>>      <https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29
>>     
>> <https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>>limit
>>     >>             of what it can realistically support
>>     >>             <https://phabricator.wikimedia.org/T213210
>>     <https://phabricator.wikimedia.org/T213210>>. Once
>>     >>             Blazegraph is maxed out, WDQS can no longer be
>>     updated.
>>     >>             This will also break Wikidata tools that rely on WDQS.
>>     >>
>>     >>         2.
>>     >>
>>     >>             Software support. Blazegraph is end of life software,
>>     >>             which is no longer actively maintained, making it an
>>     >>             unsustainable backend to continue moving forward
>>     with long
>>     >>             term.
>>     >>
>>     >>
>>     >>     Blazegraph maxing out in size poses the greatest risk for
>>     >>     catastrophic failure, as it would effectively prevent WDQS
>>     from
>>     >>     being updated further, and inevitably fall out of date.
>>     Our long
>>     >>     term strategy to address this is to move to a new graph
>>     backend
>>     >>     that best meets our WDQS needs and is actively maintained, and
>>     >>     begin the migration off of Blazegraph as soon as a viable
>>     >>     alternative is identified
>>     >>     <https://phabricator.wikimedia.org/T206560
>>     <https://phabricator.wikimedia.org/T206560>>.
>>     >>
>>     >
>>     >     Hi Mike,
>>     >
>>     >     Do bear in mind that pre and post selection of Blazegraph for
>>     >     Wikidata, we've always offered an RDF-based DBMS that can
>>     handle
>>     >     current and future requirements for Wikidata, just as we do
>>     DBpedia.
>>     >
>>     >     At the time of our first rendezvous, handling 50 billion
>>     triples
>>     >     would have typically required our Cluster Edition which is a
>>     >     Commercial Only offering -- basically, that was the deal
>>     breaker
>>     >     back then.
>>     >
>>     >     Anyway, in recent times, our Open Source Edition has evolved to
>>     >     handle some 80 Billion+ triples (exemplified by the live
>>     Uniprot
>>     >     instance) where performance and scale is primary a function of
>>     >     available memory.
>>     >
>>     >     I hope this helps.
>>     >
>>     >     Related:
>>     >
>>     >     [1] https://wikidata.demo.openlinksw.com/sparql
>>     <https://wikidata.demo.openlinksw.com/sparql>
>>     >     <https://wikidata.demo.openlinksw.com/sparql
>>     <https://wikidata.demo.openlinksw.com/sparql>>-- Our Live Wikidata
>>     >     SPARQL Query Endpoint
>>     >     [2]
>>     >   
>>      
>> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
>>     
>> <https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0>
>>     >   
>>      
>> <https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
>>     
>> <https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0>>
>>     >     -- Google Spreadsheet about various Virtuoso Configurations
>>     >     associated with some well-known public endpoints
>>     >     [3] https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE>
>>     <https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE>> -- this query
>>     >     doesn't complete with the current Blazegraph-based Wikidata
>>     endpoint
>>     >     [4] https://t.co/GTATPPJNBI <https://t.co/GTATPPJNBI>
>>     <https://t.co/GTATPPJNBI <https://t.co/GTATPPJNBI>> -- same query
>>     >     completing when applied to the Virtuoso-based endpoint
>>     >     [5] https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69>
>>     <https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69>> -- about
>>     >     loading Wikidata's datasets into a Virtuoso instance
>>     >     [6]
>>     >   
>>      
>> https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live
>>     
>> <https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live>
>>     >   
>>      
>> <https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live
>>     
>> <https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live>>
>>     >     -- various demos shared via Twitter over the years
>>     regarding Wikidata
>>     >
>>     >     --
>>     >     Regards,
>>     >
>>     >     Kingsley Idehen   
>>     >     Founder & CEO
>>     >     OpenLink Software
>>     >     Home Page:http://www.openlinksw.com
>>     <http://www.openlinksw.com>  <http://www.openlinksw.com
>>     <http://www.openlinksw.com>>
>>     >     Community Support:https://community.openlinksw.com
>>     <https://community.openlinksw.com> 
>>     <https://community.openlinksw.com <https://community.openlinksw.com>>
>>     >     Weblogs (Blogs):
>>     >     Company Blog:https://medium.com/openlink-software-blog
>>     <https://medium.com/openlink-software-blog> 
>>     <https://medium.com/openlink-software-blog
>>     <https://medium.com/openlink-software-blog>>
>>     >     Virtuoso Blog:https://medium.com/virtuoso-blog
>>     <https://medium.com/virtuoso-blog> 
>>     <https://medium.com/virtuoso-blog <https://medium.com/virtuoso-blog>>
>>     >     Data Access Drivers
>>     Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>     <https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers> 
>>     <https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>     <https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers>>
>>     >
>>     >     Personal Weblogs (Blogs):
>>     >     Medium Blog:https://medium.com/@kidehen
>>     <https://medium.com/@kidehen>  <https://medium.com/@kidehen
>>     <https://medium.com/@kidehen>>
>>     >     Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
>>     <http://www.openlinksw.com/blog/~kidehen/> 
>>     <http://www.openlinksw.com/blog/~kidehen/
>>     <http://www.openlinksw.com/blog/~kidehen/>>
>>     >                    http://kidehen.blogspot.com
>>     <http://kidehen.blogspot.com>  <http://kidehen.blogspot.com
>>     <http://kidehen.blogspot.com>>
>>     >
>>     >     Profile Pages:
>>     >     Pinterest:https://www.pinterest.com/kidehen/
>>     <https://www.pinterest.com/kidehen/> 
>>     <https://www.pinterest.com/kidehen/
>>     <https://www.pinterest.com/kidehen/>>
>>     >     Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>     <https://www.quora.com/profile/Kingsley-Uyi-Idehen> 
>>     <https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>     <https://www.quora.com/profile/Kingsley-Uyi-Idehen>>
>>     >     Twitter:https://twitter.com/kidehen
>>     <https://twitter.com/kidehen>  <https://twitter.com/kidehen
>>     <https://twitter.com/kidehen>>
>>     >     Google+:https://plus.google.com/+KingsleyIdehen/about
>>     <https://plus.google.com/+KingsleyIdehen/about> 
>>     <https://plus.google.com/+KingsleyIdehen/about
>>     <https://plus.google.com/+KingsleyIdehen/about>>
>>     >     LinkedIn:http://www.linkedin.com/in/kidehen
>>     <http://www.linkedin.com/in/kidehen> 
>>     <http://www.linkedin.com/in/kidehen
>>     <http://www.linkedin.com/in/kidehen>>
>>     >
>>     >     Web Identities (WebID):
>>     >   
>>      Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>     <http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i> 
>>     <http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>     <http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i>>
>>     >             
>>     
>> :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>     
>> <http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this>
>>  
>>     
>> <http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>     
>> <http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this>>
>>     >
>>     >     _______________________________________________
>>     >     Wikidata mailing list -- wikidata@lists.wikimedia.org
>>     <mailto:wikidata@lists.wikimedia.org>
>>     >     <mailto:wikidata@lists.wikimedia.org
>>     <mailto:wikidata@lists.wikimedia.org>>
>>     >     To unsubscribe send an email to
>>     wikidata-le...@lists.wikimedia.org
>>     <mailto:wikidata-le...@lists.wikimedia.org>
>>     >     <mailto:wikidata-le...@lists.wikimedia.org
>>     <mailto:wikidata-le...@lists.wikimedia.org>>
>>     >
>>     >
>>     >
>>     > --
>>     > Samuel Klein          @metasj           w:user:sj          +1
>>     617 529 4266
>>     >
>>     > _______________________________________________
>>     > Wikidata mailing list -- wikidata@lists.wikimedia.org
>>     <mailto:wikidata@lists.wikimedia.org>
>>     > To unsubscribe send an email to
>>     wikidata-le...@lists.wikimedia.org
>>     <mailto:wikidata-le...@lists.wikimedia.org>
>>     >
>>
>>     -- 
>>
>>             *Jerven Tjalling Bolleman*
>>     Principal Software Developer
>>     *SIB | Swiss Institute of Bioinformatics*
>>     1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
>>     t +41 22 379 58 85
>>     Jerven.Bolleman@sib.swiss - www.sib.swiss
>>     _______________________________________________
>>     Wikidata mailing list -- wikidata@lists.wikimedia.org
>>     <mailto:wikidata@lists.wikimedia.org>
>>     To unsubscribe send an email to
>>     wikidata-le...@lists.wikimedia.org
>>     <mailto:wikidata-le...@lists.wikimedia.org>
>>
>>
>>
>> -- 
>> Samuel Klein          @metasj           w:user:sj          +1 617 529
>> 4266
>> _______________________________________________
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> <mailto:wikidata@lists.wikimedia.org>
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>> <mailto:wikidata-le...@lists.wikimedia.org>
>
> _______________________________________________
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software   
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog: 
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
              http://kidehen.blogspot.com

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
        : 
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to