Ah, that's lovely. Thanks for the update, Kingsley! Uniprot is a good parallel to keep in mind.
For Egon, Andra, others who work with them: Is there someone you'd recommend chatting with at uniprot? "scaling alongside uniprot" or at least engaging them on how to solve shared + comparable issues (they also offer authentication-free SPARQL querying) sounds like a compelling option. S. On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata < wikidata@lists.wikimedia.org> wrote: > On 8/18/21 5:07 PM, Mike Pham wrote: > > Wikidata community members, > > Thank you for all of your work helping Wikidata grow and improve over the > years. In the spirit of better communication, we would like to take this > opportunity to share some of the current challenges Wikidata Query Service > (WDQS) is facing, and some strategies we have for dealing with them. > > WDQS currently risks failing to provide acceptable service quality due to > the following reasons: > > 1. > > Blazegraph scaling > 1. > > Graph size. WDQS uses Blazegraph as our graph backend. While > Blazegraph can theoretically support 50 billion edges > <https://blazegraph.com/>, in reality Wikidata is the largest graph > we know of running on Blazegraph (~13 billion triples > > <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>), > and there is a risk that we will reach a size > > <https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit > of what it can realistically support > <https://phabricator.wikimedia.org/T213210>. Once Blazegraph is > maxed out, WDQS can no longer be updated. This will also break Wikidata > tools that rely on WDQS. > 2. > > Software support. Blazegraph is end of life software, which is no > longer actively maintained, making it an unsustainable backend to > continue > moving forward with long term. > > > Blazegraph maxing out in size poses the greatest risk for catastrophic > failure, as it would effectively prevent WDQS from being updated further, > and inevitably fall out of date. Our long term strategy to address this is > to move to a new graph backend that best meets our WDQS needs and is > actively maintained, and begin the migration off of Blazegraph as soon as a > viable alternative is identified > <https://phabricator.wikimedia.org/T206560>. > > > Hi Mike, > > Do bear in mind that pre and post selection of Blazegraph for Wikidata, > we've always offered an RDF-based DBMS that can handle current and future > requirements for Wikidata, just as we do DBpedia. > > At the time of our first rendezvous, handling 50 billion triples would > have typically required our Cluster Edition which is a Commercial Only > offering -- basically, that was the deal breaker back then. > > Anyway, in recent times, our Open Source Edition has evolved to handle > some 80 Billion+ triples (exemplified by the live Uniprot instance) where > performance and scale is primary a function of available memory. > > I hope this helps. > > Related: > > [1] https://wikidata.demo.openlinksw.com/sparql -- Our Live Wikidata > SPARQL Query Endpoint > [2] > https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0 > -- Google Spreadsheet about various Virtuoso Configurations associated with > some well-known public endpoints > [3] https://t.co/EjAAO73wwE -- this query doesn't complete with the > current Blazegraph-based Wikidata endpoint > [4] https://t.co/GTATPPJNBI -- same query completing when applied to the > Virtuoso-based endpoint > [5] https://t.co/X7mLmcYC69 -- about loading Wikidata's datasets into a > Virtuoso instance > [6] > https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live > <https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live> > -- various demos shared via Twitter over the years regarding Wikidata > > -- > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Home Page: http://www.openlinksw.com > Community Support: https://community.openlinksw.com > Weblogs (Blogs): > Company Blog: https://medium.com/openlink-software-blog > Virtuoso Blog: https://medium.com/virtuoso-blog > Data Access Drivers Blog: > https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers > > Personal Weblogs (Blogs): > Medium Blog: https://medium.com/@kidehen > Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ > http://kidehen.blogspot.com > > Profile Pages: > Pinterest: https://www.pinterest.com/kidehen/ > Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen > Twitter: https://twitter.com/kidehen > Google+: https://plus.google.com/+KingsleyIdehen/about > LinkedIn: http://www.linkedin.com/in/kidehen > > Web Identities (WebID): > Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i > : > http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this > > _______________________________________________ > Wikidata mailing list -- wikidata@lists.wikimedia.org > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org > -- Samuel Klein @metasj w:user:sj +1 617 529 4266
_______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-le...@lists.wikimedia.org