Ah, that's lovely.  Thanks for the update, Kingsley!  Uniprot is a good
parallel to keep in mind.

For Egon, Andra, others who work with them: Is there someone you'd
recommend chatting with at uniprot?
"scaling alongside uniprot" or at least engaging them on how to solve
shared + comparable issues (they also offer authentication-free SPARQL
querying) sounds like a compelling option.

S.

On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata <
wikidata@lists.wikimedia.org> wrote:

> On 8/18/21 5:07 PM, Mike Pham wrote:
>
> Wikidata community members,
>
> Thank you for all of your work helping Wikidata grow and improve over the
> years. In the spirit of better communication, we would like to take this
> opportunity to share some of the current challenges Wikidata Query Service
> (WDQS) is facing, and some strategies we have for dealing with them.
>
> WDQS currently risks failing to provide acceptable service quality due to
> the following reasons:
>
>    1.
>
>    Blazegraph scaling
>    1.
>
>       Graph size. WDQS uses Blazegraph as our graph backend. While
>       Blazegraph can theoretically support 50 billion edges
>       <https://blazegraph.com/>, in reality Wikidata is the largest graph
>       we know of running on Blazegraph (~13 billion triples
>       
> <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>),
>       and there is a risk that we will reach a size
>       
> <https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit
>       of what it can realistically support
>       <https://phabricator.wikimedia.org/T213210>. Once Blazegraph is
>       maxed out, WDQS can no longer be updated. This will also break Wikidata
>       tools that rely on WDQS.
>       2.
>
>       Software support. Blazegraph is end of life software, which is no
>       longer actively maintained, making it an unsustainable backend to 
> continue
>       moving forward with long term.
>
>
> Blazegraph maxing out in size poses the greatest risk for catastrophic
> failure, as it would effectively prevent WDQS from being updated further,
> and inevitably fall out of date. Our long term strategy to address this is
> to move to a new graph backend that best meets our WDQS needs and is
> actively maintained, and begin the migration off of Blazegraph as soon as a
> viable alternative is identified
> <https://phabricator.wikimedia.org/T206560>.
>
>
> Hi Mike,
>
> Do bear in mind that pre and post selection of Blazegraph for Wikidata,
> we've always offered an RDF-based DBMS that can handle current and future
> requirements for Wikidata, just as we do DBpedia.
>
> At the time of our first rendezvous, handling 50 billion triples would
> have typically required our Cluster Edition which is a Commercial Only
> offering -- basically, that was the deal breaker back then.
>
> Anyway, in recent times, our Open Source Edition has evolved to handle
> some 80 Billion+ triples (exemplified by the live Uniprot instance) where
> performance and scale is primary a function of available memory.
>
> I hope this helps.
>
> Related:
>
> [1] https://wikidata.demo.openlinksw.com/sparql -- Our Live Wikidata
> SPARQL Query Endpoint
> [2]
> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
> -- Google Spreadsheet about various Virtuoso Configurations associated with
> some well-known public endpoints
> [3] https://t.co/EjAAO73wwE -- this query doesn't complete with the
> current Blazegraph-based Wikidata endpoint
> [4] https://t.co/GTATPPJNBI -- same query completing when applied to the
> Virtuoso-based endpoint
> [5] https://t.co/X7mLmcYC69 -- about loading Wikidata's datasets into a
> Virtuoso instance
> [6]
> https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live
> <https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live>
> -- various demos shared via Twitter over the years regarding Wikidata
>
> --
> Regards,
>
> Kingsley Idehen       
> Founder & CEO
> OpenLink Software
> Home Page: http://www.openlinksw.com
> Community Support: https://community.openlinksw.com
> Weblogs (Blogs):
> Company Blog: https://medium.com/openlink-software-blog
> Virtuoso Blog: https://medium.com/virtuoso-blog
> Data Access Drivers Blog: 
> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>
> Personal Weblogs (Blogs):
> Medium Blog: https://medium.com/@kidehen
> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>               http://kidehen.blogspot.com
>
> Profile Pages:
> Pinterest: https://www.pinterest.com/kidehen/
> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
> Twitter: https://twitter.com/kidehen
> Google+: https://plus.google.com/+KingsleyIdehen/about
> LinkedIn: http://www.linkedin.com/in/kidehen
>
> Web Identities (WebID):
> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>         : 
> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>
> _______________________________________________
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>


-- 
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to