> (i) identify and delete lower priority data (e.g. labels, descriptions, aliases, non-normalized values, etc);
Ouch. For me - as a native Hungarian: the labels, descriptions, aliases - is extremely important - as a data user: I am using "labels","aliases" in my concordances tools ( mapping wikidata-ids with external ids ) So Please clarify the practical meaning of the *"delete"* Thanks in advance, Imre Mike Pham <mp...@wikimedia.org> ezt írta (időpont: 2021. aug. 18., Sze, 23:08): > Wikidata community members, > > Thank you for all of your work helping Wikidata grow and improve over the > years. In the spirit of better communication, we would like to take this > opportunity to share some of the current challenges Wikidata Query Service > (WDQS) is facing, and some strategies we have for dealing with them. > > WDQS currently risks failing to provide acceptable service quality due to > the following reasons: > > > 1. > > Blazegraph scaling > 1. > > Graph size. WDQS uses Blazegraph as our graph backend. While > Blazegraph can theoretically support 50 billion edges > <https://blazegraph.com/>, in reality Wikidata is the largest graph > we know of running on Blazegraph (~13 billion triples > > <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>), > and there is a risk that we will reach a size > > <https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit > of what it can realistically support > <https://phabricator.wikimedia.org/T213210>. Once Blazegraph is > maxed out, WDQS can no longer be updated. This will also break Wikidata > tools that rely on WDQS. > 2. > > Software support. Blazegraph is end of life software, which is no > longer actively maintained, making it an unsustainable backend to > continue > moving forward with long term. > > > Blazegraph maxing out in size poses the greatest risk for catastrophic > failure, as it would effectively prevent WDQS from being updated further, > and inevitably fall out of date. Our long term strategy to address this is > to move to a new graph backend that best meets our WDQS needs and is > actively maintained, and begin the migration off of Blazegraph as soon as a > viable alternative is identified > <https://phabricator.wikimedia.org/T206560>. > > In the interim period, we are exploring disaster mitigation options for > reducing Wikidata’s graph size in the case that we hit this upper graph > size limit: (i) identify and delete lower priority data (e.g. labels, > descriptions, aliases, non-normalized values, etc); (ii) separate out > certain subgraphs (such as Lexemes and/or scholarly articles). This would > be a last resort scenario to keep Wikidata and WDQS running with reduced > functionality while we are able to deploy a more long-term solution. > > > > 1. > > Update and access scaling > 1. > > Throughput. WDQS is currently trying to provide fast updates, and > fast unlimited queries for all users. As the number of SPARQL > queries grows over time > > <https://www.mediawiki.org/wiki/User:MPopov_(WMF)/Wikimania_2021_Hackathon>alongside > graph updates, WDQS is struggling to sufficiently keep up > > <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&from=now-6M&to=now&refresh=1d> > in each dimension of service quality without compromising anywhere. For > users, this often leads to timed out queries. > 2. > > Equitable service. We are currently unable to adjust system > behavior per user/agent. As such, it is not possible to provide > equitable > service to users: for example, a heavy user could swamp WDQS enough to > hinder usability by community users. > > > In addition to being a querying service for Wikidata, WDQS is also part of > the edit pipeline of Wikidata (every edit on Wikidata is pushed to WDQS to > update the data there). While deploying the new Flink-based Streaming > Updater <https://phabricator.wikimedia.org/T244590> will help with > increasing throughput of Wikidata updates, there is a substantial risk that > WDQS will be unable to keep up with the combination of increased querying > and updating, resulting in more tradeoffs between update lag and querying > latency/timeouts. > > In the near term, we would like to work more closely with you to determine > what acceptable trade-offs would be for preserving WDQS functionality while > we scale up Wikidata querying. In the long term, we will be conducting more > user research to better understand your needs so we can (i) optimize > querying via SPARQL and/or other methods, (ii) explore better user > management that will allow us to prevent heavy use of WDQS that does not > align with the goals of our movement and projects, and (iii) make it easier > for users to set up and run their own query services. > > Though this information about the current state of WDQS may not be a total > surprise to many of you, we want to be as transparent with you as possible > to ensure that there are as few surprises as possible in the case of any > potential service disruptions/catastrophic failures, and that we can > accommodate your work as best as we can in the future evolution of WDQS. We > plan on doing a session on WDQS scaling challenges during WikidataCon this > year at the end of October. > > Thanks for your understanding with these scaling challenges, and for any > feedback you have already been providing. If you have new concerns, > comments and questions, you can best reach us at this talk page > <https://www.wikidata.org/wiki/Wikidata_talk:Query_Service_scaling_update_Aug_2021>. > Additionally, if you have not had a chance to fill out our survey > <https://docs.google.com/forms/d/e/1FAIpQLSe1H_OXQFDCiGlp0QRwP6-Z2CGCgm96MWBBmiqsMLu0a6bhLg/viewform?usp=sf_link> > yet, please tell us how you use the Wikidata Query Service (see privacy > statement > <https://foundation.wikimedia.org/wiki/WDQS_User_Survey_2021_Privacy_Statement>)! > Whether you are an occasional user or create tools, your feedback is needed > to decide our future development. > > Best, > > WMF Search + WMDE > _______________________________________________ > Wikidata mailing list -- wikidata@lists.wikimedia.org > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org >
_______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-le...@lists.wikimedia.org