>  (i) identify and delete lower priority data (e.g. labels, descriptions,
aliases, non-normalized values, etc);

Ouch.
For me
- as a native Hungarian: the labels, descriptions, aliases - is extremely
important
- as a data user: I am using "labels","aliases" in my concordances tools (
mapping wikidata-ids with external ids )

So  Please clarify the practical meaning of the *"delete"*

Thanks in advance,
  Imre



Mike Pham <mp...@wikimedia.org> ezt írta (időpont: 2021. aug. 18., Sze,
23:08):

> Wikidata community members,
>
> Thank you for all of your work helping Wikidata grow and improve over the
> years. In the spirit of better communication, we would like to take this
> opportunity to share some of the current challenges Wikidata Query Service
> (WDQS) is facing, and some strategies we have for dealing with them.
>
> WDQS currently risks failing to provide acceptable service quality due to
> the following reasons:
>
>
>    1.
>
>    Blazegraph scaling
>    1.
>
>       Graph size. WDQS uses Blazegraph as our graph backend. While
>       Blazegraph can theoretically support 50 billion edges
>       <https://blazegraph.com/>, in reality Wikidata is the largest graph
>       we know of running on Blazegraph (~13 billion triples
>       
> <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>),
>       and there is a risk that we will reach a size
>       
> <https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit
>       of what it can realistically support
>       <https://phabricator.wikimedia.org/T213210>. Once Blazegraph is
>       maxed out, WDQS can no longer be updated. This will also break Wikidata
>       tools that rely on WDQS.
>       2.
>
>       Software support. Blazegraph is end of life software, which is no
>       longer actively maintained, making it an unsustainable backend to 
> continue
>       moving forward with long term.
>
>
> Blazegraph maxing out in size poses the greatest risk for catastrophic
> failure, as it would effectively prevent WDQS from being updated further,
> and inevitably fall out of date. Our long term strategy to address this is
> to move to a new graph backend that best meets our WDQS needs and is
> actively maintained, and begin the migration off of Blazegraph as soon as a
> viable alternative is identified
> <https://phabricator.wikimedia.org/T206560>.
>
> In the interim period, we are exploring disaster mitigation options for
> reducing Wikidata’s graph size in the case that we hit this upper graph
> size limit: (i) identify and delete lower priority data (e.g. labels,
> descriptions, aliases, non-normalized values, etc); (ii) separate out
> certain subgraphs (such as Lexemes and/or scholarly articles). This would
> be a last resort scenario to keep Wikidata and WDQS running with reduced
> functionality while we are able to deploy a more long-term solution.
>
>
>
>    1.
>
>    Update and access scaling
>    1.
>
>       Throughput. WDQS is currently trying to provide fast updates, and
>       fast unlimited queries for all users. As the number of SPARQL
>       queries grows over time
>       
> <https://www.mediawiki.org/wiki/User:MPopov_(WMF)/Wikimania_2021_Hackathon>alongside
>       graph updates, WDQS is struggling to sufficiently keep up
>       
> <https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&from=now-6M&to=now&refresh=1d>
>       in each dimension of service quality without compromising anywhere.  For
>       users, this often leads to timed out queries.
>       2.
>
>       Equitable service. We are currently unable to adjust system
>       behavior per user/agent. As such, it is not possible to provide 
> equitable
>       service to users: for example, a heavy user could swamp WDQS enough to
>       hinder usability by community users.
>
>
> In addition to being a querying service for Wikidata, WDQS is also part of
> the edit pipeline of Wikidata (every edit on Wikidata is pushed to WDQS to
> update the data there). While deploying the new Flink-based Streaming
> Updater <https://phabricator.wikimedia.org/T244590> will help with
> increasing throughput of Wikidata updates, there is a substantial risk that
> WDQS will be unable to keep up with the combination of increased querying
> and updating, resulting in more tradeoffs between update lag and querying
> latency/timeouts.
>
> In the near term, we would like to work more closely with you to determine
> what acceptable trade-offs would be for preserving WDQS functionality while
> we scale up Wikidata querying. In the long term, we will be conducting more
> user research to better understand your needs so we can (i) optimize
> querying via SPARQL and/or other methods, (ii) explore better user
> management that will allow us to prevent heavy use of WDQS that does not
> align with the goals of our movement and projects, and (iii) make it easier
> for users to set up and run their own query services.
>
> Though this information about the current state of WDQS may not be a total
> surprise to many of you, we want to be as transparent with you as possible
> to ensure that there are as few surprises as possible in the case of any
> potential service disruptions/catastrophic failures, and that we can
> accommodate your work as best as we can in the future evolution of WDQS. We
> plan on doing a session on WDQS scaling challenges during WikidataCon this
> year at the end of October.
>
> Thanks for your understanding with these scaling challenges, and for any
> feedback you have already been providing. If you have new concerns,
> comments and questions, you can best reach us at this talk page
> <https://www.wikidata.org/wiki/Wikidata_talk:Query_Service_scaling_update_Aug_2021>.
> Additionally, if you have not had a chance to fill out our survey
> <https://docs.google.com/forms/d/e/1FAIpQLSe1H_OXQFDCiGlp0QRwP6-Z2CGCgm96MWBBmiqsMLu0a6bhLg/viewform?usp=sf_link>
> yet, please tell us how you use the Wikidata Query Service (see privacy
> statement
> <https://foundation.wikimedia.org/wiki/WDQS_User_Survey_2021_Privacy_Statement>)!
> Whether you are an occasional user or create tools, your feedback is needed
> to decide our future development.
>
> Best,
>
> WMF Search + WMDE
> _______________________________________________
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to