Krinkle added a comment.
I still find myself very confused by this task and imagine that others might be struggling as well. I'll try to pick up the thread from before and continue to ask clarifying questions. - The authoritive source for describing items is Wikidata.org. - The authoritive source for describing the constraint checks is also on Wikidata.org. What is the authoritive source for executing a constraint check if all caches and secondary services were empty? I believe this is currently in MediaWiki (WBQC extension), which may consult WDQS as part of the constraint check, where WDQS in this context is mainly used as a way to query the relational data of Wikidata items, which we can't do efficiently within MediaWiki so we rely on WDQS for that. This means to run a constraint check, WDQS needs to be fairly up to date with all the items, which happens through some sync process that is not related to this RFC. Does that sound right? Speaking of caches and secondary services, where do we currently expose or store the result of constraint checks? As I understand it, they are: - Saved in Memcached for 1 day after a computation happens. - Exposed via action=rdf, which is best-effort only. Returns cache hit or nothing. It's not clear to me when one would use this, and what higher-level requirements this needs to meet. I'll assume for now there are cases somewhere where a JS gadget can't affort to wait to generate it and is fine with results just being missing if they weren't recently computed by something unrelated. - Exposed via Special:ConstraintReport/Q123, which ignores the cache and always computes it fresh. - Exposed via API action=wbcheckconstraints, which is the main and reliable way to access this data from the outside. Considers cache and re-generates on the fly as needed, so it might be slow. It's not clear to me why Special:ConstraintReport exists in this way. I suspect maybe it is to allow for an external service to be a cache or storage of constraint check results without having to worry about stale caches, so it's bascially exposing the computation end-point directly. That seems fine. What are those external stores? I think that's WDQS right? So WDQS is used for storing relational items, but also for storing constraint data. If so, why not use that as the store for this task? (Also, how is that currently backfilled? Are we happy with that? Would a different outcome to this task result in WDQS no longer doing this this way?) I suspect the reason we don't want to use WDQS for this is that you want to regularly clear that out and repopulate it from scratch, and ideally in a way that doesn't require running all non-memcached checks again which presumably would take a very long time. How long would that be? And how do we load these results currently into WDQS? Having public dumps of these seems valuable indeed. Is this something that could be dumped from WDQS? Responding to the main RFC question - using a permanent store logically owned by MediaWiki and populated progressively seems like a better direction indeed and would make sense. I'm not seriously proposing WDQS be used for this, rather I'm trying to better understand the needs through asking what it isn't serving right now. Assuming a store will be needed, how big would the canonical data be in gigabytes? What would be the expected writes per second from the job queue for this. And the expected reads per second from the various end points? Could it have a natural eviction strategy where MW takes care of replacing or removing things that are no longer needed, or would it need TTL-based eviction? Depending on the answers to this, using Main Stash might work for this. Which is essentially a persisted and replicated cache without LRU/pressure eviction, which sounds like it would fit. This is currently backed by Redis and was until recently used for sessions. This is now being migrated to a simple external MariaDB cluster (T212129 <https://phabricator.wikimedia.org/T212129>). TASK DETAIL https://phabricator.wikimedia.org/T214362 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Krinkle Cc: WMDE-leszek, eprodromou, CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, Aklapper, Addshore, Akuckartz, Demian, WDoranWMF, holger.knust, EvanProdromou, DannyS712, Nandana, kostajh, Lahi, Gq86, Pablo-WMDE, GoranSMilovanovic, RazeSoldier, QZanden, merbst, LawExplorer, _jensen, rosalieper, xSavitar, Scott_WUaS, Pchelolo, Izno, SBisson, Perhelion, Wikidata-bugs, Base, aude, GWicke, Bawolff, jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs