Krinkle added a comment.

  I still find myself very confused by this task and imagine that others might 
be struggling as well.
  
  I'll try to pick up the thread from before and continue to ask clarifying 
questions.
  
  - The authoritive source for describing items is Wikidata.org.
  - The authoritive source for describing the constraint checks is also on 
Wikidata.org.
  
  What is the authoritive source for executing a constraint check if all caches 
and secondary services were empty? I believe this is currently in MediaWiki 
(WBQC extension), which may consult WDQS as part of the constraint check, where 
WDQS in this context is mainly used as a way to query the relational data of 
Wikidata items, which we can't do efficiently within MediaWiki so we rely on 
WDQS for that. This means to run a constraint check, WDQS needs to be fairly up 
to date with all the items, which happens through some sync process that is not 
related to this RFC. Does that sound right?
  
  Speaking of caches and secondary services, where do we currently expose or 
store the result of constraint checks? As I understand it, they are:
  
  - Saved in Memcached for 1 day after a computation happens.
  - Exposed via action=rdf, which is best-effort only. Returns cache hit or 
nothing. It's not clear to me when one would use this, and what higher-level 
requirements this needs to meet. I'll assume for now there are cases somewhere 
where a JS gadget can't affort to wait to generate it and is fine with results 
just being missing if they weren't recently computed by something unrelated.
  - Exposed via Special:ConstraintReport/Q123, which ignores the cache and 
always computes it fresh.
  - Exposed via API action=wbcheckconstraints, which is the main and reliable 
way to access this data from the outside. Considers cache and re-generates on 
the fly as needed, so it might be slow.
  
  It's not clear to me why Special:ConstraintReport exists in this way. I 
suspect maybe it is to allow for an external service to be a cache or storage 
of constraint check results without having to worry about stale caches, so it's 
bascially exposing the computation end-point directly. That seems fine. What 
are those external stores? I think that's WDQS right? So WDQS is used for 
storing relational items, but also for storing constraint data. If so, why not 
use that as the store for this task? (Also, how is that currently backfilled? 
Are we happy with that? Would a different outcome to this task result in WDQS 
no longer doing this this way?)
  
  I suspect the reason we don't want to use WDQS for this is that you want to 
regularly clear that out and repopulate it from scratch, and ideally in a way 
that doesn't require running all non-memcached checks again which presumably 
would take a very long time. How long would that be? And how do we load these 
results currently into WDQS?
  
  Having public dumps of these seems valuable indeed. Is this something that 
could be dumped from WDQS?
  
  Responding to the main RFC question - using a permanent store logically owned 
by MediaWiki and populated progressively seems like a better direction indeed 
and would make sense. I'm not seriously proposing WDQS be used for this, rather 
I'm trying to better understand the needs through asking what it isn't serving 
right now.
  
  Assuming a store will be needed, how big would the canonical data be in 
gigabytes? What would be the expected writes per second from the job queue for 
this. And the expected reads per second from the various end points?
  
  Could it have a natural eviction strategy where MW takes care of replacing or 
removing things that are no longer needed, or would it need TTL-based eviction?
  
  Depending on the answers to this, using Main Stash might work for this. Which 
is essentially a persisted and replicated cache without LRU/pressure eviction, 
which sounds like it would fit. This is currently backed by Redis and was until 
recently used for sessions. This is now being migrated to a simple external 
MariaDB cluster (T212129 <https://phabricator.wikimedia.org/T212129>).

TASK DETAIL
  https://phabricator.wikimedia.org/T214362

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Krinkle
Cc: WMDE-leszek, eprodromou, CCicalese_WMF, kchapman, Krinkle, mobrovac, abian, 
Lydia_Pintscher, Lucas_Werkmeister_WMDE, Marostegui, Joe, daniel, Agabi10, 
Aklapper, Addshore, Akuckartz, Demian, WDoranWMF, holger.knust, EvanProdromou, 
DannyS712, Nandana, kostajh, Lahi, Gq86, Pablo-WMDE, GoranSMilovanovic, 
RazeSoldier, QZanden, merbst, LawExplorer, _jensen, rosalieper, xSavitar, 
Scott_WUaS, Pchelolo, Izno, SBisson, Perhelion, Wikidata-bugs, Base, aude, 
GWicke, Bawolff, jayvdb, fbstj, santhosh, Jdforrester-WMF, Ladsgroup, Mbch331, 
Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to