Lucas_Werkmeister_WMDE added a comment.

  I tinkered a bit more with this in production (`mwscript shell eowiki` on 
mwdebug1001). Specifically, I was curious what the entity usages for 
eowiki:Perseo looked like before they were deduplicated / combined, so I hacked 
together some code to render the revision from scratch.
  
    $s = mws()
    $rr = $s->getRevisionRenderer()
    $rl = $s->getRevisionLookup()
    $rev = $rl->getRevisionByPageId( 115136 )
    $rendered = $rr->getRenderedRevision( $rev )
    $po = $rendered->getSlotParserOutput( 'main' )
    $usages = $po->getExtensionData( 'wikibase-entity-usage' )
  
  And then fed them into the `UsageDeduplicator` (getting it from the 
`UsageAccumulatorFactory`’s private field out of laziness – `sudo` is a special 
PsySh command to bypass access checks):
  
    $uaf = wbc::getUsageAccumulatorFactory()
    sudo $ud = $uaf->usageDeduplicator
    $euf = new Wikibase\Client\Usage\EntityUsageFactory( 
wbc::getEntityIdParser() )
    $usageObjects = array_map( fn ( $str ) => $euf->newFromIdentity( $str ), 
array_keys( $usages ) )
  
  It turns out the original usages have //52// different statement usages for 
Q130832 (the connected item). This is well above the configured 
`$wgWBClientSettings['entityUsageModifierLimits']['C']` in production (33),  so 
they correctly get collapsed into a single “C” usage:
  
    > count( $usageObjects )
    = 93
    
    > count( $ud->deduplicate( $usageObjects ) )
    = 42
  
  So at this point, it makes sense that the individual `C.P%` usages are 
removed and only the `C` usage is kept. But then at some later point, 
presumably in `DataUpdateHookHandler::onParserCacheSaveComplete()` / 
`AddUsagesForPageJob`, we somehow have //fewer// than 33 statement usages left, 
so they get (re)added individually.
  
  I think we have two possible paths to continue on from here:
  
  - Understand where this second, shorter list of usages comes from.
  - Fix `DataUpdateHookHandler::onParserCacheSaveComplete()` to not re-add 
`C.P%` usages if a `C`  usage already exists (and likewise for other aspects), 
i.e. replace the current `$newUsages = array_diff_key( $usages, $currentUsages 
)` with something a bit smarter.
  
  I think the second part is something we should do sooner or later, but 
arguably it’s not the root cause of the problem, so maybe we should look at the 
first part first.

TASK DETAIL
  https://phabricator.wikimedia.org/T255706

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: ArthurTaylor, hoo, Lucas_Werkmeister_WMDE, ItamarWMDE, Ladsgroup, Krinkle, 
eprodromou, aaron, Michael, Aklapper, thcipriani, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, darthmon_wmde, Rosalie_WMDE, 
Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, 
Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, 
Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to