Addshore added a comment.

  I dug into some of this code a little bit more and put some findings below.
  
  I believe that rather than use and abuse this entity info thing, we should 
create our own new things specifically for prefetching referenced entities if 
needed.
  
  EntityInfo as a concept is not currently intentionally used and should be 
removed.
  
  **What is EntityInfo, why does it exist and is it still used?**
  
  - EntityInfo was introduced back in the day to "prefetch" entity info needed 
for rendering a page, this includes terms needed for rendering a page
  - The EntityInfo builder currently uses the terms storage tables, and during 
the rollout of wb_terms towards the end another layer of caching needed to be 
added in front of it 
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/546659/
  - When the "formattercache" was introduced in 
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/441010/30/lib/includes/Formatters/WikibaseValueFormatterBuilders.php
 the use of `labelDescriptionLookupFactory->getLabelDescriptionLookup` was 
removed, which would have use EntityInfo based term lookups for formatting 
entity ID values. The new behaviour instead introduced a caching lookup that 
always used a EntityRetrievingTermLookup
  - Despite not using the entityinfo, it hasn't since been cleaned up / removed 
(I seem to remember there being a ticket relating to this? but I can't find it)
  
  As a result of the above, one of the first things that happens when rendering 
an entity is getting the entity info, which is then not used at all during 
rendering, as this is all now done via the formatted cache things.
  
  **Some other interesting observations (hinting toward removing entity info):**
  
  - `FormatterLabelDescriptionLookupFactory::OPT_LABEL_DESCRIPTION_LOOKUP` is 
likely fairly pointless now, as this option is not looked at (see above) when 
creating the formatted to be used for most things. However::::
    - `WikibaseValueFormatterBuilders::newItemPropertyIdHtmlLinkFormatter` uses 
the formatted cache, and creates its own lookup
    - Other methods in this class still call 
`labelDescriptionLookupFactory->getLabelDescriptionLookup` which includes a 
check for OPT_LABEL_DESCRIPTION_LOOKUP These include:
      - newPlainEntityIdFormatter (seems it might need terms? it is a plain 
formatter that is used in newEntityIdFormatter (see next line)
      - newEntityIdFormatter
      - getVocabularyUriFormatter
    - However adding breakpoints locally to the EntityInfoTermLookup class 
methods didn't result in being hit
    - Adding a breakpoint to the OPT_LABEL_DESCRIPTION_LOOKUP case in 
`getLabelDescriptionLookup` resulted in hits for:
      - getVocabularyUriFormatter (so this code path still gets the terminfo 
lookup specified)
      - newLabelsProviderEntityIdHtmlLinkFormatter (we just touched / are 
touching this in fed prop) (it is called by newEntityIdFormatter)
    - All calls to this should be able to use the formatter cache and we remove 
EntityInfo
  
  **Some bookmarks of the code areas I was looking at throughout this**
  F31841798: image.png <https://phabricator.wikimedia.org/F31841798>
  
  **Impact of removal**
  
  The current rate of requests to this EntityInfo builder stuff: 
https://grafana.wikimedia.org/d/u5wAugyik/wikibase-statsdrecordingsimplecache?panelId=2&fullscreen&orgId=1&from=now-1h&to=now
  The rate of requests for EntityInfo is essentially the same as the # of 
parser output generations for wikibase data.
  Looking at the cache metrics
  In a 12 hour period this is ~120million calls, so 10 million an hour, or 166k 
per min or ~2700 calls per second
  The cache hits come from local server cache, but the misses result in lookups 
from the terms storage db tables.
  
  We also track the db calls for this code path that may not be needed on 
https://grafana.wikimedia.org/d/000000548/wikibase-sql-term-storage?orgId=1&refresh=30s
  You can see these in the top panel for the 
"select.EntityInfoBuilder_collectTermsForEntities" metric.
  Averaging 4.7k ops a second/minute (TBA confirm this??)
  
  Needless to say that not doing all of this stuff that is not needed should 
speed things up and reduce memory consumption.
  It will also remove an old outdated concept that should have been killed some 
while ago.

TASK DETAIL
  https://phabricator.wikimedia.org/T253125

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: RhinosF1, Aklapper, Addshore, Jimfhahn, darthmon_wmde, Nandana, lucamauri, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to