EBernhardson added a comment.

  In T239931#5726538 <https://phabricator.wikimedia.org/T239931#5726538>, 
@Ladsgroup wrote:
  
  > Hey, Thanks for the comment. I'm planning to turn it back on ASAP, right 
now, we are at middle of the migration and it puts too much pressure on s8, 
once the migration of of wb_terms is over, we can turn this back on again. 
That's going to happen in two months (hopefully).
  > I have two questions:
  >
  > - If someone makes an edit or makes a new item still the index gets updated 
but adding new feature (let's assume for example "number of claims" gets added 
to the search index) is not propagating into the system. Is that correct?
  
  Edit's all propagate with perhaps a few minutes of delay. So if the way some 
field is generated is changed, or a new property is added, that gets updated on 
a standard edit. What happens though is many pages do not get edited. When we 
first rolled out the automated saneitizer there were millions of pages that 
still did not have properties added several years prior.
  
  > - If the above statement is correct, how often do you change the index 
structure, meaning you need the run reindexing? When was the last time this 
change was needed?
  
  It's not necessarily the indexing structure, but any change to the way 
searchable properties are rendered. There was a ticket (T239950 
<https://phabricator.wikimedia.org/T239950>) filed about 5 days ago to request 
changing the way some wikidata properties are rendered. The only way to roll 
out a change like that is to regenerate the pages and ship them to 
elasticsearch. You could imagine how much worse the load would be if we had to 
re-render all wikidata items from a maintenance script instead of an automated 
process that slowly does it over 8 weeks. I'm not sure when the last change 
that affected wikidata was, but in general there are probably a few updates a 
quarter that effect the search document rendering.
  
  > If you change it quite often. let us know.
  > The best solution IMO is not to make this less aggressive, it's to stop 
rendering html of the items which is very heavy job for wikidata (unlike 
Wikipedia pages). Doing it is not super hard but I'm not super sure where to 
start. I might pick this up to see what I can do.
  
  Sounds like some kind of parse flag? I'm not too familiar with those 
interfaces. Separately, CirrusSearch tends to assume that the ParserCache has 
an anonymously rendered version of all pages somewhere. Is Cirrus somehow 
getting a different cache key than anonymous page views?

TASK DETAIL
  https://phabricator.wikimedia.org/T239931

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson
Cc: EBernhardson, Ladsgroup, Addshore, Aklapper, dcausse, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, 
Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to