Lucas_Werkmeister_WMDE added a comment.
In T231276#5441664 <https://phabricator.wikimedia.org/T231276#5441664>, @ArielGlenn wrote: > In T231276#5441586 <https://phabricator.wikimedia.org/T231276#5441586>, @Lucas_Werkmeister_WMDE wrote: > >> > > ... > >> It’s part of the serialization. Not sure why that would be a new issue, though – this seems like a fairly fundamental issue (tying the page ID to the page content even though it’s not stable across delete+restore). Is it possible that File:Bolsonaro_etc is just the first file with structured data that was deleted and then restored? > > I'd be willing to put money on that. I think you’d lose it :) found some more with an ugly query: SELECT log_id, log_page, log_title, rev_id FROM logging JOIN revision ON log_page = rev_page JOIN slots ON rev_id = slot_revision_id -- this log entry restores a file WHERE log_type = 'delete' AND log_action = 'restore' AND log_namespace = 6 -- and there is a corresponding revision, predating the restoration, that already had a mediainfo slot AND slot_role_id = (SELECT role_id FROM slot_roles WHERE role_name = 'mediainfo') AND rev_timestamp < log_timestamp -- captions were introduced in January 2019, so we can skip all earlier revisions and log entries AND rev_timestamp > 20190101000000 AND log_timestamp > 20190101000000 -- the restoration did not reuse the page ID (which we get from a corresponding deletion) AND log_page != (SELECT logdel.log_page FROM logging AS logdel WHERE logdel.log_type = 'delete' AND logdel.log_action = 'delete' AND logdel.log_namespace = 6 AND logdel.log_title = logging.log_title LIMIT 1) LIMIT 10; For example, File:PL_Stanisław_Witkiewicz-Na_przełęczy_013.jpeg <https://commons.wikimedia.org/wiki/File:PL_Stanis%C5%82aw_Witkiewicz-Na_prze%C5%82%C4%99czy_013.jpeg> had a caption in revision 334323754 <https://commons.wikimedia.org/wiki/Special:PermanentLink/334323754>, 22:37, 10 January 2019; then was deleted <https://commons.wikimedia.org/wiki/Special:Redirect/logid/278125433> 23:10 of the same day; and later restored. Curiously enough, according to the log entry, the page ID at the time was 11632736 (`log_page` of `log_id = 278125433`); yet, the serialization of the previous revision, 334323754, already contains the entity ID M75745807 (checked using this code <https://wikitech.wikimedia.org/wiki/User:Lucas_Werkmeister_(WMDE)/How_to_get_the_raw_text_of_a_page_or_revision>, but with `mediainfo` instead of `main` for the slot). Perhaps WikibaseMediaInfo already contains code that’s supposed to take care of this? (Although updating serializations of old revisions like that sounds super dangerous to me.) And it broke recently? TASK DETAIL https://phabricator.wikimedia.org/T231276 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: hashar, ArielGlenn, Lucas_Werkmeister_WMDE, Liuxinyu970226, Aklapper, zeljkofilipin, darthmon_wmde, alaa_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Jonas, Wikidata-bugs, aude, Ricordisamoa, Lydia_Pintscher, Jdforrester-WMF, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs