[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2021-03-15 Thread Addshore
Addshore closed this task as "Resolved".
Addshore claimed this task.
Addshore added a comment.
Restricted Application added a project: User-Addshore.


  SAL logs are connected to T270249 
  
  I checked in the DB and the list of issues looked empty :)

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, maantietaja, 
Akuckartz, Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Jonas, abian, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2021-03-15 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE closed subtask T270249: Run maintenance script to remove 
deleted items from term store on production as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, maantietaja, 
Akuckartz, Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Jonas, abian, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2021-02-17 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE closed subtask T274890: Make 
RemoveDeletedItemsFromTermStore.php maintenance script remove wbt_item_terms 
rows as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2021-01-19 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE closed subtask T270247: Write maintenance script for 
removing deleted items from the term store as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-12-15 Thread Lydia_Pintscher
Lydia_Pintscher moved this task from Prioritized Tech (prioritised from top to 
bottom) to Wikidata-Campsite-Iteration-∞ on the Wikidata-Campsite board.
Lydia_Pintscher edited projects, added Wikidata-Campsite 
(Wikidata-Campsite-Iteration-∞); removed Wikidata-Campsite.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

WORKBOARD
  https://phabricator.wikimedia.org/project/board/3402/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-12-15 Thread noarave
noarave set the point value for this task to "8".

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: noarave
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-12-15 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  This probably requires a maintenance script to ensure we clean up not just 
`wbt_term_in_lang`, but also `wbt_text_in_lang` and `wbt_text`. The maintenance 
script shouldn’t need to do much more than call 
`TermStoreCleaner::cleanTermInLangIds()`, after checking that the term really 
isn’t used anymore. (I don’t think the script should //find// the unused rows, 
that should be done separately, by whomever runs the maintenance script.)

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-12-15 Thread Addshore
Addshore added a comment.


  So comparing the 2 queries here, in T263730#6490963 
 and also in P12791 it seems 
that T263730#6490963  
contains more items that we potentially want to cleanup.
  
  I ran the query today and still seemingly get 9594 results, (which means the 
issue isn't occurring now? and was indeed only during the migration)
  
  Whoever tackles this should make sure they are happy with the query they are 
using to identify the entities that need to be cleaned up for.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-12-01 Thread WMDE-leszek
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-10-26 Thread Addshore
Addshore added a project: Wikidata-Campsite.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-09-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  (For the record, I also created a list of potentially affected items at 
P12791, but figured maybe it shouldn’t be public.)

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
darthmon_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-09-24 Thread MisterSynergy
MisterSynergy added a comment.


  In T263730#6490727 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > In other words, potentially affected is any item that was edited on 10 
September between 11:06 and 15:58 (UTC) and then deleted before 7 October 
13:27, if I’m not mistaken.
  
  
  
MariaDB [wikidatawiki_p]> SELECT DISTINCT wbit_item_id, log_title, 
log_timestamp FROM wbt_item_terms JOIN logging ON 
wbit_item_id=SUBSTRING(log_title, 2) WHERE log_type='delete' AND 
log_action='delete' AND log_namespace=0 AND log_title NOT IN (SELECT page_title 
FROM page WHERE page_namespace=0) ORDER BY log_timestamp ASC;

+--+---++
| wbit_item_id | log_title | log_timestamp  |
+--+---++
|  1798871 | Q1798871  | 20190908075346 |
| 67206532 | Q67206532 | 20190910183620 |
| 67206284 | Q67206284 | 20190910183629 |
| 67205707 | Q67205707 | 20190910183639 |
| 67205787 | Q67205787 | 20190910183649 |
| 67205329 | Q67205329 | 20190910183659 |
| 16376239 | Q16376239 | 20190910183709 |
| 12865809 | Q12865809 | 20190910183719 |
| 25667640 | Q25667640 | 20190910183731 |
| 25667272 | Q25667272 | 20190910183740 |
| 25664811 | Q25664811 | 20190910183750 |
|  3546228 | Q3546228  | 20190910183800 |
| 65175512 | Q65175512 | 20190910183810 |
| 65051610 | Q65051610 | 20190910183820 |
| 67205948 | Q67205948 | 20190910183830 |
|  9307721 | Q9307721  | 20190910183840 |

 ... (skip plenty of rows here) ...

| 30686328 | Q30686328 | 20191103185910 |
| 25333226 | Q25333226 | 20191103194317 |
|  8877061 | Q8877061  | 20191103194339 |
| 60975079 | Q60975079 | 20191103194433 |
|  8443133 | Q8443133  | 20191103194539 |
| 70013783 | Q70013783 | 20191107163208 |
| 74670014 | Q74670014 | 20191113061939 |
| 74065265 | Q74065265 | 20191115121708 |
| 74672773 | Q74672773 | 20191125085852 |
| 74688761 | Q74688761 | 20191125090312 |
| 74693264 | Q74693264 | 20191125090707 |
| 74670082 | Q74670082 | 20191125093548 |
| 74669422 | Q74669422 | 20191125093827 |
| 74668425 | Q74668425 | 20191125185307 |
| 74674299 | Q74674299 | 20191125234049 |
| 74691179 | Q74691179 | 20191126080348 |
| 74483053 | Q74483053 | 20191126101239 |
| 74675327 | Q74675327 | 20191127111732 |
| 74453398 | Q74453398 | 20191205123011 |
| 74613482 | Q74613482 | 20191206010632 |
| 74692551 | Q74692551 | 20191223084428 |
| 87589401 | Q87589401 | 20200313152209 |
+--+---++

9594 rows in set (17 min 42.76 sec)
  
  Except for one entry, the problem starts indeed on the afternoon of 
2019-10-09 (item deletion timestamp). However, there are plenty of entries 
until ~2019-11-03 in the evening (deletion time), and only 17 entries from a 
later deletion time.

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MisterSynergy
Cc: MisterSynergy, Addshore, Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, 
darthmon_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-09-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  I think this is because we were writing to the new term store at some point 
when the item was edited, but not when the item was deleted. Timeline:
  
  10 September 2019, 11:06:
  
  In T225055#5478485 , 
@Stashbot wrote:
  
  > Mentioned in SAL (#wikimedia-operations) [2019-09-10T11:06:02Z] 
 Synchronized wmf-config/InitialiseSettings.php: SWAT: 
6afe963 
:
 Set items term store on write both for all of Wikidata (T225055 
) (duration: 00m 55s)
  
  10 September 2019, 15:24:
  XabatuBot edits the item, adding an Asturian label. (Strictly speaking, this 
isn’t public information, but I see no harm in revealing it here.) Since we are 
at this time writing to the new term store for all of Wikidata, the terms of 
the item are duly written to the new term store.
  
  10 September 2019, 15:58:
  
  > Mentioned in SAL (#wikimedia-operations) [2019-09-10T15:58:37Z] 
ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert 
"Set items term store on write both for all of Wikidata" (duration: 01m 02s)
  
  11 September 2019, 11:15:
  
  In T225055#5482842 , 
@Stashbot wrote:
  
  > Mentioned in SAL (#wikimedia-operations) [2019-09-11T11:15:28Z] 
 Synchronized wmf-config/InitialiseSettings.php: SWAT: 
[[gerrit:535815|Set item terms on write both up to Q10mio (T225055 
)]] (duration: 01m 03s)
  
  12 September 2019, 09:51:
  MisterSynergy deletes the item as not notable. At this time, we are only 
writing to the new term store for items with an ID up to ten million, so the 
deletion is not sent to the new term store and the old terms remain.
  
  12 September2019, 12:03:
  
  In T225055#5487492 , 
@Stashbot wrote:
  
  > Mentioned in SAL (#wikimedia-operations) [2019-09-12T12:03:25Z] 
 Synchronized wmf-config/InitialiseSettings.php: SWAT: 
[[gerrit:536167|Set item terms on write both up to Q20mio (T225055 
)]] (duration: 01m 31s)
  
  23 September 2019, 11:23:
  
  In T225055#5515715 , 
@Stashbot wrote:
  
  > Mentioned in SAL (#wikimedia-operations) [2019-09-23T11:23:32Z] 
 Synchronized wmf-config/VariantSettings.php: SWAT: 
[[gerrit:538577|Set item terms on write both up to Q40Mio (T225055 
)]] (duration: 00m 55s)
  
  7 October 2019, 13:27:
  
  In T225055#5552062 , 
@Stashbot wrote:
  
  > Mentioned in SAL (#wikimedia-operations) [2019-10-07T13:27:21Z] 
 Synchronized wmf-config/InitialiseSettings.php: 
[[gerrit:541249|Set all of wikidata to write both for item term store (T225055 
)]] (duration: 00m 54s)
  
  If the item was deleted now, its terms would be removed from the new term 
store, since we are now writing to the new term store for all item IDs.
  
  In other words, potentially affected is any item that was edited on 10 
September between 11:06 and 15:58 (UTC) and then deleted before 7 October 
13:27, if I’m not mistaken. (Items with an ID below 40 million are actually 
only affected if they were deleted before 23 September 11:23, and similar 
caveats apply to thresholds of 20 and 10 million, if we want to be exact.)

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T263730: Wikidata term store contains rows for deleted items

2020-09-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE created this task.
Lucas_Werkmeister_WMDE added projects: Wikidata, 
MediaWiki-extensions-WikibaseRepository.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a Wikidata editor, I want to create and edit items without getting 
conflict reports for pages that don’t exist.
  
  **Problem:**
  The Wikidata term store (aka new term store, normalized term store) contains 
some rows for deleted items.
  
  **Example:**
  An editor tried to set the Spanish label of an item to «Nicki Nicole» and the 
description to «cantante argentina», but got an error that “Item Nicki Nicole 
(Q67179790) already has label "Nicki Nicole" associated with language code es, 
using the same description text” (project chat permalink 
);
 however, the page Q67179790  was 
actually deleted on 12 September 2019.
  
  Querying the term store on Toolforge, we can see that the rows still exist 
which allowed Wikidata to detect the “conflict”, and also to render 
`{{Q|Q67179790}}` using the label of the deleted item:
  
MariaDB [wikidatawiki_p]> SELECT wbit_item_id, wby_name, wbxl_language, 
wbx_text FROM wbt_item_terms JOIN wbt_term_in_lang ON wbit_term_in_lang_id = 
wbtl_id JOIN wbt_type ON wbtl_type_id = wby_id JOIN wbt_text_in_lang ON 
wbtl_text_in_lang_id = wbxl_id JOIN wbt_text ON wbxl_text_id = wbx_id WHERE 
wbit_item_id = 67179790;

+--+-+---+---+
| wbit_item_id | wby_name| wbxl_language | wbx_text 
 |

+--+-+---+---+
| 67179790 | label   | en| Nicki Nicole 
 |
| 67179790 | label   | es| Nicki Nicole 
 |
| 67179790 | label   | ast   | Nicki Nicole 
 |
| 67179790 | description | en| Argentine singer 
 |
| 67179790 | description | es| cantante argentina   
 |
| 67179790 | description | en-gb | Argentine singer 
 |
| 67179790 | description | sq| këngëtare argjentinase   
 |
| 67179790 | description | fr| chanteuse argentine  
 |
| 67179790 | description | gl| cantante arxentina   
 |
| 67179790 | description | bn| আর্জেন্টিনীয় গায়িকা
 |
| 67179790 | description | ar| مغنية أرجنتينية  
 |
| 67179790 | description | he| זמרת ארגנטינאית  
 |
| 67179790 | description | ro| cântăreață argentiniană  
 |
| 67179790 | description | ca| cantant argentina
 |
| 67179790 | description | en-ca | Argentine singer 
 |

+--+-+---+---+
15 rows in set (0.04 sec)
  
  **Screenshots/mockups:**
  
  **BDD**
  GIVEN 
  AND 
  WHEN
  AND
  THEN
  AND
  
  **Acceptance criteria:**
  
  - The term store does not contain any data for deleted items.
  
  **Open questions:**
  
  - How many deleted items does this affect?

TASK DETAIL
  https://phabricator.wikimedia.org/T263730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Aklapper, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs