AndrewTavis_WMDE moved this task from In progress to Needs product input on the 
Wikidata Analytics (Kanban) board.
AndrewTavis_WMDE added a comment.


  The thread on Mattermost 
<https://mattermost.wikimedia.de/swe/pl/gsr9b485x7geby79t4sg151j7c> for 
discussing this has a lot of comments on the data restrictions we're dealing 
with here because there is no text table for Wikidata in the Data Lake. A work 
around using `revision_text_bytes` to determine the minimum size that an item 
could be (i.e. = empty) has been used so far with okish results, but there are 
definitely drawbacks and it's not exact.
  
  What it is that I can say here is that:
  
  - There are lots of items being created empty (from one subset `3,540,260`)
  - They're not normally deleted (from the same subset only `0.95%` where)
  - It's usual that there are edits (I've yet to see an item that was created 
empty and is still empty, but please note that this is an eye test on ~30 items)
  
  Moving this to `Needs product input` for now. A basic thing that can be done 
that won't take too much time is that I can use a range instead of the case 
when for determining when a item is empty via the length of it's QID and the 
`revision_text_bytes` size. We would then not be getting empty on creation 
items 100% of the time, but I could also find the ratio and we could agree on 
what an acceptable margin of error would be (say `> 90%`). Time estimate on 
this is 1/2 a day.

TASK DETAIL
  https://phabricator.wikimedia.org/T360761

WORKBOARD
  https://phabricator.wikimedia.org/project/board/6546/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE
Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to