AndrewTavis_WMDE moved this task from In progress to Needs product input on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment.
The thread on Mattermost <https://mattermost.wikimedia.de/swe/pl/gsr9b485x7geby79t4sg151j7c> for discussing this has a lot of comments on the data restrictions we're dealing with here because there is no text table for Wikidata in the Data Lake. A work around using `revision_text_bytes` to determine the minimum size that an item could be (i.e. = empty) has been used so far with okish results, but there are definitely drawbacks and it's not exact. What it is that I can say here is that: - There are lots of items being created empty (from one subset `3,540,260`) - They're not normally deleted (from the same subset only `0.95%` where) - It's usual that there are edits (I've yet to see an item that was created empty and is still empty, but please note that this is an eye test on ~30 items) Moving this to `Needs product input` for now. A basic thing that can be done that won't take too much time is that I can use a range instead of the case when for determining when a item is empty via the length of it's QID and the `revision_text_bytes` size. We would then not be getting empty on creation items 100% of the time, but I could also find the ratio and we could agree on what an acceptable margin of error would be (say `> 90%`). Time estimate on this is 1/2 a day. TASK DETAIL https://phabricator.wikimedia.org/T360761 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org