chelsyx added a comment.
Categorization
Excluding hidden categories and 'needing_category' categories, there are 1,629,592 (3.73%) files that don't belong to any category, 22,492,880 (51.55%) files belong to only 1 category as of December 12, 2017.
F11832678: nfile_by_categories.png
Breakdown by m
chelsyx added a comment.
Status of tasks of this ticket:
Search hits based on which element the search is hitting: file name vs. description vs. category
This is not feasible currently. Possible solution is T177353#3716344, and we will need help from search backend team.
"Unfindable" images m
chelsyx added a comment.
On November 7, the number of files having a "needing categories" category is 4,268,386 (10%). The following table break down the counts by media type:
img_media_typeneed_catn_filesproportion
bitmapno3617694184.47%
bitmapyes42072329.82%
drawingno11673892.73%
drawingyes1774
chelsyx added a comment.
In T177353#3714007, @debt wrote:
Oh, that looks like that will be quite interesting, @chelsyx, although it looks like it might be a bit of manual work involved.
Getting data from the move log is easy, but it will take some time to train and adjust the model. @debt @Ramse
chelsyx added a comment.
In T177353#3716995, @debt wrote:
Great idea, @EBernhardson, let's do it! @chelsyx can you get that sampling from the data we already have?
@debt Yes, I can get those queries from TestSearchSatisfaction2 table. We will need help from @EBernhardson to run them against test
debt added a comment.
Great idea, @EBernhardson, let's do it! @chelsyx can you get that sampling from the data we already have?TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, debtCc: EBernhardson, Ak
EBernhardson added a comment.
While we don't log it, we could certainly take a sampling of say 20k queries, run them against our test cluster, and poke at the results to see which parts triggered the hit.TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikime
Ramsey-WMF added a comment.
In T177353#3711572, @chelsyx wrote:
There are 142,994 files with annotations (ImageNote), follow this link for the most current count.
The revision history of annotations are there, along with other page revision history, for example: https://commons.wikimedia.org/w/in
debt added a comment.
Oh, that looks like that will be quite interesting, @chelsyx, although it looks like it might be a bit of manual work involved.TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, de
chelsyx added a comment.
For unhelpful file names, I want to extract the old and new file names from the move log whose change reason is meaningless or ambiguous, and then train a model to classify these file names. As far as I know, short text classification like this is a bit tricky.. @mpopov do
chelsyx added a comment.
There are 142,994 files with annotations (ImageNote), follow this link for the most current count.
The revision history of annotations are there, along with other page revision history, for example: https://commons.wikimedia.org/w/index.php?title=File:Henley_2009_women.jpg
11 matches
Mail list logo