jcrespo added a comment. |
the NULL values in the title field where the title is the same as the normalized title (to cut back on duplicated data)
This is the kind of thing what makes SELECT not simple- you have to do an IF on the code based on if the value is NULL or not. That is not normalized. Even if it takes less space, it is not a good practice.
Duplicate data here means having:
title1, tag_1 title1, tag_2 title2, tag_2 title2, tag_3
instead of the simpler:
1, title1 2, title2 1, tag_1 2, tag_2 3, tag_3 1,1 1,2 2,2 2,3And despite having more tables, it avoids duplication. Renaming a title, or a tag is changing a single row. And it saves a lot of space by doing references instead of full contents, repeated many times.
99.9% of the time the normalization step outputs the exact same title value
If that is the case, store only the ones that have been normalized, do not add NULL values.
And rename the table to something like cognate_normalized_titles.When using the extension, check for the title on this table, if it doesn't exist, it means it has not been normalized (yet?). I would be ok with that, but that is a very different usage than the originally proposed.
Cc: daniel, Tobi_WMDE_SW, hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs