jcrespo added a comment.

the NULL values in the title field where the title is the same as the normalized title (to cut back on duplicated data)

This is the kind of thing what makes SELECT not simple- you have to do an IF on the code based on if the value is NULL or not. That is not normalized. Even if it takes less space, it is not a good practice.

Duplicate data here means having:

title1, tag_1
title1, tag_2
title2, tag_2
title2, tag_3

instead of the simpler:

1, title1
2, title2

1, tag_1
2, tag_2
3, tag_3

1,1
1,2
2,2
2,3

And despite having more tables, it avoids duplication. Renaming a title, or a tag is changing a single row. And it saves a lot of space by doing references instead of full contents, repeated many times.

99.9% of the time the normalization step outputs the exact same title value

If that is the case, store only the ones that have been normalized, do not add NULL values.
And rename the table to something like cognate_normalized_titles.

When using the extension, check for the title on this table, if it doesn't exist, it means it has not been normalized (yet?). I would be ok with that, but that is a very different usage than the originally proposed.


TASK DETAIL
https://phabricator.wikimedia.org/T148988

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: daniel, Tobi_WMDE_SW, hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to