daniel added a comment.

@jcrespo Tags? What tags? And why multiple tags for the same title? We are comparing pages titles between wikis, with some minimal string normalization applied. That's it.

(Btw, note that we are discussing two completely unrelated kinds of normalization here: string normalization, and schema normalization. Let's try and not get confused about that.)

@Addshore I just realized that we don't need the key (i.e. the normalized title) as a string at all. We just need a hash of the normalized title. A 64 bit hash can be represented as an integer. You may still want to have a table for titles, so you don't have to store the same title 20 times if the title exists on 20 wikis. If you do this, I suggest to again use the (unnormalized) title's hash as the numeric representation of the title. This ensures consistency between wikis, and reduces the need for lookups.


TASK DETAIL
https://phabricator.wikimedia.org/T148988

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: daniel, Tobi_WMDE_SW, hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to