[Wikidata-bugs] [Maniphest] [Commented On] T148988: Initial Cognate DB review

2016-10-26 Thread jcrespo
jcrespo added a comment.
So, with the feedback @Addshore gave me on IRC, I would suggest:

Converting, in the first case:

CREATE TABLE IF NOT EXISTS /*_*/cognate_titles (
  cgti_site VARBINARY(32) NOT NULL,
  cgti_namespace INT NOT NULL,
  cgti_title VARBINARY(255),
  cgti_key VARBINARY(255) NOT NULL,
  PRIMARY KEY (cgti_site, cgti_namespace, cgti_title)
  )/*$wgDBTableOptions*/;

CREATE INDEX /*i*/cgti_keys ON /*_*/cognate_titles (cgti_site, cgti_namespace, cgti_key);

into something more or less like:

CREATE TABLE IF NOT EXISTS /*_*/cognate_titles (
  ct_site VARBINARY(32) NOT NULL,
  ct_namespace INT NOT NULL,
  ct_title VARBINARY(255),
  ct_key INT UNSIGNED NOT NULL,
  PRIMARY KEY (ct_site, ct_key)
  )/*$wgDBTableOptions*/;

CREATE INDEX /*i*/ct_site_namespace_title ON /*_*/cognate_titles (ct_site, ct_namespace, ct_title);
// add the following if you will want to query all sites with a key (indexes depend on the SELECTs):
// CREATE INDEX /*i*/ct_key ON /*_*/cognate_titles (ct_key);

CREATE TABLE IF NOT EXISTS /*_*/cognate_keys (
  ck_id INT UNSIGNED PRIMARY KEY,
  ck_name VARBINARY(255),
  PRIMARY KEY (cgti_site, cgti_namespace, cgti_title)
  )/*$wgDBTableOptions*/;

CREATE INDEX /*i*/ck_name ON /*_*/cognate_keys (ck_name); // names can be unique or not, up to you

We didn't go deep into (site) and (namespace, title), I would definitely normalize site/interwiki on separate table on the second option you mention, it is sufficiently small in the first case.

The (namespace,title) denormalization will depend on if this allows to get rid of a having extra connections to the regular shards; I am ok if it allows that.TASK DETAILhttps://phabricator.wikimedia.org/T148988EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T148988: Initial Cognate DB review

2016-10-25 Thread Addshore
Addshore added a comment.

In T148988#2742025, @jcrespo wrote:
The main blocker is: is everything that this table has 100% public, or will it contain some private information. If it is fully public, it will be replicate do labs, assuming it will be useful there (I would assume yes). If there is private information, or things that could derive private information, it needs to be filtered before going to labs.


Everything in this table will be public information. Labs replication would likely be useful.TASK DETAILhttps://phabricator.wikimedia.org/T148988EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T148988: Initial Cognate DB review

2016-10-25 Thread jcrespo
jcrespo added a comment.
@Addshore I have some questions, they are not long, but they depend on each other, so I would love to chat with you when you find the time, as that will simplify the interaction. I am at Europe Timezone, so if you can find some time to meet at IRC, it would be great.

The main blocker is: is everything that this table has 100% public, or will it contain some private information. If it is fully public, it will be replicate do labs, assuming it will be useful there (I would assume yes). If there is private information, or things that could derive private information, it needs to be filtered before going to labs.

For the colums and indexes advice, please meet me on IRC.TASK DETAILhttps://phabricator.wikimedia.org/T148988EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs