[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-10-20 Thread Addshore
Addshore added a comment. This is now fully split down into sub tasksTASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: jcrespo, Meno25, gerritbot, Darkdadaah, WMDE-leszek, Lydia_Pintscher, gabriel-w

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-10-13 Thread jcrespo
jcrespo added a comment. Title is already a first-class entity, that's what the page table is. No, right now, page is an entity that has a series of properties: at title, a text, etc. By setting title as a strong entity, it has meaning by its own: a page has 1:1 titles, a title has 1:0 pages. Ther

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-10-12 Thread daniel
daniel added a comment. Hi @jcrespo! In T145412#2704922, @jcrespo wrote: Performance and scalability. We need a way to efficiently track and query page names across all Wiktionaries. Why not solve the problem forever by making title a first-class entity on core, solving title and *link, page_ass

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-10-12 Thread gerritbot
gerritbot added a comment. Change 312257 merged by jenkins-bot: Use new db schema https://gerrit.wikimedia.org/r/312257TASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, gerritbotCc: jcrespo, Meno25,

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-10-11 Thread jcrespo
jcrespo added a comment. Performance and scalability. We need a way to efficiently track and query page names across all Wiktionaries. Why not solve the problem forever by making title a first-class entity on core, solving title and *link, page_assessment,etc. space issues at the same time? I am n

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-28 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T145412#2673669, @daniel wrote: A baseline implementation doesn't need to support normalization. But I do not think we can deploy without it, simply because Cognate would then not really work for French Wiktionary, which is one of the most active Wiktionaries, a

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-28 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T145412#2673520, @Addshore wrote: Hmm, okay! I'm starting to think that an initial implementation should perhaps not normalize at all and then take on cases on a case by case basis? @Lydia_Pintscher @daniel @WMDE-leszek thoughts? Yeah that sound sensible by

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-28 Thread daniel
daniel added a comment. In T145412#2673520, @Addshore wrote: I'm starting to think that an initial implementation should perhaps not normalize at all and then take on cases on a case by case basis? @Lydia_Pintscher @daniel @WMDE-leszek thoughts? Normalization should be limited to what is absolu

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-28 Thread Addshore
Addshore added a comment. In T145412#2670625, @daniel wrote: I just realized: When an entry is added to or removed from the central table, all pages in the table with the same key (including the added or removed entry) need to trigger a purge for the respective wiki page. There should probably be

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-28 Thread Addshore
Addshore added a comment. In T145412#2670778, @Darkdadaah wrote: Maybe I read the code wrong, but we don't want to normalize the case of the words, e.g. "Clause" and "clause": the interwiki from [[de:Clause]] to [[ar:clause]] is wrong. Hmm, okay! I'm starting to think that an initial implementat

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-27 Thread Darkdadaah
Darkdadaah added a comment. Maybe I read the code wrong, but we don't want to normalize the case of the words, e.g. "Clause" and "clause": the interwiki from [[de:Clause]] to [[ar:clause]] is wrong.TASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-27 Thread daniel
daniel added a comment. I just realized: When an entry is added to or removed from the central table, all pages in the table with the same key (including the added or removed entry) need to trigger a purge for the respective wiki page. There should probably be a ticket for this :)TASK DETAILhttps:

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-27 Thread gerritbot
gerritbot added a comment. Change 313003 had a related patch set uploaded (by Addshore): Add normalization for titles => keys https://gerrit.wikimedia.org/r/313003TASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-22 Thread daniel
daniel added a comment. For reference: Langlink-entries not matching the page title, from the main namespace of de, en, and fr Wiktionary. F4513480: wiktionary-langlink-mismatch.zipTASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-22 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. To clarify: the result is that we need to be able to handle some other namespaces in the future but for now only do the main namespace. Correct?TASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emai

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-22 Thread gerritbot
gerritbot added a comment. Change 312257 had a related patch set uploaded (by Addshore): Use new db schema https://gerrit.wikimedia.org/r/312257TASK DETAILhttps://phabricator.wikimedia.org/T145412EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, gerri

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-20 Thread daniel
daniel added a comment. In the light of the above comments regarding prefixes and namespaces, a few thoughts about the database schema for connecting the pages. It seems we need the following fields: cgnt_wiki: the wiki ID cgnt_title: the page title (including namespace) cgnt_key: a normalized ve

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-20 Thread daniel
daniel added a comment. @Addshore linking the rhymes-namespaces would be desirable, but not trivial. I would suggest to leave it for later. Stripping prefixes during normalization isn't too hard, but Cognate would need to know that "German/" is the equivalent prefix to "Deutsch:" - and so on for a

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-20 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T145412#2652087, @Addshore wrote: While looking at implementing the moves in https://gerrit.wikimedia.org/r/#/c/311696/ I realised that the current db table only covers a single namespace, however the config allows multiple namespaces to be defined as cognate na

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-19 Thread Addshore
Addshore added a comment. In T145412#2629460, @daniel wrote: For the record, some concerns that came up: Performance and scalability. We need a way to efficiently track and query page names across all Wiktionaries. Right now it looks like this is a central DB table that contains a mapping of

[Wikidata-bugs] [Maniphest] [Commented On] T145412: Review & work on Cognate extension

2016-09-12 Thread daniel
daniel added a comment. For the record, some concerns that came up: Performance and scalability. We need a way to efficiently track and query page names across all Wiktionaries. Normalization of page names before comparison. Sorting of language links. We may want a separate extension for that. TA