[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names

2015-07-21 Thread Smalyshev
Smalyshev added a comment.

 The sitelinks are distinguishable by the URL (https://simple.wikipedia.org/)


URL is the different triple than the data, and matching URLs means that the 
client should maintain own database which says which Wiki URL matches which 
language and do pattern matching on the URL data. This does not sound to me 
like a good solution, both performance-wise and design-wise. This also couples 
two things (language and URL) which should not be coupled as they describe two 
different things. Since we have triple using schema:inLanguage and data using 
language tags, we should ensure those have right values, instead of relying on 
other information to fix wrong values there.

 en is also a standard language code.


True, but it does not adequately describes the data which refers to Simple 
English, not just English. Just as nl would not adequately describe data 
that refers to nl-informal, etc. That would be loss of information, and we 
should avoid that when exporting data.

 When you change the language codes at the right position the RDF export gets 
 automatically the correct language codes.


So far I have seen no code that does such change and I do not feel comfortable 
starting a project for refactoring whole language handling in whole MediaWiki 
(which would be required if we just change `Site::getLanguageCode()`), when I 
just need a right language tag in RDF. If that refactoring ever happens and 
solves that particular problem, I would be glad to refactor this particular 
fix. But remaining without fix until an undefined moment that this happens does 
not sound like a good way to go for me.

 I think it is bad programming style to make several workarounds instead of 
 fixing the core problem.


If somebody volonteers to fix the core problem, I think that would be 
excellent. If not, as it has been evidently happening since 2012, I don't see 
what use it is to discuss a hypothetical fix that might have been instead of 
fixing the actual thing that needs to be fixed. Following this strategy would 
lead us only to discussing solving bigger and bigger global solutions in theory 
without actually getting a thing done in practice.


TASK DETAIL
  https://phabricator.wikimedia.org/T105430

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Fomafix, gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names

2015-07-20 Thread Smalyshev
Smalyshev added a comment.

@Fomafix we're not talking about user interface languages here. We're talking 
about language specification in the RDF export - which should follow BCP 47 and 
common accepted language codes, otherwise third-party tools would not be able 
to understand in which language these strings are in. Of course, with something 
like Simple English there might not be a standard code (correct me if I'm 
wrong) but at least it should be one that is standard-compliant and not the 
same as en, otherwise sitelink to Simple English and sitelink to English 
would not be distinguishable.

As far as I can see, Simple English is a separate wiki from English - I see 
Search the 113,945 articles in the Simple English Wikipedia on the homepage, 
so it's not the same articles. Thus, I think we need separate code for it.

 Changing this codes to de-x-formal and nl-x-informal may be possible when 
 this is necessary to be conform to BCP 47.


That's what I am doing in the patch. Along with several others that also need 
to be changed for standard compliance.


TASK DETAIL
  https://phabricator.wikimedia.org/T105430

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Fomafix, gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names

2015-07-20 Thread Smalyshev
Smalyshev added a comment.

 This language code should be used for the sitelinks. In HTML and in the RDF 
 export


That would lead to the situation where links to Simple English wiki and to 
English wiki are indistinguishable. Which is not good.

 If you want to change this to en-x-simple then create a separate task.


This is that task.

 This must be changed everywhere where a HTML attribute lang is generated.


That has no relation to RDF export and thus outside of the scope of this task.

 When https://phabricator.wikimedia.org/T43723 is fixed most of your patch for 
 the RDF export is superfluous.


When it would be fixed, we can consider revisiting this code and if the fix 
allows to remove the special cases then they will be removed. However, since 
that ticket seems to be open since 2012, I'd rather fix the RDF export now 
(which otherwise will be confusing for third party users - the main audience of 
the export) than wait for https://phabricator.wikimedia.org/T43723.


TASK DETAIL
  https://phabricator.wikimedia.org/T105430

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Fomafix, gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names

2015-07-17 Thread gerritbot
gerritbot added a subscriber: gerritbot.
gerritbot added a comment.

Change 225518 had a related patch set uploaded (by Smalyshev):
https://phabricator.wikimedia.org/T105430: canonicalize language codes

https://gerrit.wikimedia.org/r/225518


TASK DETAIL
  https://phabricator.wikimedia.org/T105430

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, gerritbot
Cc: gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, Wikidata-bugs, 
Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs