[Wikidata-bugs] [Maniphest] [Updated] T151129: Add monolingual language code moe

2016-11-19 Thread gerritbot
gerritbot added a project: Patch-For-Review.
TASK DETAILhttps://phabricator.wikimedia.org/T151129EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Amqui, Aklapper, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T151135: Wikibase composer modules breaks LoggerFactory configurations when merged into mediawiki/core composer.json

2016-11-19 Thread Florian
Florian created this task.Florian added projects: MediaWiki-General-or-Unknown, Wikibase-DataModel, Wikibase-DataModel-_javascript_, Wikibase-DataModel-Serialization.Herald added a subscriber: Aklapper.Herald added a project: Wikidata.
TASK DESCRIPTIONI discovered this, when I worked in switching my wiki from the default, file-based logging, to Monolog loggers (my wiki uses Wikibase). So, I thought I "simply" configure $wgMWLoggerDefaultSpi in my LocalSettings.php, and let the logs be redirected to redis (which then is read by logstash and visualized in kibana, at least that's my plan). This resulted in the following configuration (based on the example on-wiki):

$wgMWLoggerDefaultSpi = array(
'class' => '\\MediaWiki\\Logger\\MonologSpi',
'args' => array( array(
'loggers' => array(
'@default' => array(
'processors' => array( 'wiki', 'psr', 'pid', 'uid', 'web' ),
'handlers'   => array( 'default', 'redis' ),
),
'wfDebug' => array(
'handlers'   => array( 'default' ),
'processors' => array( 'psr' ),
),
'profileoutput' => array(
'handlers'   => array( 'profiler' ),
'processors' => array( 'psr' ),
),
),

'processors' => array(
'wiki' => array(
'class' => '\\MediaWiki\\Logger\\Monolog\\WikiProcessor',
),
'psr' => array(
'class' => '\\Monolog\\Processor\\PsrLogMessageProcessor',
),
'pid' => array(
'class' => '\\Monolog\\Processor\\ProcessIdProcessor',
),
'uid' => array(
'class' => '\\Monolog\\Processor\\UidProcessor',
),
'web' => array(
'class' => '\\Monolog\\Processor\\WebProcessor',
),
),

'handlers' => array(
'default' => array(
'class' => '\\MediaWiki\\Logger\\Monolog\\LegacyHandler',
'args' => array( '/data/www/mediawiki-log/monolog-'.date('Ymd').'.log' ),
'formatter' => 'line',
),
'redis' => array(
'class' => '\\Monolog\\Handler\\RedisHandler',
'args' => array(
function() {
$redis = new Redis();
$redis->connect( '127.0.0.1', 6379 );
return $redis;
},
'logstash'
),
'formatter' => 'logstash',
),
'profiler' => array(
'class' => '\\MediaWiki\\Logger\\Monolog\\LegacyHandler',
'args' => array( '/data/www/mediawiki-log/profiler-'.date('Ymd').'.log' ),
'formatter' => 'profiler',
),
),

'formatters' => array(
'line' => array(
'class' => '\\Monolog\\Formatter\\LineFormatter',
),
'logstash' => array(
'class' => '\\Monolog\\Formatter\\LogstashFormatter',
'args'  => array( 'mediawiki', php_uname( 'n' ), null, '', 1 ),
),
'profiler' => array(
'class' => '\\Monolog\\Formatter\\LineFormatter',
'args' => array( "%datetime% %message%\n\n", null, true, true ),
),
),
) ),
);

After setting the $wgMWLoggerDefaultSpi config variable, I recognized, that no data is sent to logstash, and even not to redis (which I checked after that).

After some time, I checked, if my configuration is applied by putting the following code directly after the assignment of the $wgMWLoggerDefaultSpi variable:

$logger = LoggerFactory::getInstance( 'test' );
var_dump( get_class( $logger ) );
exit;

And I got the output, that the LegacyLogger is returned by the LoggerFactory, which shouldn't happen if the configuration above is applied. So, after checking, that I used the correct global variable (twice) I checked, when getInstance() of LoggerFactory is called the first time by adding a debug_print_backtrace() call to the method, and I got the following output:

#0 MediaWiki\Logger\LoggerFactory::getProvider() called at [/data/mediawiki/main/includes/debug/logger/LoggerFactory.php:97] 
#1 MediaWiki\Logger\LoggerFactory::getInstance(objectcache) called at [/data/mediawiki/main/includes/objectcache/ObjectCache.php:175] 
#2 ObjectCache::newFromParams(Array) called at [/data/mediawiki/main/includes/ServiceWiring.php:336] 
#3 Closure$
#34(Object of class MediaWiki\MediaWikiServices could not be converted to string) 
#4 call_user_func_array(Object of class Closure$
#34;2111707029$8f1050010a79dba3428a504eb49b3b82$ could not be converted to string, Array) called at [/data/mediawiki/main/includes/services/ServiceContainer.php:361] 
#5 MediaWiki\Services\ServiceContainer->createService(LocalServerObjectCache) called at 

[Wikidata-bugs] [Maniphest] [Created] T151129: Add monolingual language code moe

2016-11-19 Thread Amqui
Amqui created this task.Amqui added a project: Wikidata.Herald added a subscriber: Aklapper.
TASK DESCRIPTIONPlease add the language code moe to the list of language codes supported for monolingual text values.

Language code: moe
Language name in the language itself: innu-aimun
Langue name in English: Innu-aimun
Example of use: http://www.innu-aimun.ca/innu-aimun/vocab/vocab-animals/
Where the langue is used: by the Innu people in North America
Wikidata ID: Q13351TASK DETAILhttps://phabricator.wikimedia.org/T151129EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AmquiCc: Amqui, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-19 Thread daniel
daniel added a comment.
@EBernhardson wow, index expansion galore...TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude, danielCc: Lydia_Pintscher, Jan_Dittrich, EBernhardson, dcausse, hoo, Ricordisamoa, aude, Deskana, StudiesWorld, Aklapper, Smalyshev, Tobi_WMDE_SW, thiemowmde, JanZerebecki, gerritbot, Jonas, daniel, EBjune, mschwarzer, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, Wikidata-bugs, jayvdb, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T107595: [RFC] Multi-Content Revisions

2016-11-19 Thread daniel
daniel added a comment.

In T107595#2791142, @TomT0m wrote:
Ok, I got confused. Does that mean that the documentation will not have its wikipage address anymore ?


Yes, the documentation would be part of the template page proper, and would not have a separate title.

Would this then be possible to have a special type of "reference" slot which would hold a pointer to another page revision ? I guess the parser could be modified to maintain those reference slots when page are saved.

That would theoretically possible, but there are currently no plans to do this. I'm also not sure this would be the best way to tie a page revision to template revisions. So far, slots are intended to be editable, not derived. I have been thinking about derived slots, but the use cases for that idea all seem a bit contrieved, and would perhaps be better served by a more specialized solution, like a dedicated database table.

For example the parser computes a new version of the page when its content is modified, and when he expands a template a hook triggers the slot mangager to store the revision number of the template with those "reference" slots - I guess this this kind of hooks or something similar exists since we got a list of the used templates on previsualisation of a page.

This could be done with a DB table that associated a revision ID of the "transcluder" with a revision ID of the "transcluded" in each row. Simple enough to do, and would be stable against moving the template being renamed, etc. It's going to be a big table, though. And quite a change in how things work. As Tgr pointed out, there is the Memento extension that does this with some limitation. It's a feature that has been discussed time and time again, but never gained enough traction to be properly implemented.TASK DETAILhttps://phabricator.wikimedia.org/T107595EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: brion, danielCc: Izno, Pppery, Alsee, Florian, Liuxinyu970226, WMDE-leszek, Mholloway, Scott_WUaS, Niharika, MGChecker, LikeLifer, Elitre, Glaisher, JJMC89, RobLa-WMF, Yurik, ArielGlenn, APerson, TomT0m, Krenair, intracer, Tgr, Tobi_WMDE_SW, Addshore, Lydia_Pintscher, cscott, PleaseStand, awight, Ricordisamoa, GWicke, MarkTraceur, waldyrious, Legoktm, Aklapper, Jdforrester-WMF, Ltrlg, brion, Spage, MZMcBride, daniel, D3r1ck01, Luke081515, Wikidata-bugs, aude, jayvdb, fbstj, Mbch331, Jay8g, bd808___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-19 Thread daniel
daniel added a comment.
Ah, a note about priorities: use case one (completion match) is by far the most pressing need for us. It puts massive load on the DB server, and it's triggered several times whenever a user uses a search field.TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude, danielCc: Lydia_Pintscher, Jan_Dittrich, EBernhardson, dcausse, hoo, Ricordisamoa, aude, Deskana, StudiesWorld, Aklapper, Smalyshev, Tobi_WMDE_SW, thiemowmde, JanZerebecki, gerritbot, Jonas, daniel, EBjune, mschwarzer, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, Wikidata-bugs, jayvdb, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-19 Thread daniel
daniel added subscribers: Jan_Dittrich, Lydia_Pintscher.daniel added a comment.

In T150891#2802755, @dcausse wrote:

In T150891#2802255, @daniel wrote:
@dcausse I added use cases to the ticket description



Autocomplete Looking at the current behavior it seems that you display exact matches first and then prefix matches.



We actually do up to four queries at the moment, until we have found enough matches to fill the desired limit:


full length case insensitive match, user language only
full length case insensitive match, fallback languages
prefix match, user language only
prefix match, fallback languages


We currently rank by a crude heuristic score: max( |sitelinks|, |labels| ).

In addition to a prefix field you need a untokenized field in order to promote exact matches first.

Doesn't prefix also require untokenized?

Since prefix and fullmatch fields do not require fancy language features (no tokenization required) do you think it's still important to break by language?

Yes, we want to ignore, or at least strongly demote, languages that the user is not known to speak.

Breaking by language would only be needed for ranking, when 2 entities are ambiguous always prefer the match that comes from a language field close to the user language.

Indeed

It can become rather complex since we have two competing matches, assuming I'm french would I prefer an exact match in english or a prefix match in french?

See the algorithm described above.

Do we have enough ambiguities to really care about that? Would a simple solution where we merge all languages into the same field be sufficient?

I do not think it would be sufficient. I think that the result would often get swamped with results that are irrelevant for the user, and worse, impossible to read and interpret, especially for short prefixes like "li".

However, I have no research to support this, and I don't know how we would conduct such research. It boils down to a product level UX choice, so this is something to ask @Lydia_Pintscher and @Jan_Dittrich about.TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude, danielCc: Lydia_Pintscher, Jan_Dittrich, EBernhardson, dcausse, hoo, Ricordisamoa, aude, Deskana, StudiesWorld, Aklapper, Smalyshev, Tobi_WMDE_SW, thiemowmde, JanZerebecki, gerritbot, Jonas, daniel, EBjune, mschwarzer, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, Wikidata-bugs, jayvdb, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T150941: Create a factory of DataValueDserializer instances configured for the given repository

2016-11-19 Thread gerritbot
gerritbot added a comment.
Change 322080 merged by jenkins-bot:
Add RepositorySpecificDataValueDeserializerFactory

https://gerrit.wikimedia.org/r/322080TASK DETAILhttps://phabricator.wikimedia.org/T150941EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszek, gerritbotCc: gerritbot, daniel, WMDE-leszek, Aklapper, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs