Re: [Wikitech-l] #switch limits

2012-09-25 Thread S Page
Tim Starling  variously wrote:

> 
> That template alone uses 47MB for 37000 #switch cases

> I tried converting that template with 37000 switch cases to a Lua
> array. Lua used 6.5MB for the chunk and then another 2.4MB to execute
> it, so 8.9MB in total compared to 47MB for wikitext

It's only a 400kb string, and no "key" is a substring of another key.
So just match the regexp /\|=(.*)$/m , and $1 holds the value.
This works great in Perl, PHP, JavaScript...

D'oh, Extension:RegexParserFunctions not enabled on Wikimedia sites.

Fine, use string functions to look for |= , look from there
onwards for the next '|', and take the substring.
D'oh, $wgPFEnableStringFunctions is set false on Wikimedia sites, bug
6455 (a great read).

Fine, use the string lookup function people have coded in wiki template syntax.
e.g. {{Str find0}} – Very fast zero-based substring search with string
support up to *90* characters.
D'oh, several orders of magnitude too small.

OK, lua and Scribuntu.  Reading the fine tutorial
https://www.mediawiki.org/wiki/Lua_scripting/Tutorial

local p = {}
p.bigStr = [[
|01001=22.4
|01002=17.3
... 36,000 lines
]]

p.findStr = '|' .. p.lookFor .. '='
p.begin, p.ending = string.find( p.bigStr, p.findStr )
... something or other...

Amazingly, my browser and the syntaxhighlighter in the Module
namespace can handle this 400kB textarea,
, well done!

If I just ask for the string.len( p.bigStr ), Scribuntu loads and
executes this module. I dunno how to determine its memory consumption.
 But when I try to do string.find() I get "Script error", probably
because I've never written any Lua before this evening.
Assuming it's possible, what are the obvious flaws in string matching
that I'm overlooking?

Is there an explanation of how to simulate the Scribuntu/Lua calling
environment (the frame setup, I guess) in command-line lua?

This was fun :-)
--
=S Page  software engineer on E3

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GLAMwiki Toolset Project : Request for Comments - Technical Architecture

2012-09-25 Thread dan entous
>> On 09/20/2012 04:34 PM, dan entous wrote:
>> dear all,
>> 
>> as some of you may already know, the GLAMwiki Toolset Project, 
>> http://outreach.wikimedia.org/wiki/GLAM/Toolset_project, is a collaboration 
>> between Wikimedia Nederland, Wikimedia UK, Wikimedia France and Europeana, 
>> with the goal of providing a set of tools to get materials from GLAM 
>> institutions onto Wikimedia Commons in a way that reuse can easily be 
>> tracked, and that Commons materials can easily be integrated back into the 
>> collection of the original GLAM or even other GLAMs.
>> 
>> as part of our initial goal of creating a GLAM Upload System, we are looking 
>> to gather Wikimedia community input on the proposed architecture and 
>> technologies. if you have time and interest, please take a look and let us 
>> know your thoughts, 
>> http://outreach.wikimedia.org/wiki/GLAM/Toolset_project/Request_for_Comments/Technical_Architecture.
>> 
>> 
>> with kind regards,
>> dan
> 
> On Sep 24, 2012, at 9:54 PM, Emmanuel Engelhart wrote:
> Hi Dan,
> 
> I have a few questions about the choice of the Zend Framework:
> * Why exactly using the Zend Framework?
would like to use an open-source mvc framework that is used widely, has strong 
community support, and several modules that may be used for current and future 
development rather than develop or use a custom or less widely used mvc 
framework

> * Do we really need such a dependency?
the code will be dependent on some type of mvc framework - either custom, less 
or widely used

> * Do we have this framework installed on the Wikimedia servers?
not sure, but the gwtoolset will be used on a wikimedia labs instance, not on a 
wikimedia server. the gwtoolset will have a browser ui that allows glams to 
upload their metadata, match it to mediawiki templates and then upload the 
result to commons.wikimedia.org using the mediawiki api

> * If "no", is that not a problem?
> 
> Regards
> Emmanuel



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-25 Thread Daniel Kinzler
Hi all!

Since https://gerrit.wikimedia.org/r/#/c/21584/ got merged, people have been
complaining that they get tons of warnings. A great number of them seem to be
caused by the fact the MediaWiki will, if the DBO_TRX  flag is set,
automatically start a transaction on the first call to Database::query().

See e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=40378

The DBO_TRX flag appears to be set by default in sapi (mod_php) mode. According
to the (very limited) documentation, it's intended to wrap the entire web
request in a single database transaction.

However, since we do not have support for nested transactions, this doesn't
work: the "wrapping" transaction gets implicitely comitted when begin() is
called to start a "proper" transaction, which is often the case when saving new
revisions, etc.

So, DBO_TRX sems to be misguided, or at least broken, to me. Can someone please
explain why it was introduced? It seems the current situation is this:

* every view-only request is wrapped in a transaction, for not good reason i can
see.

* any write operation that uses an explicit transaction, like page editing,
watching pages, etc, will break the wrapping transaction (and cause a warning in
the process). As far as I understand, this really defies the purpose of the
automatic wrapping transaction.

So, how do we solve this? We could:

* suppress warnings if the DBO_TRX flag is set. That would prevent the logs from
being swamped by transaction warnings, but it would not fix the current broken
(?!) behavior.

* get rid of DBO_TRX (or at least not use it per default). This seems to be the
Right Thing to me, but I suppose  there is some point to the automatic
transactions that I am missing.

* Implement support for nested transactions, either using a counter (this would
at least make DBO_TRX work as I guess it was intended) or using safepoints (that
would give us support for actual nested transactions). That would be the Real
Solution, IMHO.


So, can someone shed light on what DBO_TRX is intended to do, and how it is
supposed to work?

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GLAMwiki Toolset Project : Request for Comments - Technical Architecture

2012-09-25 Thread Emmanuel Engelhart

On 09/25/2012 11:22 AM, dan entous wrote:

On 09/20/2012 04:34 PM, dan entous wrote:
dear all,

as some of you may already know, the GLAMwiki Toolset Project, 
http://outreach.wikimedia.org/wiki/GLAM/Toolset_project, is a collaboration 
between Wikimedia Nederland, Wikimedia UK, Wikimedia France and Europeana, with 
the goal of providing a set of tools to get materials from GLAM institutions 
onto Wikimedia Commons in a way that reuse can easily be tracked, and that 
Commons materials can easily be integrated back into the collection of the 
original GLAM or even other GLAMs.

as part of our initial goal of creating a GLAM Upload System, we are looking to 
gather Wikimedia community input on the proposed architecture and technologies. 
if you have time and interest, please take a look and let us know your 
thoughts, 
http://outreach.wikimedia.org/wiki/GLAM/Toolset_project/Request_for_Comments/Technical_Architecture.


with kind regards,
dan


On Sep 24, 2012, at 9:54 PM, Emmanuel Engelhart wrote:
Hi Dan,

I have a few questions about the choice of the Zend Framework:
* Why exactly using the Zend Framework?

would like to use an open-source mvc framework that is used widely, has strong 
community support, and several modules that may be used for current and future 
development rather than develop or use a custom or less widely used mvc 
framework


* Do we really need such a dependency?

the code will be dependent on some type of mvc framework - either custom, less 
or widely used


* Do we have this framework installed on the Wikimedia servers?

not sure, but the gwtoolset will be used on a wikimedia labs instance, not on a 
wikimedia server. the gwtoolset will have a browser ui that allows glams to 
upload their metadata, match it to mediawiki templates and then upload the 
result to commons.wikimedia.org using the mediawiki api


I thought the lab instance was only an incubation environment and that 
the final goal was to put gwtoolset on the WMF prod. servers. Isn't it?


Emmanuel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GLAMwiki Toolset Project : Request for Comments - Technical Architecture

2012-09-25 Thread dan entous
 On 09/20/2012 04:34 PM, dan entous wrote:
 dear all,
 
 as some of you may already know, the GLAMwiki Toolset Project, 
 http://outreach.wikimedia.org/wiki/GLAM/Toolset_project, is a 
 collaboration between Wikimedia Nederland, Wikimedia UK, Wikimedia France 
 and Europeana, with the goal of providing a set of tools to get materials 
 from GLAM institutions onto Wikimedia Commons in a way that reuse can 
 easily be tracked, and that Commons materials can easily be integrated 
 back into the collection of the original GLAM or even other GLAMs.
 
 as part of our initial goal of creating a GLAM Upload System, we are 
 looking to gather Wikimedia community input on the proposed architecture 
 and technologies. if you have time and interest, please take a look and 
 let us know your thoughts, 
 http://outreach.wikimedia.org/wiki/GLAM/Toolset_project/Request_for_Comments/Technical_Architecture.
 
 
 with kind regards,
 dan
>>> 
>>> On Sep 24, 2012, at 9:54 PM, Emmanuel Engelhart wrote:
>>> Hi Dan,
>>> 
>>> I have a few questions about the choice of the Zend Framework:
>>> * Why exactly using the Zend Framework?
>> would like to use an open-source mvc framework that is used widely, has 
>> strong community support, and several modules that may be used for current 
>> and future development rather than develop or use a custom or less widely 
>> used mvc framework
>> 
>>> * Do we really need such a dependency?
>> the code will be dependent on some type of mvc framework - either custom, 
>> less or widely used
>> 
>>> * Do we have this framework installed on the Wikimedia servers?
>> not sure, but the gwtoolset will be used on a wikimedia labs instance, not 
>> on a wikimedia server. the gwtoolset will have a browser ui that allows 
>> glams to upload their metadata, match it to mediawiki templates and then 
>> upload the result to commons.wikimedia.org using the mediawiki api
> 
> I thought the lab instance was only an incubation environment and that the 
> final goal was to put gwtoolset on the WMF prod. servers. Isn't it?
the final goal, as far as i understand it, is to have it run as its own 
application in its own environment, similar to the way some applications are 
running on the current toolserver. i was told that instead of using the 
toolserver we should use a wikimedia labs instance. would love to hear if this 
is not the intended outcome.

> Emmanuel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] New extension: Diff

2012-09-25 Thread Jeroen De Dauw
Hey,

I'm happy to announce the first release of a new little extension I wrote
called Diff.

https://www.mediawiki.org/wiki/Extension:Diff

It's a small utility library which might be of use to anyone creating a new
extension :)

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-25 Thread Tim Starling
On 25/09/12 19:33, Daniel Kinzler wrote:
> So, can someone shed light on what DBO_TRX is intended to do, and how it is
> supposed to work?

Maybe you should have asked that before you broke it with I8c0426e1.

DBO_TRX provides the following benefits:

* It provides improved consistency of write operations for code which
is not transaction-aware, for example rollback-on-error.

* It provides a snapshot for consistent reads, which improves
application correctness when concurrent writes are occurring.

DBO_TRX was introduced when we switched over to InnoDB, along with the
introduction of Database::begin() and Database::commit().

begin() and commit() were never meant to be "matched", so it's not
surprising that you would get a lot of warnings if you started trying
to enforce that.

Initially, I set up a scheme where transactions were "nested", in the
sense that begin() incremented the transaction level and commit()
decremented it. When it was decremented to zero, an actual COMMIT was
issued. So you would have a call sequence like:

* begin() -- sends BEGIN
  * begin()  -- does nothing
  * commit() -- does nothing
* commit() -- sends COMMIT

This scheme soon proved to be inappropriate, since it turned out that
the most important thing for performance and correctness is for an
application to be able to commit the current transaction after some
particular query has completed. Database::immediateCommit() was
introduced to support this use case -- its function was to immediately
reduce the transaction level to zero and commit the underlying
transaction.

When it became obvious that that every Database::commit() call should
really be Database::immediateCommit(), I changed the semantics,
effectively renaming Database::immediateCommit() to
Database::commit(). I removed the idea of nested transactions in
favour of a model of cooperative transaction length management:

* Database::begin() became effectively a no-op for web requests and
was sometimes omitted for brevity.
* Database::commit() should be called after completion of a sequence
of write operations where atomicity is desired, or at the earliest
opportunity when contended locks are held.

In cases where transactions end up being too short due to the need for
a called function to commit a transaction when the caller already has
a transaction open, it is the responsibility of the callers to
introduce some suitable abstraction for serializing the transactions.

When transactions too long, you hit performance problems due to lock
contention. When transactions are too short, you hit consistency
problems when requests fail. The scheme I introduced favours
performance over consistency. It resolves conflicts between callers
and callees by using the shortest transaction time. I think was an
appropriate choice for Wikipedia, both then and now, and I think it is
probably appropriate for many other medium to high traffic wikis.

Savepoints were not available at the time the scheme was introduced.
But they are a refinement of the abandoned transaction nesting scheme,
not a refinement of the current scheme which is optimised for reducing
lock contention.

In terms of performance, perhaps it would be feasible to use short
transactions with an explicit begin() with savepoints for nesting. But
then you would lose the consistency benefits of DBO_TRX that I
mentioned at the start of this post.

-- Tim Starling



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] shortened links (was Re: MediaWiki 1.20 release candidate (and triage announcement))

2012-09-25 Thread Jeremy Baron
On Sep 24, 2012 7:18 PM, "Mark A. Hershberger"  wrote:
> On 09/23/2012 06:33 PM, K. Peachey wrote:
> > On Mon, Sep 24, 2012 at 4:03 AM, Mark A. Hershberger 
wrote:
> >> On 09/23/2012 12:54 PM, Krinkle wrote:
> >>> https://bugzilla.wikimedia.org/[...]
> >>
> >> Link shortened: http://hexm.de/lp
> >
> > There is no need to shorten urls in emails, Please don't.
>
> I've personally seen mail readers and MTAs that mangle long URLs. Since
> this sort of mangling happens, I see a need for a way to make URLs
> usable.  I use a shortener for those URLS as a precautionary measure and
> to help me communicate with others.

I personally agree it's annoying and wish you didn't. But maybe there's
mangling examples I've not seen.

IMHO, it should usually be enough to either wrap in angle brackets or put
the URL in a footnote on it's own line with just the footnote number.

Anyway, I do certainly think we have bigger fish to fry.

-Jeremy
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] shortened links (was Re: MediaWiki 1.20 release candidate (and triage announcement))

2012-09-25 Thread Derric Atzrott
>I personally agree it's annoying and wish you didn't. But maybe there's
>mangling examples I've not seen.
>
>IMHO, it should usually be enough to either wrap in angle brackets or put
>the URL in a footnote on it's own line with just the footnote number.
>
>Anyway, I do certainly think we have bigger fish to fry.

Agreed on all counts.  As a compromise, might I suggest including both? [0]

[0-short]: http://exam.pl/
[0-long]: http://www.example.com/

Thank you,
Derric Atzrott


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki 1.20 release candidate (and triage announcement)

2012-09-25 Thread Krinkle
On Sep 23, 2012, at 8:03 PM, Mark A. Hershberger  wrote:

> On 09/23/2012 12:54 PM, Krinkle wrote:
>> Also, this bugzilla query should be empty before release as well (either by 
>> fixing bugs,
>> or reviewing/merging pending commits that claim to fix stuff, or deferring 
>> the bug to
>> a later release etc.). People assign and block things usually for a reason:
>> 
>> 
https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&target_milestone=1.20.0%20release&product=MediaWiki&resolution=---
> 
> [..]
> 
> I can hold a triage for blockers at UTC1100 on Tue, October 2
> (http://hexm.de/lr).  At this point, there are over 100 non-enhancment
> bugs marked for 1.20.  Please help trim the list before then.
> 

Actually, there are ~ 65 (as my link reflected),
the shortened link forgot to exclude resolved bugs.

-- Krinkle

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] shortened links (was Re: MediaWiki 1.20 release candidate (and triage announcement))

2012-09-25 Thread Mark A. Hershberger

On 09/25/2012 09:40 AM, Krinkle wrote:
> On Sep 23, 2012, at 8:03 PM, Mark A. Hershberger  wrote:
>> On 09/23/2012 12:54 PM, Krinkle wrote:
>>> https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&target_milestone=1.20.0%20release&product=MediaWiki&resolution=---
>> I can hold a triage for blockers at UTC1100 on Tue, October 2
>> (http://hexm.de/lr).  At this point, there are over 100 non-enhancment
>> bugs marked for 1.20.  Please help trim the list before then.

> Actually, there are ~ 65 (as my link reflected),
> the shortened link forgot to exclude resolved bugs.

Thanks for pointing this out, this is a perfect example of link mangling
by a mail reader.

When I click on your url, Thunderbird does not include the ending "---"
and I see 155 bugs.  I didn't realize this was happening until now.

Screenshot of the bug in Thunderbird:
  http://mah.everybody.org/image/tbird-url-bug.png

Mark

-- 
http://hexmode.com/

Human evil is not a problem.  It is a mystery.  It cannot be solved.
  -- When Atheism Becomes a Religion, Chris Hedges

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Content handler feature merge (Wikidata branch) scheduled early next week

2012-09-25 Thread Rob Lanphier
Hi everyone,

Assuming no one finds any substantive issues, we plan on merging the
ContentHandler feature (Wikidata branch) early next week, in time for
1.20wmf14 (assuming we're still calling this the "1.20" series then).
The tracking bug for that is here:


The tree is here:


...and recent commits are here:


We'd like to get it in right after a deployment branch point so that
it has some time to settle in master before we foist it on everyone.

Please take a look at this branch, and raise any issues with the
branch or this plan on list.

Thanks!
Rob

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-25 Thread Platonides
On 25/09/12 13:38, Tim Starling wrote:
> On 25/09/12 19:33, Daniel Kinzler wrote:
>> So, can someone shed light on what DBO_TRX is intended to do, and how it is
>> supposed to work?
> 
> Maybe you should have asked that before you broke it with I8c0426e1.

He did ask about the whole database transactions on Aug 23 ("Nested
database transactions thread"), and explicitely asked from objections
the next day. I8c0426e1 is from Aug 27.

Nobody said that «begin() and commit() were never meant to be "matched"»
on that thread, the previous one (2010) or even mails in the last few
years AFAIK. I guess that's because you were the only one who knew how
they were meant to be used. :)


(...)
> Initially, I set up a scheme where transactions were "nested", in the
> sense that begin() incremented the transaction level and commit()
> decremented it. When it was decremented to zero, an actual COMMIT was
> issued. So you would have a call sequence like:
> 
> * begin() -- sends BEGIN
>   * begin()  -- does nothing
>   * commit() -- does nothing
> * commit() -- sends COMMIT
> 
> This scheme soon proved to be inappropriate, since it turned out that
> the most important thing for performance and correctness is for an
> application to be able to commit the current transaction after some
> particular query has completed. 

Except when you break your caller, whose transaction you wrecked, with
bad consequences such as losing the descriptions of uploaded files (bug
32551).



As for the original issue, I think that it could be solved by leaving
the "transaction counter" at 0 for the implicit DB_TRX, and making
begin() commit it and create a new one when called explicitely (we would
need another flag... name that transaction level 0.5? :).


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Asher Feldman
As we've increased our use of sha1 hashes to identify unique content over
the past year, I occasionally see changesets or discussions about indexing
sha1's in mysql.  When indexing a text field, it's generally beneficial to
define the smallest index that still uniquely matches a high percentage of
rows.  Search and insert performance both benefit from the space savings.

As a cryptographic hash function, sha1 has a very high degree of
uniformity.  We can estimate the percent of partial index look-ups that
will match a unique result just by comparing the size of the table to the
space covered by the index.

sha1 hashes are 160bits, which mediawiki stores in mysql with base36
encoding.  base36(2^160) == "twj4yidkw7a8pn4g709kzmfoaol3x8g".  Looking at
enwiki.revision.rev_sha1, the smallest current value is
02xi72hkkhn1nvfdeffgp7e1w3s and the largest,
twj4yi9tgesxysgyi41bz16jdkwroha.

The number of combinations covered by indexing the top bits represented by
the left-most 4 thru 10 characters:

sha1_index(4) = 1395184 (twj4)
sha1_index(5) = 50226658 (twj4y)
sha1_index(6) = 1808159706 (twj4yi)
sha1_index(7) = 65093749429 (twj4yid)
sha1_index(8) = 2343374979464 (twj4yidk)
sha1_index(9) = 84361499260736 (twj4yidkw)
sha1_index(10) = 3037013973386503 (twj4yidkw7)

percentage of unique matches in a table of 2B sha1's:

sha1_index(7) = 96.92%
sha1_index(8) = 99.91%
sha1_index(9) = 99.997%
sha1_index(10) = 99.%

percentage of unique matches in a table of 10B sha1's:

sha1_index(8) = 99.573%
sha1_index(9) = 99.988%
sha1_index(10) = 99.9996%

Given current table sizes and growth rates, an 8 character index on a sha1
column should be sufficient for years for many cases (i.e. media files
outside of commons, revisions on projects outside of the top 10), while a
10 character index still provides >99.99% coverage of 100 billion sha1's.

Caveat: The likely but rare worst case for a partial index is that we may
have tables with hundreds of rows containing the same sha1, perhaps
revisions of a page that had a crazy revert war.  A lookup for that
specific sha1 will have to do secondary lookups for each match, as would
lookups of any other sha1 that happens to collide within the index space.
If the index is large enough to make the later case quite unlikely, prudent
use of caching can address the first.

 Where an index is desired on a mysql column of base36 encoded sha1
hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
sufficient in many cases, but this is still provides a >2/3 space savings
while covering a huge (2^51.43) space.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] New extension: Diff

2012-09-25 Thread Tyler Romeo
This looks pretty interesting. Is there a reason we don't just put this in
the core?

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com



On Tue, Sep 25, 2012 at 6:50 AM, Jeroen De Dauw wrote:

> Hey,
>
> I'm happy to announce the first release of a new little extension I wrote
> called Diff.
>
> https://www.mediawiki.org/wiki/Extension:Diff
>
> It's a small utility library which might be of use to anyone creating a new
> extension :)
>
> Cheers
>
> --
> Jeroen De Dauw
> http://www.bn2vs.com
> Don't panic. Don't be evil.
> --
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Tyler Romeo
I see no problem with this. SHA-1 has such a strong avalanche effect that
even the chance of having two similar hashes is pretty low.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com



On Tue, Sep 25, 2012 at 1:54 PM, Asher Feldman wrote:

> As we've increased our use of sha1 hashes to identify unique content over
> the past year, I occasionally see changesets or discussions about indexing
> sha1's in mysql.  When indexing a text field, it's generally beneficial to
> define the smallest index that still uniquely matches a high percentage of
> rows.  Search and insert performance both benefit from the space savings.
>
> As a cryptographic hash function, sha1 has a very high degree of
> uniformity.  We can estimate the percent of partial index look-ups that
> will match a unique result just by comparing the size of the table to the
> space covered by the index.
>
> sha1 hashes are 160bits, which mediawiki stores in mysql with base36
> encoding.  base36(2^160) == "twj4yidkw7a8pn4g709kzmfoaol3x8g".  Looking at
> enwiki.revision.rev_sha1, the smallest current value is
> 02xi72hkkhn1nvfdeffgp7e1w3s and the largest,
> twj4yi9tgesxysgyi41bz16jdkwroha.
>
> The number of combinations covered by indexing the top bits represented by
> the left-most 4 thru 10 characters:
>
> sha1_index(4) = 1395184 (twj4)
> sha1_index(5) = 50226658 (twj4y)
> sha1_index(6) = 1808159706 (twj4yi)
> sha1_index(7) = 65093749429 (twj4yid)
> sha1_index(8) = 2343374979464 (twj4yidk)
> sha1_index(9) = 84361499260736 (twj4yidkw)
> sha1_index(10) = 3037013973386503 (twj4yidkw7)
>
> percentage of unique matches in a table of 2B sha1's:
>
> sha1_index(7) = 96.92%
> sha1_index(8) = 99.91%
> sha1_index(9) = 99.997%
> sha1_index(10) = 99.%
>
> percentage of unique matches in a table of 10B sha1's:
>
> sha1_index(8) = 99.573%
> sha1_index(9) = 99.988%
> sha1_index(10) = 99.9996%
>
> Given current table sizes and growth rates, an 8 character index on a sha1
> column should be sufficient for years for many cases (i.e. media files
> outside of commons, revisions on projects outside of the top 10), while a
> 10 character index still provides >99.99% coverage of 100 billion sha1's.
>
> Caveat: The likely but rare worst case for a partial index is that we may
> have tables with hundreds of rows containing the same sha1, perhaps
> revisions of a page that had a crazy revert war.  A lookup for that
> specific sha1 will have to do secondary lookups for each match, as would
> lookups of any other sha1 that happens to collide within the index space.
> If the index is large enough to make the later case quite unlikely, prudent
> use of caching can address the first.
>
>  Where an index is desired on a mysql column of base36 encoded sha1
> hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
> sufficient in many cases, but this is still provides a >2/3 space savings
> while covering a huge (2^51.43) space.
>
> -Asher
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis

2012-09-25 Thread S Page
On Mon, Sep 24, 2012 at 12:35 PM, Jon Robson  wrote:

> I've knocked up a first version here:
> http://www.mediawiki.org/wiki/Developer_Hub

( I hope you were abusing Capitalized Title just to sit next to the
current hub,  
https://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28capitalization%29
)

That mass of donation text doesn't work for me. But a short developer
story and image could go at the side, next to an actual Developer hub
intro,  which would be something like

* Run Wikipedia's MediaWiki software yourself, it's free
* Contribue to the MediaWiki software, it's open
* Explore the data on Wikimedia's sites with our API, it's remixable
* Find out about WMF software projects, they're awesome too

__TOC__

General developer news (not just MediaWiki releases but API, mobile,
other projects)

The rest of the page is or could be like the current page on the
MediaWiki software.

-- 
=S Page  software engineer on E3

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Artur Fijałkowski
>  Where an index is desired on a mysql column of base36 encoded sha1
> hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
> sufficient in many cases, but this is still provides a >2/3 space savings
> while covering a huge (2^51.43) space.

Isn't it better to store BIGINT containing part of (binary) sha1 and
use index on numeric column?

AJF/WarX

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Tyler Romeo
It would be better, but I believe MediaWiki already uses this type of
storage. Changing to binary would require a schema change.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com



On Tue, Sep 25, 2012 at 2:20 PM, Artur Fijałkowski wrote:

> >  Where an index is desired on a mysql column of base36 encoded
> sha1
> > hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
> > sufficient in many cases, but this is still provides a >2/3 space savings
> > while covering a huge (2^51.43) space.
>
> Isn't it better to store BIGINT containing part of (binary) sha1 and
> use index on numeric column?
>
> AJF/WarX
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis

2012-09-25 Thread Quim Gil
Hi,

On Mon, Sep 24, 2012 at 12:35 PM, Jon Robson  wrote:
> The home page for the developer page should act like a personal appeal
> only with __developers__ as the writers.

I'm not sure about the concept of personal appeal for APIs and open
source projects. They should be objectively appealing in the first
place. Once the most attractive facts are effectively explained we can
spice them up with showcases and personal experiences.

In order to attract and keep new developers we want to strive for
maximum simplification in the Developer Hub. What pearls with hook do
we have for newcomers?

- An API to operate with precious Wikipedia data in your applications.

- Open source projects improving Wikipedia and looking for contributors.

- MediaWiki is the free, extensible and customizable technology that
powers Wikipedia and thousands of other sites.

- Become a wikitech ambassador in order to test the fresh stuff and
get involved.


Each entry would lead to THE reference page, from which the interested
reader could dive in.

Each entry could feature already in the Developer Hub the best showcases:

- amazing apps using the API,
- Wikipedia mobile, visual editor, article feedback.
- amazing sites powered with MediaWiki.
- amazing selection of ambassadors and contributors.

We do many other things, but they should be linkable from these main
pillars. If you have a very important entry to add, which one in the
list would you remove?

PS: still learning about how to contribute effectively to a RFC in
conjunction with mailing list discussion and bug report.  ;)

-- 
Quim

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis

2012-09-25 Thread Mark Holmquist

1) The mediawiki homepage puts ME off. This is mainly because I'm more
interested in doing things with the data on wikipedia rather than the
software that runs Wikipedia. I think this is the problem we are
trying to solve - there are many different types of developers out
there and we need something generic to appeal to as many of them as
possible.


I want to add something a little more general to this point. I think we 
can avoid appealing to _people_ in this page, and instead appeal to 
_actions_. I wrote about it on the RFC [0], but I'll repeat myself 
briefly here.


I don't think that throwing people into buckets (i.e., appealing to 
types of developers) is a useful way to think about this. If you want to 
*make MediaWiki better*, we send you to a more specific page about how 
to develop, translate, code review, and document. If you want to *set up 
or change a MediaWiki site*, we send you to a page with documentation on 
configuration, performance tweaks, and extensions. If you want to *use 
MediaWiki sites*, we send you to a page with editing help, information 
about user permissions, how to use the interface, and so on. We could 
split this up further, or differently. We could include the same link in 
multiple pages (e.g., pages about extensions could go in many categories).


My point is, I think we could accomplish much the same, maybe more, 
without lumping people together, when we actually mean to group actions.


[0] 
https://www.mediawiki.org/wiki/Talk:Requests_for_comment/MediaWiki.org_Main_Page_tweaks#Lessons_from_Manual:Skins


--
Mark Holmquist
Software Engineer, Wikimedia Foundation
mtrac...@member.fsf.org
http://marktraceur.info

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis

2012-09-25 Thread Tomasz Finc
On Tue, Sep 25, 2012 at 11:59 AM, Quim Gil  wrote:
> Each entry could feature already in the Developer Hub the best showcases:
>
> - amazing apps using the API,
> - Wikipedia mobile, visual editor, article feedback.
> - amazing sites powered with MediaWiki.
> - amazing selection of ambassadors and contributors.

This really just highlights how much past MediaWiki we've gotten as a
development community. This is especially true with the Wikimedia
mobile apps like : Wiki Loves Monuments, Wikipedia, Wiktionary, etc.
Shifting from a home on MW.org to developers.wikimedia.org would
easily allows to grow our volunteer community to accurately reflect
the range of projects that we work on. Currently our mobile volunteers
join us on IRC and are on our mobile specific wiki pages but we'd love
to have a central place for all of this.

Does anyone have an issue broadening the topic base that we have on
our tech hub?

--tomasz

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Content handler feature merge (Wikidata branch) scheduled early next week

2012-09-25 Thread IAlex
Hello,

Would it be possible to have the whole changes as an changeset on Gerrit?
This would make review and comments much easier than having to do this on this 
list.

Cheers!

Alexandre Emsenhuber (ialex)

Le 25 sept. 2012 à 19:16, Rob Lanphier a écrit :

> Hi everyone,
> 
> Assuming no one finds any substantive issues, we plan on merging the
> ContentHandler feature (Wikidata branch) early next week, in time for
> 1.20wmf14 (assuming we're still calling this the "1.20" series then).
> The tracking bug for that is here:
> 
> 
> The tree is here:
> 
> 
> ...and recent commits are here:
> 
> 
> We'd like to get it in right after a deployment branch point so that
> it has some time to settle in master before we foist it on everyone.
> 
> Please take a look at this branch, and raise any issues with the
> branch or this plan on list.
> 
> Thanks!
> Rob
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Content handler feature merge (Wikidata branch) scheduled early next week

2012-09-25 Thread Jeremy Baron
On Tue, Sep 25, 2012 at 7:14 PM, IAlex  wrote:
> Would it be possible to have the whole changes as an changeset on Gerrit?
> This would make review and comments much easier than having to do this on 
> this list.

is 
https://gerrit.wikimedia.org/r/#/q/status:merged+project:mediawiki/core+branch:Wikidata,n,z
sufficient? (copied from the first message on this thread)

-Jeremy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-25 Thread Daniel Kinzler
Hi Tim

Thanks for bringing some light into the DBO_TRX stuff. Seems like few knew it
existed, and hardly anyone understood what it means or how it should be used.

I'll give my thoughts inline and propose a solution at the bottom.

On 25.09.2012 13:38, Tim Starling wrote:
> On 25/09/12 19:33, Daniel Kinzler wrote:
>> So, can someone shed light on what DBO_TRX is intended to do, and how it is
>> supposed to work?
> 
> Maybe you should have asked that before you broke it with I8c0426e1.

Well, I did ask about nested transactions on the list. Nobody mentioned the
scheme you describe. Is it documented somewhere?

Anyway, I just added warnings, the behavior didn't change.

> DBO_TRX provides the following benefits:
> 
> * It provides improved consistency of write operations for code which
> is not transaction-aware, for example rollback-on-error.

But it *breaks* write consistency for code that *is* transaction aware. Calling
begin() will prematurely commit the already open transaction.

> * It provides a snapshot for consistent reads, which improves
> application correctness when concurrent writes are occurring.

Ok.

> Initially, I set up a scheme where transactions were "nested", in the
> sense that begin() incremented the transaction level and commit()
> decremented it. When it was decremented to zero, an actual COMMIT was
> issued. So you would have a call sequence like:
> 
> * begin() -- sends BEGIN
>   * begin()  -- does nothing
>   * commit() -- does nothing
> * commit() -- sends COMMIT
> 
> This scheme soon proved to be inappropriate, since it turned out that
> the most important thing for performance and correctness is for an
> application to be able to commit the current transaction after some
> particular query has completed. Database::immediateCommit() was
> introduced to support this use case -- its function was to immediately
> reduce the transaction level to zero and commit the underlying
> transaction.

Ok.

> When it became obvious that that every Database::commit() call should
> really be Database::immediateCommit(), I changed the semantics,
> effectively renaming Database::immediateCommit() to
> Database::commit(). I removed the idea of nested transactions in
> favour of a model of cooperative transaction length management:
> 
> * Database::begin() became effectively a no-op for web requests and
> was sometimes omitted for brevity.

But it isn't! It's not a no-op if there is an active transaction! It *breaks*
the active transaction! I think that is the crucial point here.

> * Database::commit() should be called after completion of a sequence
> of write operations where atomicity is desired, or at the earliest
> opportunity when contended locks are held.

Ok, so it's basically a savepoint.

Using that scheme, a new transaction should be started immediately after the
commit(). I guess when DBO_TRX is set, query() will do that.

> In cases where transactions end up being too short due to the need for
> a called function to commit a transaction when the caller already has
> a transaction open, it is the responsibility of the callers to
> introduce some suitable abstraction for serializing the transactions.

In the presence of hooks implemented by extensions, this frankly seems
impossible. Also, it would require functions to document exactly if and when
they are using transactions, and hooks have to document whether implementors can
use transactions.

Currently, the only safe way for an extension to use transactions seems to be to
check the trxLevel explicitly, and only start a transaction if there is none
already in progress. Which effectively brings us back to the level-counting 
scheme.

> When transactions too long, you hit performance problems due to lock
> contention. 

Yes... which makes me wonder why it's a good idea to start a transaction upon
the first select, even for requests that do not write to the database at all.

> When transactions are too short, you hit consistency
> problems when requests fail. The scheme I introduced favours
> performance over consistency. It resolves conflicts between callers
> and callees by using the shortest transaction time. I think was an
> appropriate choice for Wikipedia, both then and now, and I think it is
> probably appropriate for many other medium to high traffic wikis.
>
> In terms of performance, perhaps it would be feasible to use short
> transactions with an explicit begin() with savepoints for nesting. But
> then you would lose the consistency benefits of DBO_TRX that I
> mentioned at the start of this post.

I'm trying to think of a way to implement this scheme without breaking
transactions and causing creeping inconsistencies.

For the semantics you propose, begin() and commit() seem to be misleading names
- flush() would be more descriptive of what is going on, and implies no nesting.

So, how about this:

* normally, flush() will commit any currently open DBO_TRX transaction (and the
next query will start a new one). In cli (auto-commi

Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis

2012-09-25 Thread Sumana Harihareswara
On 09/25/2012 03:11 PM, Tomasz Finc wrote:
> On Tue, Sep 25, 2012 at 11:59 AM, Quim Gil  wrote:
>> Each entry could feature already in the Developer Hub the best showcases:
>>
>> - amazing apps using the API,
>> - Wikipedia mobile, visual editor, article feedback.
>> - amazing sites powered with MediaWiki.
>> - amazing selection of ambassadors and contributors.
> 
> This really just highlights how much past MediaWiki we've gotten as a
> development community. This is especially true with the Wikimedia
> mobile apps like : Wiki Loves Monuments, Wikipedia, Wiktionary, etc.
> Shifting from a home on MW.org to developers.wikimedia.org would
> easily allows to grow our volunteer community to accurately reflect
> the range of projects that we work on. Currently our mobile volunteers
> join us on IRC and are on our mobile specific wiki pages but we'd love
> to have a central place for all of this.
> 
> Does anyone have an issue broadening the topic base that we have on
> our tech hub?
> 
> --tomasz

I want to ensure that we cover the issues that we discussed last time,
in the "Where to host wikimedia related software projects" discussion,
and if wikitech.wikimedia.org is involved, ask Ops for their input as well.

http://www.gossamer-threads.com/lists/wiki/wikitech/272170

http://lists.wikimedia.org/pipermail/labs-l/2012-March/81.html

-- 
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Content handler feature merge (Wikidata branch) scheduled early next week

2012-09-25 Thread Tyler Romeo
I think it would be nice to have a changeset in Gerrit showing the actual
merge. I'm not sure how this would be possible with Gerrit, but it would
definitely be useful as a final review (and for QA purposes).

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com



On Tue, Sep 25, 2012 at 3:20 PM, Jeremy Baron  wrote:

> On Tue, Sep 25, 2012 at 7:14 PM, IAlex  wrote:
> > Would it be possible to have the whole changes as an changeset on Gerrit?
> > This would make review and comments much easier than having to do this
> on this list.
>
> is
> https://gerrit.wikimedia.org/r/#/q/status:merged+project:mediawiki/core+branch:Wikidata,n,z
> sufficient? (copied from the first message on this thread)
>
> -Jeremy
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Content handler feature merge (Wikidata branch) scheduled early next week

2012-09-25 Thread IAlex
Hello,

Having to dispatch comments over hunderds of commits is probably not the best 
idea,
since I'm sure some of them will get lost. I would prefer having a central to 
do this instead.

Cheers!

Alexandre Emsenhuber (ialex) 

Le 25 sept. 2012 à 21:20, Jeremy Baron a écrit :

> On Tue, Sep 25, 2012 at 7:14 PM, IAlex  wrote:
>> Would it be possible to have the whole changes as an changeset on Gerrit?
>> This would make review and comments much easier than having to do this on 
>> this list.
> 
> is 
> https://gerrit.wikimedia.org/r/#/q/status:merged+project:mediawiki/core+branch:Wikidata,n,z
> sufficient? (copied from the first message on this thread)
> 
> -Jeremy
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Flagged Reviews default by quality levels

2012-09-25 Thread Liquipedia
We recently noticed the loss of this functionality as well and it is quite
unfortunate for us as we were using it to manage the content of several
important pages, most notably our Main Page.

Would it be possible to have this feature re-added as an option? Even if it
is not enabled by default, those of us that used it (and made it work) could
still be able to use it.

Thanks,
On behalf of Liquipedia.net
-Noam



--
View this message in context: 
http://wikimedia.7.n6.nabble.com/Flagged-Reviews-default-by-quality-levels-tp4983993p4986060.html
Sent from the Wikipedia Developers mailing list archive at Nabble.com.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Asher Feldman
Base36 certainly isn't the most efficient way to store a sha1, but it's
what is in use all over mediawiki.  I think there was some discussion on
this list of the tradeoffs of different methods when revision.rev_sha1 was
added, and base36 was picked as a compromise.  I don't know why base36 was
picked over base62 once it was decided to stick with an ascii alpha-numeric
encoding but regardless, there was opposition to binary.   Taken on its
own, an integer index would be more efficient but I don't think it makes
sense if we continue using base36.

On Tue, Sep 25, 2012 at 11:20 AM, Artur Fijałkowski wrote:

> >  Where an index is desired on a mysql column of base36 encoded
> sha1
> > hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
> > sufficient in many cases, but this is still provides a >2/3 space savings
> > while covering a huge (2^51.43) space.
>
> Isn't it better to store BIGINT containing part of (binary) sha1 and
> use index on numeric column?
>
> AJF/WarX
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis

2012-09-25 Thread Erik Moeller
On Tue, Sep 25, 2012 at 12:11 PM, Tomasz Finc  wrote:
> Does anyone have an issue broadening the topic base that we have on
> our tech hub?

I don't. MediaWiki.org has evolved to serve multiple functions:

- the primary hub for the development of MediaWiki, its APIs, extensions, etc.;
- the primary hub for WMF engineering projects, even some which are
separate but related (e.g.
https://www.mediawiki.org/wiki/Analytics/Kraken )
- the primary hub for hackathons and other events, even when they only
partially relate to MediaWiki.

I suspect that it will also take on additional functions in the future:

- a central repository for gadgets;
- (maybe) a central repository for Lua code and other Scribunto code.

I don't view this intermingling of purposes as inherently problematic.
That's actually a much narrower scope than Meta, which tends to be
defined as "everything else". Why not interpret MediaWiki.org to mean
"Information about MediaWiki and other technical coordination relevant
to Wikimedia"? Yes, MediaWiki serves third parties as well -- but much
of the non-MediaWiki stuff we're talking about here is potentially
interesting to those third parties as well.

Erik
-- 
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation

Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Platonides
On 25/09/12 23:12, Asher Feldman wrote:
> Base36 certainly isn't the most efficient way to store a sha1, but it's
> what is in use all over mediawiki.  I think there was some discussion on
> this list of the tradeoffs of different methods when revision.rev_sha1 was
> added, and base36 was picked as a compromise.  I don't know why base36 was
> picked over base62 once it was decided to stick with an ascii alpha-numeric
> encoding but regardless, there was opposition to binary.   Taken on its
> own, an integer index would be more efficient but I don't think it makes
> sense if we continue using base36.

We started using base36 for storing deleted files. The advantage of
base36 is that it's shorter than plain hex, but it can be stored
-without collisions- in a case insensitive filesystem.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Skin pages on MW.org, and Skin repos in Gerrit

2012-09-25 Thread Daniel Friesen

Skin pages on MW.org (and repos in Gerrit) are now ready.

MediaWiki.org is now ready for modern skins (ones NOT using old  
QuickTemplate and skins/Foo.php patterns) to have pages about them just  
like extensions do.


Relevant links:
https://www.mediawiki.org/wiki/Category:All_skins
https://www.mediawiki.org/wiki/Template:Skin

Some examples that currently exist:
https://www.mediawiki.org/wiki/Skin:Erudite
https://www.mediawiki.org/wiki/Skin:Vector

Gerrit can also handle your skin repos. Ask for a  
mediawiki/skins/{skinname} repo.
So your skins can now have proper review and be updated with the various  
improvements that get made to MediaWiki's skinning system.


When you do introduce a skin here please do follow the file layout  
patterns I used in my tutorials:

http://blog.redwerks.org/2012/02/08/mediawiki-skinning-tutorial/
http://blog.redwerks.org/2012/02/28/mediawiki-subskin-tutorial/
I don't want to see any non-core skins using the outdated  
skins/TheSkin.php pattern in the new skins area.


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-25 Thread Tim Starling
On 26/09/12 05:26, Daniel Kinzler wrote:
>> * Database::commit() should be called after completion of a sequence
>> of write operations where atomicity is desired, or at the earliest
>> opportunity when contended locks are held.
> 
> Ok, so it's basically a savepoint.

Savepoints don't release locks.

>> In cases where transactions end up being too short due to the need for
>> a called function to commit a transaction when the caller already has
>> a transaction open, it is the responsibility of the callers to
>> introduce some suitable abstraction for serializing the transactions.
> 
> In the presence of hooks implemented by extensions, this frankly seems
> impossible. Also, it would require functions to document exactly if and when
> they are using transactions, and hooks have to document whether implementors 
> can
> use transactions.

Presumably there is some particular case where you have encountered
this problem. What is it?

>> When transactions too long, you hit performance problems due to lock
>> contention. 
> 
> Yes... which makes me wonder why it's a good idea to start a transaction upon
> the first select, even for requests that do not write to the database at all.

Ordinary select queries do not acquire locks. They just open a snapshot.

> For the semantics you propose, begin() and commit() seem to be misleading 
> names
> - flush() would be more descriptive of what is going on, and implies no 
> nesting.

begin() and commit() are named after the queries that they
unconditionally issue. A BEGIN query causes an implicit COMMIT of the
preceding transaction, if there is any. The Database class is
redundantly performing the same operation, not implementing some novel
concept.

> So, how about this:
> 
> * normally, flush() will commit any currently open DBO_TRX transaction (and 
> the
> next query will start a new one). In cli (auto-commit) mode, it would do 
> nothing.
> 
> * begin() and end() (!) use counted levels. In auto-commit mode, the 
> outer-most
> level of them starts and commits a transaction.
> 
> * Between begin and end, flush does nothing.
> 
> This would allow us to use the "earliest commit" semantics, but also control
> blocks of DB operations that should be consistent and atomic. And it would 
> make
> it explicit to programmers what they are doing.

I think that to avoid confusion, begin() and commit() should continue
to issue the queries they are named for.

You shouldn't use the terms "CLI" and "autocommit" interchangeably.
Autocommit is enabled when DBO_TRX is off and $db->mTrxLevel is zero.
There are several callers that set up this situation during a web
request, and there are many CLI scripts that start transactions
explicitly.

Your scheme does not appear to provide a method for hooks to release
highly contended locks that they may acquire. Lock contention is
usually the most important reason for calling commit(). Holding a
contended lock for an excessive amount of time has often brought the
site down. Imagine if someone wrapped a hook that writes to site_stats
with begin/end. The code would work just fine in testing, and then
instantly bring the site down when it was deployed.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Flagged Reviews default by quality levels

2012-09-25 Thread Aaron Schulz
So have 2+ quality levels and sometimes want quality versions to be the
default over checked ones? I guess the closest thing to that would be to
restrict who can review/autoreview certain pages via Special:Stabilization.



--
View this message in context: 
http://wikimedia.7.n6.nabble.com/Flagged-Reviews-default-by-quality-levels-tp4983993p4986082.html
Sent from the Wikipedia Developers mailing list archive at Nabble.com.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-25 Thread Aaron Schulz
I agree that begin()/commit() should do what they say (which they do now).
I'd like to have another construct that behaves like how those two used to
(back when there were immediate* functions). Callers would then have code
like:
$db->enterTransaction()
... atomic stuff ...
$db->exitTransaction()
This would use counters for nested begins (or perhaps SAVEPOINTs to deal
with rollback better...though that can cause RTT spam easily). If using
counters, it could be like begin()/finish() in
https://gerrit.wikimedia.org/r/#/c/16696/. The main advantage of doing this
would be that in cli mode (which defaults to using autocommit), all the code
will still start transactions when needed. It would be nice to have the
consistency/robustness. 

In any case, echoing what Tim said, most code that has begin()/commit() does
so for performance reasons. In some cases, they can be changed to use
DeferredUpdates or $db->onTransactionIdle(). I had a few patches in gerrit
to this affect. Some things may not actually need begin/commit explicitly (I
got rid of this in some preferences code ages ago). Things like
WikiPage/LocalFile are examples of classes that would have a hard time not
using begin()/commit() as they do. Perhaps some code could be restructured
in some cases so that the calls at least match, meaning the splitting of
transactions would at least be more deliberate rather than accidental.



--
View this message in context: 
http://wikimedia.7.n6.nabble.com/Can-we-kill-DBO-TRX-It-seems-evil-tp4986002p4986083.html
Sent from the Wikipedia Developers mailing list archive at Nabble.com.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Page Curation launch on English Wikipedia

2012-09-25 Thread Fabrice Florin
Hi folks,

I am happy to announce that the Wikimedia Foundation has just launched Page 
Curation, a new suite of tools for reviewing articles on Wikipedia.

Current page patrol tools like Special:NewPages and Twinkle can be hard to use 
quickly and accurately, and have led to frustration for some users. Page 
Curation aims to improve that page patrol experience by making it faster and 
easier to review new pages, using two integrated tools: the New Pages Feed and 
the Curation Toolbar.

Read the Page Curation announcement on our blog:
http://blog.wikimedia.org/2012/09/25/page-curation-launch/

To learn more, visit our introduction page:
 https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/Introduction

If you are an experienced editor, please give Page Curation a try:
https://en.wikipedia.org/wiki/Special:NewPagesFeed

We are also holding IRC office hours on Wednesday, September 26 at 4pm PT 
(23:00 UTC), during which we will be happy to answer any questions you may 
have. Please report any issues on our talk page or to our Community Liaison, 
Oliver Keyes . 

A number of patrollers have already started using Page Curation, and we hope 
that more curators will adopt this new toolkit over time. A 'release version' 
was deployed on the English Wikipedia on September 20, 2012, and we plan to 
make it available to other projects in coming weeks.

This feature was created in close collaboration with editors. We would like to 
take this opportunity to thank all the community members who patiently guided 
our progress over the past few months. This includes folks like Athleek123, 
DGG, Dori, Fluffernutter, Logan, The Helpful One, Tom Morris, Utar and 
WereSpielChequers, to name but a few. We are deeply grateful for your generous 
contributions to this project!

We designed Page Curation to offer a better experience, by making it easier for 
curators to review new pages and by providing more feedback to creators so they 
can improve Wikipedia together. 

We hope that you will find this new tool useful. Enjoy!



Fabrice Florin
Product Manager, Editor Engagement Team
Wikimedia Foundation
User:Fabrice Florin (WMF)

https://en.wikipedia.org/wiki/Wikipedia:Editor_Engagement

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Tim Starling
On 26/09/12 03:54, Asher Feldman wrote:
>  Where an index is desired on a mysql column of base36 encoded sha1
> hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
> sufficient in many cases, but this is still provides a >2/3 space savings
> while covering a huge (2^51.43) space.

We usually don't put update patches in MediaWiki for index size
changes. You can change the indexes on existing Wikimedia wikis at
your leisure, and we will change it in tables.sql for the benefit of
new wikis.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] New extension: Diff

2012-09-25 Thread Tim Starling
On 26/09/12 03:54, Tyler Romeo wrote:
> This looks pretty interesting. Is there a reason we don't just put this in
> the core?

It has about 50 lines of useful code wrapped in 1600 lines of
abstraction. I don't think it is the sort of style we want in the core.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Question About wfUrlEncode

2012-09-25 Thread Tyler Romeo
Hey,

So I'm working on https://gerrit.wikimedia.org/r/22167 (Uri class), and
it's failing a unit test. I know the reason it's failing the unit test is
because of wfUrlencode() (the failure only started occurring when I moved
wfUrlencode() over to the Uri class). However, I cannot figure out why it's
failing, because the actual code hasn't really changed at all (in fact some
of wfUrlencode is even copied/pasted). Maybe somebody can offer some
insight?

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Tim Starling
On 26/09/12 12:11, Tim Starling wrote:
> On 26/09/12 03:54, Asher Feldman wrote:
>>  Where an index is desired on a mysql column of base36 encoded sha1
>> hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
>> sufficient in many cases, but this is still provides a >2/3 space savings
>> while covering a huge (2^51.43) space.
> 
> We usually don't put update patches in MediaWiki for index size
> changes. You can change the indexes on existing Wikimedia wikis at
> your leisure, and we will change it in tables.sql for the benefit of
> new wikis.

Done in https://gerrit.wikimedia.org/r/#/c/25218/

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l