[Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Domas Mituzas
Hello,

I understand the need for cite, thats why it is still there :) But...

- We format Cite references list every 100th request to backend,  
though it takes 8.15% backend response time (thanks parser cache,  
without it Cite formatting would take 815% cluster time - though  
developers should understand I'm not exactly right at this hyperbole ;-)

- When parsing articles like one of most popular today,  
[[en:Rod_Blagojevich_corruption_charges]], it takes 20s to produce the  
page, 17s is spent on Cite block, executing {{cite}} mostly. That  
makes every editor wait for ages to get a page displayed, and due to  
cache stampede after invalidation it causes considerable stress on  
site (look at numbers mentioned above).

- This 8% is in real-time, which includes waiting for search,  
databases, and simply CPU contention, which we end up having today.  
CPU-time wise it is way higher, so can actually have 20% CPU time  
impact on our application farm. Thats at least 100k$ worth of hardware  
(and rising), even if new/modern one, just for citation formatting.

So, a checklist what can be done ( simple to complex )

[  ] - Simplification of {{cite}}
[  ] - Separate cache for Cite, to avoid reparsing on minor edits,  
that don't involve citations. I have no idea how much this would win,  
but there is theoretical chance of stripping 1% or so. ;)
[  ] - Offload some templates like {{cite}} to actual PHP extensions  
(can of worms, but, oh well, can be standardized process too)
[  ] - Implement proper scripting engine like Lua for metatemplates 
(http://pecl.php.net/package/lua 
  - another can of worms, though yet again, can be managed via trusted  
set of people, on top20 wikis or so).
[  ] - Frustrated operations guy adding something like ( return ; )  
in some random extension, and syncing the live hack. Obviously there  
would be some HAHA YOU THOUGHT I COULDN'T DO THIS comments in there.

I for one can directly participate in at least two of these options. ;-)

Unfortunately, {{cite}} is the only template I can profile/account for  
now, we don't have proper per-template profiling, but I wish to get  
one some day. Then we'd have more war on ... topics ;-D

Generally, templates are major part of our parsing, and thats over 50%  
of our current cluster CPU load.
As we've actually managed to hit 100% last week, something what hasn't  
happened for a while, some of work has to be done here.

Of course, new hardware will help for a while, but I for one have huge  
personal satisfaction saving donation money. ;-)

CHEERS!
-- 
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection

2009-01-31 Thread Roan Kattouw
Marcus Buck schreef:
 I just read the last category intersection discussion from December to 
 see, what's the latest state of it. While doing that, I saw, that the 
 last message in that thread was this post from Roan Kattouw, providing 
 his extension. Oddly, nobody reacted on it. After 65 posts on that 
 thread somebody posted a solution and nobody reacted.

 Can this extension be seen live anywhere?
   
Yes, actually, at http://mixesdb.com/db/index.php/Special:AdvancedSearch 
(the site I was hired to write it for in the first place).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection

2009-01-31 Thread David Gerard
2009/1/31 Roan Kattouw roan.katt...@home.nl:
 Marcus Buck schreef:

 I just read the last category intersection discussion from December to
 see, what's the latest state of it. While doing that, I saw, that the
 last message in that thread was this post from Roan Kattouw, providing
 his extension. Oddly, nobody reacted on it. After 65 posts on that
 thread somebody posted a solution and nobody reacted.
 Can this extension be seen live anywhere?

 Yes, actually, at http://mixesdb.com/db/index.php/Special:AdvancedSearch
 (the site I was hired to write it for in the first place).


Win!

So what's in the way of this going live on Wikimedia?

(Commons first?)


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection

2009-01-31 Thread Roan Kattouw
David Gerard schreef:
 Win!

 So what's in the way of this going live on Wikimedia?

 (Commons first?
As I said before, the extension was written especially for MixesDB, and 
has all kinds of features WMF wikis don't need or don't want for 
performance reasons. Also, the UI is pretty crude (note that all the 
pretty colors and help stuff was added by the MixesDB guy). The most 
useful part is the code that builds, maintains and searches a category 
index, but it hasn't been updated to include Andrew's hack for short 
words yet, and it should probably be ported to use Lucene for use on WMF 
wikis.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Jan 31, 2009 at 2:03 PM, Domas Mituzas  wrote:
 Hello,

 I understand the need for cite, thats why it is still there :) But...
 (...)
What about converting these to ref tags?

 Unfortunately, {{cite}} is the only template I can profile/account for
 now, we don't have proper per-template profiling, but I wish to get
 one some day. Then we'd have more war on ... topics ;-D
Stub templates, for example :D

 Generally, templates are major part of our parsing, and thats over 50%
 of our current cluster CPU load.
Wow. Can you compare the load to the systems with the load caused by
solely using  tags?

Marco
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2)

iD8DBQFJhG4xW6S2GapJUuQRAsQdAJ0WHP1DfI0+5BF5s0PYlHe6Ax5rPwCfRXax
f/yjmuQRbPinnl4mzvRWCtw=
=F6F1
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Robert Rohde
A long while ago I remember looking at the parser and realizing that
the recursive template expansion and argument handling led the parser
to run all branches of #if and #switch statements before deciding
which one to include.

In other words, given {{#if: something | statements_A | statements_B
}}, the parser was fully expanding both statements_A and statements_B
before checking #if to decide which one to keep.  Obviously that is
inefficient and in the case of very complicated conditional templates
potentially very expensive.

The parser has changed so much since I last worked with it that I am
having difficulty figuring out if this is still true.  Hopefully,
someone already went through and improved the branch handling logic,
but if not, I would suggest that this would also be a good generalized
target for improving template operation.

-Robert Rohde


On Sat, Jan 31, 2009 at 5:03 AM, Domas Mituzas midom.li...@gmail.com wrote:
 Hello,

 I understand the need for cite, thats why it is still there :) But...

 - We format Cite references list every 100th request to backend,
 though it takes 8.15% backend response time (thanks parser cache,
 without it Cite formatting would take 815% cluster time - though
 developers should understand I'm not exactly right at this hyperbole ;-)

 - When parsing articles like one of most popular today,
 [[en:Rod_Blagojevich_corruption_charges]], it takes 20s to produce the
 page, 17s is spent on Cite block, executing {{cite}} mostly. That
 makes every editor wait for ages to get a page displayed, and due to
 cache stampede after invalidation it causes considerable stress on
 site (look at numbers mentioned above).

 - This 8% is in real-time, which includes waiting for search,
 databases, and simply CPU contention, which we end up having today.
 CPU-time wise it is way higher, so can actually have 20% CPU time
 impact on our application farm. Thats at least 100k$ worth of hardware
 (and rising), even if new/modern one, just for citation formatting.

 So, a checklist what can be done ( simple to complex )

 [  ] - Simplification of {{cite}}
 [  ] - Separate cache for Cite, to avoid reparsing on minor edits,
 that don't involve citations. I have no idea how much this would win,
 but there is theoretical chance of stripping 1% or so. ;)
 [  ] - Offload some templates like {{cite}} to actual PHP extensions
 (can of worms, but, oh well, can be standardized process too)
 [  ] - Implement proper scripting engine like Lua for metatemplates 
 (http://pecl.php.net/package/lua
  - another can of worms, though yet again, can be managed via trusted
 set of people, on top20 wikis or so).
 [  ] - Frustrated operations guy adding something like ( return ; )
 in some random extension, and syncing the live hack. Obviously there
 would be some HAHA YOU THOUGHT I COULDN'T DO THIS comments in there.

 I for one can directly participate in at least two of these options. ;-)

 Unfortunately, {{cite}} is the only template I can profile/account for
 now, we don't have proper per-template profiling, but I wish to get
 one some day. Then we'd have more war on ... topics ;-D

 Generally, templates are major part of our parsing, and thats over 50%
 of our current cluster CPU load.
 As we've actually managed to hit 100% last week, something what hasn't
 happened for a while, some of work has to be done here.

 Of course, new hardware will help for a while, but I for one have huge
 personal satisfaction saving donation money. ;-)

 CHEERS!
 --
 Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Alex
Domas Mituzas wrote:
 
 So, a checklist what can be done ( simple to complex )
 
 [  ] - Simplification of {{cite}}

Short of significant improvements to the parser or requireing people to
ask Domas before editing the template, I can

 [  ] - Separate cache for Cite, to avoid reparsing on minor edits,  
 that don't involve citations. I have no idea how much this would win,  
 but there is theoretical chance of stripping 1% or so. ;)
 [  ] - Offload some templates like {{cite}} to actual PHP extensions  
 (can of worms, but, oh well, can be standardized process too)

I've actually considered something like this in the past, basically
creating a Cite 2.0 extension, where all the main cite options would be
in the ref tags themselves with pre-defined templates written in PHP
for web citations, book citations, etc.; this would greatly reduce the
amount of  stuff that needs to be done using the Cite wiki-templates and
run through the parser.

You would have something like:

ref author=Foo title=Bar type=bookPages 1-10/ref

Any parameters in the ref tag would be converted to HTML output using
the book template in the extension rather than a thousand parser
functions in some meta-template, and only the content of the tag (the
page numbers in this case) would have to be run through the parser, so
it would also be backwards-compatible with the current templates until
they can all be migrated.

The main downside to this is that it requires someone to file a Bugzilla
request every time a template needs changing.

 [  ] - Implement proper scripting engine like Lua for metatemplates 
 (http://pecl.php.net/package/lua 
   - another can of worms, though yet again, can be managed via trusted  
 set of people, on top20 wikis or so).
 [  ] - Frustrated operations guy adding something like ( return ; )  
 in some random extension, and syncing the live hack. Obviously there  
 would be some HAHA YOU THOUGHT I COULDN'T DO THIS comments in there.
 
 I for one can directly participate in at least two of these options. ;-)
 
 Unfortunately, {{cite}} is the only template I can profile/account for  
 now, we don't have proper per-template profiling, but I wish to get  
 one some day. Then we'd have more war on ... topics ;-D
 
 Generally, templates are major part of our parsing, and thats over 50%  
 of our current cluster CPU load.
 As we've actually managed to hit 100% last week, something what hasn't  
 happened for a while, some of work has to be done here.
 
 Of course, new hardware will help for a while, but I for one have huge  
 personal satisfaction saving donation money. ;-)
 
 CHEERS!


-- 
Alex (wikipedia:en:User:Mr.Z-man)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Drafts extension in testing

2009-01-31 Thread Bence Damokos
Hi,There seems to be an issue with the extension: It seems when I saved a
draft after editing a section, the draft was considered a draft of the
corresponding section number (it was mentioned as Article#Section name in
the list of drafts). If the given section was removed, clicking on this
saved draft  I received the error that section number 6 doesn't exist, and
thus it could not restore the draft.
Changing the page, to again contain at least 6 sections, restoring the draft
was possible, at the cost of removing the new section that has replaced it.

I believe that this is not really user friendly, even if this is intended
behaviour. (You click on a named section and receive a raw number (of the
section) in the error message; without any help message or the possibility
to restore the text of your draft is someone changes the page in the mean
time in an unexpected way).

Best regards,
Bence Damokos

On Wed, Jan 21, 2009 at 7:26 PM, Platonides platoni...@gmail.com wrote:

 Alex wrote:
  A possible option would be to have a checkbox (probably on
  Special:Drafts itself, to avoid cluttering the edit page and to avoid
  accidental clicks) to mark drafts as public. This would be especially
  useful when combined with bug 17067, the ability to create drafts of
  protected pages, a user could make a draft, mark it as public, then link
  to it for an admin to add to the page.

 I worry it goes beyond what Drafts attempted to do. So now you start
 having queues of Drafts, someone seeing the public draft shouldn't
 delete others drafts when saving, but perhaps the original draft should
 be marked as 'Foo did an edit from this'. Should the history mark the
 draft author somewhere?

 Welcome to the Wikimedia developer life, Trevor.

 :)


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Platonides
Would storing an intermediate template improve things?
I mean, keep a template but where the inner templates are substed,
depending on the original parameters.



Robert Rohde wrote:
 A long while ago I remember looking at the parser and realizing that
 the recursive template expansion and argument handling led the parser
 to run all branches of #if and #switch statements before deciding
 which one to include.
 
 In other words, given {{#if: something | statements_A | statements_B
 }}, the parser was fully expanding both statements_A and statements_B
 before checking #if to decide which one to keep.  Obviously that is
 inefficient and in the case of very complicated conditional templates
 potentially very expensive.

The new preprocessor don't follow unused branches (or so were we told ;).

http://en.wikipedia.org/wiki/Template:Citation/core screams for having loops


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Chad
On Sat, Jan 31, 2009 at 1:28 PM, Alex mrzmanw...@gmail.com wrote:

 Domas Mituzas wrote:
 
  So, a checklist what can be done ( simple to complex )
 
  [  ] - Simplification of {{cite}}

 Short of significant improvements to the parser or requireing people to
 ask Domas before editing the template, I can

  [  ] - Separate cache for Cite, to avoid reparsing on minor edits,
  that don't involve citations. I have no idea how much this would win,
  but there is theoretical chance of stripping 1% or so. ;)
  [  ] - Offload some templates like {{cite}} to actual PHP extensions
  (can of worms, but, oh well, can be standardized process too)

 I've actually considered something like this in the past, basically
 creating a Cite 2.0 extension, where all the main cite options would be
 in the ref tags themselves with pre-defined templates written in PHP
 for web citations, book citations, etc.; this would greatly reduce the
 amount of  stuff that needs to be done using the Cite wiki-templates and
 run through the parser.

 You would have something like:

 ref author=Foo title=Bar type=bookPages 1-10/ref

 Any parameters in the ref tag would be converted to HTML output using
 the book template in the extension rather than a thousand parser
 functions in some meta-template, and only the content of the tag (the
 page numbers in this case) would have to be run through the parser, so
 it would also be backwards-compatible with the current templates until
 they can all be migrated.

 The main downside to this is that it requires someone to file a Bugzilla
 request every time a template needs changing.


What about throwing them in MediaWiki: space, similar to editnotices?
At least then they could be cached to hell and back in the message
cache.

-Chad
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] ordered lists starting at a certain number

2009-01-31 Thread jidanni
Gentlemen, In wikitext I want to do ol start=6101
 lia
 lib
/ol
but http://www.w3.org/TR/html401/struct/lists.html says that is
deprecated. In fact I really want to just use # and have that
start at 6101.  OK, I'll just hard wire them into the page 6101. a
6102. b

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Alex
Chad wrote:
 On Sat, Jan 31, 2009 at 1:28 PM, Alex mrzmanw...@gmail.com wrote:
 
 Domas Mituzas wrote:
 So, a checklist what can be done ( simple to complex )

 [  ] - Simplification of {{cite}}
 Short of significant improvements to the parser or requireing people to
 ask Domas before editing the template, I can

 [  ] - Separate cache for Cite, to avoid reparsing on minor edits,
 that don't involve citations. I have no idea how much this would win,
 but there is theoretical chance of stripping 1% or so. ;)
 [  ] - Offload some templates like {{cite}} to actual PHP extensions
 (can of worms, but, oh well, can be standardized process too)
 I've actually considered something like this in the past, basically
 creating a Cite 2.0 extension, where all the main cite options would be
 in the ref tags themselves with pre-defined templates written in PHP
 for web citations, book citations, etc.; this would greatly reduce the
 amount of  stuff that needs to be done using the Cite wiki-templates and
 run through the parser.

 You would have something like:

 ref author=Foo title=Bar type=bookPages 1-10/ref

 Any parameters in the ref tag would be converted to HTML output using
 the book template in the extension rather than a thousand parser
 functions in some meta-template, and only the content of the tag (the
 page numbers in this case) would have to be run through the parser, so
 it would also be backwards-compatible with the current templates until
 they can all be migrated.

 The main downside to this is that it requires someone to file a Bugzilla
 request every time a template needs changing.

 
 What about throwing them in MediaWiki: space, similar to editnotices?
 At least then they could be cached to hell and back in the message
 cache.
 
 -Chad

I considered that as well, but I'm not sure how much that will actually
help. Looking at
http://en.wikipedia.org/wiki/Joe%20the%20Plumber?action=purgeforceprofile=true

it took 21.796 seconds to load, most of which seems be from
Parser::recursiveTagParse, about 90% of that that is from
Cite::referencesFormat-parse. Even if the templates themselves are
heavily cached, it still has to run all the conditionals and formatting
through the parser. Heavy caching might help if there's lots of refs
with the same content on multiple pages, but I don't think that's very
common.

-- 
Alex (wikipedia:en:User:Mr.Z-man)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread K. Peachey
 I understand the need for cite, thats why it is still there :) But...
 (...)
 What about converting these to ref tags?
Unfortunately most of those are designed to format the ref's to a
proper standard that we use (Harvard/MLA standard iirc) and are
designed to easily updated when we change out standards (eg: recently
the pages value changed in one of the cite templates and a bot when
though and fixed them all)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Chad
On Sat, Jan 31, 2009 at 5:37 PM, Alex mrzmanw...@gmail.com wrote:

 Chad wrote:
  On Sat, Jan 31, 2009 at 1:28 PM, Alex mrzmanw...@gmail.com wrote:
 
  Domas Mituzas wrote:
  So, a checklist what can be done ( simple to complex )
 
  [  ] - Simplification of {{cite}}
  Short of significant improvements to the parser or requireing people to
  ask Domas before editing the template, I can
 
  [  ] - Separate cache for Cite, to avoid reparsing on minor edits,
  that don't involve citations. I have no idea how much this would win,
  but there is theoretical chance of stripping 1% or so. ;)
  [  ] - Offload some templates like {{cite}} to actual PHP extensions
  (can of worms, but, oh well, can be standardized process too)
  I've actually considered something like this in the past, basically
  creating a Cite 2.0 extension, where all the main cite options would be
  in the ref tags themselves with pre-defined templates written in PHP
  for web citations, book citations, etc.; this would greatly reduce the
  amount of  stuff that needs to be done using the Cite wiki-templates and
  run through the parser.
 
  You would have something like:
 
  ref author=Foo title=Bar type=bookPages 1-10/ref
 
  Any parameters in the ref tag would be converted to HTML output using
  the book template in the extension rather than a thousand parser
  functions in some meta-template, and only the content of the tag (the
  page numbers in this case) would have to be run through the parser, so
  it would also be backwards-compatible with the current templates until
  they can all be migrated.
 
  The main downside to this is that it requires someone to file a Bugzilla
  request every time a template needs changing.
 
 
  What about throwing them in MediaWiki: space, similar to editnotices?
  At least then they could be cached to hell and back in the message
  cache.
 
  -Chad

 I considered that as well, but I'm not sure how much that will actually
 help. Looking at

 http://en.wikipedia.org/wiki/Joe%20the%20Plumber?action=purgeforceprofile=true

 it took 21.796 seconds to load, most of which seems be from
 Parser::recursiveTagParse, about 90% of that that is from
 Cite::referencesFormat-parse. Even if the templates themselves are
 heavily cached, it still has to run all the conditionals and formatting
 through the parser. Heavy caching might help if there's lots of refs
 with the same content on multiple pages, but I don't think that's very
 common.

 --
 Alex (wikipedia:en:User:Mr.Z-man)

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Throw a caching layer on top of it. Do a final expansion until final
substitution at the {{cite book}} etc level. Then you've got less to
recursively parse.

-Chad
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] – Fixing {val}

2009-01-31 Thread Aryeh Gregor
On Sat, Jan 31, 2009 at 7:12 PM, Platonides platoni...@gmail.com wrote:
 {{val}} is just a presentational template. It's trivial to create an
 equivalent, fixed, parserfunction.

We do not want to create a new parser function for every
presentational template people come up with.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] ordered lists starting at a certain number

2009-01-31 Thread Platonides
Aryeh Gregor wrote:
 On Sat, Jan 31, 2009 at 4:23 PM,  jida...@jidanni.org wrote:
 Gentlemen, In wikitext I want to do ol start=6101
  lia
  lib
 /ol
 but http://www.w3.org/TR/html401/struct/lists.html says that is
 deprecated.
 
 It's been un-deprecated in HTML5, for what that's worth.  I don't know
 whether XHTML2 has done so as well.

IMHO it should be allowed to do
ol start=6101
# a
# b
/ol

instead of having to revert to html to do this -otherwise not too
uncommon- action.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Platonides
Aryeh Gregor wrote:
 On Sat, Jan 31, 2009 at 8:03 AM, Domas Mituzas midom.li...@gmail.com wrote:
 [  ] - Implement proper scripting engine like Lua for metatemplates 
 (http://pecl.php.net/package/lua
  - another can of worms, though yet again, can be managed via trusted
 set of people, on top20 wikis or so).
 
 This seems like it's the only solution from your list that would be
 generally applicable to similar future scenarios.  I don't think the
 users would have to be particularly trusted -- just make sure that the
 runtime of the programs is limited, and that it's properly sandboxed
 (is the Lua PECL extension sandboxed?).

That would be like adding a dependancy on Lua extension for reusers, as
the core templates will be implemented in Lua.
And I don't think worth reimplementing a Lua interpreter in php...


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Fixing text encoding corruption

2009-01-31 Thread Asheesh Laroia
Just as an FYI, wiki.freeculture.org has mis-encoded UTF-8 for the better 
part of the past four years. This is because we used the old Latin 1 
schemas.

Now we don't have these problems anymore. I wrote up my notes at 
http://wiki.freeculture.org/Fixing_text_encoding_corruption , but here 
they are for y'all's convenience:

1.  freeze writes to the main wiki
2. Dump freecult_wikidb to dump.sql
3. Create a fresh MW install (just for the table schemas) in 
freecult_wikidb2
4. Create a temporary empty DB, and import dump.sql to it
5. In the temporary DB, ALTER TABLE on the text table so it has the 
same columns as freecult_wikidb2's text table
6. Dump wikidb3 and have certainty that the column names will line up 
(but don't copy the sucky old schema)
   * mysqldump --no-create-info --add-locks --complete-insert 
freecult_wikidb3  sql
7. Import that into freecult_wikidb2, skipping the tables that are 
missing
   * mysql -f freecult_wikidb2  sql
   * WATCH for errors other than skipping missing table
8. php maintenance/rebuildall.php
   * If this fails with key errors, just drop the recentchanges 
table and recreate it with the wikidb2 schema

I've poured enough of my life into this issue, but if someone else wants 
to take this up and document it better, by all means go ahead!

-- Asheesh.

-- 
It is often the case that the man who can't tell a lie thinks he is the best
judge of one.
-- Mark Twain, Pudd'nhead Wilson's Calendar

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] – Fixing {val}

2009-01-31 Thread Robert Rohde
This discussion is getting side tracked.

The real complaint here is that

{{#expr:(0.7 * 1000 * 1000) mod 1000}} is giving 69 when it should give 70.

This is NOT a formatting issue, but rather it is bug in the #expr
parser function, presumably caused by some kind of round-off error.

-Robert Rohde


On Sat, Jan 31, 2009 at 2:27 PM, Greg L greg_l_at_wikipe...@comcast.net wrote:
 Yes, {val} is a tool for making attractive and convenient scientific
 notation. The look of {{tl|val}} was discussed at length on both
 WT:MOSNUM and WT:MOS and achieved broad support for how it works and
 renders numbers. It delimits numbers with narrow spaces that aren't
 really spaces; they use CSS span tags to move characters. Thus,
 the significands can be copied and pasted into Excel where they will
 be treated as real numbers without the need to first hand-delete spaces.

 The problem with it {val} is outlined here at…

 http://en.wikipedia.org/w/index.php?title=User_talk:Jimbo_Walesoldid=260819871
  -
 Developer_support_for_parser_function

 In a nutshell, about 5 to 10% of the time, {val} gives rounding
 errors. For instance, the expression  {{val|0.55007|e=6}} will return
 a significand of 0.550069.

 This is the product of the buggy math-based parser functions it must
 use. To date, notwithstanding that Jimbo is solidly behind this, and
 that Erik supports the production of the required parser function, no
 volunteer developer has stepped up to the plate with a parser function
 that can character-counting parser function.

 Greg


 On Jan 31, 2009, at 2:17 PM, Platonides wrote:

 Greg L wrote:
 All,

 Can anyone figure out how to fix {{tl|val}} so an expression like

 {{val|0.55007|e=6}}

 …works properly?

 Greg L


 You can't figure out what it should do just from the description.
 I imagine you mean http://en.wikipedia.org/wiki/Template:Val Set of
 templates that can be used to easily present values in scientific
 notation, including uncertainty
 Another ugly template...


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] -- Fixing {val}

2009-01-31 Thread Platonides
Aryeh Gregor wrote:
 On Sat, Jan 31, 2009 at 7:12 PM, Platonides wrote:
 {{val}} is just a presentational template. It's trivial to create an
 equivalent, fixed, parserfunction.
 
 We do not want to create a new parser function for every
 presentational template people come up with.

I know, that's the problem of such approach.
Although it could be worth to parserify a set of stable core templates.
Not only would they be faster, they would be more readable.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] – Fixing {val}

2009-01-31 Thread Aryeh Gregor
On Sat, Jan 31, 2009 at 8:33 PM, Robert Rohde raro...@gmail.com wrote:
 This discussion is getting side tracked.

 The real complaint here is that

 {{#expr:(0.7 * 1000 * 1000) mod 1000}} is giving 69 when it should give 
 70.

 This is NOT a formatting issue, but rather it is bug in the #expr
 parser function, presumably caused by some kind of round-off error.

$ php -r 'echo (0.7 * 1000 * 1000) % 1000 . \n;'
69
$ php -r 'echo (int)(0.7 * 1000) . \n;'
699

The issue is bog-standard floating-point error.  If PHP has a decent
library for exact-precision arithmetic, we could probably use that.
Otherwise, template programmers will have to learn how floating-point
numbers work just like all other programmers in the universe.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] -- Fixing {val}

2009-01-31 Thread greg_l_at_wikipedia
Aryeh, this reaction of “We do not want to create a new parser  
function for every presentational template people come up with” is  
understandable. However, I understand that a character-counting parser  
function in another form has been in the works for a long time but  
hasn’t proven to be reliable enough to be released into the wild.

If someone could finally develop a bullet-proof character-counting  
parser function, I’m quite certain that a number of valuable uses  
could be found for it. That is why I encourage the writing of a parser  
function over the effort of writing a developer’s version of a  
template that doesn’t work very well. The only reason {val} doesn’t  
work well is because it must rely upon math-based parser functions  
that produce rounding errors. Having said that…

The MOS and MOSNUM community has waited seven months for a version of  
{val} that works well for all numbers—even ones that are really big.  
Any developer who is willing to tackle this issue, regardless of  
whether it is a parser function or a revised version of {val}, would  
be most welcome. However, both Jimbo Wales (in particular) as well as  
Erik seemed to think the best way to leverage developer effort would  
be to produce the character-counting parser function as this would  
enable the production of template tools we haven’t even conceived of  
yet.

On Jan 31, 2009, at 5:30 PM, Platonides wrote:

Aryeh Gregor wrote:
 On Sat, Jan 31, 2009 at 7:12 PM, Platonides wrote:
 {{val}} is just a presentational template. It's trivial to create an
 equivalent, fixed, parserfunction.

 We do not want to create a new parser function for every
 presentational template people come up with.

I know, that's the problem of such approach.
Although it could be worth to parserify a set of stable core templates.
Not only would they be faster, they would be more readable.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] -- Fixing {val}

2009-01-31 Thread Aryeh Gregor
On Sat, Jan 31, 2009 at 8:53 PM, greg_l_at_wikipedia
greg_l_at_wikipe...@comcast.net wrote:
 Aryeh, this reaction of We do not want to create a new parser
 function for every presentational template people come up with is
 understandable. However, I understand that a character-counting parser
 function in another form has been in the works for a long time but
 hasn't proven to be reliable enough to be released into the wild.

It would be trivial to write up such a function, and in fact plenty of
people have.  I could add it right now in five minutes.  The question
is whether it's desirable to make templates into more of a
full-fledged programming language than they already are.  There's been
reluctance on many people's part to do that.  Personally, I think
they're close enough anyway so that you may as well give them some
basic string functions like {{#len:}}, if the Lua proposal isn't
accepted.

 The only reason {val} doesn't
 work well is because it must rely upon math-based parser functions
 that produce rounding errors.

As I said in my other response, the exact same errors occur in PHP,
and the same type of error occurs in all programming languages.  If
you aren't familiar with floating-point calculations, see:

http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

In a real programming language, of course, there would be workarounds
like defining new data types, whereas in template programming that
would be tricky.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Daniel Friesen
^_^ Wikipedia is already a horrible place to copy templates from. Unlike 
Wikipedia most other MW installations don't bother turning on Tidy, and 
Wikipedia abuses that /feature/ way to much.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://nadir-seen-fire.com]
-Nadir-Point (http://nadir-point.com)
-Wiki-Tools (http://wiki-tools.com)
-MonkeyScript (http://monkeyscript.nadir-point.com)
-Animepedia (http://anime.wikia.com)
-Narutopedia (http://naruto.wikia.com)
-Soul Eater Wiki (http://souleater.wikia.com)



Aryeh Gregor wrote:
 On Sat, Jan 31, 2009 at 8:19 PM, Platonides platoni...@gmail.com wrote:
   
 That would be like adding a dependancy on Lua extension for reusers, as
 the core templates will be implemented in Lua.
 

 Yes, that would be the major disadvantage I can see.  In practice,
 nobody can reuse large chunks of Wikipedia content on shared hosting
 anyway, since it's way too big, but it would be a serious obstacle for
 people who want to reuse only parts of Wikipedia.
   


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] ordered lists starting at a certain number

2009-01-31 Thread Aryeh Gregor
On Sat, Jan 31, 2009 at 10:12 PM, Daniel Friesen dan_the_...@telus.net wrote:
 Someone needs to read a good WP article before they start mentioning
 (X)HTML version numbers:
 http://en.wikipedia.org/wiki/XHTML

Both HTML5 and XHTML2 are successors to HTML4.  That's all that's
really relevant here.  HTML5 has un-deprecated the start attribute
of ol, so nobody should be worrying about HTML4's deprecation of it.
 (XHTML2 does appear to have removed the attribute, so I guess you
could worry about it if you plan to move to XHTML2 in the future.  But
probably nobody is going to use XHTML2, and MediaWiki almost certainly
isn't.)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] – Fixing {val}

2009-01-31 Thread Robert Rohde
On Sat, Jan 31, 2009 at 5:43 PM, Aryeh Gregor
simetrical+wikil...@gmail.com wrote:
 On Sat, Jan 31, 2009 at 8:33 PM, Robert Rohde raro...@gmail.com wrote:
 This discussion is getting side tracked.

 The real complaint here is that

 {{#expr:(0.7 * 1000 * 1000) mod 1000}} is giving 69 when it should give 
 70.

 This is NOT a formatting issue, but rather it is bug in the #expr
 parser function, presumably caused by some kind of round-off error.

 $ php -r 'echo (0.7 * 1000 * 1000) % 1000 . \n;'
 69
 $ php -r 'echo (int)(0.7 * 1000) . \n;'
 699

 The issue is bog-standard floating-point error.  If PHP has a decent
 library for exact-precision arithmetic, we could probably use that.
 Otherwise, template programmers will have to learn how floating-point
 numbers work just like all other programmers in the universe.

In r46671 I have added an explicit test for floating point numbers
that are within 1 part in 10^10 of integers before performing
round-off sensitive conversions and comparisons.

This should eliminate these errors in many cases.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Character-counting parser function

2009-01-31 Thread greg_l_at_wikipedia
As I understand it, there is rightfully little interest in the  
developer community to write a new parser function for every single  
template need to come along.

Therefore, when it comes to a template like {{val}}, which now  
generates rounding errors about 5–10% of the time because of the math- 
based parser functions it must use, it would be nice if the template- 
authoring community could have a character-counting parser function  
that is not only suitable for {{val}}, but which could be a general - 
purpose parser function that could be used for a great variety of  
purposes.

A description of what {{val}} tries to do at its fundamental level is  
described here:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29

Is there a developer whom I can have the author of {{val}} e-mail to  
see if you two can arrive at a relatively easy-to-make parser function  
that A) meets the basic needs of {{val}}, and B) has sufficient  
utility to be useful for other character-counting needs?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Gerard Meijssen
Hoi,
Let us please appreciate what is being said here: Wikipedia is a horrible
place to copy templates from. We pride ourselves of being open source and
the current templates make us as bad as the worst proprietary vendor. We
have what is effectively an API and it is not documented at all.
Thanks,
  GerardM

2009/2/1 Daniel Friesen dan_the_...@telus.net

 ^_^ Wikipedia is already a horrible place to copy templates from. Unlike
 Wikipedia most other MW installations don't bother turning on Tidy, and
 Wikipedia abuses that /feature/ way to much.

 ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://nadir-seen-fire.com]
 -Nadir-Point (http://nadir-point.com)
 -Wiki-Tools (http://wiki-tools.com)
 -MonkeyScript (http://monkeyscript.nadir-point.com)
 -Animepedia (http://anime.wikia.com)
 -Narutopedia (http://naruto.wikia.com)
 -Soul Eater Wiki (http://souleater.wikia.com)



 Aryeh Gregor wrote:
  On Sat, Jan 31, 2009 at 8:19 PM, Platonides platoni...@gmail.com
 wrote:
 
  That would be like adding a dependancy on Lua extension for reusers, as
  the core templates will be implemented in Lua.
 
 
  Yes, that would be the major disadvantage I can see.  In practice,
  nobody can reuse large chunks of Wikipedia content on shared hosting
  anyway, since it's way too big, but it would be a serious obstacle for
  people who want to reuse only parts of Wikipedia.
 


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Chad
How is the api not documented? Between the docs on
Mediawiki.org and the fact that every parameter is
documented (with examples), I'd say its highly
documented.

-Chad

On Feb 1, 2009 12:18 AM, Gerard Meijssen gerard.meijs...@gmail.com
wrote:

Hoi,
Let us please appreciate what is being said here: Wikipedia is a horrible
place to copy templates from. We pride ourselves of being open source and
the current templates make us as bad as the worst proprietary vendor. We
have what is effectively an API and it is not documented at all.
Thanks,
 GerardM

2009/2/1 Daniel Friesen dan_the_...@telus.net

 ^_^ Wikipedia is already a horrible place to copy templates from. Unlike 
Wikipedia most other M...
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Robert Rohde
On Sat, Jan 31, 2009 at 9:16 PM, Gerard Meijssen
gerard.meijs...@gmail.com wrote:
 Hoi,
 Let us please appreciate what is being said here: Wikipedia is a horrible
 place to copy templates from. We pride ourselves of being open source and
 the current templates make us as bad as the worst proprietary vendor. We
 have what is effectively an API and it is not documented at all.
 Thanks,
  GerardM

Actually, I think Daniel had a somewhat different point.

Wikimedia uses Tidy which does a good job at closing dangling format
tags.  A very substantial fraction of our templates actually have
dangling divs, and tables, and other bad syntax that Tidy is covering
up for us.  Anyone who has ever tried to copy Wikimedia templates into
a wiki with Tidy turned off (the default setting) knows that many of
our templates will actually return a lot of junk.

Strictly speaking it should be the editors' job to properly close
tables and divs, etc., but because Tidy is so good at it they don't
have to, which makes our wikicode less portable.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread K. Peachey
 How is the api not documented? Between the docs on
 Mediawiki.org and the fact that every parameter is
 documented (with examples), I'd say its highly
 documented.
I think he means on wiki, most people probably won't know to look for
information on how to use it at the main/official mediawiki wiki and
just go by the scraps they can find on whatever local wiki they are on
(in this case en.wiki).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] war on Cite/{{cite}}

2009-01-31 Thread Chad
Then that's solely enwikis fault for having poor docs.
If developers have documented where expected (in
code and on mw.org) then they've done their part.

-Chad

On Feb 1, 2009 12:33 AM, K. Peachey p858sn...@yahoo.com.au wrote:

 How is the api not documented? Between the docs on  Mediawiki.org and the
fact that every paramet...
I think he means on wiki, most people probably won't know to look for
information on how to use it at the main/official mediawiki wiki and
just go by the scraps they can find on whatever local wiki they are on
(in this case en.wiki).

___ Wikitech-l mailing list
wikitec...@lists.wikimedia
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Character-counting parser function

2009-01-31 Thread Daniel Friesen
Output a big red error when giving numbers that will encounter a 
floating point error?

Perhaps also provide a # of use limited #expr equivalent that will use a 
bignum library rather than normal numbers which can be used in cases 
where that big red error shows up.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://nadir-seen-fire.com]
-Nadir-Point (http://nadir-point.com)
-Wiki-Tools (http://wiki-tools.com)
-MonkeyScript (http://monkeyscript.nadir-point.com)
-Animepedia (http://anime.wikia.com)
-Narutopedia (http://naruto.wikia.com)
-Soul Eater Wiki (http://souleater.wikia.com)



Robert Rohde wrote:
 On Sat, Jan 31, 2009 at 9:39 PM, Tim Starling tstarl...@wikimedia.org wrote:
   
 greg_l_at_wikipedia wrote:
 
 As I understand it, there is rightfully little interest in the
 developer community to write a new parser function for every single
 template need to come along.

 Therefore, when it comes to a template like {{val}}, which now
 generates rounding errors about 5–10% of the time because of the math-
 based parser functions it must use, it would be nice if the template-
 authoring community could have a character-counting parser function
 that is not only suitable for {{val}}, but which could be a general -
 purpose parser function that could be used for a great variety of
 purposes.
   
 I would rather have an application-specific number formatting function,
 rather than a character-counting function. It could be similar to PHP's
 number_format(). Wikitext is a terrible programming language, slow to
 execute and hard to understand. It's much better to write in PHP.
 

 We already have {{formatnum:}} with a very limited functionality that
 presumably could be extended.

 Though I would like to re-emphasize that Greg's complaint principally
 arrises because of floating point round-off errors in #expr that are
 difficult for normal editors to predict or plan for, and that should
 be addressed irrespective of other work to improve number formatting.

 -Robert Rohde
   


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l