Re: [Wikitech-l] MATH markup question

2011-01-24 Thread Aryeh Gregor
On Sun, Jan 23, 2011 at 4:24 PM, Maury Markowitz
maury.markow...@gmail.com wrote:
 I used to think that too. Then I looked at the examples on the wiki
 page on the issue. Although I find TeX rather opaque, a much worst
 issue is obscurity through verbosity, which not only makes the formula
 difficult to understand, but the entire source of the article too.
 That's why I don't use CITE either.

We'd be talking about translating LaTeX input to MathML output
automatically here -- no MathML input in the wikitext.  It's certainly
true that MathML is human-readable barely if at all, but it's more
human-readable than PNG, which is what we output now.  ;)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] HTML math rendering… easy upgrade?

2011-01-24 Thread Maury Markowitz
Take a look at:

http://en.wikipedia.org/wiki/Headway

Note that when the HTML renderer has to make a fraction, it leaves way
too much whitespace between the numerator and denominator. I realize
why this is happening, but can't this be adjusted with CSS?

Maury

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MATH markup question

2011-01-24 Thread Maury Markowitz
On Mon, Jan 24, 2011 at 8:09 AM, Aryeh Gregor
simetrical+wikil...@gmail.com wrote:
 We'd be talking about translating LaTeX input to MathML output
 automatically here -- no MathML input in the wikitext.

Ahhh, I get it. And yes, that does make sense to me.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Google Summer of Code 2011

2011-01-24 Thread Jeroen De Dauw
Hey all,

The Google Summer of Code 2011 program has been announced [0]. I'm assuming
the WMF will be participating like last years; can someone confirm this so
the GSoC 2011 page [1] can be updated?

Fun fact: since that page exists since last GSoC, it's now one of the top
results when doing a Google search for summer of code 2011 :)

[0] http://socghop.appspot.com/
[1]
https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Summer_of_Code_2011

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Platonides
Happy-melon wrote:
 Eeeww
 
 What's any different between this and a {{#author: }} parser function apart 
 from the inability to access it from the wikitext?  As noted, it's perfectly 
 possible for the data to be in a separate field on the upload form, either 
 by default or by per-wiki hackery.  This is likely to result in as many why 
 can't I edit the bits of wikitext which diff, history, transclusion (let's 
 not forget the enormous can of worms mucking around with the wikitext will 
 open up there), etc assure me is there?? questions as it solves what does 
 this brace structure do? ones.
 
 --HM

Good point about transclusion.
That question wouldn't be asked since they would be editable above, just
in a different input box than the main content.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Michael Dale
On 01/22/2011 01:15 PM, Bryan Tong Minh wrote:
 Handling metadata separately from wikitext provides two main
 advantages: it is much more user friendly, and it allows us to
 properly validate and parse data.

This assumes wikitext is simply a formatting language, really its a data
storage, structure and presentation language. You can already see this
in place by the evolution of templates as both data and presentation
containers. It seems like a bad idea to move away from leveraging
flexible data properties used in presentation.

In commons for we have Template:Information that links out into numerous
data triples for assets presentation. ( ie Template:Artwork,
Template:Creator,  Template:Book with sub data relationships like
Artwork.Location referencing the Institution template. If tied to SMW
backed you could say give me artwork in room Pavillion de Beauvais at
the louvre, that is missing a created on date.

We should focus on apis for template editing,
Extension:Page_Object_Model seemed like a step in the right direction
but not  Something that let you edit structured data across nested
template objects and we could stack validation ontop of that would let
us leverage everything that has been done and keep things wide open for
what's done in the future.

Most importantly we need clean high level apis that we can build GUIs
on, so that the flexibility of the system does not hurt usability and
functionality.

 Having a clear separate input text field Author:  is much more
 user friendly {{#fileauthor:}}, which is so to say, a type of obscure
 MediaWiki jargon. I know that we could probably hide it behind a
 template, but that is still not as friendly as a separate field. I
 keep on hearing that especially for newbies, a big blob of wikitext is
 plain scary. We regulars may be able to quickly parse the structure in
  {{Information}}, but for newbies this is certainly not so clear.
 We actually see that from the community there is a demand for
 separating the meta data from the wikitext -- this is after all why
 they implemented the uselang= hacked upload form with a separate text
 box for every meta field.

I don't know... see all the templates mentioned above... To be sure, I
think we need better interfaces for interacting with templates.

 Also, a separate field allows MediaWiki to understand what a certain
 input really means. {{#fileauthor:[[User:Bryan]]}} means nothing to
 MediaWiki or re-users, but Author: Bryan___ [checkbox] This is a
 Commons username can be parsed by MediaWiki to mean something. It
 also allows us to mass change for example the author. If I want to
 change my attribution from Bryan to Bryan Tong Minh, I would need
 to edit the wikitext of every single upload, whereas in the new system
 I go to Special:AuthorManager and change the attribution.

A semantic mediwiki like system retains this meaning for mediawiki to
interact with at any stage of data [re]presentation, and of course
supports flexible meaning types.

 Similar to categories, and all otheruser edited metadata.
 Categories is a good example of why metadata does not belong in the
 wikitext. If you have ever tried renaming a category... you need to
 edit every page in the category and rename it in the wikitext. Commons
 is running multiple bots to handle category rename requests.

 All these advantage outweigh the pain of migration (which could
 presumably be handled by bots) in my opinion.

Unless your category was template driven, in which case you just update
the template ;) If your category was instead magically associated with
the page outside of template built wiki page text, how do you build
procedurally build data associations?


--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] user email validation ready

2011-01-24 Thread Ashar Voultoiz
Hi,

We got the email validation stuff sorted out properly tonight. We even 
have javascript tests (thanks Krinkle)!

Revisions got reviewed by Brion and bugs 959  22449 are now fixed.

I opened bug https://bugzilla.wikimedia.org/26910 as a merge request for 
Roan.

Thanks everyone!

-- 
Ashar Voultoiz


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-24 Thread Conrad Irwin
Out of interest, do you know what percentage of emails in the database
don't validate under the new scheme?

Conrad

On 24 January 2011 13:55, Ashar Voultoiz hashar+...@free.fr wrote:
 Hi,

 We got the email validation stuff sorted out properly tonight. We even
 have javascript tests (thanks Krinkle)!

 Revisions got reviewed by Brion and bugs 959  22449 are now fixed.

 I opened bug https://bugzilla.wikimedia.org/26910 as a merge request for
 Roan.

 Thanks everyone!

 --
 Ashar Voultoiz


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-24 Thread Brion Vibber
On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin conrad.ir...@gmail.comwrote:

 Out of interest, do you know what percentage of emails in the database
 don't validate under the new scheme?


That's actually a wise thing to check -- most fails will probably be
legitimately bogus entries, but if we can find any that don't validate but
*do* work (eg they've been confirmed as functional) that's info we need to
report upstream as well -- the new code is using the  specs for HTML 5's
client-side form validation, which is starting to go into the latest
generation of browsers.

In theory the validation rules should be pretty liberal, and you should need
to do something very esoteric to not pass. (The old validation regexes from
~2004-2005 got kicked out for failing to deal with things like '+' which
turned out to be more common than we thought.)

Folks actually already pushed a fix upstream to the whatwg spec page to
allow single-part domains like 'localhost', needed for local-network testing
and perhaps some weird intranet setups.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Krinkle
Before I respond to the recent new ideas, concepts and suggestions.  
I'd like to
explain a few things about the backend (atleast the way it's currently  
planned to be)

The mw_authors table contains unique authors by either a name or a  
userid.
And optionally a custom attribution can be given (fallback to  
authorname, user real_name or user_name)
Also optionally a url can be given (fallback to nothing or userpage).

The mw_license table contains the different licenses a wiki allows to  
be used.
Their canonical name (eg. GFDL, CC-BY-SA-3.0 etc.), url to legal  
code and usage count[1].

mw_file_props is a table that keeps previous versions of file_props as  
well. And is linked to
mw_revision by fp_id in rev_fileprops_id (like mw_text is linked in  
rev_text_id).

Both authors and licenses are uniquely identified by their id. This  
makes it easy to change stuff later on
in an AuthorManager (eg. different url, username change etc.). The  
texts and complete titles of the
licenses are stored in interface messages (for internationalization).  
MediaWiki:License-uniq-text could
for example contain {{Cc-by-sa-3.0|attribution=$2}} on Wikimedia  
Commons.

-

If we store the links in the wikitext (like {{#fileauthor:}} and  
{{#filelicense:}}, the advantages are basically
two things:
1) It has all features of editing and revisioning (better history,  
edit conflict, diff view, etc.)
2) No need for a revisionized mw_file_props, we can store the current  
values in mw_page_props

Possible down side is that a diff like
- {{#fileauthor:2}} {{filelicense:12}}
+ {{#fileauthor:10}} {{#fileauthor:12}} {{#filelicense:
doesn't mean very much. I.m.h.o The solution is not to store the  
actual names in wikitext so that
the diffs are better, but to either not store it in wikitext at all,  
or customize the behaviour everywhere:
* edit form: extract parserfunction calls from wikitext before  
anything else,
and put it in seperate form elements
* diff view: get the names of those authors and licenses and somehow  
include it in the diff view
This could be done a bit like AbuseFilter's diff between filter 
 
versions (ie. before Line 1,
would be Author and License)
* saving form: convert back to {{#parserfunction:}} calls and  
prepending it to wikitext
* action=raw: ?
* action=render: ?
* api-parse: ?
right now I think storing it in wikitext and customizing it everywhere  
like shown above is not worth
the trouble and would likely bring it's own troubles. Keeping it  
seperate from wikitext is more work
once but I think it pays off. But again, nothing is final yet.  
Everything is possible.

--
Krinkle


[1]: The usage count (mw_license.lic_count) is a bit like edit count  
(increased/decreased when saving files)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-24 Thread Billinghurst
It would seem that the bugzilla
   https://bugzilla.wikimedia.org/show_bug.cgi?id=23710
would fall under that category, and to note that it is still marked as  
new.  Can it be tied to this process?

Regards, Andrew


Quoting Brion Vibber br...@pobox.com:

 On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin conrad.ir...@gmail.comwrote:

 Out of interest, do you know what percentage of emails in the database
 don't validate under the new scheme?


 That's actually a wise thing to check -- most fails will probably be
 legitimately bogus entries, but if we can find any that don't validate but
 *do* work (eg they've been confirmed as functional) that's info we need to
 report upstream as well -- the new code is using the  specs for HTML 5's
 client-side form validation, which is starting to go into the latest
 generation of browsers.

 In theory the validation rules should be pretty liberal, and you should need
 to do something very esoteric to not pass. (The old validation regexes from
 ~2004-2005 got kicked out for failing to deal with things like '+' which
 turned out to be more common than we thought.)

 Folks actually already pushed a fix upstream to the whatwg spec page to
 allow single-part domains like 'localhost', needed for local-network testing
 and perhaps some weird intranet setups.

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l





This message was sent using iSage/AuNix webmail
http://www.isage.net.au/




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-24 Thread Platonides
Brion Vibber wrote:
 On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin conrad.ir...@gmail.comwrote:
 
 Out of interest, do you know what percentage of emails in the database
 don't validate under the new scheme?

 
 That's actually a wise thing to check -- most fails will probably be
 legitimately bogus entries, but if we can find any that don't validate but
 *do* work (eg they've been confirmed as functional) that's info we need to
 report upstream as well -- the new code is using the  specs for HTML 5's
 client-side form validation, which is starting to go into the latest
 generation of browsers.
 
 In theory the validation rules should be pretty liberal, and you should need
 to do something very esoteric to not pass. (The old validation regexes from
 ~2004-2005 got kicked out for failing to deal with things like '+' which
 turned out to be more common than we thought.)
 
 Folks actually already pushed a fix upstream to the whatwg spec page to
 allow single-part domains like 'localhost', needed for local-network testing
 and perhaps some weird intranet setups.
 
 -- brion

The original spec had feedback based precisely on enwiki numbers.
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/00.html

So about 100? Note that there are invalid addresses marked as confirmed
in wikipedia.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Platonides
Krinkle wrote:
 Before I respond to the recent new ideas, concepts and suggestions.  
 I'd like to
 explain a few things about the backend (atleast the way it's currently  
 planned to be)
 
 The mw_authors table contains unique authors by either a name or a  
 userid.
 And optionally a custom attribution can be given (fallback to  
 authorname, user real_name or user_name)
 Also optionally a url can be given (fallback to nothing or userpage).
 
 The mw_license table contains the different licenses a wiki allows to  
 be used.
 Their canonical name (eg. GFDL, CC-BY-SA-3.0 etc.), url to legal  
 code and usage count[1].
 
 mw_file_props is a table that keeps previous versions of file_props as  
 well. And is linked to
 mw_revision by fp_id in rev_fileprops_id (like mw_text is linked in  
 rev_text_id).
 
 Both authors and licenses are uniquely identified by their id. This  
 makes it easy to change stuff later on
 in an AuthorManager (eg. different url, username change etc.). The  
 texts and complete titles of the
 licenses are stored in interface messages (for internationalization).  
 MediaWiki:License-uniq-text could
 for example contain {{Cc-by-sa-3.0|attribution=$2}} on Wikimedia  
 Commons.
 
 -
 
 If we store the links in the wikitext (like {{#fileauthor:}} and  
 {{#filelicense:}}, the advantages are basically
 two things:
 1) It has all features of editing and revisioning (better history,  
 edit conflict, diff view, etc.)
 2) No need for a revisionized mw_file_props, we can store the current  
 values in mw_page_props
 
 Possible down side is that a diff like
 - {{#fileauthor:2}} {{filelicense:12}}
 + {{#fileauthor:10}} {{#fileauthor:12}} {{#filelicense:
 doesn't mean very much. I.m.h.o The solution is not to store the  
 actual names in wikitext so that
 the diffs are better, but to either not store it in wikitext at all,  
 or customize the behaviour everywhere:

Why? Storing the property filelicense: GPL directly in wikitext is not
bad. It's also a relief when we want to delete licenses later.
Same with Author. Take that as a key into a NS_AUTHOR namespace.
Going to Special:LicenseManager/5 in order to change GPL license data is
just added complexity over using the short name GPL.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-24 Thread Brion Vibber
On Mon, Jan 24, 2011 at 3:50 PM, Billinghurst billinghu...@gmail.comwrote:

 It would seem that the bugzilla
   https://bugzilla.wikimedia.org/show_bug.cgi?id=23710
 would fall under that category, and to note that it is still marked as
 new.  Can it be tied to this process?


That's an issue about clickable links in the body of outgoing mails
generated by the system, and is not related to the format or validation of
email addresses.

It should be addressed (either by ensuring that links inserted into email
are escaped clearly, or that they're arranged nicely in brackets that email
clients commonly understand as delimiters, or by supplementing the plaintext
emails with HTML emails that can mark their links explicitly) but is an
entirely separate issue.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-24 Thread Brion Vibber
On Mon, Jan 24, 2011 at 4:02 PM, Platonides platoni...@gmail.com wrote:

 The original spec had feedback based precisely on enwiki numbers.
 http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/00.html

 So about 100? Note that there are invalid addresses marked as confirmed
 in wikipedia.


Ok so from the breakdown at
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.htmlwith
202 email address records that were marked as confirmed, but failed
the
proposed validation check at the time and couldn't be corrected by stripping
whitespace:


 The breakdown of the 202 is as follows.

Reordered into:

Now allowed by the current revision of the HTML 5 spec as implemented in
User::isValidEmailAddr:
 Single trailing dot in local part: 40 (prohibited by RFC but plausibly
deliverable)
 Multiple consecutive dots: 20 (prohibited by RFC but plausibly
deliverable)

Easily correctable by the user removing the extra bits upon being prompted,
as doing so would not change the actual delivery:
 * Single trailing dot in domain part: 100 (prohibited by RFC but plausibly
deliverable)
 Valid address in angle brackets (with other junk around it): 21
 (permitted by RFC, kind of, and plausibly deliverable)
 * Comment: 3 (permitted by RFC and plausibly deliverable)

v LINE OF DOOM ---v

Clearly wrong in typical context, should indeed be rejected (or changed to
@localhost for legit cases):
 * No @: 9 (unlikely to be deliverable)

Not quite sure what's going on but most look like stray chars that would be
ignored or else invalid and possibly bogusly marked as confirmed:
 * Miscellaneous: 9 (one containing [NO]@[SPAM], two with trailing ,
 one in quotes, one with single leading dot in local part, two with
 single leading comma in local part, one with leading : , one with
 leading \)


So from the August 2009 survey on English Wikipedia, that leaves 18 email
addresses out of over 3 million listed as confirmed, of which a few *might*
be deliverable addresses that could not be fixed by the user tweaking them
during input (ie, they actually rely on those extra chars being there in
order to be delivered to the right person).

To me it sounds like we're pretty good with this; it wouldn't hurt to make
sure that existing addresses that are stored funny (eg with extra whitespace
or trailing dots on the domain name) continue to work as long as they've
been previously.

Also wouldn't hurt to do a current survey, and to include some other
language sites.


Of interest -- gmail's validation rules were also posted in that thread:
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022268.html

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l