Re: [Wikitech-l] MATH markup question
On Sun, Jan 23, 2011 at 4:24 PM, Maury Markowitz maury.markow...@gmail.com wrote: I used to think that too. Then I looked at the examples on the wiki page on the issue. Although I find TeX rather opaque, a much worst issue is obscurity through verbosity, which not only makes the formula difficult to understand, but the entire source of the article too. That's why I don't use CITE either. We'd be talking about translating LaTeX input to MathML output automatically here -- no MathML input in the wikitext. It's certainly true that MathML is human-readable barely if at all, but it's more human-readable than PNG, which is what we output now. ;) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] HTML math rendering… easy upgrade?
Take a look at: http://en.wikipedia.org/wiki/Headway Note that when the HTML renderer has to make a fraction, it leaves way too much whitespace between the numerator and denominator. I realize why this is happening, but can't this be adjusted with CSS? Maury ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MATH markup question
On Mon, Jan 24, 2011 at 8:09 AM, Aryeh Gregor simetrical+wikil...@gmail.com wrote: We'd be talking about translating LaTeX input to MathML output automatically here -- no MathML input in the wikitext. Ahhh, I get it. And yes, that does make sense to me. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Google Summer of Code 2011
Hey all, The Google Summer of Code 2011 program has been announced [0]. I'm assuming the WMF will be participating like last years; can someone confirm this so the GSoC 2011 page [1] can be updated? Fun fact: since that page exists since last GSoC, it's now one of the top results when doing a Google search for summer of code 2011 :) [0] http://socghop.appspot.com/ [1] https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Summer_of_Code_2011 Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. -- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
Happy-melon wrote: Eeeww What's any different between this and a {{#author: }} parser function apart from the inability to access it from the wikitext? As noted, it's perfectly possible for the data to be in a separate field on the upload form, either by default or by per-wiki hackery. This is likely to result in as many why can't I edit the bits of wikitext which diff, history, transclusion (let's not forget the enormous can of worms mucking around with the wikitext will open up there), etc assure me is there?? questions as it solves what does this brace structure do? ones. --HM Good point about transclusion. That question wouldn't be asked since they would be editable above, just in a different input box than the main content. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
On 01/22/2011 01:15 PM, Bryan Tong Minh wrote: Handling metadata separately from wikitext provides two main advantages: it is much more user friendly, and it allows us to properly validate and parse data. This assumes wikitext is simply a formatting language, really its a data storage, structure and presentation language. You can already see this in place by the evolution of templates as both data and presentation containers. It seems like a bad idea to move away from leveraging flexible data properties used in presentation. In commons for we have Template:Information that links out into numerous data triples for assets presentation. ( ie Template:Artwork, Template:Creator, Template:Book with sub data relationships like Artwork.Location referencing the Institution template. If tied to SMW backed you could say give me artwork in room Pavillion de Beauvais at the louvre, that is missing a created on date. We should focus on apis for template editing, Extension:Page_Object_Model seemed like a step in the right direction but not Something that let you edit structured data across nested template objects and we could stack validation ontop of that would let us leverage everything that has been done and keep things wide open for what's done in the future. Most importantly we need clean high level apis that we can build GUIs on, so that the flexibility of the system does not hurt usability and functionality. Having a clear separate input text field Author: is much more user friendly {{#fileauthor:}}, which is so to say, a type of obscure MediaWiki jargon. I know that we could probably hide it behind a template, but that is still not as friendly as a separate field. I keep on hearing that especially for newbies, a big blob of wikitext is plain scary. We regulars may be able to quickly parse the structure in {{Information}}, but for newbies this is certainly not so clear. We actually see that from the community there is a demand for separating the meta data from the wikitext -- this is after all why they implemented the uselang= hacked upload form with a separate text box for every meta field. I don't know... see all the templates mentioned above... To be sure, I think we need better interfaces for interacting with templates. Also, a separate field allows MediaWiki to understand what a certain input really means. {{#fileauthor:[[User:Bryan]]}} means nothing to MediaWiki or re-users, but Author: Bryan___ [checkbox] This is a Commons username can be parsed by MediaWiki to mean something. It also allows us to mass change for example the author. If I want to change my attribution from Bryan to Bryan Tong Minh, I would need to edit the wikitext of every single upload, whereas in the new system I go to Special:AuthorManager and change the attribution. A semantic mediwiki like system retains this meaning for mediawiki to interact with at any stage of data [re]presentation, and of course supports flexible meaning types. Similar to categories, and all otheruser edited metadata. Categories is a good example of why metadata does not belong in the wikitext. If you have ever tried renaming a category... you need to edit every page in the category and rename it in the wikitext. Commons is running multiple bots to handle category rename requests. All these advantage outweigh the pain of migration (which could presumably be handled by bots) in my opinion. Unless your category was template driven, in which case you just update the template ;) If your category was instead magically associated with the page outside of template built wiki page text, how do you build procedurally build data associations? --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] user email validation ready
Hi, We got the email validation stuff sorted out properly tonight. We even have javascript tests (thanks Krinkle)! Revisions got reviewed by Brion and bugs 959 22449 are now fixed. I opened bug https://bugzilla.wikimedia.org/26910 as a merge request for Roan. Thanks everyone! -- Ashar Voultoiz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] user email validation ready
Out of interest, do you know what percentage of emails in the database don't validate under the new scheme? Conrad On 24 January 2011 13:55, Ashar Voultoiz hashar+...@free.fr wrote: Hi, We got the email validation stuff sorted out properly tonight. We even have javascript tests (thanks Krinkle)! Revisions got reviewed by Brion and bugs 959 22449 are now fixed. I opened bug https://bugzilla.wikimedia.org/26910 as a merge request for Roan. Thanks everyone! -- Ashar Voultoiz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] user email validation ready
On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin conrad.ir...@gmail.comwrote: Out of interest, do you know what percentage of emails in the database don't validate under the new scheme? That's actually a wise thing to check -- most fails will probably be legitimately bogus entries, but if we can find any that don't validate but *do* work (eg they've been confirmed as functional) that's info we need to report upstream as well -- the new code is using the specs for HTML 5's client-side form validation, which is starting to go into the latest generation of browsers. In theory the validation rules should be pretty liberal, and you should need to do something very esoteric to not pass. (The old validation regexes from ~2004-2005 got kicked out for failing to deal with things like '+' which turned out to be more common than we thought.) Folks actually already pushed a fix upstream to the whatwg spec page to allow single-part domains like 'localhost', needed for local-network testing and perhaps some weird intranet setups. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
Before I respond to the recent new ideas, concepts and suggestions. I'd like to explain a few things about the backend (atleast the way it's currently planned to be) The mw_authors table contains unique authors by either a name or a userid. And optionally a custom attribution can be given (fallback to authorname, user real_name or user_name) Also optionally a url can be given (fallback to nothing or userpage). The mw_license table contains the different licenses a wiki allows to be used. Their canonical name (eg. GFDL, CC-BY-SA-3.0 etc.), url to legal code and usage count[1]. mw_file_props is a table that keeps previous versions of file_props as well. And is linked to mw_revision by fp_id in rev_fileprops_id (like mw_text is linked in rev_text_id). Both authors and licenses are uniquely identified by their id. This makes it easy to change stuff later on in an AuthorManager (eg. different url, username change etc.). The texts and complete titles of the licenses are stored in interface messages (for internationalization). MediaWiki:License-uniq-text could for example contain {{Cc-by-sa-3.0|attribution=$2}} on Wikimedia Commons. - If we store the links in the wikitext (like {{#fileauthor:}} and {{#filelicense:}}, the advantages are basically two things: 1) It has all features of editing and revisioning (better history, edit conflict, diff view, etc.) 2) No need for a revisionized mw_file_props, we can store the current values in mw_page_props Possible down side is that a diff like - {{#fileauthor:2}} {{filelicense:12}} + {{#fileauthor:10}} {{#fileauthor:12}} {{#filelicense: doesn't mean very much. I.m.h.o The solution is not to store the actual names in wikitext so that the diffs are better, but to either not store it in wikitext at all, or customize the behaviour everywhere: * edit form: extract parserfunction calls from wikitext before anything else, and put it in seperate form elements * diff view: get the names of those authors and licenses and somehow include it in the diff view This could be done a bit like AbuseFilter's diff between filter versions (ie. before Line 1, would be Author and License) * saving form: convert back to {{#parserfunction:}} calls and prepending it to wikitext * action=raw: ? * action=render: ? * api-parse: ? right now I think storing it in wikitext and customizing it everywhere like shown above is not worth the trouble and would likely bring it's own troubles. Keeping it seperate from wikitext is more work once but I think it pays off. But again, nothing is final yet. Everything is possible. -- Krinkle [1]: The usage count (mw_license.lic_count) is a bit like edit count (increased/decreased when saving files) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] user email validation ready
It would seem that the bugzilla https://bugzilla.wikimedia.org/show_bug.cgi?id=23710 would fall under that category, and to note that it is still marked as new. Can it be tied to this process? Regards, Andrew Quoting Brion Vibber br...@pobox.com: On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin conrad.ir...@gmail.comwrote: Out of interest, do you know what percentage of emails in the database don't validate under the new scheme? That's actually a wise thing to check -- most fails will probably be legitimately bogus entries, but if we can find any that don't validate but *do* work (eg they've been confirmed as functional) that's info we need to report upstream as well -- the new code is using the specs for HTML 5's client-side form validation, which is starting to go into the latest generation of browsers. In theory the validation rules should be pretty liberal, and you should need to do something very esoteric to not pass. (The old validation regexes from ~2004-2005 got kicked out for failing to deal with things like '+' which turned out to be more common than we thought.) Folks actually already pushed a fix upstream to the whatwg spec page to allow single-part domains like 'localhost', needed for local-network testing and perhaps some weird intranet setups. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l This message was sent using iSage/AuNix webmail http://www.isage.net.au/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] user email validation ready
Brion Vibber wrote: On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin conrad.ir...@gmail.comwrote: Out of interest, do you know what percentage of emails in the database don't validate under the new scheme? That's actually a wise thing to check -- most fails will probably be legitimately bogus entries, but if we can find any that don't validate but *do* work (eg they've been confirmed as functional) that's info we need to report upstream as well -- the new code is using the specs for HTML 5's client-side form validation, which is starting to go into the latest generation of browsers. In theory the validation rules should be pretty liberal, and you should need to do something very esoteric to not pass. (The old validation regexes from ~2004-2005 got kicked out for failing to deal with things like '+' which turned out to be more common than we thought.) Folks actually already pushed a fix upstream to the whatwg spec page to allow single-part domains like 'localhost', needed for local-network testing and perhaps some weird intranet setups. -- brion The original spec had feedback based precisely on enwiki numbers. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/00.html So about 100? Note that there are invalid addresses marked as confirmed in wikipedia. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
Krinkle wrote: Before I respond to the recent new ideas, concepts and suggestions. I'd like to explain a few things about the backend (atleast the way it's currently planned to be) The mw_authors table contains unique authors by either a name or a userid. And optionally a custom attribution can be given (fallback to authorname, user real_name or user_name) Also optionally a url can be given (fallback to nothing or userpage). The mw_license table contains the different licenses a wiki allows to be used. Their canonical name (eg. GFDL, CC-BY-SA-3.0 etc.), url to legal code and usage count[1]. mw_file_props is a table that keeps previous versions of file_props as well. And is linked to mw_revision by fp_id in rev_fileprops_id (like mw_text is linked in rev_text_id). Both authors and licenses are uniquely identified by their id. This makes it easy to change stuff later on in an AuthorManager (eg. different url, username change etc.). The texts and complete titles of the licenses are stored in interface messages (for internationalization). MediaWiki:License-uniq-text could for example contain {{Cc-by-sa-3.0|attribution=$2}} on Wikimedia Commons. - If we store the links in the wikitext (like {{#fileauthor:}} and {{#filelicense:}}, the advantages are basically two things: 1) It has all features of editing and revisioning (better history, edit conflict, diff view, etc.) 2) No need for a revisionized mw_file_props, we can store the current values in mw_page_props Possible down side is that a diff like - {{#fileauthor:2}} {{filelicense:12}} + {{#fileauthor:10}} {{#fileauthor:12}} {{#filelicense: doesn't mean very much. I.m.h.o The solution is not to store the actual names in wikitext so that the diffs are better, but to either not store it in wikitext at all, or customize the behaviour everywhere: Why? Storing the property filelicense: GPL directly in wikitext is not bad. It's also a relief when we want to delete licenses later. Same with Author. Take that as a key into a NS_AUTHOR namespace. Going to Special:LicenseManager/5 in order to change GPL license data is just added complexity over using the short name GPL. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] user email validation ready
On Mon, Jan 24, 2011 at 3:50 PM, Billinghurst billinghu...@gmail.comwrote: It would seem that the bugzilla https://bugzilla.wikimedia.org/show_bug.cgi?id=23710 would fall under that category, and to note that it is still marked as new. Can it be tied to this process? That's an issue about clickable links in the body of outgoing mails generated by the system, and is not related to the format or validation of email addresses. It should be addressed (either by ensuring that links inserted into email are escaped clearly, or that they're arranged nicely in brackets that email clients commonly understand as delimiters, or by supplementing the plaintext emails with HTML emails that can mark their links explicitly) but is an entirely separate issue. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] user email validation ready
On Mon, Jan 24, 2011 at 4:02 PM, Platonides platoni...@gmail.com wrote: The original spec had feedback based precisely on enwiki numbers. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/00.html So about 100? Note that there are invalid addresses marked as confirmed in wikipedia. Ok so from the breakdown at http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.htmlwith 202 email address records that were marked as confirmed, but failed the proposed validation check at the time and couldn't be corrected by stripping whitespace: The breakdown of the 202 is as follows. Reordered into: Now allowed by the current revision of the HTML 5 spec as implemented in User::isValidEmailAddr: Single trailing dot in local part: 40 (prohibited by RFC but plausibly deliverable) Multiple consecutive dots: 20 (prohibited by RFC but plausibly deliverable) Easily correctable by the user removing the extra bits upon being prompted, as doing so would not change the actual delivery: * Single trailing dot in domain part: 100 (prohibited by RFC but plausibly deliverable) Valid address in angle brackets (with other junk around it): 21 (permitted by RFC, kind of, and plausibly deliverable) * Comment: 3 (permitted by RFC and plausibly deliverable) v LINE OF DOOM ---v Clearly wrong in typical context, should indeed be rejected (or changed to @localhost for legit cases): * No @: 9 (unlikely to be deliverable) Not quite sure what's going on but most look like stray chars that would be ignored or else invalid and possibly bogusly marked as confirmed: * Miscellaneous: 9 (one containing [NO]@[SPAM], two with trailing , one in quotes, one with single leading dot in local part, two with single leading comma in local part, one with leading : , one with leading \) So from the August 2009 survey on English Wikipedia, that leaves 18 email addresses out of over 3 million listed as confirmed, of which a few *might* be deliverable addresses that could not be fixed by the user tweaking them during input (ie, they actually rely on those extra chars being there in order to be delivered to the right person). To me it sounds like we're pretty good with this; it wouldn't hurt to make sure that existing addresses that are stored funny (eg with extra whitespace or trailing dots on the domain name) continue to work as long as they've been previously. Also wouldn't hurt to do a current survey, and to include some other language sites. Of interest -- gmail's validation rules were also posted in that thread: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022268.html -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l