Re: [Wikitech-l] Changes to the new installer

2010-07-21 Thread Jeroen De Dauw
Hey,

Unless the installer needs to be ready within a week for 1.17 I don't see
any issues.

I want to make structural changes, not add new features. The sooner these
are made, the less overall work my GSoC project will be.

As I'll be doing all the work on these changes, and am not skipping any
other work on the new installer to do so, the progress on the new installer
should not be impacted negatively.

Cheers

--
Jeroen De Dauw
* http://blog.bn2vs.com
* http://wiki.bn2vs.com
Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69
66 65!
--


On 21 July 2010 04:40, Tim Starling tstarl...@wikimedia.org wrote:

 On 20/07/10 19:28, Jeroen De Dauw wrote:
  Hey,
 
  Basically splitting core-specific stuff from general installer
 functionality
  (so the general stuff can also be used for extensions). Also making
 initial
  steps towards filesystem upgrades possible.
 
  The point of this mail is not discussing what I want to do though, but
  rather avoiding commit conflicts, as I don't know which people are
 working
  on the code right now, and who has uncommitted changes.

 There's still quite a lot of work to do to get the new installer ready
 for 1.17. I think we should focus on that, and avoid expanding the
 scope of the project until we've reached that milestone.

 There are the issues discussed here:

 http://www.mediawiki.org/wiki/New-installer_issues

 and more will become apparent as more testing is done.

 If the new installer is not ready to replace the old installer when it
 comes time to branch 1.17, I will move it out of trunk, back to a
 development branch. Hopefully that won't be necessary.

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Jeroen De Dauw
Hey,

Is someone planning on doing this? If not, who can do it? The sooner it's
there, the better.

Cheers

--
Jeroen De Dauw
* http://blog.bn2vs.com
* http://wiki.bn2vs.com
Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69
66 65!
--


On 20 July 2010 17:20, Max Semenik maxsem.w...@gmail.com wrote:

 On 20.07.2010, 17:20 Chad wrote:

  On Tue, Jul 20, 2010 at 9:16 AM, Jeroen De Dauw jeroended...@gmail.com
 wrote:
  Hey,
 
  About the semantic extensions:
 
  It would actually be nice if they did not get marked deferred at all,
 and be
  reviewed by people that are familiar with them to some extend. I'm
 willing
  to do that for all commits not made by myself. Assuming this would not
  interfere to much with WMF code review of course :)
 
  Cheers
 

  If someone's going to start doing code review, that's fine. They've
  just all been getting deferred because nobody's been reviewing
  them so far.

 We could create a separate review queue for it.


 --
 Best regards,
  Max Semenik ([[User:MaxSem]])

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Jeroen De Dauw
Hey,

I'm also fine either way. So if no separate queue is set up, I'd appropriate
it if the semantic* commits where not marked as deferred from now on.

Cheers

--
Jeroen De Dauw
* http://blog.bn2vs.com
* http://wiki.bn2vs.com
Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69
66 65!
--


On 21 July 2010 14:47, Chad innocentkil...@gmail.com wrote:

 On Wed, Jul 21, 2010 at 8:38 AM, Roan Kattouw roan.katt...@gmail.com
 wrote:
  2010/7/21 Chad innocentkil...@gmail.com:
  I'm also not sure how Code Review will handle a repository handling a
  subset of another
  repository. I'm pretty sure things will be ok, I only imagine it would
  just duplicate data
  (revs for SMW stuff would be imported for both repos). Still should be
  tested first though.
  Then we would need someone with repoadmin rights to set this up, I
  believe Brion or
  Tim can.
 
  Why would you want to do this? With the path search feature, it's
  extremely easy to pull up a list of revs touching a certain extension.
  I really don't see why the SMW review queue has to be separate from
  the main MW review queue on a technical level; of course it would be
  on a personal level, in that different people review different things,
  but we have that already for e.g. UsabilityInitiative. In practical
  terms, people who are familiar with the SMW codebase would start
  reviewing SMW revisions through our existing CodeReview setup, and the
  only thing we would have to do on a technical level is make sure those
  paths don't get auto-deferred.
 
  Roan Kattouw (Catrope)
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 

 I agree with you here. They were just suggesting another route.
 Honestly, I don't really care either way :) The fix in r69675 is
 generally useful though, if repositories were segemented in that
 manner.

 -Chad

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Jeroen De Dauw
Hey,

The 'semantic extensions' include Validator and Maps, as they are the base
for Semantic Maps, so these should also not get deferred.

Cheers

--
Jeroen De Dauw
* http://blog.bn2vs.com
* http://wiki.bn2vs.com
Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69
66 65!
--


On 21 July 2010 14:54, Jeroen De Dauw jeroended...@gmail.com wrote:

 Hey,

 I'm also fine either way. So if no separate queue is set up, I'd
 appropriate it if the semantic* commits where not marked as deferred from
 now on.

 Cheers

 --
 Jeroen De Dauw
 * http://blog.bn2vs.com
 * http://wiki.bn2vs.com
 Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69
 66 65!
 --

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Upload file size limit

2010-07-21 Thread Platonides
Tim Starling wrote:
 The problem is just that increasing the limits in our main Squid and
 Apache pool would create DoS vulnerabilities, including the prospect
 of accidental DoS. We could offer this service via another domain
 name, with a specially-configured webserver, and a higher level of
 access control compared to ordinary upload to avoid DoS, but there is
 no support for that in MediaWiki.
 
 We could theoretically allow uploads of several gigabytes this way,
 which is about as large as we want files to be anyway. People with
 flaky internet connections would hit the problem of the lack of
 resuming, but it would work for some.
 
 -- Tim Starling

I don't think it wouldn't be a problem for MediaWiki if we wanted to go
this route. There could be eg. http://upload.en.wikipedia.org/ which
redirected all wiki pages but Special:Upload to http://en.wikipedia.org/

The normal Special:Upload would need a redirect there, for accesses
not going via $wgUploadNagivationUrl, but that's a couple of lines.

Having the normal apaches handle uploads instead of a dedicated pool has
some issues, including the DoS you mention, filled /tmp/s, needing write
access to storage via nfs...


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Changes to the new installer

2010-07-21 Thread Platonides
Tim Starling wrote:
 There's still quite a lot of work to do to get the new installer ready
 for 1.17. I think we should focus on that, and avoid expanding the
 scope of the project until we've reached that milestone.
 
 There are the issues discussed here:
 
 http://www.mediawiki.org/wiki/New-installer_issues
 
 and more will become apparent as more testing is done.
 
 If the new installer is not ready to replace the old installer when it
 comes time to branch 1.17, I will move it out of trunk, back to a
 development branch. Hopefully that won't be necessary.
 
 -- Tim Starling

We should probably ship both installers in 1.17. I wouldn't be surprised
if it some odd configurations in the wild made it not work.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Take me back too hip

2010-07-21 Thread Aryeh Gregor
On Tue, Jul 20, 2010 at 11:03 PM, James Salsman jsals...@gmail.com wrote:
 May I suggest, Use legacy interface or Abandon new interface?

Or just get rid of it entirely.  At this point, it's been the default
skin for some time, and almost anyone who wants to switch back will
have done so.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Aryeh Gregor
It strikes me that a better solution is to fix whatever tools we're
using to determine what still needs to be reviewed.  If someone is
checking all revisions marked as new and needs to mark things they
won't review as deferred to get them off the list, maybe they should
instead be checking all revisions marked as new from particular
paths.  Then explicit deferral will not be necessary, and projects
like SMW can go ahead and use Code Review at their own pace without
annoying anyone else.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Upload file size limit

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 12:31 AM, Neil Kandalgaonkar
ne...@wikimedia.org wrote:
 Here's a demo which implements an EXIF reader for JPEGs in Javascript,
 which reads the file as a stream of bytes.

   http://demos.hacks.mozilla.org/openweb/FileAPI/

 So, as you can see, we do have a form of BLOB access.

But only by reading the whole file into memory, right?  That doesn't
adequately address the use-case we're discussing in this thread
(uploading files  100 MB in chunks).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Upload file size limit

2010-07-21 Thread Mark A. Hershberger
Michael Dale md...@wikimedia.org writes:

 * Modern html5 browsers are starting to be able to natively split files 
 up into chunks and do separate 1 meg xhr posts. Firefogg extension does 
 something similar with extension javascript.

Could you point me to the specs that the html5 browsers are using?
Would it be possible to just make Firefogg mimic this same protocol for
pre-html5 Firefox?

 * We should really get the chunk uploading reviewed and deployed. Tim 
 expressed some concerns with the chunk uploading protocol which we 
 addressed client side, but I don't he had time to follow up with 
 proposed changes that we made for server api.

If you can point me to Tim's proposed server-side changes, I'll have a
look.

Mark.

-- 
http://hexmode.com/

Embrace Ignorance.  Just don't get too attached.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Upload file size limit

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 11:19 AM, Mark A. Hershberger m...@everybody.org 
wrote:
 Could you point me to the specs that the html5 browsers are using?
 Would it be possible to just make Firefogg mimic this same protocol for
 pre-html5 Firefox?

The relevant spec is here:

http://www.w3.org/TR/FileAPI/

Firefox 3.6 doesn't implement it exactly, since it was changed after
Firefox's implementation, but the changes should mostly be compatible
(as I understand it).  But it's not good enough for large files, since
it has to read them into memory.

But anyway, what's the point in telling people to install an extension
if we can just tell them to upgrade Firefox?  Something like
two-thirds of our Firefox users are already on 3.6:

http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Roan Kattouw
2010/7/21 Aryeh Gregor simetrical+wikil...@gmail.com:
 It strikes me that a better solution is to fix whatever tools we're
 using to determine what still needs to be reviewed.  If someone is
 checking all revisions marked as new and needs to mark things they
 won't review as deferred to get them off the list, maybe they should
 instead be checking all revisions marked as new from particular
 paths.  Then explicit deferral will not be necessary, and projects
 like SMW can go ahead and use Code Review at their own pace without
 annoying anyone else.

As far as I know, this is exactly what happens in reality.

As I discussed with a few others at Wikimania, it'd be nice to take
this one step further and allow multiple people to sign off on a
revision, possibly with various types of sign-off, like:
* I read the diff and it looks good
* I tested this and seems to work
* I reviewed the niche part of this rev that I'm an expert on
* I am Tim Starling and I approve this message^Hrevision
* ...

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Upload file size limit

2010-07-21 Thread Michael Dale
On 07/20/2010 10:24 PM, Tim Starling wrote:
 The problem is just that increasing the limits in our main Squid and
 Apache pool would create DoS vulnerabilities, including the prospect
 of accidental DoS. We could offer this service via another domain
 name, with a specially-configured webserver, and a higher level of
 access control compared to ordinary upload to avoid DoS, but there is
 no support for that in MediaWiki.

 We could theoretically allow uploads of several gigabytes this way,
 which is about as large as we want files to be anyway. People with
 flaky internet connections would hit the problem of the lack of
 resuming, but it would work for some.

yes in theory we could do that ... or we could support some simple chunk 
uploading protocol for which there is *already* basic support written, 
and will be supported in native js over time.

The firefogg protocol is almost identical to the plupload protocol. The 
main difference is firefogg requests a unique upload parameter / url 
back from the server so that if you uploaded identical named files they 
would not mangle the chunking. From a quick look at upload.php of 
plupload it appears plupload relies on the filename and a extra chunk 
url parameter != 0 request parameter. The other difference is firefogg 
has an explicit done = 1 in the request parameter to signify the end of 
chunks.

We requested feedback for adding a chunk id to the firefogg chunk 
protocol with each posted chunk to gard againt cases where the outer 
caches report an error but the backend got the file anyway. This way the 
backend can check the chunk index and not append the same chunk twice 
even if their are errors at other levels of the server response that 
cause the client to resend the same chunk.

Either way, if Tim says that plupload chunk protocol is superior then 
why discuss it? We can easily shift the chunks api to that and *move 
forward* with supporting larger file uploads. Is that at all agreeable?

peace,
--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 12:05 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 As far as I know, this is exactly what happens in reality.

Then why do we need auto-deferral?  Just let the things we don't care
about stay new forever.

 As I discussed with a few others at Wikimania, it'd be nice to take
 this one step further and allow multiple people to sign off on a
 revision, possibly with various types of sign-off, like:
 * I read the diff and it looks good
 * I tested this and seems to work
 * I reviewed the niche part of this rev that I'm an expert on
 * I am Tim Starling and I approve this message^Hrevision
 * ...

I think this is a good idea.  For simplicity, I'd keep it to one
level, at least at first.  The understanding should be that you should
mark it reviewed if you're confident it's correct, and if obvious
errors crop up later, it means people will informally give less weight
to your review.  Whether you tested it or just reviewed the diff
should be up to you -- whatever you think it needs.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Take me back too hip

2010-07-21 Thread Andre Engels
On Wed, Jul 21, 2010 at 5:30 PM, Mike.lifeguard
mike.lifegu...@gmail.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 37-01--10 03:59 PM, Aryeh Gregor wrote:
 Or just get rid of it entirely.  At this point, it's been the default
 skin for some time, and almost anyone who wants to switch back will
 have done so.

 Are all wikis migrated? Maybe it has been the default for enwiki for a
 while, but I'm not sure about most of our other wikis.

The bigger wikis switched some time ago, although later than en:, but
the smaller ones are still on MonoBook standard.

-- 
André Engels, andreeng...@gmail.com

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Salutations From a Computer Science Student

2010-07-21 Thread David Breneisen
Ahoy there,

My name is David Breneisen.  I was referred here by James Alexander.  I'm a
Comp. Sci. student at George Washington
University and have had an interest in open education web development for
the last few years.  I thought that I might be able to offer technical
services for the Wikiversity development/maintenance while getting some
experience working on larger, real, projects.

I also hope to see if it is possible to do a more formal summer internship
after this school year with Wikimedia, and thought it
would be nice to get used to the overall manner in which Wikimedia
design/development goes.

Regards,
David Breneisen
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Salutations From a Computer Science Student

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 1:32 PM, David Breneisen d...@gwmail.gwu.edu wrote:
 My name is David Breneisen.  I was referred here by James Alexander.  I'm a
 Comp. Sci. student at George Washington

 University and have had an interest in open education web development for
 the last few years.  I thought that I might be able to offer technical
 services for the Wikiversity development/maintenance while getting some
 experience working on larger, real, projects.


 I also hope to see if it is possible to do a more formal summer internship
 after this school year with Wikimedia, and thought it

 would be nice to get used to the overall manner in which Wikimedia
 design/development goes.

This list, and irc://irc.freenode.net/mediawiki, are good places to
lurk and get to know people.  The source code for MediaWiki proper can
be obtained with:

svn co http://svn.wikimedia.org/mediawiki/trunk/phase3

You can look for bugs to fix, and submit patches, at
https://bugzilla.wikimedia.org/.  More info is at
http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker.  If
you have any questions, here or IRC is the best place to ask.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Upload file size limit

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 2:05 PM, Aryeh Gregor
simetrical+wikil...@gmail.com wrote:
 This is the right place to bring it up:

 http://lists.w3.org/Archives/Public/public-webapps/

 I think the right API change would be to just allow slicing a Blob up
 into other Blobs by byte range.  It should be simple to both spec and
 implement.  But it might have been discussed before, so best to look
 in the archives first.

Aha, I finally found it.  It's in the spec already:

http://dev.w3.org/2006/webapi/FileAPI/#dfn-slice

So once you have a File object, you should be able to call
file.slice(pos, 1024*1024) to get a Blob object that's 1024*1024 bytes
long starting at pos.  Of course, this surely won't be reliably
available in all browsers for several years yet, so best not to pin
our hopes on it.  Chrome apparently implements some or all of the File
API in version 6, but I can't figure out if it includes this part.
Firefox doesn't yet according to MDC.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Architectural revisions to improve category sorting

2010-07-21 Thread Aryeh Gregor
I'm going to begin working on the following bugs:

* Support collation by a certain locale (sorting order of
characters), https://bugzilla.wikimedia.org/show_bug.cgi?id=164 (only
parts related to category sorting)
* Subcategory paging is not separate from article or image paging,
https://bugzilla.wikimedia.org/show_bug.cgi?id=1211
* CategoryTree is inefficient,
https://bugzilla.wikimedia.org/show_bug.cgi?id=23682

As well as possibly:

* Categories need to be structured by namespace,
https://bugzilla.wikimedia.org/show_bug.cgi?id=450
* Natural number sorting in category listings,
https://bugzilla.wikimedia.org/show_bug.cgi?id=6948

There are essentially two problems here:

1) We currently sort articles on category pages by the Unicode code
point of their sort key.  This is terrible for anything other than
English, and dodgy sometimes even for English.  (This is bugs 164 and
6948.)

2) We have no way to efficiently get all items that are in a category
and also in a particular namespace.  Particularly, we can't retrieve
all subcategories without scanning all items in the category, which is
inefficient when we have a few (or no) subcategories and tons of
items.  (This is bugs 1211, 23682, and 450.)

One part of (2) needs to be clarified.  The primary use-case is
obviously that we want to be able to count subcategories efficiently,
or display all of them when we only display some of the items in the
category: this is bugs 1211 and 23682.  Secondarily, we have a request
at bug 450 to organize category pages by namespace, so main, Talk:,
User:, etc. are all paginated separately.

I think the goal for (2) should be to allow efficient separate
retrieval of subcategories, files, and other pages, but not to
distinguish between namespaces otherwise.  The major motivation is
that to do this efficiently, we'll need to add namespace info to the
categorylinks table, and we want this to stay consistent with the info
in the page table.  Categories, files, and other types of pages cannot
be moved to one another, as far as I know (it would hardly make
sense), so it automatically stays consistent this way.  This is a big
plus, because there are inevitably bugs that cause denormalized data
to fall out of sync (look at cat_pages).

Furthermore, I don't think it's obvious that we want separate
namespaces to display separately at all on category pages.  What's a
case where that would be desired?  It would break up the display a
lot, with a bunch of separate headers for different namespaces, when
each namespace might only have a few items.  Most categories whose
sort appearance you'd care about (i.e., excepting maintenance
categories) will have nearly everything in one namespace anyway.  You
could always split the category into separate ones per namespace if
you want them separate.

So I propose that we keep the current category/normal page/file split,
and paginate those three parts of the page separately.  So you'd have
up to 200 subcategories, then below that up to 200 normal pages, then
below that up to 200 files.  (The numbers could be adjusted.
Currently they're hardcoded, which is stupid.)  Paginating
subcategories separately is obviously needed.  Paginating files
separately is not really needed, but it would be much more consistent.

The overall solution, then, would be:

1) Change the way category sortkeys are generated.  Start them with a
letter depending on namespace, like 'C' for category, 'P' for regular
page, 'F' for file.  After that first letter, append a sortkey
generated by ICU or whatever.  I think Tim has opinions on what would
be a good choice to convert the article title into sort key -- if not,
I'll have to research it and hopefully not come up with a completely
incorrect answer.

2) On category pages, maintain three offsets and do three queries (or
maybe UNION them together, doesn't matter), one for each of
categories/regular pages/files.  Because of (1), this will be
efficient and will also sort less unreasonably for non-English
languages.

One problem that was pointed out somewhere in the massive useless
discussion on bug 164 is that we'd have to do something to display the
first letter for each section.  Currently it's just the first letter
of the sortkey, but if that's some binary string, that becomes a
problem.  I'm not seeing an obvious solution, since the
sortkey-generation algorithm will be opaque to us.  If it sorts Á the
same as A, then how do we figure out that the canonical first letter
for the section should be A and not Á?  How do we even figure out
where the sections begin or end?  Would that even make sense in all
cases?  At a first pass, I'd say we should just skip the first letter
and display all the items straight from beginning to end without
section divisions.  I don't think that's a big problem.

This is just my initial thoughts.  Feedback appreciated.  If people
agree with the general approach, I can start coding this up tomorrow.

___

Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Platonides
Aryeh Gregor wrote:
 As I discussed with a few others at Wikimania, it'd be nice to take
 this one step further and allow multiple people to sign off on a
 revision, possibly with various types of sign-off, like:
 * I read the diff and it looks good
 * I tested this and seems to work
 * I reviewed the niche part of this rev that I'm an expert on
 * I am Tim Starling and I approve this message^Hrevision
 * ...
 
 I think this is a good idea.  For simplicity, I'd keep it to one
 level, at least at first.  The understanding should be that you should
 mark it reviewed if you're confident it's correct, and if obvious
 errors crop up later, it means people will informally give less weight
 to your review.  Whether you tested it or just reviewed the diff
 should be up to you -- whatever you think it needs.

It's not the same. You can have different standards. I see revisions
that are apparently good, but marking as ok is This revision is
right in my book, which in many cases would need actually testing it,
checking spec, and so that I (lazily) don't do. So it keeps as new
instead of as lightly reviewed.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CodeReview auto-deferral regexes

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 5:12 PM, Platonides platoni...@gmail.com wrote:
 It's not the same. You can have different standards. I see revisions
 that are apparently good, but marking as ok is This revision is
 right in my book, which in many cases would need actually testing it,
 checking spec, and so that I (lazily) don't do. So it keeps as new
 instead of as lightly reviewed.

Is it useful to know that something is lightly reviewed?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Architectural revisions to improve category sorting

2010-07-21 Thread Conrad Irwin
On 21 July 2010 14:49, Roan Kattouw roan.katt...@gmail.com wrote:
 2010/7/21 Aryeh Gregor simetrical+wikil...@gmail.com:

 Note that different languages will want different orders. For
 instance, German generally sorts ä as ae, ö as oe and ü as ue, whereas
 the Swedish sort å, ä and ö at the end of the alphabet (so they
 actually say A, B, C, ... Z, Å, Ä, Ö and use the phrase from A to
 Ö). These collation schemes obviously conflict in their handling of ä
 and ö, and I'm sure there's crazier stuff out there.

 This could be solved by having a different collation scheme for each
 content language (these have to be standardized *somewhere*, right?)
 and using {{DEFAULTSORT:}} for those rare cases where you have an
 article about a German person on a non-German wiki and want it to sort
 the German way.

For Wiktionary, every language is included in one wiki (and even on
one page) - it would be phenomenal to be able to select the collation
per category. As per-page or per-wiki will not help very much at all.


 2) On category pages, maintain three offsets and do three queries (or
 maybe UNION them together, doesn't matter),
 In my personal opinion, UNION makes zero sense because you'd have to
 pull the data apart again after querying it, as you're displaying it
 separately as well. Separate queries are much cleaner in this case.

 One problem that was pointed out somewhere in the massive useless
 discussion on bug 164 is that we'd have to do something to display the
 first letter for each section.  Currently it's just the first letter
 of the sortkey, but if that's some binary string, that becomes a
 problem.  I'm not seeing an obvious solution, since the
 sortkey-generation algorithm will be opaque to us.  If it sorts Á the
 same as A, then how do we figure out that the canonical first letter
 for the section should be A and not Á?  How do we even figure out
 where the sections begin or end?  Would that even make sense in all
 cases?  At a first pass, I'd say we should just skip the first letter
 and display all the items straight from beginning to end without
 section divisions.  I don't think that's a big problem.

 I agree that the first-letter thing is a nice-to-have, but I'm more
 worried about the general problem that sortkeys won't be
 human-readable strings anymore (the API currently displays them and,
 obviously, uses them for paging) nor possible to decode into
 human-readable strings (because the encoding essentially loses
 information when e.g. a and á are folded). It would be nice if we
 could store the original, unmunged sortkey in the categorylinks table,
 although I realize that would eat space for display and debugging
 purposes only.

There is no way to go from the sort-key to the first letter  and
indeed, you can't even put the first letter at the start of the sort
key, as you need to sort the sections differently per language. The
solution I use for generating the indices on Wiktionary is to store
the first letter explicitly (either of the page or the user-provided
sort key before they are fed into ICU). This would (in the future)
allow topical categories, but that's juts a distraction for now.

Conrad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Architectural revisions to improve category sorting

2010-07-21 Thread Aryeh Gregor
On Wed, Jul 21, 2010 at 5:49 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 This is true for categories but not for files:
 http://www.mediawiki.org/w/index.php?title=Special:Logdir=prevoffset=20091202100459limit=2type=moveuser=Catrope

Blech.  Does this make any sense?  Can we change it?  It would simply
this considerably.

 Note that different languages will want different orders. For
 instance, German generally sorts ä as ae, ö as oe and ü as ue, whereas
 the Swedish sort å, ä and ö at the end of the alphabet (so they
 actually say A, B, C, ... Z, Å, Ä, Ö and use the phrase from A to
 Ö). These collation schemes obviously conflict in their handling of ä
 and ö, and I'm sure there's crazier stuff out there.

 This could be solved by having a different collation scheme for each
 content language (these have to be standardized *somewhere*, right?)
 and using {{DEFAULTSORT:}} for those rare cases where you have an
 article about a German person on a non-German wiki and want it to sort
 the German way.

Yes, of course.  I'm assuming that the magical sortkey-generator I'm
plugging into here is locale-specific.

 In my personal opinion, UNION makes zero sense because you'd have to
 pull the data apart again after querying it, as you're displaying it
 separately as well. Separate queries are much cleaner in this case.

It's pretty simple to do either way.  Makes no big difference.

 I agree that the first-letter thing is a nice-to-have, but I'm more
 worried about the general problem that sortkeys won't be
 human-readable strings anymore (the API currently displays them and,
 obviously, uses them for paging) nor possible to decode into
 human-readable strings (because the encoding essentially loses
 information when e.g. a and á are folded). It would be nice if we
 could store the original, unmunged sortkey in the categorylinks table,
 although I realize that would eat space for display and debugging
 purposes only.

This would also require altering the table.  Why is it necessary?  For
paging, we can just use cl_from to stick in the URL, and retrieve
cl_sortkey based on that and cl_to.  That will make it be short and
not look horribly ugly.  When do we ever need a human-readable form of
the sortkey, as opposed to a human-readable form of the title?  API
users should keep working when this happens with no special code
changes on server or client, just they'll have horribly long and ugly
URLs with encoded binary.  Sortkeys are often weird and not suitable
for display to humans anyway, like when * is used.

I'm not seeing this as worth adding a fourth field to categorylinks,
which is a huge table already.

On Wed, Jul 21, 2010 at 6:04 PM, Conrad Irwin conrad.ir...@gmail.com wrote:
 For Wiktionary, every language is included in one wiki (and even on
 one page) - it would be phenomenal to be able to select the collation
 per category. As per-page or per-wiki will not help very much at all.

Why won't per-page help?  I'm not understanding clearly here.  I don't
think it would be too much trouble to add per-page and per-category
parser functions to set the language used for sort keys, though.

 There is no way to go from the sort-key to the first letter  and
 indeed, you can't even put the first letter at the start of the sort
 key, as you need to sort the sections differently per language. The
 solution I use for generating the indices on Wiktionary is to store
 the first letter explicitly (either of the page or the user-provided
 sort key before they are fed into ICU). This would (in the future)
 allow topical categories, but that's juts a distraction for now.

But different articles that are sorted as though they started with the
same letter might not actually start with the same letter, so how do
we figure out which first letter is the correct one?  This is a
problem even if you're just dealing with accented letters -- I have no
idea how this stuff works (or doesn't work) for CJK or whatnot.
(Judging by these:

http://ja.wikipedia.org/wiki/Category:%E5%AD%98%E5%91%BD%E4%BA%BA%E7%89%A9
http://zh.wikipedia.org/wiki/Category:%E5%9C%A8%E4%B8%96%E4%BA%BA%E7%89%A9
http://zh-yue.wikipedia.org/wiki/Category:%E5%9C%A8%E4%B8%96%E4%BA%BA%E7%89%A9

the strategy is just to manually force sortkeys to begin with
something like A or あ.  Cantonese doesn't do this, and it ends up
with one article per letter in many cases.)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Architectural revisions to improve category sorting

2010-07-21 Thread Daniel Kinzler
Aryeh Gregor schrieb:
 * Categories need to be structured by namespace,
 https://bugzilla.wikimedia.org/show_bug.cgi?id=450
 * Natural number sorting in category listings,
 https://bugzilla.wikimedia.org/show_bug.cgi?id=6948

While we definitly need efficient retrieval by namespace, the default sort key
should *not* include the namespace prefix. it's very annoying that all files get
sorted under F currently, or that pages from the Wikipedia namespace all end
up under W.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Architectural revisions to improve category sorting

2010-07-21 Thread Roan Kattouw
2010/7/22 Aryeh Gregor simetrical+wikil...@gmail.com:
 On Wed, Jul 21, 2010 at 6:18 PM, Daniel Kinzler dan...@brightbyte.de wrote:
 While we definitly need efficient retrieval by namespace, the default sort 
 key
 should *not* include the namespace prefix. it's very annoying that all files 
 get
 sorted under F currently, or that pages from the Wikipedia namespace all 
 end
 up under W.

 That's totally orthogonal and is like a one-line change.  Probably you
 just have to change getPrefixedDBkey() to getDBkey() somewhere.

$wgCategoryPrefixedDefaultSortkey currently defaults to true, we could
make that default to false instead.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l