Re: [Wikitech-l] Changes to the new installer
Hey, Unless the installer needs to be ready within a week for 1.17 I don't see any issues. I want to make structural changes, not add new features. The sooner these are made, the less overall work my GSoC project will be. As I'll be doing all the work on these changes, and am not skipping any other work on the new installer to do so, the progress on the new installer should not be impacted negatively. Cheers -- Jeroen De Dauw * http://blog.bn2vs.com * http://wiki.bn2vs.com Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69 66 65! -- On 21 July 2010 04:40, Tim Starling tstarl...@wikimedia.org wrote: On 20/07/10 19:28, Jeroen De Dauw wrote: Hey, Basically splitting core-specific stuff from general installer functionality (so the general stuff can also be used for extensions). Also making initial steps towards filesystem upgrades possible. The point of this mail is not discussing what I want to do though, but rather avoiding commit conflicts, as I don't know which people are working on the code right now, and who has uncommitted changes. There's still quite a lot of work to do to get the new installer ready for 1.17. I think we should focus on that, and avoid expanding the scope of the project until we've reached that milestone. There are the issues discussed here: http://www.mediawiki.org/wiki/New-installer_issues and more will become apparent as more testing is done. If the new installer is not ready to replace the old installer when it comes time to branch 1.17, I will move it out of trunk, back to a development branch. Hopefully that won't be necessary. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
Hey, Is someone planning on doing this? If not, who can do it? The sooner it's there, the better. Cheers -- Jeroen De Dauw * http://blog.bn2vs.com * http://wiki.bn2vs.com Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69 66 65! -- On 20 July 2010 17:20, Max Semenik maxsem.w...@gmail.com wrote: On 20.07.2010, 17:20 Chad wrote: On Tue, Jul 20, 2010 at 9:16 AM, Jeroen De Dauw jeroended...@gmail.com wrote: Hey, About the semantic extensions: It would actually be nice if they did not get marked deferred at all, and be reviewed by people that are familiar with them to some extend. I'm willing to do that for all commits not made by myself. Assuming this would not interfere to much with WMF code review of course :) Cheers If someone's going to start doing code review, that's fine. They've just all been getting deferred because nobody's been reviewing them so far. We could create a separate review queue for it. -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
Hey, I'm also fine either way. So if no separate queue is set up, I'd appropriate it if the semantic* commits where not marked as deferred from now on. Cheers -- Jeroen De Dauw * http://blog.bn2vs.com * http://wiki.bn2vs.com Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69 66 65! -- On 21 July 2010 14:47, Chad innocentkil...@gmail.com wrote: On Wed, Jul 21, 2010 at 8:38 AM, Roan Kattouw roan.katt...@gmail.com wrote: 2010/7/21 Chad innocentkil...@gmail.com: I'm also not sure how Code Review will handle a repository handling a subset of another repository. I'm pretty sure things will be ok, I only imagine it would just duplicate data (revs for SMW stuff would be imported for both repos). Still should be tested first though. Then we would need someone with repoadmin rights to set this up, I believe Brion or Tim can. Why would you want to do this? With the path search feature, it's extremely easy to pull up a list of revs touching a certain extension. I really don't see why the SMW review queue has to be separate from the main MW review queue on a technical level; of course it would be on a personal level, in that different people review different things, but we have that already for e.g. UsabilityInitiative. In practical terms, people who are familiar with the SMW codebase would start reviewing SMW revisions through our existing CodeReview setup, and the only thing we would have to do on a technical level is make sure those paths don't get auto-deferred. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l I agree with you here. They were just suggesting another route. Honestly, I don't really care either way :) The fix in r69675 is generally useful though, if repositories were segemented in that manner. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
Hey, The 'semantic extensions' include Validator and Maps, as they are the base for Semantic Maps, so these should also not get deferred. Cheers -- Jeroen De Dauw * http://blog.bn2vs.com * http://wiki.bn2vs.com Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69 66 65! -- On 21 July 2010 14:54, Jeroen De Dauw jeroended...@gmail.com wrote: Hey, I'm also fine either way. So if no separate queue is set up, I'd appropriate it if the semantic* commits where not marked as deferred from now on. Cheers -- Jeroen De Dauw * http://blog.bn2vs.com * http://wiki.bn2vs.com Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69 66 65! -- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
Tim Starling wrote: The problem is just that increasing the limits in our main Squid and Apache pool would create DoS vulnerabilities, including the prospect of accidental DoS. We could offer this service via another domain name, with a specially-configured webserver, and a higher level of access control compared to ordinary upload to avoid DoS, but there is no support for that in MediaWiki. We could theoretically allow uploads of several gigabytes this way, which is about as large as we want files to be anyway. People with flaky internet connections would hit the problem of the lack of resuming, but it would work for some. -- Tim Starling I don't think it wouldn't be a problem for MediaWiki if we wanted to go this route. There could be eg. http://upload.en.wikipedia.org/ which redirected all wiki pages but Special:Upload to http://en.wikipedia.org/ The normal Special:Upload would need a redirect there, for accesses not going via $wgUploadNagivationUrl, but that's a couple of lines. Having the normal apaches handle uploads instead of a dedicated pool has some issues, including the DoS you mention, filled /tmp/s, needing write access to storage via nfs... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Changes to the new installer
Tim Starling wrote: There's still quite a lot of work to do to get the new installer ready for 1.17. I think we should focus on that, and avoid expanding the scope of the project until we've reached that milestone. There are the issues discussed here: http://www.mediawiki.org/wiki/New-installer_issues and more will become apparent as more testing is done. If the new installer is not ready to replace the old installer when it comes time to branch 1.17, I will move it out of trunk, back to a development branch. Hopefully that won't be necessary. -- Tim Starling We should probably ship both installers in 1.17. I wouldn't be surprised if it some odd configurations in the wild made it not work. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Take me back too hip
On Tue, Jul 20, 2010 at 11:03 PM, James Salsman jsals...@gmail.com wrote: May I suggest, Use legacy interface or Abandon new interface? Or just get rid of it entirely. At this point, it's been the default skin for some time, and almost anyone who wants to switch back will have done so. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
It strikes me that a better solution is to fix whatever tools we're using to determine what still needs to be reviewed. If someone is checking all revisions marked as new and needs to mark things they won't review as deferred to get them off the list, maybe they should instead be checking all revisions marked as new from particular paths. Then explicit deferral will not be necessary, and projects like SMW can go ahead and use Code Review at their own pace without annoying anyone else. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
On Wed, Jul 21, 2010 at 12:31 AM, Neil Kandalgaonkar ne...@wikimedia.org wrote: Here's a demo which implements an EXIF reader for JPEGs in Javascript, which reads the file as a stream of bytes. http://demos.hacks.mozilla.org/openweb/FileAPI/ So, as you can see, we do have a form of BLOB access. But only by reading the whole file into memory, right? That doesn't adequately address the use-case we're discussing in this thread (uploading files 100 MB in chunks). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
Michael Dale md...@wikimedia.org writes: * Modern html5 browsers are starting to be able to natively split files up into chunks and do separate 1 meg xhr posts. Firefogg extension does something similar with extension javascript. Could you point me to the specs that the html5 browsers are using? Would it be possible to just make Firefogg mimic this same protocol for pre-html5 Firefox? * We should really get the chunk uploading reviewed and deployed. Tim expressed some concerns with the chunk uploading protocol which we addressed client side, but I don't he had time to follow up with proposed changes that we made for server api. If you can point me to Tim's proposed server-side changes, I'll have a look. Mark. -- http://hexmode.com/ Embrace Ignorance. Just don't get too attached. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
On Wed, Jul 21, 2010 at 11:19 AM, Mark A. Hershberger m...@everybody.org wrote: Could you point me to the specs that the html5 browsers are using? Would it be possible to just make Firefogg mimic this same protocol for pre-html5 Firefox? The relevant spec is here: http://www.w3.org/TR/FileAPI/ Firefox 3.6 doesn't implement it exactly, since it was changed after Firefox's implementation, but the changes should mostly be compatible (as I understand it). But it's not good enough for large files, since it has to read them into memory. But anyway, what's the point in telling people to install an extension if we can just tell them to upgrade Firefox? Something like two-thirds of our Firefox users are already on 3.6: http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
2010/7/21 Aryeh Gregor simetrical+wikil...@gmail.com: It strikes me that a better solution is to fix whatever tools we're using to determine what still needs to be reviewed. If someone is checking all revisions marked as new and needs to mark things they won't review as deferred to get them off the list, maybe they should instead be checking all revisions marked as new from particular paths. Then explicit deferral will not be necessary, and projects like SMW can go ahead and use Code Review at their own pace without annoying anyone else. As far as I know, this is exactly what happens in reality. As I discussed with a few others at Wikimania, it'd be nice to take this one step further and allow multiple people to sign off on a revision, possibly with various types of sign-off, like: * I read the diff and it looks good * I tested this and seems to work * I reviewed the niche part of this rev that I'm an expert on * I am Tim Starling and I approve this message^Hrevision * ... Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
On 07/20/2010 10:24 PM, Tim Starling wrote: The problem is just that increasing the limits in our main Squid and Apache pool would create DoS vulnerabilities, including the prospect of accidental DoS. We could offer this service via another domain name, with a specially-configured webserver, and a higher level of access control compared to ordinary upload to avoid DoS, but there is no support for that in MediaWiki. We could theoretically allow uploads of several gigabytes this way, which is about as large as we want files to be anyway. People with flaky internet connections would hit the problem of the lack of resuming, but it would work for some. yes in theory we could do that ... or we could support some simple chunk uploading protocol for which there is *already* basic support written, and will be supported in native js over time. The firefogg protocol is almost identical to the plupload protocol. The main difference is firefogg requests a unique upload parameter / url back from the server so that if you uploaded identical named files they would not mangle the chunking. From a quick look at upload.php of plupload it appears plupload relies on the filename and a extra chunk url parameter != 0 request parameter. The other difference is firefogg has an explicit done = 1 in the request parameter to signify the end of chunks. We requested feedback for adding a chunk id to the firefogg chunk protocol with each posted chunk to gard againt cases where the outer caches report an error but the backend got the file anyway. This way the backend can check the chunk index and not append the same chunk twice even if their are errors at other levels of the server response that cause the client to resend the same chunk. Either way, if Tim says that plupload chunk protocol is superior then why discuss it? We can easily shift the chunks api to that and *move forward* with supporting larger file uploads. Is that at all agreeable? peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
On Wed, Jul 21, 2010 at 12:05 PM, Roan Kattouw roan.katt...@gmail.com wrote: As far as I know, this is exactly what happens in reality. Then why do we need auto-deferral? Just let the things we don't care about stay new forever. As I discussed with a few others at Wikimania, it'd be nice to take this one step further and allow multiple people to sign off on a revision, possibly with various types of sign-off, like: * I read the diff and it looks good * I tested this and seems to work * I reviewed the niche part of this rev that I'm an expert on * I am Tim Starling and I approve this message^Hrevision * ... I think this is a good idea. For simplicity, I'd keep it to one level, at least at first. The understanding should be that you should mark it reviewed if you're confident it's correct, and if obvious errors crop up later, it means people will informally give less weight to your review. Whether you tested it or just reviewed the diff should be up to you -- whatever you think it needs. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Take me back too hip
On Wed, Jul 21, 2010 at 5:30 PM, Mike.lifeguard mike.lifegu...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 37-01--10 03:59 PM, Aryeh Gregor wrote: Or just get rid of it entirely. At this point, it's been the default skin for some time, and almost anyone who wants to switch back will have done so. Are all wikis migrated? Maybe it has been the default for enwiki for a while, but I'm not sure about most of our other wikis. The bigger wikis switched some time ago, although later than en:, but the smaller ones are still on MonoBook standard. -- André Engels, andreeng...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Salutations From a Computer Science Student
Ahoy there, My name is David Breneisen. I was referred here by James Alexander. I'm a Comp. Sci. student at George Washington University and have had an interest in open education web development for the last few years. I thought that I might be able to offer technical services for the Wikiversity development/maintenance while getting some experience working on larger, real, projects. I also hope to see if it is possible to do a more formal summer internship after this school year with Wikimedia, and thought it would be nice to get used to the overall manner in which Wikimedia design/development goes. Regards, David Breneisen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Salutations From a Computer Science Student
On Wed, Jul 21, 2010 at 1:32 PM, David Breneisen d...@gwmail.gwu.edu wrote: My name is David Breneisen. I was referred here by James Alexander. I'm a Comp. Sci. student at George Washington University and have had an interest in open education web development for the last few years. I thought that I might be able to offer technical services for the Wikiversity development/maintenance while getting some experience working on larger, real, projects. I also hope to see if it is possible to do a more formal summer internship after this school year with Wikimedia, and thought it would be nice to get used to the overall manner in which Wikimedia design/development goes. This list, and irc://irc.freenode.net/mediawiki, are good places to lurk and get to know people. The source code for MediaWiki proper can be obtained with: svn co http://svn.wikimedia.org/mediawiki/trunk/phase3 You can look for bugs to fix, and submit patches, at https://bugzilla.wikimedia.org/. More info is at http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker. If you have any questions, here or IRC is the best place to ask. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
On Wed, Jul 21, 2010 at 2:05 PM, Aryeh Gregor simetrical+wikil...@gmail.com wrote: This is the right place to bring it up: http://lists.w3.org/Archives/Public/public-webapps/ I think the right API change would be to just allow slicing a Blob up into other Blobs by byte range. It should be simple to both spec and implement. But it might have been discussed before, so best to look in the archives first. Aha, I finally found it. It's in the spec already: http://dev.w3.org/2006/webapi/FileAPI/#dfn-slice So once you have a File object, you should be able to call file.slice(pos, 1024*1024) to get a Blob object that's 1024*1024 bytes long starting at pos. Of course, this surely won't be reliably available in all browsers for several years yet, so best not to pin our hopes on it. Chrome apparently implements some or all of the File API in version 6, but I can't figure out if it includes this part. Firefox doesn't yet according to MDC. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Architectural revisions to improve category sorting
I'm going to begin working on the following bugs: * Support collation by a certain locale (sorting order of characters), https://bugzilla.wikimedia.org/show_bug.cgi?id=164 (only parts related to category sorting) * Subcategory paging is not separate from article or image paging, https://bugzilla.wikimedia.org/show_bug.cgi?id=1211 * CategoryTree is inefficient, https://bugzilla.wikimedia.org/show_bug.cgi?id=23682 As well as possibly: * Categories need to be structured by namespace, https://bugzilla.wikimedia.org/show_bug.cgi?id=450 * Natural number sorting in category listings, https://bugzilla.wikimedia.org/show_bug.cgi?id=6948 There are essentially two problems here: 1) We currently sort articles on category pages by the Unicode code point of their sort key. This is terrible for anything other than English, and dodgy sometimes even for English. (This is bugs 164 and 6948.) 2) We have no way to efficiently get all items that are in a category and also in a particular namespace. Particularly, we can't retrieve all subcategories without scanning all items in the category, which is inefficient when we have a few (or no) subcategories and tons of items. (This is bugs 1211, 23682, and 450.) One part of (2) needs to be clarified. The primary use-case is obviously that we want to be able to count subcategories efficiently, or display all of them when we only display some of the items in the category: this is bugs 1211 and 23682. Secondarily, we have a request at bug 450 to organize category pages by namespace, so main, Talk:, User:, etc. are all paginated separately. I think the goal for (2) should be to allow efficient separate retrieval of subcategories, files, and other pages, but not to distinguish between namespaces otherwise. The major motivation is that to do this efficiently, we'll need to add namespace info to the categorylinks table, and we want this to stay consistent with the info in the page table. Categories, files, and other types of pages cannot be moved to one another, as far as I know (it would hardly make sense), so it automatically stays consistent this way. This is a big plus, because there are inevitably bugs that cause denormalized data to fall out of sync (look at cat_pages). Furthermore, I don't think it's obvious that we want separate namespaces to display separately at all on category pages. What's a case where that would be desired? It would break up the display a lot, with a bunch of separate headers for different namespaces, when each namespace might only have a few items. Most categories whose sort appearance you'd care about (i.e., excepting maintenance categories) will have nearly everything in one namespace anyway. You could always split the category into separate ones per namespace if you want them separate. So I propose that we keep the current category/normal page/file split, and paginate those three parts of the page separately. So you'd have up to 200 subcategories, then below that up to 200 normal pages, then below that up to 200 files. (The numbers could be adjusted. Currently they're hardcoded, which is stupid.) Paginating subcategories separately is obviously needed. Paginating files separately is not really needed, but it would be much more consistent. The overall solution, then, would be: 1) Change the way category sortkeys are generated. Start them with a letter depending on namespace, like 'C' for category, 'P' for regular page, 'F' for file. After that first letter, append a sortkey generated by ICU or whatever. I think Tim has opinions on what would be a good choice to convert the article title into sort key -- if not, I'll have to research it and hopefully not come up with a completely incorrect answer. 2) On category pages, maintain three offsets and do three queries (or maybe UNION them together, doesn't matter), one for each of categories/regular pages/files. Because of (1), this will be efficient and will also sort less unreasonably for non-English languages. One problem that was pointed out somewhere in the massive useless discussion on bug 164 is that we'd have to do something to display the first letter for each section. Currently it's just the first letter of the sortkey, but if that's some binary string, that becomes a problem. I'm not seeing an obvious solution, since the sortkey-generation algorithm will be opaque to us. If it sorts Á the same as A, then how do we figure out that the canonical first letter for the section should be A and not Á? How do we even figure out where the sections begin or end? Would that even make sense in all cases? At a first pass, I'd say we should just skip the first letter and display all the items straight from beginning to end without section divisions. I don't think that's a big problem. This is just my initial thoughts. Feedback appreciated. If people agree with the general approach, I can start coding this up tomorrow. ___
Re: [Wikitech-l] CodeReview auto-deferral regexes
Aryeh Gregor wrote: As I discussed with a few others at Wikimania, it'd be nice to take this one step further and allow multiple people to sign off on a revision, possibly with various types of sign-off, like: * I read the diff and it looks good * I tested this and seems to work * I reviewed the niche part of this rev that I'm an expert on * I am Tim Starling and I approve this message^Hrevision * ... I think this is a good idea. For simplicity, I'd keep it to one level, at least at first. The understanding should be that you should mark it reviewed if you're confident it's correct, and if obvious errors crop up later, it means people will informally give less weight to your review. Whether you tested it or just reviewed the diff should be up to you -- whatever you think it needs. It's not the same. You can have different standards. I see revisions that are apparently good, but marking as ok is This revision is right in my book, which in many cases would need actually testing it, checking spec, and so that I (lazily) don't do. So it keeps as new instead of as lightly reviewed. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CodeReview auto-deferral regexes
On Wed, Jul 21, 2010 at 5:12 PM, Platonides platoni...@gmail.com wrote: It's not the same. You can have different standards. I see revisions that are apparently good, but marking as ok is This revision is right in my book, which in many cases would need actually testing it, checking spec, and so that I (lazily) don't do. So it keeps as new instead of as lightly reviewed. Is it useful to know that something is lightly reviewed? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architectural revisions to improve category sorting
On 21 July 2010 14:49, Roan Kattouw roan.katt...@gmail.com wrote: 2010/7/21 Aryeh Gregor simetrical+wikil...@gmail.com: Note that different languages will want different orders. For instance, German generally sorts ä as ae, ö as oe and ü as ue, whereas the Swedish sort å, ä and ö at the end of the alphabet (so they actually say A, B, C, ... Z, Å, Ä, Ö and use the phrase from A to Ö). These collation schemes obviously conflict in their handling of ä and ö, and I'm sure there's crazier stuff out there. This could be solved by having a different collation scheme for each content language (these have to be standardized *somewhere*, right?) and using {{DEFAULTSORT:}} for those rare cases where you have an article about a German person on a non-German wiki and want it to sort the German way. For Wiktionary, every language is included in one wiki (and even on one page) - it would be phenomenal to be able to select the collation per category. As per-page or per-wiki will not help very much at all. 2) On category pages, maintain three offsets and do three queries (or maybe UNION them together, doesn't matter), In my personal opinion, UNION makes zero sense because you'd have to pull the data apart again after querying it, as you're displaying it separately as well. Separate queries are much cleaner in this case. One problem that was pointed out somewhere in the massive useless discussion on bug 164 is that we'd have to do something to display the first letter for each section. Currently it's just the first letter of the sortkey, but if that's some binary string, that becomes a problem. I'm not seeing an obvious solution, since the sortkey-generation algorithm will be opaque to us. If it sorts Á the same as A, then how do we figure out that the canonical first letter for the section should be A and not Á? How do we even figure out where the sections begin or end? Would that even make sense in all cases? At a first pass, I'd say we should just skip the first letter and display all the items straight from beginning to end without section divisions. I don't think that's a big problem. I agree that the first-letter thing is a nice-to-have, but I'm more worried about the general problem that sortkeys won't be human-readable strings anymore (the API currently displays them and, obviously, uses them for paging) nor possible to decode into human-readable strings (because the encoding essentially loses information when e.g. a and á are folded). It would be nice if we could store the original, unmunged sortkey in the categorylinks table, although I realize that would eat space for display and debugging purposes only. There is no way to go from the sort-key to the first letter and indeed, you can't even put the first letter at the start of the sort key, as you need to sort the sections differently per language. The solution I use for generating the indices on Wiktionary is to store the first letter explicitly (either of the page or the user-provided sort key before they are fed into ICU). This would (in the future) allow topical categories, but that's juts a distraction for now. Conrad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architectural revisions to improve category sorting
On Wed, Jul 21, 2010 at 5:49 PM, Roan Kattouw roan.katt...@gmail.com wrote: This is true for categories but not for files: http://www.mediawiki.org/w/index.php?title=Special:Logdir=prevoffset=20091202100459limit=2type=moveuser=Catrope Blech. Does this make any sense? Can we change it? It would simply this considerably. Note that different languages will want different orders. For instance, German generally sorts ä as ae, ö as oe and ü as ue, whereas the Swedish sort å, ä and ö at the end of the alphabet (so they actually say A, B, C, ... Z, Å, Ä, Ö and use the phrase from A to Ö). These collation schemes obviously conflict in their handling of ä and ö, and I'm sure there's crazier stuff out there. This could be solved by having a different collation scheme for each content language (these have to be standardized *somewhere*, right?) and using {{DEFAULTSORT:}} for those rare cases where you have an article about a German person on a non-German wiki and want it to sort the German way. Yes, of course. I'm assuming that the magical sortkey-generator I'm plugging into here is locale-specific. In my personal opinion, UNION makes zero sense because you'd have to pull the data apart again after querying it, as you're displaying it separately as well. Separate queries are much cleaner in this case. It's pretty simple to do either way. Makes no big difference. I agree that the first-letter thing is a nice-to-have, but I'm more worried about the general problem that sortkeys won't be human-readable strings anymore (the API currently displays them and, obviously, uses them for paging) nor possible to decode into human-readable strings (because the encoding essentially loses information when e.g. a and á are folded). It would be nice if we could store the original, unmunged sortkey in the categorylinks table, although I realize that would eat space for display and debugging purposes only. This would also require altering the table. Why is it necessary? For paging, we can just use cl_from to stick in the URL, and retrieve cl_sortkey based on that and cl_to. That will make it be short and not look horribly ugly. When do we ever need a human-readable form of the sortkey, as opposed to a human-readable form of the title? API users should keep working when this happens with no special code changes on server or client, just they'll have horribly long and ugly URLs with encoded binary. Sortkeys are often weird and not suitable for display to humans anyway, like when * is used. I'm not seeing this as worth adding a fourth field to categorylinks, which is a huge table already. On Wed, Jul 21, 2010 at 6:04 PM, Conrad Irwin conrad.ir...@gmail.com wrote: For Wiktionary, every language is included in one wiki (and even on one page) - it would be phenomenal to be able to select the collation per category. As per-page or per-wiki will not help very much at all. Why won't per-page help? I'm not understanding clearly here. I don't think it would be too much trouble to add per-page and per-category parser functions to set the language used for sort keys, though. There is no way to go from the sort-key to the first letter and indeed, you can't even put the first letter at the start of the sort key, as you need to sort the sections differently per language. The solution I use for generating the indices on Wiktionary is to store the first letter explicitly (either of the page or the user-provided sort key before they are fed into ICU). This would (in the future) allow topical categories, but that's juts a distraction for now. But different articles that are sorted as though they started with the same letter might not actually start with the same letter, so how do we figure out which first letter is the correct one? This is a problem even if you're just dealing with accented letters -- I have no idea how this stuff works (or doesn't work) for CJK or whatnot. (Judging by these: http://ja.wikipedia.org/wiki/Category:%E5%AD%98%E5%91%BD%E4%BA%BA%E7%89%A9 http://zh.wikipedia.org/wiki/Category:%E5%9C%A8%E4%B8%96%E4%BA%BA%E7%89%A9 http://zh-yue.wikipedia.org/wiki/Category:%E5%9C%A8%E4%B8%96%E4%BA%BA%E7%89%A9 the strategy is just to manually force sortkeys to begin with something like A or あ. Cantonese doesn't do this, and it ends up with one article per letter in many cases.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architectural revisions to improve category sorting
Aryeh Gregor schrieb: * Categories need to be structured by namespace, https://bugzilla.wikimedia.org/show_bug.cgi?id=450 * Natural number sorting in category listings, https://bugzilla.wikimedia.org/show_bug.cgi?id=6948 While we definitly need efficient retrieval by namespace, the default sort key should *not* include the namespace prefix. it's very annoying that all files get sorted under F currently, or that pages from the Wikipedia namespace all end up under W. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architectural revisions to improve category sorting
2010/7/22 Aryeh Gregor simetrical+wikil...@gmail.com: On Wed, Jul 21, 2010 at 6:18 PM, Daniel Kinzler dan...@brightbyte.de wrote: While we definitly need efficient retrieval by namespace, the default sort key should *not* include the namespace prefix. it's very annoying that all files get sorted under F currently, or that pages from the Wikipedia namespace all end up under W. That's totally orthogonal and is like a one-line change. Probably you just have to change getPrefixedDBkey() to getDBkey() somewhere. $wgCategoryPrefixedDefaultSortkey currently defaults to true, we could make that default to false instead. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l