Huh???
www.englishfreeroam.co.cc On 17 Jan 2011, at 17:41, wikitech-l-requ...@lists.wikimedia.org wrote: > Send Wikitech-l mailing list submissions to > wikitech-l@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > or, via email, send a message with subject or body 'help' to > wikitech-l-requ...@lists.wikimedia.org > > You can reach the person managing the list at > wikitech-l-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikitech-l digest..." > > > Today's Topics: > > 1. Category sorting and first letters (Tim Starling) > 2. Re: From page history to sentence history (Bryan Tong Minh) > 3. Re: From page history to sentence history (Alex Brollo) > 4. WMDE Developer Meetup moved to May (Daniel Kinzler) > 5. Re: WYSIFTW status (Aryeh Gregor) > 6. Re: [Toolserver-l] WMDE Developer Meetup moved to May > (Daniel Kinzler) > 7. Re: June 8th 2011, World IPv6 Day (Aryeh Gregor) > 8. Re: WMDE Developer Meetup moved to May (Chad) > 9. Re: From page history to sentence history (Aryeh Gregor) > 10. Re: From page history to sentence history (Anthony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 18 Jan 2011 02:00:09 +1100 > From: Tim Starling <tstarl...@wikimedia.org> > Subject: [Wikitech-l] Category sorting and first letters > To: wikitech-l@lists.wikimedia.org > Message-ID: <ih1lhs$pmn$1...@dough.gmane.org> > Content-Type: text/plain; charset=UTF-8 > > In r80443 I added a feature allowing categories to be sorted using the > Unicode Collation Algorithm (UCA). I wanted to briefly talk about the > potential user impact, the design choices and the caveats. > > Sorting was the easy part. The hard part was providing a "first > letter" concept which would be reasonably sane. The idea I came up > with was to compile a list of first letters, themselves sorted using > the UCA. Then the "first letter" of a given string is the nearest > letter in the list which sorts above the string. > > For instance if you have letters A, B, C, and a string Aardvark, if > you sort them you get: > > A > Aardvark > B > C > > So we know that A is the first letter of Aardvark because Aardvark > sorts immediately below A. This algorithm gives us a number of nice > properties: > > * It automatically drops accents, since accented letters sort the same > as unaccented letters (at the primary level). Same with case > differences, hiragana/katakana, etc. > > * You can work out the initial Jamo of a Hangul syllable character by > just omitting the composed syllables from the "first letter" list. > Previously this was done with a special-case hack in > Language::firstChar(). > > * Vowel reordering in Thai and Lao is automatically supported. > So "??" sorts under heading "?" and "??" sorts under heading "?". > > * The collation can be expanded to support all sorts of other crazy > features, and the first letter feature will keep working in a sane > way. For instance, you could have an English collation which removed > "the" from the start of a title. > > I compiled a list of 14,742 suitable header characters, identified by > processing various Unicode data files. That list probably still needs > lots of tweaks. > > There is a down side to this scheme. The default UCA table gives all > characters with a similar logical function to the digits 0-9 the same > primary sort order as the corresponding ASCII digits. So a page like > [[????]] on the Bihari Wikipedia will sort under a heading of "1" > instead of "?". There may be other instances of accidental cultural > imperialism. However, this can be fixed by compiling > language-dependent lists of header characters. > > The UCA default table is not meant to sort any language correctly, > it's just a compromise collation. Support for language-specific > collations can easily be added. Whether we get language-specific > collations or not, I'd like to think about enabling this feature on > Wikimedia. > > The most glaring omission from the UCA default tables is sensible > sorting of the unified Han. > > In a Chinese context, there's an obvious way to sort characters, and > that's by their order in the KangXi dictionary. The Unihan database > gives such an ordering, and it's used within code blocks. But it's not > used between code blocks. So if you sort by code point, all the Han > characters that aren't in the U+4E00 to U+9FFF block will sort > incorrectly. That's what the default UCA does, with a few minor > exceptions. > > In a Japanese context, the way to sort ideographic characters is to > convert them to phonetic hiragana and then to sort the resulting > string. I don't know if there is any free software for doing this. On > the Japanese Wikipedia, they achieve the same result by manually > setting the sort key of every page to be the hiragana version of the > title. > > There's lots of room here for other people to get involved, especially > if you know a language other than English. > > -- Tim Starling > > > > > ------------------------------ > > Message: 2 > Date: Mon, 17 Jan 2011 16:29:58 +0100 > From: Bryan Tong Minh <bryan.tongm...@gmail.com> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Message-ID: > <AANLkTi=w=6we2xngmmnikuffmth8krtivzxrsibju...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Mon, Jan 17, 2011 at 3:49 PM, Anthony <wikim...@inbox.org> wrote: >> How would you define a particular sentence, paragraph or section of an >> article? ?The difficulty of the solution lies in answering that >> question. >> > > Difficult, but doable. Jan-Paul's sentence-level editing tool is able > to make the distinction. It would perhaps be possible to use that as a > framework for sentence-level diffs. > > > Bryan > > > > ------------------------------ > > Message: 3 > Date: Mon, 17 Jan 2011 16:40:28 +0100 > From: Alex Brollo <alex.bro...@gmail.com> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Message-ID: > <AANLkTi=whaz1d5ty9hbkdd-7lkfsd_fy0vtevjxad...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > 2011/1/17 Bryan Tong Minh <bryan.tongm...@gmail.com> > >> >> Difficult, but doable. Jan-Paul's sentence-level editing tool is able >> to make the distinction. It would perhaps be possible to use that as a >> framework for sentence-level diffs. >> > > Difficult, but diff between versions of a page does it. Looking at diff > between pages, I simply thought firmly that only diff paragraphs were > stored, so that the page was built as updated diff segments. I had no idea > how this could be done, but all was "magic"! > > Alex > > > ------------------------------ > > Message: 4 > Date: Mon, 17 Jan 2011 17:11:12 +0100 > From: Daniel Kinzler <dan...@brightbyte.de> > Subject: [Wikitech-l] WMDE Developer Meetup moved to May > To: wikitech-l@lists.wikimedia.org, toolserve...@lists.wikimedia.org, > MediaWiki announcements and site admin list > <mediawik...@lists.wikimedia.org> > Cc: Nicole Ebber <nicole.eb...@wikimedia.de>, Pavel Richter > <pavel.rich...@wikimedia.de> > Message-ID: <4d346a20....@brightbyte.de> > Content-Type: text/plain; charset=UTF-8 > > Hi all > > after some discussion, Wikimedia Germany decided not to hold a developer's > meet-up around the Chapter's conference in March. We just couldn't fit this in > nicely with the venue and the overall organization. Don't despair though: > > This is what we will do instead: > > * There will be a hackathon hosted by Wikimedia Germany in (late) May, > probably > in Berlin, but that's not decided yet. This will mostly about hacking, with a > strong focus on GLAM related stuff. There will be little in terms of > presentations. > > * There will be the hacking days attached to Wikimania in Haifa, August 3./4. > I'm in charge of setting up the program for that, and I'll try to make it a > nice > mix of discussing technology and actually hacking. I would also like to have a > get-together with thechies and chapter folks at some point during Wikimania. > > I hope that this way, we can give the hacking events the attention they > deserve. > Let me know what you think. > > -- daniel > > > > ------------------------------ > > Message: 5 > Date: Mon, 17 Jan 2011 11:31:27 -0500 > From: Aryeh Gregor <simetrical+wikil...@gmail.com> > Subject: Re: [Wikitech-l] WYSIFTW status > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Message-ID: > <aanlktikudzhxbhndkehewsuqhvcqbz2vestkm7xoz...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sun, Jan 16, 2011 at 7:16 PM, Magnus Manske > <magnusman...@googlemail.com> wrote: >> There is the question of what browsers/versions to test for. Should I >> invest large amounts of time optimising performance in Firefox 3, when >> FF4 will probably be released before WYSIFTW, and everyone and their >> cousin upgrades? > > Design for only the fastest browsers. Other browsers could always > just be dropped back to the old-fashioned editor. > > > > ------------------------------ > > Message: 6 > Date: Mon, 17 Jan 2011 17:39:31 +0100 > From: Daniel Kinzler <dan...@brightbyte.de> > Subject: Re: [Wikitech-l] [Toolserver-l] WMDE Developer Meetup moved > to May > To: toolserve...@lists.wikimedia.org > Cc: MediaWiki announcements and site admin list > <mediawik...@lists.wikimedia.org>, wikitech-l@lists.wikimedia.org, > Asaf Bartov <asaf.bar...@gmail.com>, Pavel Richter > <pavel.rich...@wikimedia.de>, Nicole Ebber <nicole.eb...@wikimedia.de> > Message-ID: <4d3470c3.4040...@brightbyte.de> > Content-Type: text/plain; charset=ISO-8859-1 > > On 17.01.2011 17:14, Asaf Bartov wrote: >> Correction: Haifa Hacking Days are to be held August 2nd-3rd. >> Wikimania itself will be Aug 4th-6th. > > Gah! Thanks Asaf. > > There I went and looked it up, and then wrote the wrong thing into the email. > Curses. > > -- daniel > > > > ------------------------------ > > Message: 7 > Date: Mon, 17 Jan 2011 11:44:28 -0500 > From: Aryeh Gregor <simetrical+wikil...@gmail.com> > Subject: Re: [Wikitech-l] June 8th 2011, World IPv6 Day > To: Happy-melon <happy-me...@live.com>, Wikimedia developers > <wikitech-l@lists.wikimedia.org> > Message-ID: > <AANLkTikk20OAKv-vreinxD-oBmfnzLbo97=xroqeb...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sun, Jan 16, 2011 at 7:12 PM, Happy-melon <happy-me...@live.com> wrote: >> I don't entirely understand the point of this. ?The plan seems to be """get >> a large enough fraction of 'the internet' to make a change which breaks for >> some people all at the same time, so that those people get angry with the >> ISPs that haven't got off their arses to fix said breakage, rather than >> angry with the broken sites""", which is fair enough. > > No, the point is to test what happens if IPv6 is supported on a large > scale. It's known from small-scale testing that this will break > things for some small percentage of users, but no one's sure what the > consequences are of switching this on fully for everyone. > >> But AFAICT, the >> breakage won't occur if your connection can't 'do' IPv6, but only if your >> connection can't 'do' both IPv4 *and* IPv6 on the same site at the same >> time. ?Surely that's not actually the problem that we need to solve if we're >> to be able to migrate smoothly onto IPv6? ?When the IPv4 addresses run out, >> we need to be able to start setting up websites which are *only* v6, surely? > > There are many more clients in the world than servers, and servers > have always been able to get dedicated IPv4 addresses much more easily > than clients. A server Internet connection in America will typically > come with as many IPv4 addresses as you need, while you usually can't > get a dedicated residential IP address unless you pay extra. (And > America has more IP addresses allocated per capita than anywhere else > in the world, since it originally developed the Internet.) > > So as IPv4 addresses become scarcer, the pressure to use IPv6 only > will fall mostly on residential users. Clients with only an IPv6 > address will only be able to get direct connections to IPv6-enabled > servers. The way servers are supposed to do this is serve both A and > AAAA records for the same domain, so IPv4 clients use the A record and > IPv6 clients use the AAAA record. > > Unfortunately, someone at some point decided that if the client > supports both IPv4 and IPv6, and the server publishes both A and AAAA > records, the client should connect via IPv6. In practice, almost no > sites use IPv6, so the infrastructure is much less well-tested. > Clients that think they have IPv6 connections might actually have the > connection eaten by a middlebox, or just be slower or less reliable. > So sites don't turn on the AAAA records in practice because it > degrades service for clients with IPv6 connections, which means the > servers aren't accessible to IPv6-only clients without workarounds. > > IPv6 day is an attempt to see what happens if major sites publish AAAA > records for a while. Stuff will break, but hopefully not too > horribly, and it will give both site operators and ISPs the chance to > analyze what's wrong with their IPv6 support and what they can do to > fix it. This is a step toward major sites publishing AAAA records all > the time, which is necessary to support IPv6-only clients. > > Something like that, anyway. I'm hardly an expert on these things. > > > > ------------------------------ > > Message: 8 > Date: Mon, 17 Jan 2011 11:45:33 -0500 > From: Chad <innocentkil...@gmail.com> > Subject: Re: [Wikitech-l] WMDE Developer Meetup moved to May > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Cc: toolserve...@lists.wikimedia.org, MediaWiki announcements and site > admin list <mediawik...@lists.wikimedia.org> > Message-ID: > <AANLkTim3Q5CS20O=crvo0a2z7nnbqftrhauffgvbq...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Jan 17, 2011 at 11:11 AM, Daniel Kinzler <dan...@brightbyte.de> wrote: >> * There will be a hackathon hosted by Wikimedia Germany in (late) May, >> probably >> in Berlin, but that's not decided yet. This will mostly about hacking, with a >> strong focus on GLAM related stuff. There will be little in terms of >> presentations. >> > > Late May? That's actually *really* awesome. Now I don't have > to miss school to come :D > > -Chad > > > > ------------------------------ > > Message: 9 > Date: Mon, 17 Jan 2011 11:47:35 -0500 > From: Aryeh Gregor <simetrical+wikil...@gmail.com> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Message-ID: > <AANLkTinBdUX_v4d0gvxzm=bf_le+1aqrmmjhk8xsv...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Jan 17, 2011 at 5:55 AM, Alex Brollo <alex.bro...@gmail.com> wrote: >> Before I dig a little more into wiki mysteries, I was absolutely sure that >> wiki articles were stored into small pieces (paragraphs?) so that a small >> edit into a long long page would take exactly the same disk space than a >> small edit into a short page. But I discovered soon, that things are >> different. :-) > > Wikimedia stores diffs using delta compression, so actually this is > basically what happens. The size of the edit is what determines the > size of the stored diff, not the size of the page. (I don't know how > this works in detail, though.) IIRC, default MediaWiki doesn't work > this way. > > > > ------------------------------ > > Message: 10 > Date: Mon, 17 Jan 2011 12:41:22 -0500 > From: Anthony <wikim...@inbox.org> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Message-ID: > <aanlktinfd+peoawn1t4xyzaecwpo1_nexm0eodglj...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Mon, Jan 17, 2011 at 10:40 AM, Alex Brollo <alex.bro...@gmail.com> wrote: >> 2011/1/17 Bryan Tong Minh <bryan.tongm...@gmail.com> >> >>> >>> Difficult, but doable. Jan-Paul's sentence-level editing tool is able >>> to make the distinction. It would perhaps be possible to use that as a >>> framework for sentence-level diffs. >>> >> >> Difficult, but diff between versions of a page does it. Looking at diff >> between pages, I simply thought firmly that only diff paragraphs were >> stored, so that the page was built as updated diff segments. I had no idea >> how this could be done, but ?all was "magic"! > > Paragraphs are much easier to recognize than sentences, as wikitext > has a paragraph delimiter - a blank line. To truly recognize > sentences, you basically have to engage in natural language > processing, though you can probably get it right 90% of the time > without too much effort. > > And to recognize what's going on when a sentence changes *and* is > moved from one paragraph to another, requires an even greater level of > natural language understanding. Again though, you can probably get it > right most of the time without too much effort. > > Wikitext actually makes it easier for the most part, as you can use > tricks such as the fact that the periods in [[I.M. Someone]] don't > represent sentence delimiters, since they are contained in square > brackets. But not all periods which occur in the middle of a sentence > are contained in square brackets, and not all sentences end with a > period. > > I'd say "difficult but doable" is quite accurate, although with the > caveat that even the state of the art tools available today are > probably going to make mistakes that would be obvious to a human. I'm > sure there are tools for this, and there are probably some decent ones > that are open source. But it's not as simple as just adding an index. > > > > ------------------------------ > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > End of Wikitech-l Digest, Vol 90, Issue 33 > ****************************************** _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l