Re: [Wikitech-l] Google Summer of Code 2009
Yeah! I was part of the "mixed luck" from last year, and honestly, I get warm feelings when reading the friendly http://www.mediawiki.org/wiki/Summer_of_Code_2009/Application_template ;) I don't know if anyone here is willing to be a mentor this year, but please go ahead, and try to help a student. The experience is unique for students, as it really motivates them to get involved -- if not in mediawiki -- in an OSS project. I don't know how to put it "nicely", but the key for GSoC to succeed, on the mentor/senior devs side, is just to be *very* available. Easy thing to say, I know, but that would be nice to keep this in mind if we plan to host students this year. It's not about having "xx minutes available a day for my student", it's more of being able to set up in advance regular IRC meetings so that his (her?) questions can be answered in real-time: being stuck on your code when it seems like you won't get your questions answered in a long time particularly sucks; especially when it seems to you that answers are really simple. And it's not only about mentors, but also about having some "awareness" from devs that students are going to hang around on IRC, asking for directions, and also sometimes asking (very) naive questions: let's try not to bite them! =) So yes, let's move, let's get involved into GSoC again! This is really a great project, and I'm really looking forward to seeing new faces around, bringing in new ideas, as naive as they may sound =) 2009/3/11 Brion Vibber : > I’ve just put in Wikimedia’s org application for Google Summer of Code > 2009… Hopefully we’ll get in. :) > > http://www.mediawiki.org/wiki/Summer_of_Code_2009 > ^ Add and update cool project ideas as a starting point for student > applicants! > > We’ve had mixed luck in previous years with GSoC, but I think we’ve got > enough internal bandwidth this year that we can make sure there’s enough > effort put into interacting with the student candidates ahead of time to > pick the coolest and most go-get-em self-starter awesome projects and > then support them through the project term. > > > I’ve also tossed up a student application template if you want to get > started early. :) > > http://www.mediawiki.org/wiki/Summer_of_Code_2009/Application_template > > -- brion > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] how to convert the latin1 SQL dump back into UTF-8?
OK, I found if I use "mysqldump --default-character-set=latin1" I can read all that can be read in the dump. The only difference from plain mysqldump is -/*!40101 SET NAMES utf8 */; +/*!40101 SET NAMES latin1 */; But that doesn't seem to affect restores from the SQL file. I'm sold. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
Yes, but whilst StatusBot and the proposed bot would have comparable edit statistics, the latter would have more of a reason for running than 'to update people's statuses'. It's not just about actions, it's about the justification for those actions. - Chris On Wed, Mar 11, 2009 at 2:11 AM, Soxred93 wrote: > In case you didn't see the whole StatusBot fiasco on enwiki, I used > to run a bot as a replacement to a replacement of [[User:StatusBot]]. > The bot made 50k edis in a few months, and was soon shut down by > Brion. A bot the edits the sandbox every few minutes would no way be > approved. > > On Mar 10, 2009, at 7:54 PM [Mar 10, 2009 ], Thomas Dalton wrote: > > > 2009/3/10 K. Peachey : > >> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde > >> wrote: > >>> Out of curiousity, when a technical problem shuts down all > >>> editing on > >>> a major wiki (as this did) are there any automated alerts? Is it > >>> likely to be noticed and addressed even if no one rushes to IRC? > >>> > >>> I guess I am curious what is the normal delay between problem onset > >>> and problem recognition? > >>> > >>> -Robert Rohde > >> I believe with this issue (Full MySQL table) that there is no easy > >> way > >> to automate the test. > >> maybe you could automatically query it every so often but even then > >> that might not return reliable results. > > > > A bot that edits the sandbox every few minutes would work, would it? > > > > ___ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
What about replag? The bot would puke every time that replication stops. On Mar 10, 2009, at 8:02 PM [Mar 10, 2009 ], Robert Rohde wrote: > On Tue, Mar 10, 2009 at 4:43 PM, K. Peachey > wrote: >> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde >> wrote: >>> Out of curiousity, when a technical problem shuts down all >>> editing on >>> a major wiki (as this did) are there any automated alerts? Is it >>> likely to be noticed and addressed even if no one rushes to IRC? >>> >>> I guess I am curious what is the normal delay between problem onset >>> and problem recognition? >>> >>> -Robert Rohde >> I believe with this issue (Full MySQL table) that there is no easy >> way >> to automate the test. >> maybe you could automatically query it every so often but even then >> that might not return reliable results. > > One could query count(*) from revisions (or some similar artifice, > such as looking at the recent changes feed) and trigger an alert if it > stops increasing. > > Such things are probably totally unnecessary on enwiki, because there > is no shortage of people to complain, but I could image it might be > useful to have such an alert for smaller, non-English speaking wikis. > > -Robert Rohde > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
In case you didn't see the whole StatusBot fiasco on enwiki, I used to run a bot as a replacement to a replacement of [[User:StatusBot]]. The bot made 50k edis in a few months, and was soon shut down by Brion. A bot the edits the sandbox every few minutes would no way be approved. On Mar 10, 2009, at 7:54 PM [Mar 10, 2009 ], Thomas Dalton wrote: > 2009/3/10 K. Peachey : >> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde >> wrote: >>> Out of curiousity, when a technical problem shuts down all >>> editing on >>> a major wiki (as this did) are there any automated alerts? Is it >>> likely to be noticed and addressed even if no one rushes to IRC? >>> >>> I guess I am curious what is the normal delay between problem onset >>> and problem recognition? >>> >>> -Robert Rohde >> I believe with this issue (Full MySQL table) that there is no easy >> way >> to automate the test. >> maybe you could automatically query it every so often but even then >> that might not return reliable results. > > A bot that edits the sandbox every few minutes would work, would it? > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki developer meeting is drawing close
On 3/6/09 5:11 AM, Daniel Kinzler wrote: > The meet-up[1] is drawing close now: between April 3. and 5. we meet at the > c-base[2] in Berlin to discuss MediaWiki development, extensions, toolserver > projects, wiki research, etc. Registration[3] is open until March 20 (required > even if you already pre-registered). I've put in a quick reg mail for Wikimedia's staff contingent. :) We'll have some of the usual suspects -- me, Tim Starling, Mark Bergsma, and Michael Dale (of Metavid/Kaltura Ogg Theora video work fame) -- as well as some of our newer folks: Tomasz Finc who's been doing a lot of behind-the-scenes work with the fundraiser and notice systems and is now working on patching up the data dumps and job queue -- and Arash Boostani and Trevor Parscal who are dev'ing up the Wikipedia Usability Initiative. -- brion > > The schedule[4] is slowly becomming clear now: On Friday, we'll start at noon > with a who-is-who-and-does-what session and in the evening there will be an > opportunity to get to know Berlin a bit. On Saturday we have all day for > presentations and discussions, and in the evening we will have a party > together > with all the folks from the chapter and board meetings. On Sunday there will > be > a wrap-up session and a big lunch for everyone. > > We have also organized affordable accommodation: we have reserved rooms in the > Apartmenthaus am Potsdamer Platz[5]. Staying there is a recommended way of > getting to know your fellow Wikimedians! > > I'm happy that so many of you have shown interest, and I'm sure we'll have a > great time in Berlin! > > Regards, > Daniel > > [1] http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009 > [2] http://en.wikipedia.org/wiki/C-base > [3] http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009/Registration > [4] http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009#Outline > [5] > http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009#Apartmenthaus_am_Potsdamer_Platz > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On 3/10/09 5:29 PM, Aryeh Gregor wrote: > On Tue, Mar 10, 2009 at 7:54 PM, Platonides wrote: >> Is mediawiki table structure going to change? > > Yes, it changes on a regular basis. > >> Moreover, any more private method for sharing the tables (eg. a trigger >> deleting the row when rev_deleted is set) would precisely lose the >> backup ability the toolserver is performing. > > I don't think the toolserver is used for backups. At least I hope > it's not, given its reliability (which is quite good, but "quite good" > is scary for backups). The existence of the replicas on toolserver is one of our backups. Obviously we want to improve our offsite backups to include complete offline snapshots as well. It's in progress. :) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
On 3/10/09 5:02 PM, Robert Rohde wrote: > Such things are probably totally unnecessary on enwiki, because there > is no shortage of people to complain, but I could image it might be > useful to have such an alert for smaller, non-English speaking wikis. *nod* -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
On 3/10/09 4:39 PM, Platonides wrote: > The procedure is "a lot of people enter #wikimedia-tech complaining > about it". > There are automated alerts about servers going down or not having enough > free space on disk, but not for 'saving an edit failed'. That would be > tricky to do. Wouldn't be that tricky, but it's not our highest priority as the human alert system does an excellent job of this already. ;) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
On Tue, Mar 10, 2009 at 7:15 PM, K. Peachey wrote: > Not everyone knows how to use IRC, i would recommend that you > recommend people report to bugzilla with urgent tags instead. Sysadmins do not check Bugzilla constantly. They do check IRC constantly if you say their names in the right channel. Therefore, IRC is the preferred method of reporting urgent problems, and will remain so for the foreseeable future. If some people don't know how to use it, they can either find someone who does; learn quickly; or be content with their alert taking a while longer to reach the right people. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 7:54 PM, Platonides wrote: > Is mediawiki table structure going to change? Yes, it changes on a regular basis. > Moreover, any more private method for sharing the tables (eg. a trigger > deleting the row when rev_deleted is set) would precisely lose the > backup ability the toolserver is performing. I don't think the toolserver is used for backups. At least I hope it's not, given its reliability (which is quite good, but "quite good" is scary for backups). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Google Summer of Code 2009
I’ve just put in Wikimedia’s org application for Google Summer of Code 2009… Hopefully we’ll get in. :) http://www.mediawiki.org/wiki/Summer_of_Code_2009 ^ Add and update cool project ideas as a starting point for student applicants! We’ve had mixed luck in previous years with GSoC, but I think we’ve got enough internal bandwidth this year that we can make sure there’s enough effort put into interacting with the student candidates ahead of time to pick the coolest and most go-get-em self-starter awesome projects and then support them through the project term. I’ve also tossed up a student application template if you want to get started early. :) http://www.mediawiki.org/wiki/Summer_of_Code_2009/Application_template -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
On Tue, Mar 10, 2009 at 4:43 PM, K. Peachey wrote: > On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde wrote: >> Out of curiousity, when a technical problem shuts down all editing on >> a major wiki (as this did) are there any automated alerts? Is it >> likely to be noticed and addressed even if no one rushes to IRC? >> >> I guess I am curious what is the normal delay between problem onset >> and problem recognition? >> >> -Robert Rohde > I believe with this issue (Full MySQL table) that there is no easy way > to automate the test. > maybe you could automatically query it every so often but even then > that might not return reliable results. One could query count(*) from revisions (or some similar artifice, such as looking at the recent changes feed) and trigger an alert if it stops increasing. Such things are probably totally unnecessary on enwiki, because there is no shortage of people to complain, but I could image it might be useful to have such an alert for smaller, non-English speaking wikis. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
2009/3/10 K. Peachey : >> A bot that edits the sandbox every few minutes would work, would it? > Possibly, but i would bump it up to like every two hours. Plus since > the MySQL is spread between multipul systems you would have make sure > it checks the same one all the time. If you're going to wait 2 hours, you might as well just wait for people to start complaining, that will be far quicker. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
> A bot that edits the sandbox every few minutes would work, would it? Possibly, but i would bump it up to like every two hours. Plus since the MySQL is spread between multipul systems you would have make sure it checks the same one all the time. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
River Tarnell wrote: > i think the only non-money issue is that the > Wikimedia Foundation won't allow us to add any more admins until they do some > internal reorganisation of their databases, which we've been waiting for for > several months now. Is mediawiki table structure going to change? RevisionDelete system is not friendly for partial replication, but precisely doing things that way is what [will] allows avoiding the row-copying from revision to archive of the 'old' deletion system. Moreover, any more private method for sharing the tables (eg. a trigger deleting the row when rev_deleted is set) would precisely lose the backup ability the toolserver is performing. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
2009/3/10 K. Peachey : > On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde wrote: >> Out of curiousity, when a technical problem shuts down all editing on >> a major wiki (as this did) are there any automated alerts? Is it >> likely to be noticed and addressed even if no one rushes to IRC? >> >> I guess I am curious what is the normal delay between problem onset >> and problem recognition? >> >> -Robert Rohde > I believe with this issue (Full MySQL table) that there is no easy way > to automate the test. > maybe you could automatically query it every so often but even then > that might not return reliable results. A bot that edits the sandbox every few minutes would work, would it? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde wrote: > Out of curiousity, when a technical problem shuts down all editing on > a major wiki (as this did) are there any automated alerts? Is it > likely to be noticed and addressed even if no one rushes to IRC? > > I guess I am curious what is the normal delay between problem onset > and problem recognition? > > -Robert Rohde I believe with this issue (Full MySQL table) that there is no easy way to automate the test. maybe you could automatically query it every so often but even then that might not return reliable results. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
Robert Rohde wrote: > Out of curiousity, when a technical problem shuts down all editing on > a major wiki (as this did) are there any automated alerts? Is it > likely to be noticed and addressed even if no one rushes to IRC? > > I guess I am curious what is the normal delay between problem onset > and problem recognition? > > -Robert Rohde The procedure is "a lot of people enter #wikimedia-tech complaining about it". There are automated alerts about servers going down or not having enough free space on disk, but not for 'saving an edit failed'. That would be tricky to do. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
Out of curiousity, when a technical problem shuts down all editing on a major wiki (as this did) are there any automated alerts? Is it likely to be noticed and addressed even if no one rushes to IRC? I guess I am curious what is the normal delay between problem onset and problem recognition? -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
> Please report urgent system administration issues to IRC, specifically > #wikimedia-tech on irc.freenode.net. > > -- Tim Starling Not everyone knows how to use IRC, i would recommend that you recommend people report to bugzilla with urgent tags instead. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki developer meeting is drawing close
2009/3/10 Daniel Kinzler : > Roan Kattouw schrieb: >> Daniel Kinzler schreef: >>> The schedule[4] is slowly becomming clear now: On Friday, we'll start at >>> noon >>> with a who-is-who-and-does-what session >> The schedule you're linking to says it starts at 3 PM. Which time is the >> right one? > > Bah, naming times of day in english is awkward :) what do you call 1pm, > "afternnon"? Anyway... 1pm is afternoon, but "at noon" is not "at 3PM"! > So... doors open at noon, schedule starts at 3pm. Satisfied? That makes sense! ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki developer meeting is drawing close
Daniel Kinzler wrote: > So, come early, hang out a bit, the c-base is a cosy place :) > So... doors open at noon, schedule starts at 3pm. Satisfied? Or if you arrive in Berlin already on Thursday April 2, join me for lunch at the Museum of Technology, and we can spend that afternoon admiring the replicas of Konrad Zuse's early computers. I added this as an informal item to the program at http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009 where you can also find links for more reading and maps. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 2:18 PM, Daniel Kinzler wrote: > Robert Rohde schrieb: >> The converse of this is that some recognized experts would probably >> prefer to administer their own server/cluster rather than relying on >> some random guy with Wikimedia DE (or wherever) to get things done. > > An academic institution may also get a serious research grant for this - that > would be more complicated if the money would be handeled via the german > chapter. > Though it's something we are, of course, also interested in. > > Basically, if we could all work on making the toolserver THE ONE PLACE for > working with wikipedia's data, that would be perfect. If, for some reason, it > makes sense to build a separate cluster, I propose to give it a distict > purpose > and profile: let it provide facilities for fulltext research, with low > priority > for the update latency, and high priority of having fulltext in various forms, > with search indexes, word lists, and all the fun. Personally I would favor a physically distinct cluster (regardless of who administers it) more or less with the focus you describe. In particular, I think it is useful to separate "tools" from "analysis". A "tool" aims to provide useful information in near realtime based on specific and focused parameters. By contrast, "analysis" often involves running some process systematically through a very large portion of the data with the expectation that it will take a while (for example, I've used dumps to perform large statistical analyses where the processing code might take 24 hours when run against the full edit history of a large wiki.) "Tools" need high availability and low lag relative to the live site, but "analysis" doesn't care if it gets out of date and should use scheduling etc. to balance large loads. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Robert Rohde schrieb: > On Tue, Mar 10, 2009 at 1:27 PM, River Tarnell > wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> phoebe ayers: >>> River: Well, you say that part of the issue with the toolserver is money and >>> time... and this person that I've been talking to is offering to throw money >>> and time at the problem. So, what can they constructively do? >> i think this is being discussed privately now... > > If other research groups are interested in contributing to this, who > should they be talking to? Wikimedia Germany. That is, I guess, me. Send mail to daniel dot kinzler at wikimedia dot de. I'll forward it as appropriate. >> i don't see why access to the toolserver would be restricted to Wikipedia >> editors. in fact, i'd be happier giving access to a recognised academic >> expert >> than some random guy on Wikipedia. > > The converse of this is that some recognized experts would probably > prefer to administer their own server/cluster rather than relying on > some random guy with Wikimedia DE (or wherever) to get things done. An academic institution may also get a serious research grant for this - that would be more complicated if the money would be handeled via the german chapter. Though it's something we are, of course, also interested in. Basically, if we could all work on making the toolserver THE ONE PLACE for working with wikipedia's data, that would be perfect. If, for some reason, it makes sense to build a separate cluster, I propose to give it a distict purpose and profile: let it provide facilities for fulltext research, with low priority for the update latency, and high priority of having fulltext in various forms, with search indexes, word lists, and all the fun. Regards, Daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Let me know if you have a grant proposal you'd like help with! Andrea On Tue, Mar 10, 2009 at 4:30 PM, River Tarnell wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Andrea Forte: >> To give you an idea, for me personally, the incentive for a new resource is a >> need for a server (perhaps a cluster) to support full-text queries at a >> reasonable speed. > > then why not help us do this on the existing toolserver, so everyone can have > access to it, instead of duplicating it yet again somewhere else? > > there are many toolserver users who would like direct access to text, and the > ability to search it. > > - river. > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (HP-UX) > > iEYEARECAAYFAkm2zgIACgkQIXd7fCuc5vLrvgCgkWY9BizcJCSunzrk+dPdrcJO > U4wAn0kIpQd7NYVBHfKNwR+dTM2rTon6 > =rSHL > -END PGP SIGNATURE- > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 1:27 PM, River Tarnell wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > phoebe ayers: >> River: Well, you say that part of the issue with the toolserver is money and >> time... and this person that I've been talking to is offering to throw money >> and time at the problem. So, what can they constructively do? > > i think this is being discussed privately now... If other research groups are interested in contributing to this, who should they be talking to? > i don't see why access to the toolserver would be restricted to Wikipedia > editors. in fact, i'd be happier giving access to a recognised academic > expert > than some random guy on Wikipedia. The converse of this is that some recognized experts would probably prefer to administer their own server/cluster rather than relying on some random guy with Wikimedia DE (or wherever) to get things done. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki developer meeting is drawing close
Roan Kattouw schrieb: > Daniel Kinzler schreef: >> The schedule[4] is slowly becomming clear now: On Friday, we'll start at noon >> with a who-is-who-and-does-what session > The schedule you're linking to says it starts at 3 PM. Which time is the > right one? Bah, naming times of day in english is awkward :) what do you call 1pm, "afternnon"? Anyway... The schedule on the wiki is the definitive one - or rather, as definite as it gets. It may however still change. So, come early, hang out a bit, the c-base is a cosy place :) So... doors open at noon, schedule starts at 3pm. Satisfied? -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andrea Forte: > To give you an idea, for me personally, the incentive for a new resource is a > need for a server (perhaps a cluster) to support full-text queries at a > reasonable speed. then why not help us do this on the existing toolserver, so everyone can have access to it, instead of duplicating it yet again somewhere else? there are many toolserver users who would like direct access to text, and the ability to search it. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2zgIACgkQIXd7fCuc5vLrvgCgkWY9BizcJCSunzrk+dPdrcJO U4wAn0kIpQd7NYVBHfKNwR+dTM2rTon6 =rSHL -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
I've been trying to do some work mining the full en dump with revision history and was involved in getting together the Syracuse grant proposal. To give you an idea, for me personally, the incentive for a new resource is a need for a server (perhaps a cluster) to support full-text queries at a reasonable speed. People at various research institutions duplicate this effort over and over. Andrea On Tue, Mar 10, 2009 at 2:26 PM, phoebe ayers wrote: > Thanks for the responses, all. > > Daniel and Bilal: the notes about the possible servers at Syracuse and > Concordia are very interesting; it sounds like the researchers > interested in such things should team up. > > Daniel: I am not sure what type of data is needed -- this is not my > project (I'm only the messenger!) but I'll pass along your message and > send you private details (and encourage the researcher to reply > himself). > > River: Well, you say that part of the issue with the toolserver is > money and time... and this person that I've been talking to is > offering to throw money and time at the problem. So, what can they > constructively do? > > All: Like I said, I am unclear on the technical issues involved, but > as for why a separate "research toolserver" might be useful... : > I see a difference in the type of information a researcher might want > to pull (public data, large sets of related page information, > full-text mining, ??) and the types of tools that the current > toolserver mainly supports (editcount tools, catscan, etc). I also see > a difference in how the two groups might be authenticated -- there's a > difference between being a trusted Wikipedian or trusted Wikimedia > developer and being a trusted technically-competent researcher (for > instance, I recognized the affiliation of the person who was trying to > apply, because I've read their research papers; but if you were going > on wikimedia status alone, they don't have any). > > -- Phoebe > > -- > * I use this address for lists; send personal messages to phoebe.ayers > gmail.com * > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 phoebe ayers: > River: Well, you say that part of the issue with the toolserver is money and > time... and this person that I've been talking to is offering to throw money > and time at the problem. So, what can they constructively do? i think this is being discussed privately now... > I see a difference in the type of information a researcher might want to pull > (public data, large sets of related page information, full-text mining, ??) > and the types of tools that the current toolserver mainly supports (editcount > tools, catscan, etc). so, what is missing from the current toolserver that prevents researchers from working with large data sets? > I also see a difference in how the two groups might be authenticated -- > there's a difference between being a trusted Wikipedian or trusted Wikimedia > developer and being a trusted technically-competent researcher i don't see why access to the toolserver would be restricted to Wikipedia editors. in fact, i'd be happier giving access to a recognised academic expert than some random guy on Wikipedia. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2zSQACgkQIXd7fCuc5vKYSACdF2IJwcfhWEarjgDC8FmMSls1 NN0An2jLSu3/mhLCEAsLuoZz0x3DE8mP =ZHMA -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Thanks for the responses, all. Daniel and Bilal: the notes about the possible servers at Syracuse and Concordia are very interesting; it sounds like the researchers interested in such things should team up. Daniel: I am not sure what type of data is needed -- this is not my project (I'm only the messenger!) but I'll pass along your message and send you private details (and encourage the researcher to reply himself). River: Well, you say that part of the issue with the toolserver is money and time... and this person that I've been talking to is offering to throw money and time at the problem. So, what can they constructively do? All: Like I said, I am unclear on the technical issues involved, but as for why a separate "research toolserver" might be useful... : I see a difference in the type of information a researcher might want to pull (public data, large sets of related page information, full-text mining, ??) and the types of tools that the current toolserver mainly supports (editcount tools, catscan, etc). I also see a difference in how the two groups might be authenticated -- there's a difference between being a trusted Wikipedian or trusted Wikimedia developer and being a trusted technically-competent researcher (for instance, I recognized the affiliation of the person who was trying to apply, because I've read their research papers; but if you were going on wikimedia status alone, they don't have any). -- Phoebe -- * I use this address for lists; send personal messages to phoebe.ayers gmail.com * ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Aryeh Gregor: > Oh. Why does a single specific person have to handle the approval of > all toolserver account requests, then? because accounts have to be approved by WM-DE, and WM-DE has designated this person to approve accounts on their behalf. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2deAACgkQIXd7fCuc5vJBLQCeINPPjEA50FjFlphN70J9gnAx 7dkAoJ1WXk0hWFOLj1ZZNbwNG0fBDVok =+dbS -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 phoebe ayers: > Personally, I think a dedicated toolserver is a great idea for the research > community, but I know very little about the technical issues involved and/or > whether this has been proposed before. Please comment, and I can pass on > replies and put the researcher in touch with the tech team if it seems like a > good idea. i don't understand what "research-oriented" toolserver means. what will the research-toolserver provide that the current toolserver doesn't provide? is the only issue the time it takes for accounts to be created? this is a WM-DE issue; the more people who complain to WM-DE about this, the more likely it is to be resolved. (so far, i've had zero communications from WM-DE about how the only people able to approve accounts are so busy with other things nowadays. on the other hand, i didn't ask them about it either; i suppose they don't bother monitoring the toolserver most of the time.) we recently conducted a survey of toolserver users, and account approval (not creation) was generally felt to be quite slow. once i produce a report from the results of that survey, we might be able to get WM-DE to do something about it. most of the issues with the current toolserver come down to money. we don't have enough money to afford redundant databases, so any failure is a major problem and creates inconvenience for users. we don't have enough money for a paid admin, so it often takes a long time for things to get done. we don't have enough money to upgrade hardware when we need it, so things are often slow until the money is available. i think the only non-money issue is that the Wikimedia Foundation won't allow us to add any more admins until they do some internal reorganisation of their databases, which we've been waiting for for several months now. the more separate toolservers we have, the less efficiently the money is spent. sure, every chapter and university could have their own toolserver, but i don't see how that's a better situation than these people contributing to a single toolserver in order to fix the problems that prevent people from using it. i've lost count of how often i've heard "the toolserver sucks; let's start our own". what i don't understand is why no one says "the toolserver sucks; how can we make it better?". (there _has_ been some interest from other chapters recently about how to improve the toolserver; however, most chapters don't have a lot of money to spend. a single additional database servers for the toolserver would cost at least EUR8'000.) in the past, we had a lot of problems getting WM-DE to do anything for the toolserver (it seemed everyone there was busy with something else), but that's been better recently, so i think we're making some progress. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2dV4ACgkQIXd7fCuc5vLkOwCgv9zShn4f8BVLHe5w8pYJuatU z8gAoLQOtJjveh1pzd1kPDiz7RWTN1zL =9qOq -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia is full
Magnus Manske wrote: > Database error > From Wikipedia, the free encyclopedia > Jump to: navigation, search > A database query syntax error has occurred. This may indicate a bug in > the software. The last attempted database query was: > > (SQL query hidden) > > from within function "ExternalStoreDB::store". MySQL returned error > "1114: The table 'blobs' is full (10.0.2.161)". Please report urgent system administration issues to IRC, specifically #wikimedia-tech on irc.freenode.net. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Bilal Abdul Kader schrieb: > Greetings, > We are setting up a research server at Concordia University (Canada) that is > dedicated for Wikipedia. We would love to share the resources with anyone > interested. > > In case anyone needs help setting it up, we would love to help as well. > > bilal There's a project for a biggish research cluster for wikipedia data awaiting funding at the Syracuse University. I forwarded your mail to one of the people involved. Perhaps you can join forces. > > On Mon, Mar 9, 2009 at 8:07 PM, phoebe ayers wrote: > >> Hi all, >> I'm not sure exactly where to raise this, so am asking here. >> >> A researcher I have been in touch with has proposed starting a 2nd, >> research-oriented Wikimedia toolserver. He thinks his lab can pay for >> the hardware and would be willing to maintain it, if they could get >> help setting it up. He got this idea after a member of his research >> group tried (unsuccessfully so far -- no response) to get an account >> on the current toolserver; their Wikipedia-related research has been >> put on hold for a few months because of the delay. (It seems like >> there is a big backlog of account requests right now and only one >> person working on them?) This research group has done some >> interesting Wikipedia research to date and I expect they could do more >> with access to the right data. I apologize for the delay, perhaps you can send me some detaqils in private, and I'll look at it. DaB doesn't have much time lately, and we had some major changes in infrastructure to take care of, that caused some delays. >> Personally, I think a dedicated toolserver is a great idea for the >> research community, but I know very little about the technical issues >> involved and/or whether this has been proposed before. Please comment, >> and I can pass on replies and put the researcher in touch with the >> tech team if it seems like a good idea. If it makes sense to run a separate cluster largely depends on what kind of data you need access too, and in what time frame. If you workj mustly on secondaty data like link tables, and you need the data in near-real time, use toolserver.org. That's what it's there for, and it's unlikely you can set up anything that could get the same data with low latency. However, if you work mostly on full text, toolserver.org is not so useful anyway - there's no direct access to full page text there anyway, not to search indexes. Having a dedicated cluster for research on textual content, perhaps providing content in various pre-processed forms, would be a very good idea. This is what the project I mentioned above aims at, and I'll be happy to support this effort officially, as Wikimedia Germany's tech guy. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Robert Rohde schrieb: > On Mon, Mar 9, 2009 at 9:29 PM, Andrew Garrett wrote: >> On Tue, Mar 10, 2009 at 3:21 PM, K. Peachey wrote: Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. >>> My understanding is that the the toolserver(/s) are owned by the >>> german chapter and not by wikimedia directly so why is private data >>> being replicated onto them? >> Because it was chosen as the best technical solution. Is there a >> specific problem with private data being on the toolserver? If so, >> what? > > I'd say the added worries about security and access approval are a > "problem" partially bundled up with that, even if they can be worked > around. > > Logistically it would be nice to have a means of providing an > exclusively public data replica for purposes such as research, though > I can certainly see how that could get technically messy. As far as I know, there is simply no efficient way to do this currently. MySQL's replication can be told to omit entire tables, but not individual columns or even rows. That would be required though. Witrh the new revision-deletion feature, we have even more trouble. So, toolserver roots need to be trusted and approved by the foundation. However, account *approval* doesn't require root access. It doesn't require any access, technically. Accoiunt *creation* of course does, but that's not much of a problem (except currently, because of infrastructure changes due to new serves, but that will be fixed soon). To avoid confusion: *two* Daniels can do approval: DaB and me. We both don't have much time, currently - DaB does it every now and then, and I don't do it at all, admittedly - i'm caught up in organizing the dev meeting and hardware orders besides doing my regular develoment jobs. I suppose we should streamline the process, yes. This would be a good topic for the developer meeting, maybe. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l