Re: [Wikitech-l] Google Summer of Code 2009

2009-03-10 Thread Nicolas Dumazet
Yeah!

I was part of the "mixed luck" from last year, and honestly, I get
warm feelings when reading the friendly
http://www.mediawiki.org/wiki/Summer_of_Code_2009/Application_template
;)


I don't know if anyone here is willing to be a mentor this year, but
please go ahead, and try to help a student. The experience is unique
for students, as it really motivates them to get involved -- if not in
mediawiki -- in an OSS project.

I don't know how to put it "nicely", but the key for GSoC to succeed,
on the mentor/senior devs side, is just to be *very* available. Easy
thing to say, I know, but that would be nice to keep this in mind if
we plan to host students this year.
It's not about having "xx minutes available a day for my student",
it's more of being able to set up in advance regular IRC meetings so
that his (her?) questions can be answered in real-time: being stuck on
your code when it seems like you won't get your questions answered in
a long time particularly sucks; especially when it seems to you that
answers are really simple.

And it's not only about mentors, but also about having some
"awareness" from devs that students are going to hang around on IRC,
asking for directions, and also sometimes asking (very) naive
questions: let's try not to bite them! =)


So yes, let's move, let's get involved into GSoC again! This is really
a great project, and I'm really looking forward to seeing new faces
around, bringing in new ideas, as naive as they may sound =)


2009/3/11 Brion Vibber :
> I’ve just put in Wikimedia’s org application for Google Summer of Code
> 2009… Hopefully we’ll get in. :)
>
> http://www.mediawiki.org/wiki/Summer_of_Code_2009
> ^ Add and update cool project ideas as a starting point for student
> applicants!
>
> We’ve had mixed luck in previous years with GSoC, but I think we’ve got
> enough internal bandwidth this year that we can make sure there’s enough
> effort put into interacting with the student candidates ahead of time to
> pick the coolest and most go-get-em self-starter awesome projects and
> then support them through the project term.
>
>
> I’ve also tossed up a student application template if you want to get
> started early. :)
>
> http://www.mediawiki.org/wiki/Summer_of_Code_2009/Application_template
>
> -- brion
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] how to convert the latin1 SQL dump back into UTF-8?

2009-03-10 Thread jidanni
OK, I found if I use "mysqldump --default-character-set=latin1"
I can read all that can be read in the dump.
The only difference from plain mysqldump is
-/*!40101 SET NAMES utf8 */;
+/*!40101 SET NAMES latin1 */;
But that doesn't seem to affect restores from the SQL file. I'm sold.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Chris Down
Yes, but whilst StatusBot and the proposed bot would have comparable edit
statistics, the latter would have more of a reason for running than 'to
update people's statuses'. It's not just about actions, it's about the
justification for those actions.

- Chris

On Wed, Mar 11, 2009 at 2:11 AM, Soxred93  wrote:

> In case you didn't see the whole StatusBot fiasco on enwiki, I used
> to run a bot as a replacement to a replacement of [[User:StatusBot]].
> The bot made 50k edis in a few months, and was soon shut down by
> Brion. A bot the edits the sandbox every few minutes would no way be
> approved.
>
> On Mar 10, 2009, at 7:54 PM [Mar 10, 2009 ], Thomas Dalton wrote:
>
> > 2009/3/10 K. Peachey :
> >> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde 
> >> wrote:
> >>> Out of curiousity, when a technical problem shuts down all
> >>> editing on
> >>> a major wiki (as this did) are there any automated alerts?  Is it
> >>> likely to be noticed and addressed even if no one rushes to IRC?
> >>>
> >>> I guess I am curious what is the normal delay between problem onset
> >>> and problem recognition?
> >>>
> >>> -Robert Rohde
> >> I believe with this issue (Full MySQL table) that there is no easy
> >> way
> >> to automate the test.
> >> maybe you could automatically query it every so often but even then
> >> that might not return reliable results.
> >
> > A bot that edits the sandbox every few minutes would work, would it?
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Soxred93
What about replag? The bot would puke every time that replication stops.

On Mar 10, 2009, at 8:02 PM [Mar 10, 2009 ], Robert Rohde wrote:

> On Tue, Mar 10, 2009 at 4:43 PM, K. Peachey  
>  wrote:
>> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde   
>> wrote:
>>> Out of curiousity, when a technical problem shuts down all  
>>> editing on
>>> a major wiki (as this did) are there any automated alerts?  Is it
>>> likely to be noticed and addressed even if no one rushes to IRC?
>>>
>>> I guess I am curious what is the normal delay between problem onset
>>> and problem recognition?
>>>
>>> -Robert Rohde
>> I believe with this issue (Full MySQL table) that there is no easy  
>> way
>> to automate the test.
>> maybe you could automatically query it every so often but even then
>> that might not return reliable results.
>
> One could query count(*) from revisions (or some similar artifice,
> such as looking at the recent changes feed) and trigger an alert if it
> stops increasing.
>
> Such things are probably totally unnecessary on enwiki, because there
> is no shortage of people to complain, but I could image it might be
> useful to have such an alert for smaller, non-English speaking wikis.
>
> -Robert Rohde
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Soxred93
In case you didn't see the whole StatusBot fiasco on enwiki, I used  
to run a bot as a replacement to a replacement of [[User:StatusBot]].  
The bot made 50k edis in a few months, and was soon shut down by  
Brion. A bot the edits the sandbox every few minutes would no way be  
approved.

On Mar 10, 2009, at 7:54 PM [Mar 10, 2009 ], Thomas Dalton wrote:

> 2009/3/10 K. Peachey :
>> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde   
>> wrote:
>>> Out of curiousity, when a technical problem shuts down all  
>>> editing on
>>> a major wiki (as this did) are there any automated alerts?  Is it
>>> likely to be noticed and addressed even if no one rushes to IRC?
>>>
>>> I guess I am curious what is the normal delay between problem onset
>>> and problem recognition?
>>>
>>> -Robert Rohde
>> I believe with this issue (Full MySQL table) that there is no easy  
>> way
>> to automate the test.
>> maybe you could automatically query it every so often but even then
>> that might not return reliable results.
>
> A bot that edits the sandbox every few minutes would work, would it?
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki developer meeting is drawing close

2009-03-10 Thread Brion Vibber
On 3/6/09 5:11 AM, Daniel Kinzler wrote:
> The meet-up[1] is drawing close now: between April 3. and 5. we meet at the
> c-base[2] in Berlin to discuss MediaWiki development, extensions, toolserver
> projects, wiki research, etc. Registration[3] is open until March 20 (required
> even if you already pre-registered).

I've put in a quick reg mail for Wikimedia's staff contingent. :)

We'll have some of the usual suspects -- me, Tim Starling, Mark Bergsma, 
and Michael Dale (of Metavid/Kaltura Ogg Theora video work fame) -- as 
well as some of our newer folks: Tomasz Finc who's been doing a lot of 
behind-the-scenes work with the fundraiser and notice systems and is now 
working on patching up the data dumps and job queue -- and Arash 
Boostani and Trevor Parscal who are dev'ing up the Wikipedia Usability 
Initiative.

-- brion
>
> The schedule[4] is slowly becomming clear now: On Friday, we'll start at noon
> with a who-is-who-and-does-what session and in the evening there will be an
> opportunity to get to know Berlin a bit. On Saturday we have all day for
> presentations and discussions, and in the evening we will have a party 
> together
> with all the folks from the chapter and board meetings. On Sunday there will 
> be
> a wrap-up session and a big lunch for everyone.
>
> We have also organized affordable accommodation: we have reserved rooms in the
> Apartmenthaus am Potsdamer Platz[5]. Staying there is a recommended way of
> getting to know your fellow Wikimedians!
>
> I'm happy that so many of you have shown interest, and I'm sure we'll have a
> great time in Berlin!
>
> Regards,
> Daniel
>
> [1] http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009
> [2] http://en.wikipedia.org/wiki/C-base
> [3] http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009/Registration
> [4] http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009#Outline
> [5]
> http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009#Apartmenthaus_am_Potsdamer_Platz
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Brion Vibber
On 3/10/09 5:29 PM, Aryeh Gregor wrote:
> On Tue, Mar 10, 2009 at 7:54 PM, Platonides  wrote:
>> Is mediawiki table structure going to change?
>
> Yes, it changes on a regular basis.
>
>> Moreover, any more private method for sharing the tables (eg. a trigger
>> deleting the row when rev_deleted is set) would precisely lose the
>> backup ability the toolserver is performing.
>
> I don't think the toolserver is used for backups.  At least I hope
> it's not, given its reliability (which is quite good, but "quite good"
> is scary for backups).

The existence of the replicas on toolserver is one of our backups. 
Obviously we want to improve our offsite backups to include complete 
offline snapshots as well. It's in progress. :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Brion Vibber
On 3/10/09 5:02 PM, Robert Rohde wrote:
> Such things are probably totally unnecessary on enwiki, because there
> is no shortage of people to complain, but I could image it might be
> useful to have such an alert for smaller, non-English speaking wikis.

*nod*

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Brion Vibber
On 3/10/09 4:39 PM, Platonides wrote:
> The procedure is "a lot of people enter #wikimedia-tech complaining
> about it".
> There are automated alerts about servers going down or not having enough
> free space on disk, but not for 'saving an edit failed'. That would be
> tricky to do.

Wouldn't be that tricky, but it's not our highest priority as the human 
alert system does an excellent job of this already. ;)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Aryeh Gregor
On Tue, Mar 10, 2009 at 7:15 PM, K. Peachey  wrote:
> Not everyone knows how to use IRC, i would recommend that you
> recommend people report to bugzilla with urgent tags instead.

Sysadmins do not check Bugzilla constantly.  They do check IRC
constantly if you say their names in the right channel.  Therefore,
IRC is the preferred method of reporting urgent problems, and will
remain so for the foreseeable future.  If some people don't know how
to use it, they can either find someone who does; learn quickly; or be
content with their alert taking a while longer to reach the right
people.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Aryeh Gregor
On Tue, Mar 10, 2009 at 7:54 PM, Platonides  wrote:
> Is mediawiki table structure going to change?

Yes, it changes on a regular basis.

> Moreover, any more private method for sharing the tables (eg. a trigger
> deleting the row when rev_deleted is set) would precisely lose the
> backup ability the toolserver is performing.

I don't think the toolserver is used for backups.  At least I hope
it's not, given its reliability (which is quite good, but "quite good"
is scary for backups).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Google Summer of Code 2009

2009-03-10 Thread Brion Vibber
I’ve just put in Wikimedia’s org application for Google Summer of Code 
2009… Hopefully we’ll get in. :)

http://www.mediawiki.org/wiki/Summer_of_Code_2009
^ Add and update cool project ideas as a starting point for student 
applicants!

We’ve had mixed luck in previous years with GSoC, but I think we’ve got 
enough internal bandwidth this year that we can make sure there’s enough 
effort put into interacting with the student candidates ahead of time to 
pick the coolest and most go-get-em self-starter awesome projects and 
then support them through the project term.


I’ve also tossed up a student application template if you want to get 
started early. :)

http://www.mediawiki.org/wiki/Summer_of_Code_2009/Application_template

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Robert Rohde
On Tue, Mar 10, 2009 at 4:43 PM, K. Peachey  wrote:
> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde  wrote:
>> Out of curiousity, when a technical problem shuts down all editing on
>> a major wiki (as this did) are there any automated alerts?  Is it
>> likely to be noticed and addressed even if no one rushes to IRC?
>>
>> I guess I am curious what is the normal delay between problem onset
>> and problem recognition?
>>
>> -Robert Rohde
> I believe with this issue (Full MySQL table) that there is no easy way
> to automate the test.
> maybe you could automatically query it every so often but even then
> that might not return reliable results.

One could query count(*) from revisions (or some similar artifice,
such as looking at the recent changes feed) and trigger an alert if it
stops increasing.

Such things are probably totally unnecessary on enwiki, because there
is no shortage of people to complain, but I could image it might be
useful to have such an alert for smaller, non-English speaking wikis.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Thomas Dalton
2009/3/10 K. Peachey :
>> A bot that edits the sandbox every few minutes would work, would it?
> Possibly, but i would bump it up to like every two hours. Plus since
> the MySQL is spread between multipul systems you would have make sure
> it checks the same one all the time.

If you're going to wait 2 hours, you might as well just wait for
people to start complaining, that will be far quicker.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread K. Peachey
> A bot that edits the sandbox every few minutes would work, would it?
Possibly, but i would bump it up to like every two hours. Plus since
the MySQL is spread between multipul systems you would have make sure
it checks the same one all the time.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Platonides
River Tarnell wrote:
> i think the only non-money issue is that the
> Wikimedia Foundation won't allow us to add any more admins until they do some
> internal reorganisation of their databases, which we've been waiting for for
> several months now.

Is mediawiki table structure going to change?
RevisionDelete system is not friendly for partial replication, but
precisely doing things that way is what [will] allows avoiding the
row-copying from revision to archive of the 'old' deletion system.

Moreover, any more private method for sharing the tables (eg. a trigger
deleting the row when rev_deleted is set) would precisely lose the
backup ability the toolserver is performing.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Thomas Dalton
2009/3/10 K. Peachey :
> On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde  wrote:
>> Out of curiousity, when a technical problem shuts down all editing on
>> a major wiki (as this did) are there any automated alerts?  Is it
>> likely to be noticed and addressed even if no one rushes to IRC?
>>
>> I guess I am curious what is the normal delay between problem onset
>> and problem recognition?
>>
>> -Robert Rohde
> I believe with this issue (Full MySQL table) that there is no easy way
> to automate the test.
> maybe you could automatically query it every so often but even then
> that might not return reliable results.

A bot that edits the sandbox every few minutes would work, would it?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread K. Peachey
On Wed, Mar 11, 2009 at 9:21 AM, Robert Rohde  wrote:
> Out of curiousity, when a technical problem shuts down all editing on
> a major wiki (as this did) are there any automated alerts?  Is it
> likely to be noticed and addressed even if no one rushes to IRC?
>
> I guess I am curious what is the normal delay between problem onset
> and problem recognition?
>
> -Robert Rohde
I believe with this issue (Full MySQL table) that there is no easy way
to automate the test.
maybe you could automatically query it every so often but even then
that might not return reliable results.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Platonides
Robert Rohde wrote:
> Out of curiousity, when a technical problem shuts down all editing on
> a major wiki (as this did) are there any automated alerts?  Is it
> likely to be noticed and addressed even if no one rushes to IRC?
> 
> I guess I am curious what is the normal delay between problem onset
> and problem recognition?
> 
> -Robert Rohde

The procedure is "a lot of people enter #wikimedia-tech complaining
about it".
There are automated alerts about servers going down or not having enough
free space on disk, but not for 'saving an edit failed'. That would be
tricky to do.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Robert Rohde
Out of curiousity, when a technical problem shuts down all editing on
a major wiki (as this did) are there any automated alerts?  Is it
likely to be noticed and addressed even if no one rushes to IRC?

I guess I am curious what is the normal delay between problem onset
and problem recognition?

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread K. Peachey
> Please report urgent system administration issues to IRC, specifically
> #wikimedia-tech on irc.freenode.net.
>
> -- Tim Starling
Not everyone knows how to use IRC, i would recommend that you
recommend people report to bugzilla with urgent tags instead.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki developer meeting is drawing close

2009-03-10 Thread Thomas Dalton
2009/3/10 Daniel Kinzler :
> Roan Kattouw schrieb:
>> Daniel Kinzler schreef:
>>> The schedule[4] is slowly becomming clear now: On Friday, we'll start at 
>>> noon
>>> with a who-is-who-and-does-what session
>> The schedule you're linking to says it starts at 3 PM. Which time is the
>> right one?
>
> Bah, naming times of day in english is awkward :) what do you call 1pm,
> "afternnon"? Anyway...

1pm is afternoon, but "at noon" is not "at 3PM"!

> So... doors open at noon, schedule starts at 3pm. Satisfied?

That makes sense!

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki developer meeting is drawing close

2009-03-10 Thread Lars Aronsson
Daniel Kinzler wrote:

> So, come early, hang out a bit, the c-base is a cosy place :)
> So... doors open at noon, schedule starts at 3pm. Satisfied?


Or if you arrive in Berlin already on Thursday April 2, join me 
for lunch at the Museum of Technology, and we can spend that 
afternoon admiring the replicas of Konrad Zuse's early computers.

I added this as an informal item to the program at
http://www.mediawiki.org/wiki/Project:Developer_meet-up_2009
where you can also find links for more reading and maps.


-- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

  Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Robert Rohde
On Tue, Mar 10, 2009 at 2:18 PM, Daniel Kinzler  wrote:
> Robert Rohde schrieb:
>> The converse of this is that some recognized experts would probably
>> prefer to administer their own server/cluster rather than relying on
>> some random guy with Wikimedia DE (or wherever) to get things done.
>
> An academic institution may also get a serious research grant for this - that
> would be more complicated if the money would be handeled via the german 
> chapter.
> Though it's something we are, of course, also interested in.
>
> Basically, if we could all work on making the toolserver THE ONE PLACE for
> working with wikipedia's data, that would be perfect. If, for some reason, it
> makes sense to build a separate cluster, I propose to give it a distict 
> purpose
> and profile: let it provide facilities for fulltext research, with low 
> priority
> for the update latency, and high priority of having fulltext in various forms,
> with search indexes, word lists, and all the fun.

Personally I would favor a physically distinct cluster (regardless of
who administers it) more or less with the focus you describe.  In
particular, I think it is useful to separate "tools" from "analysis".
A "tool" aims to provide useful information in near realtime based on
specific and focused parameters.  By contrast, "analysis" often
involves running some process systematically through a very large
portion of the data with the expectation that it will take a while
(for example, I've used dumps to perform large statistical analyses
where the processing code might take 24 hours when run against the
full edit history of a large wiki.)  "Tools" need high availability
and low lag relative to the live site, but "analysis" doesn't care if
it gets out of date and should use scheduling etc. to balance large
loads.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Daniel Kinzler
Robert Rohde schrieb:
> On Tue, Mar 10, 2009 at 1:27 PM, River Tarnell
>  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> phoebe ayers:
>>> River: Well, you say that part of the issue with the toolserver is money and
>>> time... and this person that I've been talking to is offering to throw money
>>> and time at the problem. So, what can they constructively do?
>> i think this is being discussed privately now...
> 
> If other research groups are interested in contributing to this, who
> should they be talking to?

Wikimedia Germany. That is, I guess, me. Send mail to daniel dot kinzler at
wikimedia dot de. I'll forward it as appropriate.

>> i don't see why access to the toolserver would be restricted to Wikipedia
>> editors.  in fact, i'd be happier giving access to a recognised academic 
>> expert
>> than some random guy on Wikipedia.
> 
> The converse of this is that some recognized experts would probably
> prefer to administer their own server/cluster rather than relying on
> some random guy with Wikimedia DE (or wherever) to get things done.

An academic institution may also get a serious research grant for this - that
would be more complicated if the money would be handeled via the german chapter.
Though it's something we are, of course, also interested in.

Basically, if we could all work on making the toolserver THE ONE PLACE for
working with wikipedia's data, that would be perfect. If, for some reason, it
makes sense to build a separate cluster, I propose to give it a distict purpose
and profile: let it provide facilities for fulltext research, with low priority
for the update latency, and high priority of having fulltext in various forms,
with search indexes, word lists, and all the fun.

Regards,
Daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Andrea Forte
Let me know if you have a grant proposal you'd like help with!

Andrea

On Tue, Mar 10, 2009 at 4:30 PM, River Tarnell
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Andrea Forte:
>> To give you an idea, for me personally, the incentive for a new resource is a
>> need for a server (perhaps a cluster) to support full-text queries at a
>> reasonable speed.
>
> then why not help us do this on the existing toolserver, so everyone can have
> access to it, instead of duplicating it yet again somewhere else?
>
> there are many toolserver users who would like direct access to text, and the
> ability to search it.
>
>        - river.
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (HP-UX)
>
> iEYEARECAAYFAkm2zgIACgkQIXd7fCuc5vLrvgCgkWY9BizcJCSunzrk+dPdrcJO
> U4wAn0kIpQd7NYVBHfKNwR+dTM2rTon6
> =rSHL
> -END PGP SIGNATURE-
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Robert Rohde
On Tue, Mar 10, 2009 at 1:27 PM, River Tarnell
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> phoebe ayers:
>> River: Well, you say that part of the issue with the toolserver is money and
>> time... and this person that I've been talking to is offering to throw money
>> and time at the problem. So, what can they constructively do?
>
> i think this is being discussed privately now...

If other research groups are interested in contributing to this, who
should they be talking to?



> i don't see why access to the toolserver would be restricted to Wikipedia
> editors.  in fact, i'd be happier giving access to a recognised academic 
> expert
> than some random guy on Wikipedia.

The converse of this is that some recognized experts would probably
prefer to administer their own server/cluster rather than relying on
some random guy with Wikimedia DE (or wherever) to get things done.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki developer meeting is drawing close

2009-03-10 Thread Daniel Kinzler
Roan Kattouw schrieb:
> Daniel Kinzler schreef:
>> The schedule[4] is slowly becomming clear now: On Friday, we'll start at noon
>> with a who-is-who-and-does-what session
> The schedule you're linking to says it starts at 3 PM. Which time is the 
> right one?

Bah, naming times of day in english is awkward :) what do you call 1pm,
"afternnon"? Anyway...

The schedule on the wiki is the definitive one - or rather, as definite as it
gets. It may however still change. So, come early, hang out a bit, the c-base is
a cosy place :)

So... doors open at noon, schedule starts at 3pm. Satisfied?

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrea Forte:
> To give you an idea, for me personally, the incentive for a new resource is a
> need for a server (perhaps a cluster) to support full-text queries at a
> reasonable speed. 

then why not help us do this on the existing toolserver, so everyone can have
access to it, instead of duplicating it yet again somewhere else?

there are many toolserver users who would like direct access to text, and the
ability to search it.

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (HP-UX)

iEYEARECAAYFAkm2zgIACgkQIXd7fCuc5vLrvgCgkWY9BizcJCSunzrk+dPdrcJO
U4wAn0kIpQd7NYVBHfKNwR+dTM2rTon6
=rSHL
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Andrea Forte
I've been trying to do some work mining the full en dump with revision
history and was involved in getting together the Syracuse grant
proposal. To give you an idea, for me personally, the incentive for a
new resource is a need for a server (perhaps a cluster) to support
full-text queries at a reasonable speed. People at various research
institutions duplicate this effort over and over.

Andrea



On Tue, Mar 10, 2009 at 2:26 PM, phoebe ayers  wrote:
> Thanks for the responses, all.
>
> Daniel and Bilal: the notes about the possible servers at Syracuse and
> Concordia are very interesting; it sounds like the researchers
> interested in such things should team up.
>
> Daniel: I am not sure what type of data is needed -- this is not my
> project (I'm only the messenger!) but I'll pass along your message and
> send you private details (and encourage the researcher to reply
> himself).
>
> River: Well, you say that part of the issue with the toolserver is
> money and time... and this person that I've been talking to is
> offering to throw money and time at the problem. So, what can they
> constructively do?
>
> All: Like I said, I am unclear on the technical issues involved, but
> as for why a separate "research toolserver" might be useful... :
> I see a difference in the type of information a researcher might want
> to pull (public data, large sets of related page information,
> full-text mining, ??) and the types of tools that the current
> toolserver mainly supports (editcount tools, catscan, etc). I also see
> a difference in how the two groups might be authenticated -- there's a
> difference between being a trusted Wikipedian or trusted Wikimedia
> developer and being a trusted technically-competent researcher (for
> instance, I recognized the affiliation of the person who was trying to
> apply, because I've read their research papers; but if you were going
> on wikimedia status alone, they don't have any).
>
> -- Phoebe
>
> --
> * I use this address for lists; send personal messages to phoebe.ayers
>  gmail.com *
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

phoebe ayers:
> River: Well, you say that part of the issue with the toolserver is money and
> time... and this person that I've been talking to is offering to throw money
> and time at the problem. So, what can they constructively do?
 
i think this is being discussed privately now...

> I see a difference in the type of information a researcher might want to pull
> (public data, large sets of related page information, full-text mining, ??)
> and the types of tools that the current toolserver mainly supports (editcount
> tools, catscan, etc).

so, what is missing from the current toolserver that prevents researchers from
working with large data sets?

> I also see a difference in how the two groups might be authenticated --
> there's a difference between being a trusted Wikipedian or trusted Wikimedia
> developer and being a trusted technically-competent researcher 

i don't see why access to the toolserver would be restricted to Wikipedia
editors.  in fact, i'd be happier giving access to a recognised academic expert
than some random guy on Wikipedia.

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (HP-UX)

iEYEARECAAYFAkm2zSQACgkQIXd7fCuc5vKYSACdF2IJwcfhWEarjgDC8FmMSls1
NN0An2jLSu3/mhLCEAsLuoZz0x3DE8mP
=ZHMA
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread phoebe ayers
Thanks for the responses, all.

Daniel and Bilal: the notes about the possible servers at Syracuse and
Concordia are very interesting; it sounds like the researchers
interested in such things should team up.

Daniel: I am not sure what type of data is needed -- this is not my
project (I'm only the messenger!) but I'll pass along your message and
send you private details (and encourage the researcher to reply
himself).

River: Well, you say that part of the issue with the toolserver is
money and time... and this person that I've been talking to is
offering to throw money and time at the problem. So, what can they
constructively do?

All: Like I said, I am unclear on the technical issues involved, but
as for why a separate "research toolserver" might be useful... :
I see a difference in the type of information a researcher might want
to pull (public data, large sets of related page information,
full-text mining, ??) and the types of tools that the current
toolserver mainly supports (editcount tools, catscan, etc). I also see
a difference in how the two groups might be authenticated -- there's a
difference between being a trusted Wikipedian or trusted Wikimedia
developer and being a trusted technically-competent researcher (for
instance, I recognized the affiliation of the person who was trying to
apply, because I've read their research papers; but if you were going
on wikimedia status alone, they don't have any).

-- Phoebe

-- 
* I use this address for lists; send personal messages to phoebe.ayers
 gmail.com *

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Aryeh Gregor:
> Oh.  Why does a single specific person have to handle the approval of
> all toolserver account requests, then?

because accounts have to be approved by WM-DE, and WM-DE has designated this
person to approve accounts on their behalf.

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (HP-UX)

iEYEARECAAYFAkm2deAACgkQIXd7fCuc5vJBLQCeINPPjEA50FjFlphN70J9gnAx
7dkAoJ1WXk0hWFOLj1ZZNbwNG0fBDVok
=+dbS
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

phoebe ayers:
> Personally, I think a dedicated toolserver is a great idea for the research
> community, but I know very little about the technical issues involved and/or
> whether this has been proposed before. Please comment, and I can pass on
> replies and put the researcher in touch with the tech team if it seems like a
> good idea.

i don't understand what "research-oriented" toolserver means.  what will the
research-toolserver provide that the current toolserver doesn't provide?

is the only issue the time it takes for accounts to be created?  this is a
WM-DE issue; the more people who complain to WM-DE about this, the more likely
it is to be resolved.  (so far, i've had zero communications from WM-DE about
how the only people able to approve accounts are so busy with other things
nowadays.  on the other hand, i didn't ask them about it either; i suppose they
don't bother monitoring the toolserver most of the time.)

we recently conducted a survey of toolserver users, and account approval (not
creation) was generally felt to be quite slow.  once i produce a report from
the results of that survey, we might be able to get WM-DE to do something about
it.

most of the issues with the current toolserver come down to money.  we don't
have enough money to afford redundant databases, so any failure is a major
problem and creates inconvenience for users.  we don't have enough money for a
paid admin, so it often takes a long time for things to get done.  we don't
have enough money to upgrade hardware when we need it, so things are often slow
until the money is available.  i think the only non-money issue is that the
Wikimedia Foundation won't allow us to add any more admins until they do some
internal reorganisation of their databases, which we've been waiting for for
several months now.  

the more separate toolservers we have, the less efficiently the money is spent.
sure, every chapter and university could have their own toolserver, but i don't
see how that's a better situation than these people contributing to a single
toolserver in order to fix the problems that prevent people from using it.
i've lost count of how often i've heard "the toolserver sucks; let's start our
own".  what i don't understand is why no one says "the toolserver sucks; how
can we make it better?".  (there _has_ been some interest from other chapters
recently about how to improve the toolserver; however, most chapters don't have
a lot of money to spend.  a single additional database servers for the
toolserver would cost at least EUR8'000.)

in the past, we had a lot of problems getting WM-DE to do anything for the
toolserver (it seemed everyone there was busy with something else), but that's
been better recently, so i think we're making some progress.

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (HP-UX)

iEYEARECAAYFAkm2dV4ACgkQIXd7fCuc5vLkOwCgv9zShn4f8BVLHe5w8pYJuatU
z8gAoLQOtJjveh1pzd1kPDiz7RWTN1zL
=9qOq
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia is full

2009-03-10 Thread Tim Starling
Magnus Manske wrote:
> Database error
> From Wikipedia, the free encyclopedia
> Jump to: navigation, search
> A database query syntax error has occurred. This may indicate a bug in
> the software. The last attempted database query was:
> 
> (SQL query hidden)
> 
> from within function "ExternalStoreDB::store". MySQL returned error
> "1114: The table 'blobs' is full (10.0.2.161)".

Please report urgent system administration issues to IRC, specifically
#wikimedia-tech on irc.freenode.net.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Daniel Kinzler
Bilal Abdul Kader schrieb:
> Greetings,
> We are setting up a research server at Concordia University (Canada) that is
> dedicated for Wikipedia. We would love to share the resources with anyone
> interested.
> 
> In case anyone needs help setting it up, we would love to help as well.
> 
> bilal

There's a project for a biggish research cluster for wikipedia data awaiting
funding at the Syracuse University. I forwarded your mail to one of the people
involved. Perhaps you can join forces.

> 
> On Mon, Mar 9, 2009 at 8:07 PM, phoebe ayers  wrote:
> 
>> Hi all,
>> I'm not sure exactly where to raise this, so am asking here.
>>
>> A researcher I have been in touch with has proposed starting a 2nd,
>> research-oriented Wikimedia toolserver. He thinks his lab can pay for
>> the hardware and would be willing to maintain it, if they could get
>> help setting it up. He got this idea after a member of his research
>> group tried (unsuccessfully so far -- no response) to get an account
>> on the current toolserver; their Wikipedia-related research has been
>> put on hold for a few months because of the delay. (It seems like
>> there is a big backlog of account requests right now and only one
>> person working on them?)  This research group has done some
>> interesting Wikipedia research to date and I expect they could do more
>> with access to the right data.

I apologize for the delay, perhaps you can send me some detaqils in private, and
I'll look at it. DaB doesn't have much time lately, and we had some major
changes in infrastructure to take care of, that caused some delays.

>> Personally, I think a dedicated toolserver is a great idea for the
>> research community, but I know very little about the technical issues
>> involved and/or whether this has been proposed before. Please comment,
>> and I can pass on replies and put the researcher in touch with the
>> tech team if it seems like a good idea.

If it makes sense to run a separate cluster largely depends on what kind of data
you need access too, and in what time frame. If you workj mustly on secondaty
data like link tables, and you need the data in near-real time, use
toolserver.org. That's what it's there for, and it's unlikely you can set up
anything that could get the same data with low latency.

However, if you work mostly on full text, toolserver.org is not so useful anyway
- there's no direct access to full page text there anyway, not to search
indexes. Having a dedicated cluster for research on textual content, perhaps
providing content in various pre-processed forms, would be a very good idea.
This is what the project I mentioned above aims at, and I'll be happy to support
this effort officially, as Wikimedia Germany's tech guy.


-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] research-oriented toolserver?

2009-03-10 Thread Daniel Kinzler
Robert Rohde schrieb:
> On Mon, Mar 9, 2009 at 9:29 PM, Andrew Garrett  wrote:
>> On Tue, Mar 10, 2009 at 3:21 PM, K. Peachey  wrote:
 Currently all data, including private data, is replicated to the
 toolserver. We could not do this with a third-party server.
>>> My understanding is that the the toolserver(/s) are owned by the
>>> german chapter and not by wikimedia directly so why is private data
>>> being replicated onto them?
>> Because it was chosen as the best technical solution. Is there a
>> specific problem with private data being on the toolserver? If so,
>> what?
> 
> I'd say the added worries about security and access approval are a
> "problem" partially bundled up with that, even if they can be worked
> around.
> 
> Logistically it would be nice to have a means of providing an
> exclusively public data replica for purposes such as research, though
> I can certainly see how that could get technically messy.

As far as I know, there is simply no efficient way to do this currently. MySQL's
replication can be told to omit entire tables, but not individual columns or
even rows. That would be required though. Witrh the new revision-deletion
feature, we have even more trouble.

So, toolserver roots need to be trusted and approved by the foundation. However,
account *approval* doesn't require root access. It doesn't require any access,
technically. Accoiunt *creation* of course does, but that's not much of a
problem (except currently, because of infrastructure changes due to new serves,
but that will be fixed soon).

To avoid confusion: *two* Daniels can do approval: DaB and me. We both don't
have much time, currently - DaB does it every now and then, and I don't do it at
all, admittedly - i'm caught up in organizing the dev meeting and hardware
orders besides doing my regular develoment jobs. I suppose we should streamline
the process, yes. This would be a good topic for the developer meeting, maybe.


-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l