On Tue, Mar 10, 2009 at 2:18 PM, Daniel Kinzler <dan...@brightbyte.de> wrote:
> Robert Rohde schrieb:
>> The converse of this is that some recognized experts would probably
>> prefer to administer their own server/cluster rather than relying on
>> some random guy with Wikimedia DE (or wherever) to get things done.
>
> An academic institution may also get a serious research grant for this - that
> would be more complicated if the money would be handeled via the german 
> chapter.
> Though it's something we are, of course, also interested in.
>
> Basically, if we could all work on making the toolserver THE ONE PLACE for
> working with wikipedia's data, that would be perfect. If, for some reason, it
> makes sense to build a separate cluster, I propose to give it a distict 
> purpose
> and profile: let it provide facilities for fulltext research, with low 
> priority
> for the update latency, and high priority of having fulltext in various forms,
> with search indexes, word lists, and all the fun.

Personally I would favor a physically distinct cluster (regardless of
who administers it) more or less with the focus you describe.  In
particular, I think it is useful to separate "tools" from "analysis".
A "tool" aims to provide useful information in near realtime based on
specific and focused parameters.  By contrast, "analysis" often
involves running some process systematically through a very large
portion of the data with the expectation that it will take a while
(for example, I've used dumps to perform large statistical analyses
where the processing code might take 24 hours when run against the
full edit history of a large wiki.)  "Tools" need high availability
and low lag relative to the live site, but "analysis" doesn't care if
it gets out of date and should use scheduling etc. to balance large
loads.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to