Re: [Wikitech-l] research-oriented toolserver?
Morten Warncke-Wang schrieb: Hi all, Judging by the replies we think we've failed to communicate clearly some of the ideas we wanted to put forward, and we'd like to take the opportunity to try to clear that up. We did not want to narrow this down to be only about a third party toolserver. Before we initiated contact we noticed the need for adding more resources to the existing cluster. Therefore we also had in mind the idea of augmenting the toolserver, rather than attempt to create a competitor for it. For instance this could help allow the toolserver to also host applications requiring some amounts of text crunching, which is currently not feasible as far as we can tell. That would be excellent. Additionally we think there could perhaps be two paths to account creation, one for Wikipedians and one for researchers, with the research path laid out with clearer documentation on the requirements projects would need to fit the toolserver and what the application should contain, which combined with faster feedback would aid to make the process easier for the researchers. I think this should be done for all accounts. Why only researchers? We hope that this clears up some central points in our ideas surrounding a research oriented toolserver. Currently we are exploring several ideas and this particular one might not become more than a thought and a thread on a mailing list. Nonetheless perhaps there are thoughts here that can become more solid somewhere down the line. In order to develop ideas, it would be useful to get some idea of what kind of resources you think you can contribute, and under what terms and in what timeframe. I know that talking money in public is usually a bad idea, especially if the money isn't really there yet. If you like, contact me in private, preferrably under my office address, daniel.kinzler AT wikimedia.de. I'm responsible for toolserver operations, so I suppose it's my job to look into this. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Brian schrieb: I think what the toolserver guys are saying is that they've got the data (e.g., a replica of the master database) and they are willing to expand operations to include larger-scale computations, and so yes they are willing to become more research oriented. They just need the extra hardware of course. I think it's difficult to estimate how much but here are some applications that I would like to make or see made sooner or later: * WikiBlame - A Lucene index of the history of all projects that can instantly find the authors of a pasted snippet. I'm not clear on the memory requirements of hosting an app like this after the index is created, but the index will be terabyte-size at 35% of the text dump. Note that WikiTrust can do this too, and will probably go into testing soon. For now, the database for WikiTrust weill be off-site, but if it goes live on wikipedia, the hardwaree would be run at the main wmf cluster, and not on the toolserver. * WikiBlame for images - an image similarity algorithm over all images in all projects that can find all places a given image is being used. I believe there is a one-time major cpu cost when first analyzing the images and then a much lesser realtime comparison cost. Again, the memory requirements of hosting such an app are unclear. That would be very nice to have... * A vandalism classifier bot that uses the entire history of a wiki in order to predict whether the current edit is vandalism. Basically, a major extension of existing published work on automatically detecting vandalism, which only used several hundred edits. This would require major cpu resources for training but very little cost for real-time classification. Pretty big for a toolserver poroject. But an excellent research topic! * Dumps, including extended dump formats such as a natural language parse of the full text of the recent version of a wiki made readily available for researchers. Finally, there are many worthwhile projects that have been presented at past Wikimanias or published in the literature that deserve to be kept up to date as the encyclopedia continues to grow. Permanent hosting for such projects would be a worthwhile goal, as would reaching out to these researchers. If the foundation can afford such an endeavor, the hardware cost is actually not that great. Perhaps datacenter fees are. Please don't foprget that the toolserver is NOT run by the wikimedia foundation. It's run by wikimedia germany, which has maybe a tenth of the foundation's budget. If the foundation is interested in supporting us further, that's great, we just need to keep responsibilities clear: is the foundation runnign a project, or is the foundation heling us (wikimedia germany) to run a project?... -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
How will WikiTrust accomplish the WikiBlame function? I think I know what WikiTrust is: http://trust.cse.ucsc.edu/ What gives it the function that you can enter a piece of wiki code from the history of any wiki - totally out of context - and it returns the authors? On Sat, Mar 14, 2009 at 2:02 AM, Daniel Kinzler dan...@brightbyte.de wrote: Brian schrieb: * WikiBlame - A Lucene index of the history of all projects that can instantly find the authors of a pasted snippet. I'm not clear on the memory requirements of hosting an app like this after the index is created, but the index will be terabyte-size at 35% of the text dump. Note that WikiTrust can do this too, and will probably go into testing soon. For now, the database for WikiTrust weill be off-site, but if it goes live on wikipedia, the hardwaree would be run at the main wmf cluster, and not on the toolserver. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Hi all, Judging by the replies we think we've failed to communicate clearly some of the ideas we wanted to put forward, and we'd like to take the opportunity to try to clear that up. We did not want to narrow this down to be only about a third party toolserver. Before we initiated contact we noticed the need for adding more resources to the existing cluster. Therefore we also had in mind the idea of augmenting the toolserver, rather than attempt to create a competitor for it. For instance this could help allow the toolserver to also host applications requiring some amounts of text crunching, which is currently not feasible as far as we can tell. Additionally we think there could perhaps be two paths to account creation, one for Wikipedians and one for researchers, with the research path laid out with clearer documentation on the requirements projects would need to fit the toolserver and what the application should contain, which combined with faster feedback would aid to make the process easier for the researchers. We hope that this clears up some central points in our ideas surrounding a research oriented toolserver. Currently we are exploring several ideas and this particular one might not become more than a thought and a thread on a mailing list. Nonetheless perhaps there are thoughts here that can become more solid somewhere down the line. Morten Warncke-Wang, Research Assistant John Riedl, Professor GroupLens Research www.grouplens.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Aryeh Gregor: If I understand correctly, the only change being contemplated here is not replicating the databases that are entirely secret (databases of private wikis). this is correct. I might be misunderstanding, though. If only entire databases need to be hidden, why can't the toolserver just be set up not to replicate those, given that MySQL supports that? because it would require a proxy server under WMF control that filtered out the evil tables and provided a clean replicated feed to the toolserver, which is a lot more effort (and more fragile) than just moving the bad data. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm5FO8ACgkQIXd7fCuc5vKOhQCdGrF+u80Y4H8H/YcKwyTxce/5 iM8AnRAaS/xAuouawGht0/clWe13H8FG =hwrB -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Morten Warncke-Wang: What do you think? Seem like a useful idea if we can find sufficient resources, and put together a management plan? no, like Daniel said, this is a waste of time and effort. i originally assumed that a research toolserver would be different in some technical sense, which might make at least some sense (although i've argued against that elsewhere in this thread). however, i completely fail to understand your reasoning here. is there some backstory i'm missing? did you apply for a Toolserver account and were rejected because you aren't a Wikipedia editor? does the WM-DE have a history of doing this? (i'm certainly not aware of it, if so...) if you want to improve the account approval process at the Toolserver, doesn't it make more sense to do that, rather than creating a completely new project to fix one small issue? - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm5FnwACgkQIXd7fCuc5vJCTQCgrdu1UILmXifN4KAfMM64FVk5 seUAoKw3jUuQW9kp/aHdSqAs3lZBX82T =PRVy -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brion Vibber: Could be done. We're also fine with new toolserver roots as long as we approve em too for now. it would have been nice if the Toolserver was aware of this ;-) - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm5GLgACgkQIXd7fCuc5vJNTwCbBLBE5grZpHtLrKj8IiAgNTFN 8awAoKyAtofejah80yBSR4XaNSmEv3L0 =fFvc -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Anthony schrieb: On Tue, Mar 10, 2009 at 12:29 AM, Andrew Garrett and...@werdn.us wrote: On Tue, Mar 10, 2009 at 3:21 PM, K. Peachey p858sn...@yahoo.com.au wrote: Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. My understanding is that the the toolserver(/s) are owned by the german chapter and not by wikimedia directly so why is private data being replicated onto them? Because it was chosen as the best technical solution. Is there a specific problem with private data being on the toolserver? If so, what? You should be aware that toolserver roots are approved by the foundation before becoming roots. You answer the questions in your first paragraph with your sentence in the second. Think Cathedral vs. Bazaar. On Tue, Mar 10, 2009 at 4:27 AM, Daniel Kinzler dan...@brightbyte.dewrote: Robert Rohde schrieb: On Mon, Mar 9, 2009 at 9:29 PM, Andrew Garrett and...@werdn.us wrote: Logistically it would be nice to have a means of providing an exclusively public data replica for purposes such as research, though I can certainly see how that could get technically messy. As far as I know, there is simply no efficient way to do this currently. How much information does the live feed provide? Every revision, or just a subset of revisions? How much would it cost the WMF to provide a single near-live stream of every revision? A feed service for all revisions is available, see http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service. Search engines like to use it (think: answers.com) and they are made to pay for it. Researches should generally get it for free. Just ask brion. This doesn provide notifications in the range of seconds (which might bee needed for vandal-fighting tools), but should be quite sufficient to keep a text database up to date. For real-time notifications, the only decent method is the RC feed on IRC, but that's hard to parse and messages frequently get truncated. Having better means for distributing notifications of changes is something i'm quite interested in. XMPP would be a very good choice, I think, I wrote about it a while ago here: http://brightbyte.de/page/RecentChanges_via_Jabber. I did not write about including full revision text or diffs in the notifications, but that's sure possible. It may be a bit too heavy for a general purpose feed, but it would be feasible wehen using PubSub, I think. Anyway, getting this implemented would be nice. If anyone has time and/or money he could commit towards this, that would be excellent :) -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On 3/12/09 7:14 AM, River Tarnell wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brion Vibber: Could be done. We're also fine with new toolserver roots as long as we approve em too for now. it would have been nice if the Toolserver was aware of this ;-) I was pretty sure this came up in an IRC chat a few months ago; my apologies if we didn't both realize it. :) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Aryeh Gregor: I don't think the toolserver is used for backups. it is, but only in the sense that it's our only off-site copy of the database. it was not created to act as a backup... At least I hope it's not, given its reliability (which is quite good, but quite good is scary for backups). ... however, if we had enough money to support the toolserver properly, i think it would be perfectly reliable as a backup. that's something that might change this year. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3csYACgkQIXd7fCuc5vKCIgCcCzL9EGZwgZhOn5Dj/U2a6wPe /NgAn0UJzytuVBfcOUjoUs4VFWNOgeJu =HskZ -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Rohde: In particular, I think it is useful to separate tools from analysis. why? Tools need high availability and low lag relative to the live site, but analysis doesn't care if it gets out of date and should use scheduling etc. to balance large loads. what is preventing people from using the current toolserver for this analysis? what do we need to change about the platform that will enable people to run it on the current toolserver? - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3eD0ACgkQIXd7fCuc5vJeNQCbB3zmpKh2jLmyJDqr6riSXtE5 1GMAoLjUPl28JgGFiXMAMKEEF2659DI8 =R0i8 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andrea Forte: Let me know if you have a grant proposal you'd like help with! well, i'm still not sure what exactly people need. perhaps the various academic people could produce a list of what they want to do on the toolserver and what's missing at the moment? (e.g. fast text access, search, ...) then we can look at the best way to provide this, including where the money should come from. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3eZsACgkQIXd7fCuc5vIb0gCfSOEH+xZA70n2NjZjEHRLTLt2 5tgAmwTy4Qf/qqIqWHwLr030rzmzHr/0 =U3tz -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
I vote for making the toolserver the head-node to a much larger beowulf cluster that has a well configured job scheduler. The data that needs to be crunched is already right there - it makes sense to put a research cluster there as well. There will always be a limited supply of resources. Perhaps there should be a public approval system for the resources, where the community gets to pick which jobs should get added to the queue based on public analysis of the code and a description of the computation. There will be no shortage of participants ;) On Wed, Mar 11, 2009 at 2:37 AM, River Tarnell ri...@loreley.flyingparchment.org.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Rohde: In particular, I think it is useful to separate tools from analysis. why? Tools need high availability and low lag relative to the live site, but analysis doesn't care if it gets out of date and should use scheduling etc. to balance large loads. what is preventing people from using the current toolserver for this analysis? what do we need to change about the platform that will enable people to run it on the current toolserver? - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3eD0ACgkQIXd7fCuc5vJeNQCbB3zmpKh2jLmyJDqr6riSXtE5 1GMAoLjUPl28JgGFiXMAMKEEF2659DI8 =R0i8 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian: I vote for making the toolserver the head-node to a much larger beowulf cluster that has a well configured job scheduler. so the issue is that more CPU is needed to run the research jobs? how much more? do you have an example of a job and what it would require to run here? - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3fcsACgkQIXd7fCuc5vLahACgl/mTCSMcqndaChCrooL9geWo qYYAnRBmY5aFv3uvScH6uZWcDB8fTV5a =Q0+7 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Sure - creating a lucene index of the entire revision history of all wikipedia's for a WikiBlame extension. More realistically (although I would like to do the above) a natural language parse of the current revision of the english wikipedia. Based on the supposed availability of this hardware, I'd say it could be done in less than a week. https://wiki.toolserver.org/view/Servers I have to say the toolserver has grown a lot from that first donated server ^_^ On Wed, Mar 11, 2009 at 3:00 AM, River Tarnell ri...@loreley.flyingparchment.org.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian: I vote for making the toolserver the head-node to a much larger beowulf cluster that has a well configured job scheduler. so the issue is that more CPU is needed to run the research jobs? how much more? do you have an example of a job and what it would require to run here? - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3fcsACgkQIXd7fCuc5vLahACgl/mTCSMcqndaChCrooL9geWo qYYAnRBmY5aFv3uvScH6uZWcDB8fTV5a =Q0+7 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Rohde: The starting point is providing full-text history availability and once you have that there are a number of different projects (like wikiblame) which would desire to pull and process every revision in some way. okay, so full text access has been a 'would be nice' thing for a while. i added an item to this year's shopping list for it. it seems more useful to provide the text in uncompressed form, instead of the MediaWiki internal form that's almost impossible to work with. does that seem reasonable? Some of the code I've worked with would probably take weeks to run single-threaded against enwiki, but that can be made practical if one is willing to throw enough cores at the problem. well, this probably isn't something we could afford ourselves, but if there's enough interest in a batch computing infrastructure, it's probably worth talking to external organisations about this. From an exterior point of view it often seems like toolserver is significantly lagged or tools are going down, and from that I have generally assumed that it operates relatively close to capacity a lot of the time. that is correct. the way it works is we run at or over capacity for a while, until we can afford new hardware, then things are fast for a while, until we reach capacity again. this repeats every year or so. (interestingly, this is exactly how Wikipedia worked in the first few years.) - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm3iigACgkQIXd7fCuc5vKo+ACfS62b7U0dF+EtTcLcrEBHE22I h1QAoItjhW1XYmzRl3KyJDFmxQ4nMvye =jvq3 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Wed, Mar 11, 2009 at 11:20 AM, Brion Vibber br...@wikimedia.org wrote: Quite so. :) Replication is fantastic against outright failure, but by itself doesnt help agaibst daya loss within the system which gets replicated right alobg with it we're working on ensuring we've got regular snapshots as well, though this isn't up yet. Regular snapshots plus the replication binlogs provide for point-in-time restoration. Maybe you need to move the DB servers to ZFS on Solaris too. ;) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On 3/11/09 9:43 AM, Aryeh Gregor wrote: On Wed, Mar 11, 2009 at 12:35 PM, Platonidesplatoni...@gmail.com wrote: I know. That's precisely what i'm addressing. From your email, WMF is reorganising their databases so the toolserver can get more admins (less private data is replicated/stored at ts). Any such schema change to the schema would be pretty big, IMHO (and yet incomplete). If I understand correctly, the only change being contemplated here is not replicating the databases that are entirely secret (databases of private wikis). Toolserver roots would still have access to things like the recentchanges table and hidden revisions on public wikis, and would presumably still have to sign NDAs or act as Foundation agents or whatever to access those. I might be misunderstanding, though. If only entire databases need to be hidden, why can't the toolserver just be set up not to replicate those, given that MySQL supports that? Could be done. We're also fine with new toolserver roots as long as we approve em too for now. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
River Tarnell wrote: it seems more useful to provide the text in uncompressed form, instead of the MediaWiki internal form that's almost impossible to work with. does that seem reasonable? The tools should get the text in uncompressed form. The interface to do that is not so important. Given the amount of text, I don't think storing text with some kind of compression is something to discard right away. A common data access interface would be interesting. Perhaps as a C library to link, include as php extension... Then implement it for different sources: -Toolserver text replication -WikiProxy -Mysql mediawiki database -Mediawiki API -XML dump Then applications just need to be designed for the text interface, debugged with a local install, tested with a small dump, deployed on toolserver... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Am Tuesday 10 March 2009 01:07:36 schrieb phoebe ayers: their Wikipedia-related research has been put on hold for a few months because of the delay. (It seems like there is a big backlog of account requests right now and only one person working on them?) hello, I'm DaB. and I'm the lazzy guy that approve the accounts for normal. I'm sorry that your request take a lot of time, perhaps I can tell you why it took so long: You requested your account at the end of last year. At this time our servers was quite loaded and we wait for addition. So I decide to not create new accounts for first. At the beginn of december we planed which new servers we will buy and we hoped to bought them in December. For some reason that not worked and we bought not before January. So I decide to create no new accounts before the delivery. But it took several weeks until the servers were delivered and one week more to set them up and another week to check them. Now we have the ressources to create new accounts, but then I got the flue (and have it still). I hope that I can create new accounts soon. Daniel was so nice and offer himself for help so it should take not so much time. And BTW: I saw all your emails, wiki-emails and wiki-messages you and some others send, you was not ignored. Sincerly, DaB. -- wp-blog.de signature.asc Description: This is a digitally signed message part. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Hello everyone. We started the conversation with Phoebe about the possibility of a research-oriented toolserver that could be used by researchers who wish to explore novel gadgets or other tools for Wikipedia users. The toolserver could provide back-end support for these gadgets. By the phrase research-oriented toolserver we are looking for similar services to what is available in the existing toolserver cluster. From what we've heard of the research infrastructures being developed at Syracuse and Concordia, they will be valuable for researchers who are in need of full text data access on a large scale. The research toolserver, by contrast, would be for tools that need live access to Wikipedia databases, but that would only access the full text on a small scale through the Wikipedia API. The major difference from our perspective is how applications for new accounts would be handled. Our idea is to be able to hand out accounts based around the likelihood of effective research, rather than on visibility within Wikipedia, or on the usefulness of the resulting tool to the larger Wikipedia community. The latter two cases are already handled well by the existing toolserver and its application process. Accounts on the research toolserver would be approved based on the quality of the research ideas, and the ability of the proposing team to carry out the research. The research toolserver would need a more transparent decision-making process for approving accounts. The basis for decisions should be clear to applicants so they're able to write better applications, and denied applications should be returned with feedback about why the decision was made. What do you think? Seem like a useful idea if we can find sufficient resources, and put together a management plan? Morten Warncke-Wang, Research Assistant John Riedl, Professor GroupLens Research www.grouplens.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
The current toolserver user base is always willing to help. I for one am willing to review and run queries on the database if requested. I also am a very active python programmer and can use that to assist. If you have requests let me know. (Unless Im losing it) there have been accounts that are only created to be used for database queries. But until then feel free to email or contact me. I think that improving and expanding the current TS is the best option as further duplications will result in lower preformance. Betacommand On Wed, Mar 11, 2009 at 7:05 PM, Morten Warncke-Wang mor...@cs.umn.eduwrote: Hello everyone. We started the conversation with Phoebe about the possibility of a research-oriented toolserver that could be used by researchers who wish to explore novel gadgets or other tools for Wikipedia users. The toolserver could provide back-end support for these gadgets. By the phrase research-oriented toolserver we are looking for similar services to what is available in the existing toolserver cluster. From what we've heard of the research infrastructures being developed at Syracuse and Concordia, they will be valuable for researchers who are in need of full text data access on a large scale. The research toolserver, by contrast, would be for tools that need live access to Wikipedia databases, but that would only access the full text on a small scale through the Wikipedia API. The major difference from our perspective is how applications for new accounts would be handled. Our idea is to be able to hand out accounts based around the likelihood of effective research, rather than on visibility within Wikipedia, or on the usefulness of the resulting tool to the larger Wikipedia community. The latter two cases are already handled well by the existing toolserver and its application process. Accounts on the research toolserver would be approved based on the quality of the research ideas, and the ability of the proposing team to carry out the research. The research toolserver would need a more transparent decision-making process for approving accounts. The basis for decisions should be clear to applicants so they're able to write better applications, and denied applications should be returned with feedback about why the decision was made. What do you think? Seem like a useful idea if we can find sufficient resources, and put together a management plan? Morten Warncke-Wang, Research Assistant John Riedl, Professor GroupLens Research www.grouplens.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Wed, Mar 11, 2009 at 7:05 PM, Morten Warncke-Wang mor...@cs.umn.edu wrote: The major difference from our perspective is how applications for new accounts would be handled. Our idea is to be able to hand out accounts based around the likelihood of effective research, rather than on visibility within Wikipedia, or on the usefulness of the resulting tool to the larger Wikipedia community. The latter two cases are already handled well by the existing toolserver and its application process. Accounts on the research toolserver would be approved based on the quality of the research ideas, and the ability of the proposing team to carry out the research. As far as I know, the account approval process on the toolserver is fairly lax. As long as you have some credible Wikipedia-related reason to use the toolserver, whether tools or research, you should be able to get an account. Am I wrong? Have any researchers been rejected from the toolserver? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
The major difference from our perspective is how applications for new accounts would be handled. Our idea is to be able to hand out accounts based around the likelihood of effective research, rather than on visibility within Wikipedia, or on the usefulness of the resulting tool to the larger Wikipedia community. The latter two cases are already handled well by the existing toolserver and its application process. Accounts on the research toolserver would be approved based on the quality of the research ideas, and the ability of the proposing team to carry out the research. The research toolserver would need a more transparent decision-making process for approving accounts. The basis for decisions should be clear to applicants so they're able to write better applications, and denied applications should be returned with feedback about why the decision was made. What do you think? Seem like a useful idea if we can find sufficient resources, and put together a management plan? If the only problem solved by setting up a dedicated research cluster is that of the account approval system, then by all means lets fix the system on the toolserver, and keep things together. Apart from the fact that full database replication to a third party system is very unlikely to happen for legal reasons, it would be a waste of hardware and effort. For a system with a very much different focus, such as text crunching, a separate cluster seems worth considering, even though I'd of course prefer to have everything available to our users. But a second system with a spec very similar to ours (live replicated meta data) seems wasteful, even if replication was technically and legally feasible. Let's try to fix the problems of the current toolserver, starting with the application process and continuing with a plan for on how research projects could contribute to the hardware platform and infrastructure software. As to the approval policy: research projects are usually approved, if their resource requirements are not too steep. Utility to the wikimedia user community is only one factor that is considered, it's not required for research projects. Making the process more transparent and giving feedback more swiftly is indeed something we should work on. In fact, I will try to set aside a fixed amount of working time for this. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Robert Rohde schrieb: On Mon, Mar 9, 2009 at 9:29 PM, Andrew Garrett and...@werdn.us wrote: On Tue, Mar 10, 2009 at 3:21 PM, K. Peachey p858sn...@yahoo.com.au wrote: Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. My understanding is that the the toolserver(/s) are owned by the german chapter and not by wikimedia directly so why is private data being replicated onto them? Because it was chosen as the best technical solution. Is there a specific problem with private data being on the toolserver? If so, what? I'd say the added worries about security and access approval are a problem partially bundled up with that, even if they can be worked around. Logistically it would be nice to have a means of providing an exclusively public data replica for purposes such as research, though I can certainly see how that could get technically messy. As far as I know, there is simply no efficient way to do this currently. MySQL's replication can be told to omit entire tables, but not individual columns or even rows. That would be required though. Witrh the new revision-deletion feature, we have even more trouble. So, toolserver roots need to be trusted and approved by the foundation. However, account *approval* doesn't require root access. It doesn't require any access, technically. Accoiunt *creation* of course does, but that's not much of a problem (except currently, because of infrastructure changes due to new serves, but that will be fixed soon). To avoid confusion: *two* Daniels can do approval: DaB and me. We both don't have much time, currently - DaB does it every now and then, and I don't do it at all, admittedly - i'm caught up in organizing the dev meeting and hardware orders besides doing my regular develoment jobs. I suppose we should streamline the process, yes. This would be a good topic for the developer meeting, maybe. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Bilal Abdul Kader schrieb: Greetings, We are setting up a research server at Concordia University (Canada) that is dedicated for Wikipedia. We would love to share the resources with anyone interested. In case anyone needs help setting it up, we would love to help as well. bilal There's a project for a biggish research cluster for wikipedia data awaiting funding at the Syracuse University. I forwarded your mail to one of the people involved. Perhaps you can join forces. On Mon, Mar 9, 2009 at 8:07 PM, phoebe ayers phoebe.w...@gmail.com wrote: Hi all, I'm not sure exactly where to raise this, so am asking here. A researcher I have been in touch with has proposed starting a 2nd, research-oriented Wikimedia toolserver. He thinks his lab can pay for the hardware and would be willing to maintain it, if they could get help setting it up. He got this idea after a member of his research group tried (unsuccessfully so far -- no response) to get an account on the current toolserver; their Wikipedia-related research has been put on hold for a few months because of the delay. (It seems like there is a big backlog of account requests right now and only one person working on them?) This research group has done some interesting Wikipedia research to date and I expect they could do more with access to the right data. I apologize for the delay, perhaps you can send me some detaqils in private, and I'll look at it. DaB doesn't have much time lately, and we had some major changes in infrastructure to take care of, that caused some delays. Personally, I think a dedicated toolserver is a great idea for the research community, but I know very little about the technical issues involved and/or whether this has been proposed before. Please comment, and I can pass on replies and put the researcher in touch with the tech team if it seems like a good idea. If it makes sense to run a separate cluster largely depends on what kind of data you need access too, and in what time frame. If you workj mustly on secondaty data like link tables, and you need the data in near-real time, use toolserver.org. That's what it's there for, and it's unlikely you can set up anything that could get the same data with low latency. However, if you work mostly on full text, toolserver.org is not so useful anyway - there's no direct access to full page text there anyway, not to search indexes. Having a dedicated cluster for research on textual content, perhaps providing content in various pre-processed forms, would be a very good idea. This is what the project I mentioned above aims at, and I'll be happy to support this effort officially, as Wikimedia Germany's tech guy. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 phoebe ayers: Personally, I think a dedicated toolserver is a great idea for the research community, but I know very little about the technical issues involved and/or whether this has been proposed before. Please comment, and I can pass on replies and put the researcher in touch with the tech team if it seems like a good idea. i don't understand what research-oriented toolserver means. what will the research-toolserver provide that the current toolserver doesn't provide? is the only issue the time it takes for accounts to be created? this is a WM-DE issue; the more people who complain to WM-DE about this, the more likely it is to be resolved. (so far, i've had zero communications from WM-DE about how the only people able to approve accounts are so busy with other things nowadays. on the other hand, i didn't ask them about it either; i suppose they don't bother monitoring the toolserver most of the time.) we recently conducted a survey of toolserver users, and account approval (not creation) was generally felt to be quite slow. once i produce a report from the results of that survey, we might be able to get WM-DE to do something about it. most of the issues with the current toolserver come down to money. we don't have enough money to afford redundant databases, so any failure is a major problem and creates inconvenience for users. we don't have enough money for a paid admin, so it often takes a long time for things to get done. we don't have enough money to upgrade hardware when we need it, so things are often slow until the money is available. i think the only non-money issue is that the Wikimedia Foundation won't allow us to add any more admins until they do some internal reorganisation of their databases, which we've been waiting for for several months now. the more separate toolservers we have, the less efficiently the money is spent. sure, every chapter and university could have their own toolserver, but i don't see how that's a better situation than these people contributing to a single toolserver in order to fix the problems that prevent people from using it. i've lost count of how often i've heard the toolserver sucks; let's start our own. what i don't understand is why no one says the toolserver sucks; how can we make it better?. (there _has_ been some interest from other chapters recently about how to improve the toolserver; however, most chapters don't have a lot of money to spend. a single additional database servers for the toolserver would cost at least EUR8'000.) in the past, we had a lot of problems getting WM-DE to do anything for the toolserver (it seemed everyone there was busy with something else), but that's been better recently, so i think we're making some progress. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2dV4ACgkQIXd7fCuc5vLkOwCgv9zShn4f8BVLHe5w8pYJuatU z8gAoLQOtJjveh1pzd1kPDiz7RWTN1zL =9qOq -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Aryeh Gregor: Oh. Why does a single specific person have to handle the approval of all toolserver account requests, then? because accounts have to be approved by WM-DE, and WM-DE has designated this person to approve accounts on their behalf. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2deAACgkQIXd7fCuc5vJBLQCeINPPjEA50FjFlphN70J9gnAx 7dkAoJ1WXk0hWFOLj1ZZNbwNG0fBDVok =+dbS -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Thanks for the responses, all. Daniel and Bilal: the notes about the possible servers at Syracuse and Concordia are very interesting; it sounds like the researchers interested in such things should team up. Daniel: I am not sure what type of data is needed -- this is not my project (I'm only the messenger!) but I'll pass along your message and send you private details (and encourage the researcher to reply himself). River: Well, you say that part of the issue with the toolserver is money and time... and this person that I've been talking to is offering to throw money and time at the problem. So, what can they constructively do? All: Like I said, I am unclear on the technical issues involved, but as for why a separate research toolserver might be useful... : I see a difference in the type of information a researcher might want to pull (public data, large sets of related page information, full-text mining, ??) and the types of tools that the current toolserver mainly supports (editcount tools, catscan, etc). I also see a difference in how the two groups might be authenticated -- there's a difference between being a trusted Wikipedian or trusted Wikimedia developer and being a trusted technically-competent researcher (for instance, I recognized the affiliation of the person who was trying to apply, because I've read their research papers; but if you were going on wikimedia status alone, they don't have any). -- Phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 phoebe ayers: River: Well, you say that part of the issue with the toolserver is money and time... and this person that I've been talking to is offering to throw money and time at the problem. So, what can they constructively do? i think this is being discussed privately now... I see a difference in the type of information a researcher might want to pull (public data, large sets of related page information, full-text mining, ??) and the types of tools that the current toolserver mainly supports (editcount tools, catscan, etc). so, what is missing from the current toolserver that prevents researchers from working with large data sets? I also see a difference in how the two groups might be authenticated -- there's a difference between being a trusted Wikipedian or trusted Wikimedia developer and being a trusted technically-competent researcher i don't see why access to the toolserver would be restricted to Wikipedia editors. in fact, i'd be happier giving access to a recognised academic expert than some random guy on Wikipedia. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2zSQACgkQIXd7fCuc5vKYSACdF2IJwcfhWEarjgDC8FmMSls1 NN0An2jLSu3/mhLCEAsLuoZz0x3DE8mP =ZHMA -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
I've been trying to do some work mining the full en dump with revision history and was involved in getting together the Syracuse grant proposal. To give you an idea, for me personally, the incentive for a new resource is a need for a server (perhaps a cluster) to support full-text queries at a reasonable speed. People at various research institutions duplicate this effort over and over. Andrea On Tue, Mar 10, 2009 at 2:26 PM, phoebe ayers phoebe.w...@gmail.com wrote: Thanks for the responses, all. Daniel and Bilal: the notes about the possible servers at Syracuse and Concordia are very interesting; it sounds like the researchers interested in such things should team up. Daniel: I am not sure what type of data is needed -- this is not my project (I'm only the messenger!) but I'll pass along your message and send you private details (and encourage the researcher to reply himself). River: Well, you say that part of the issue with the toolserver is money and time... and this person that I've been talking to is offering to throw money and time at the problem. So, what can they constructively do? All: Like I said, I am unclear on the technical issues involved, but as for why a separate research toolserver might be useful... : I see a difference in the type of information a researcher might want to pull (public data, large sets of related page information, full-text mining, ??) and the types of tools that the current toolserver mainly supports (editcount tools, catscan, etc). I also see a difference in how the two groups might be authenticated -- there's a difference between being a trusted Wikipedian or trusted Wikimedia developer and being a trusted technically-competent researcher (for instance, I recognized the affiliation of the person who was trying to apply, because I've read their research papers; but if you were going on wikimedia status alone, they don't have any). -- Phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andrea Forte: To give you an idea, for me personally, the incentive for a new resource is a need for a server (perhaps a cluster) to support full-text queries at a reasonable speed. then why not help us do this on the existing toolserver, so everyone can have access to it, instead of duplicating it yet again somewhere else? there are many toolserver users who would like direct access to text, and the ability to search it. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2zgIACgkQIXd7fCuc5vLrvgCgkWY9BizcJCSunzrk+dPdrcJO U4wAn0kIpQd7NYVBHfKNwR+dTM2rTon6 =rSHL -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 1:27 PM, River Tarnell ri...@loreley.flyingparchment.org.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 phoebe ayers: River: Well, you say that part of the issue with the toolserver is money and time... and this person that I've been talking to is offering to throw money and time at the problem. So, what can they constructively do? i think this is being discussed privately now... If other research groups are interested in contributing to this, who should they be talking to? snip i don't see why access to the toolserver would be restricted to Wikipedia editors. in fact, i'd be happier giving access to a recognised academic expert than some random guy on Wikipedia. The converse of this is that some recognized experts would probably prefer to administer their own server/cluster rather than relying on some random guy with Wikimedia DE (or wherever) to get things done. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Let me know if you have a grant proposal you'd like help with! Andrea On Tue, Mar 10, 2009 at 4:30 PM, River Tarnell ri...@loreley.flyingparchment.org.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andrea Forte: To give you an idea, for me personally, the incentive for a new resource is a need for a server (perhaps a cluster) to support full-text queries at a reasonable speed. then why not help us do this on the existing toolserver, so everyone can have access to it, instead of duplicating it yet again somewhere else? there are many toolserver users who would like direct access to text, and the ability to search it. - river. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (HP-UX) iEYEARECAAYFAkm2zgIACgkQIXd7fCuc5vLrvgCgkWY9BizcJCSunzrk+dPdrcJO U4wAn0kIpQd7NYVBHfKNwR+dTM2rTon6 =rSHL -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Robert Rohde schrieb: On Tue, Mar 10, 2009 at 1:27 PM, River Tarnell ri...@loreley.flyingparchment.org.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 phoebe ayers: River: Well, you say that part of the issue with the toolserver is money and time... and this person that I've been talking to is offering to throw money and time at the problem. So, what can they constructively do? i think this is being discussed privately now... If other research groups are interested in contributing to this, who should they be talking to? Wikimedia Germany. That is, I guess, me. Send mail to daniel dot kinzler at wikimedia dot de. I'll forward it as appropriate. i don't see why access to the toolserver would be restricted to Wikipedia editors. in fact, i'd be happier giving access to a recognised academic expert than some random guy on Wikipedia. The converse of this is that some recognized experts would probably prefer to administer their own server/cluster rather than relying on some random guy with Wikimedia DE (or wherever) to get things done. An academic institution may also get a serious research grant for this - that would be more complicated if the money would be handeled via the german chapter. Though it's something we are, of course, also interested in. Basically, if we could all work on making the toolserver THE ONE PLACE for working with wikipedia's data, that would be perfect. If, for some reason, it makes sense to build a separate cluster, I propose to give it a distict purpose and profile: let it provide facilities for fulltext research, with low priority for the update latency, and high priority of having fulltext in various forms, with search indexes, word lists, and all the fun. Regards, Daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On 3/10/09 5:29 PM, Aryeh Gregor wrote: On Tue, Mar 10, 2009 at 7:54 PM, Platonidesplatoni...@gmail.com wrote: Is mediawiki table structure going to change? Yes, it changes on a regular basis. Moreover, any more private method for sharing the tables (eg. a trigger deleting the row when rev_deleted is set) would precisely lose the backup ability the toolserver is performing. I don't think the toolserver is used for backups. At least I hope it's not, given its reliability (which is quite good, but quite good is scary for backups). The existence of the replicas on toolserver is one of our backups. Obviously we want to improve our offsite backups to include complete offline snapshots as well. It's in progress. :) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] research-oriented toolserver?
Hi all, I'm not sure exactly where to raise this, so am asking here. A researcher I have been in touch with has proposed starting a 2nd, research-oriented Wikimedia toolserver. He thinks his lab can pay for the hardware and would be willing to maintain it, if they could get help setting it up. He got this idea after a member of his research group tried (unsuccessfully so far -- no response) to get an account on the current toolserver; their Wikipedia-related research has been put on hold for a few months because of the delay. (It seems like there is a big backlog of account requests right now and only one person working on them?) This research group has done some interesting Wikipedia research to date and I expect they could do more with access to the right data. Personally, I think a dedicated toolserver is a great idea for the research community, but I know very little about the technical issues involved and/or whether this has been proposed before. Please comment, and I can pass on replies and put the researcher in touch with the tech team if it seems like a good idea. -- user: first post on wikitech phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 11:07 AM, phoebe ayers phoebe.w...@gmail.com wrote: Personally, I think a dedicated toolserver is a great idea for the research community, but I know very little about the technical issues involved and/or whether this has been proposed before. Please comment, and I can pass on replies and put the researcher in touch with the tech team if it seems like a good idea. Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. -- Andrew Garrett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Mon, Mar 9, 2009 at 9:33 PM, Aryeh Gregor simetrical+wikil...@gmail.com wrote: . . . and this fact is also apparently a major reason for the slowness of new user review. New roots can't be added to the toolserver until the private data is moved off, so there are too few roots to add new users. Really? We just got a new root (Werdna) and normally regular roots do not handle new accounts anyway -- that job rests with the WMDE contact, currently DaB, doesn't it? -- Casey Brown Cbrown1023 --- Note: This e-mail address is used for mailing lists. Personal emails sent to this address will probably get lost. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 12:33 PM, Aryeh Gregor simetrical+wikil...@gmail.com wrote: . . . and this fact is also apparently a major reason for the slowness of new user review. New roots can't be added to the toolserver until the private data is moved off, so there are too few roots to add new users. The bottleneck is in approval (by Wikimedia DE's representative Daniel), not in creating their accounts. -- Andrew Garrett Sent from: Sydney Nsw Australia. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Mon, Mar 9, 2009 at 9:54 PM, Andrew Garrett and...@werdn.us wrote: The bottleneck is in approval (by Wikimedia DE's representative Daniel), not in creating their accounts. Oh. Why does a single specific person have to handle the approval of all toolserver account requests, then? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. My understanding is that the the toolserver(/s) are owned by the german chapter and not by wikimedia directly so why is private data being replicated onto them? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Tue, Mar 10, 2009 at 3:21 PM, K. Peachey p858sn...@yahoo.com.au wrote: Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. My understanding is that the the toolserver(/s) are owned by the german chapter and not by wikimedia directly so why is private data being replicated onto them? Because it was chosen as the best technical solution. Is there a specific problem with private data being on the toolserver? If so, what? You should be aware that toolserver roots are approved by the foundation before becoming roots. -- Andrew Garrett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] research-oriented toolserver?
On Mon, Mar 9, 2009 at 9:29 PM, Andrew Garrett and...@werdn.us wrote: On Tue, Mar 10, 2009 at 3:21 PM, K. Peachey p858sn...@yahoo.com.au wrote: Currently all data, including private data, is replicated to the toolserver. We could not do this with a third-party server. My understanding is that the the toolserver(/s) are owned by the german chapter and not by wikimedia directly so why is private data being replicated onto them? Because it was chosen as the best technical solution. Is there a specific problem with private data being on the toolserver? If so, what? I'd say the added worries about security and access approval are a problem partially bundled up with that, even if they can be worked around. Logistically it would be nice to have a means of providing an exclusively public data replica for purposes such as research, though I can certainly see how that could get technically messy. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l