Re: [Wikitech-l] Heads-up: WMF engineering process improvement meetings
On Thu, Jul 7, 2011 at 12:40 PM, MZMcBride wrote: > I might say that one more point to focus on specifically is to how to > leverage volunteer development (this is hinted at in some of your five > points). There are _a lot_ of people who are capable of coding in PHP and > who are willing to donate their time and talents, but Wikimedia/MediaWiki > code development has chased them off, generally through neglect (patches > sitting, review sitting, etc.). If there are ways to specifically look at > that, it would be an enormous benefit to Wikimedia/MediaWiki, I think. +1! There's an enormous pool of volunteer developers out there who would gladly work for us, non-stop, if we can find a way to let them. For many things, our templating language can be lot harder to work with than PHP-- but despite its difficulty, look at how many useful advanced templates have been developed without us even having to ask for them. Anyone who can make advanced templates can almost certainly handle PHP. The reason templates flourish while development flounders is "Openness"--- templating is essentially an open platform, WMF development is most certainly not an open platform. Volunteer developers will do ridiculous amounts of work for us, innovating in ways we can't even imagine. Google's most popular program is it's "20% time" that allows them to spend one day a week working on whatever they want. People want to innovate, just like people want to improve our projects' content. They will work for free-- but they have to know they'll be able to actually use their innovation themselves, and most have to know they can share it with others if it's popular. Most developers won't work for free only to have a third party decide whether it's sufficiently meritorious for its use to be allowed or not. Right now, there's system in place to allow me to initiate, develop, implement, and share a feature without having to deal with a lot of read tape and permission-getting. If I want a Wikipedia that's a little different in some way, I have to implement on the client-side or I literally have to make my own fork of Wikipedia, that involved buying a domain name, setting up a host, raising money for it / paying for it, etc etc etc. A huge nightmare full of work that developers don't enjoy. "Be Bold" hasn't been applied to the development or new projects yet. Right now, "Be Bold" is for an edits, not innovation. Right now, "Be Bold" is for new articles, not new projects. We meed to figure out how to allow developer innovations instantly, automatically, in real time. But we also have to make sure those innovations don't affect the user experience for third-parties. Once we get such a platform, development can take off. Until then, development will mostly be driven by third-party mediawiki project and paid staff-- both good to have, but orders of magnitude smaller than the size of the volunteer developer population that is going un-tapped. Alec ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?
On Wed, Jun 15, 2011 at 8:08 AM, Niklas Laxström wrote: > On 15 June 2011 17:34, Alec Conroy wrote: >> The important point of doing this would be: >> 1) to identify those users with unique language skills and recruit them > Recruit them to do what? Recruit them to help the global community with itself. There are currently-unidentified individuals with a special gift that will enable them to unite the global community in a way beyond that of monolingual members.Most recently, we needed a translator army to help us run the elections, but the need for translators isn't going away. Everyone language we have needs to have a clear and direct translation path so it can participate in the movement. >> 2) to identify projects and languages that are 'most disconnected' >> from the English hub, so we can make them less disconnected. > Can we make them less disconnected? How? First and foremost by pointing out to us that a certain community is isolated. This will hopefully cause members of the global community to reach out to the isolated community. At the same time, it will hopefully inspire members of the isolated community to reach out to the global community. In extreme cases, it's not inconceivable that the foundation has a direct role to play in helping underrepresented projects communicate with the rest of us. Alec ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?
On Wed, Jun 15, 2011 at 7:42 AM, Platonides wrote: > Alec Conroy wrote: > > We could directly ask them to tell us, but upon reflection, the > > information is already hidden in our database. A multilingual user is > > one that actively edits two projects of different languages. > > Many users already told us, by using babel templates. That also explains > how much confidence do they have in those languages (native level, basic > skills...). Babel templates are great-- if every user had them, we'd be good. Unfortunately, if you know enough to use a babel template, you probably are already 'tied in' to the global community and thus not in need of outreach. (this assumption may be false). > There's also the motivation factor. That's saying a mouthful. Just knowing people can translate is not at all the same as being able to expect they'll actually do it. We just found that out, and that's why we need to start building a translator network now, rather than wait till next year. > First point: define being active. That should be something like 'more > than X non-minor edits in the last Y weeks.' I'm flexible. The point of activity is just to weed the data down to a manageable size. If we want to call anyone active at this stage, that'd work. I suggest lasttouched in 30 days, but that's totally arbitrary. > I see a problem in that you are exposing it as a symmetric relationship, > while I don't think it should be. Again, another very brilliant caveat. I should say that my initial attempt at getting these kinds of estimates was to look at wordwide language-overlap statistics and just assume that wikimedians are "average humans", which they clearly aren't. This would get us a very very rough picture. Analysis of actual edit patterns will get us a better view, but it'll still be less precise than babel boxes or actual self-identification as a translator. Perhaps at some point we can explicitly ask users to tell us directly their language skills. Alecmconroy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?
> I think I can build you something if you give me appropiate values for > the above definition. > > Cheers Excellent-- so striking while the iron is hot-- I see that [[Special:Statistics]] defines active as "edited within the last 30 days".I'm open to whoever many users we can realistically get info on-- the more the merrier, at least until I run out of ram. :) My initial query my go something like "Select users where lasttouched was within the last month and total edit counts are greater than 500". And then, adding in the requirement of second project will narrow that pool. And then adding the constraint of a second project with a second language will narrow the pool even more. We're looking for the orphan community who have a lot of editors but little connection to English and Meta. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?
Hi Aryeh, thanks for the fast reply. Yes, this will definitely underestimate linguistic capabilities of some users, and overestimate the linguistic capabilities of others--- it's a rough measure at best. But is there another way to try to get who how "easily" two languages should be able to communicate with each other? The best way I can think of is looking for editing patterns that suggest multilingual skills.Even if this isn't a direct measure of language, it's at least a measure of "inter-wiki interaction", which is a good measure to have. The important point of doing this would be: 1) to identify those users with unique language skills and recruit them 2) to identify projects and languages that are 'most disconnected' from the English hub, so we can make them less disconnected. Is there an easy way to run this: For each of the 86,000 'active users': Store a list for their edit counts on each project they've edited That's actually a fairly small dataset, and it would get us all the data we want. I've been a developer before, but never here. Any idea how I go about getting that info? (global accounts only is fine, usernames not needed at this point if we have privacy concerns) Alec On Wed, Jun 15, 2011 at 7:24 AM, Aryeh Gregor wrote: > On Wed, Jun 15, 2011 at 8:46 AM, Alec Conroy wrote: >> We could directly ask them to tell us, but upon reflection, the >> information is already hidden in our database. A multilingual user is >> one that actively edits two projects of different languages. > > That doesn't follow. Perhaps someone speaks a language, but doesn't > edit the corresponding wiki. For instance, I know a decent amount of > Hebrew, although I wouldn't call myself fluent in Modern Hebrew. But > I'm a native English speaker, and English Wikipedia articles are > almost always better than the corresponding Hebrew ones (often even on > Judaism-related topics). So I have no reason to read the Hebrew > Wikipedia, when it takes more effort for me and the content isn't > usually as good. Likewise, some people edit exclusively or almost > exclusively on multilingual projects like Commons. > > On the other hand, people might edit on projects in languages they > don't understand. For instance, they might be running scripts that > automatically fix interwikis or such. This is less likely, though, > once you exclude bot accounts. > > If you want this info, toolserver queries are the right way to do it. > It should be pretty easy to pull this kind of info out of the revision > or recentchanges tables, although it would require reading a lot of > data. The simplest way would be to get a list of usernames for each > wiki that have edited in the last X days, then use a script to reverse > the lists so that you get a list of languages for each user. You'd > probably want to only include unified accounts here. (How many > accounts still aren't unified?) > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] How can I get data to map our linguistic interconnectedness?
The recent elections showed us that language issues and translation are something we have to take very seriously from now on. As a first step towards improving communication, it seems like we should get an idea of which users speak which languages? We could directly ask them to tell us, but upon reflection, the information is already hidden in our database. A multilingual user is one that actively edits two projects of different languages. In devising a comprehensive translation strategy, we need to know how interconnected any two given projects are. We also need to know how connected any given project is to English, since it's our working language. We need to pay special attention to languages that are very 'distant' from English-- distant in the sense of having few members who fluent in both English and the language in question. Could someone aid me in getting this data, or explaining why I don't need it or why we already have it, etc? Specifically, I'm looking for: # For each non-english-language project, how many of their active users are ALSO active on an english-language project? (the answer is should be a single whole number for each project) # For any two projects, how many users are there who are active on both? (answer is a square matrix, roughly 750x750 ) # For any two languages, how many users appear to speak both languages? (answer is a square matrix, roughly 750x750) Does anyone know how to pull this out of the database?It's an important question for us to recruit translators and really just assess "where we are" in terms of inter-project language capabilities. Alec ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l