Re: [Wikitech-l] Heads-up: WMF engineering process improvement meetings

2011-07-07 Thread Alec Conroy
On Thu, Jul 7, 2011 at 12:40 PM, MZMcBride  wrote:

> I might say that one more point to focus on specifically is to how to
> leverage volunteer development (this is hinted at in some of your five
> points). There are _a lot_ of people who are capable of coding in PHP and
> who are willing to donate their time and talents, but Wikimedia/MediaWiki
> code development has chased them off, generally through neglect (patches
> sitting, review sitting, etc.). If there are ways to specifically look at
> that, it would be an enormous benefit to Wikimedia/MediaWiki, I think.

+1!

There's an enormous pool of volunteer developers out there who would
gladly work for us, non-stop, if we can find a way to let them.  For
many things, our templating language can be lot harder to work with
than PHP-- but despite its difficulty, look at how many useful
advanced templates have been developed without us even having to ask
for them.

Anyone who can make advanced templates can almost certainly handle
PHP.  The reason templates flourish while development flounders is
"Openness"--- templating is essentially an open platform, WMF
development is most certainly not an open platform.

Volunteer developers will do ridiculous amounts of work for us,
innovating in ways we can't even imagine.   Google's most popular
program is it's "20% time" that allows them to spend one day a week
working on whatever they want.

People want to innovate, just like people want to improve our
projects' content.  They will work for free-- but they have to know
they'll  be able to actually use their innovation themselves, and most
have to know they can share it with others if it's popular.  Most
developers won't work for free only to have a third party decide
whether it's sufficiently meritorious for its use to be allowed or
not.

Right now, there's system in place to allow me to initiate, develop,
implement, and share a feature without having to deal with a lot of
read tape and permission-getting.  If I want a Wikipedia that's a
little different in some way, I have to implement on the  client-side
or I literally have to make my own fork of Wikipedia, that involved
buying a domain name, setting up a host, raising money for it / paying
for it, etc etc etc.   A huge nightmare full of work that developers
don't enjoy.

"Be Bold" hasn't been applied to the development or new projects yet.
Right now, "Be Bold" is for an edits, not innovation.
Right now, "Be Bold" is for new articles, not new projects.

We meed to figure out how to allow developer innovations instantly,
automatically,  in real time.  But we also have to make sure those
innovations don't affect the user experience for third-parties.

Once we get such a platform, development can take off.  Until then,
development will mostly be driven by third-party mediawiki project and
paid staff--  both good to have, but orders of magnitude smaller than
the size of the volunteer developer population that is going
un-tapped.

Alec

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?

2011-06-15 Thread Alec Conroy
On Wed, Jun 15, 2011 at 8:08 AM, Niklas Laxström
 wrote:
> On 15 June 2011 17:34, Alec Conroy  wrote:
>> The important point of doing this would be:
>> 1) to identify those users with unique language skills and recruit them
> Recruit them to do what?

Recruit them to help the global community with itself.   There are
currently-unidentified individuals with a special gift that will
enable them to unite the global community in a way beyond that of
monolingual members.Most recently, we needed a translator army to
help us run the elections, but the need for translators isn't going
away.  Everyone language we have needs to have a clear and direct
translation path so it can participate in the movement.

>> 2) to identify projects and languages that are 'most disconnected'
>> from the English hub, so we can make them less disconnected.
> Can we make them less disconnected? How?

First and foremost by pointing out to us that a certain community is
isolated.   This will hopefully  cause members of the global community
to reach out to the isolated community.  At the same time, it will
hopefully inspire members of the isolated community to reach out to
the global community.

In extreme cases, it's not inconceivable that the foundation has a
direct role to play in helping underrepresented projects communicate
with the rest of us.

Alec

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?

2011-06-15 Thread Alec Conroy
On Wed, Jun 15, 2011 at 7:42 AM, Platonides  wrote:
> Alec Conroy wrote:
>  > We could directly ask them to tell us, but upon reflection, the
>  > information is already hidden in our database.  A multilingual user is
>  > one that actively edits two projects of different languages.
>
> Many users already told us, by using babel templates. That also explains
> how much confidence do they have in those languages (native level, basic
> skills...).

Babel templates are great-- if every user had them, we'd be good.
Unfortunately, if you know enough to use a babel template, you
probably are already 'tied in' to the global community and thus not in
need of outreach.   (this assumption may be false).

> There's also the motivation factor.
That's saying a mouthful.  Just knowing people can translate is not at
all the same as being able to expect they'll actually do it.  We just
found that out, and that's why we need to start building a translator
network now, rather than wait till next year.


> First point: define being active. That should be something like 'more
> than X non-minor edits in the last Y weeks.'

I'm flexible.   The point of activity is just to weed the data down to
a manageable size.  If we want to call anyone active at this stage,
that'd work. I suggest lasttouched in 30 days, but that's totally
arbitrary.


> I see a problem in that you are exposing it as a symmetric relationship,
> while I don't think it should be.

Again, another very brilliant caveat.
I should say that my initial attempt at getting these kinds of
estimates was to look at wordwide language-overlap statistics and just
assume that wikimedians are "average humans", which they clearly
aren't.  This would get us a very very rough picture.

Analysis of actual edit patterns will get us a better view, but it'll
still be less precise than babel boxes or actual self-identification
as a translator.   Perhaps at some point we can explicitly ask users
to tell us directly their language skills.

Alecmconroy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?

2011-06-15 Thread Alec Conroy
> I think I can build you something if you give me appropiate values for
> the above definition.
>
> Cheers

Excellent-- so striking while the iron is hot-- I see that
[[Special:Statistics]] defines active as "edited within the last 30
days".I'm open to whoever many users we can realistically get info
on-- the more the merrier, at least until I run out of ram. :)

My initial query my go something like
"Select users where lasttouched was within the last month and total
edit counts are greater than 500".

And then, adding in the requirement of second project will narrow that pool.
And then adding the constraint of a second project with a second
language will narrow the pool even more.

We're looking for the orphan community who have a lot of editors but
little connection to English and Meta.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?

2011-06-15 Thread Alec Conroy
Hi Aryeh, thanks for the fast reply.
Yes, this will definitely underestimate linguistic capabilities of
some users, and overestimate the linguistic capabilities of others---
it's a rough measure at best.

But is there another way to try to get who how "easily" two languages
should be able to communicate with each other?
The best way I can think of is looking for editing patterns that
suggest multilingual skills.Even if this isn't a direct measure of
language, it's at least a measure of "inter-wiki interaction", which
is a good measure to have.

The important point of doing this would be:
1) to identify those users with unique language skills and recruit them
2) to identify projects and languages that are 'most disconnected'
from the English hub, so we can make them less disconnected.

Is there an easy way to run this:

For each of the 86,000 'active users':
Store a list for their edit counts on each project they've edited

That's actually a fairly small dataset, and it would get us all the
data we want.   I've been a developer before, but never here.   Any
idea how I go about getting that info?

(global accounts only is fine, usernames not needed at this point if
we have privacy concerns)

Alec



On Wed, Jun 15, 2011 at 7:24 AM, Aryeh Gregor
 wrote:
> On Wed, Jun 15, 2011 at 8:46 AM, Alec Conroy  wrote:
>> We could directly ask them to tell us, but upon reflection, the
>> information is already hidden in our database.  A multilingual user is
>> one that actively edits two projects of different languages.
>
> That doesn't follow.  Perhaps someone speaks a language, but doesn't
> edit the corresponding wiki.  For instance, I know a decent amount of
> Hebrew, although I wouldn't call myself fluent in Modern Hebrew.  But
> I'm a native English speaker, and English Wikipedia articles are
> almost always better than the corresponding Hebrew ones (often even on
> Judaism-related topics).  So I have no reason to read the Hebrew
> Wikipedia, when it takes more effort for me and the content isn't
> usually as good.  Likewise, some people edit exclusively or almost
> exclusively on multilingual projects like Commons.
>
> On the other hand, people might edit on projects in languages they
> don't understand.  For instance, they might be running scripts that
> automatically fix interwikis or such.  This is less likely, though,
> once you exclude bot accounts.
>
> If you want this info, toolserver queries are the right way to do it.
> It should be pretty easy to pull this kind of info out of the revision
> or recentchanges tables, although it would require reading a lot of
> data.  The simplest way would be to get a list of usernames for each
> wiki that have edited in the last X days, then use a script to reverse
> the lists so that you get a list of languages for each user.  You'd
> probably want to only include unified accounts here.  (How many
> accounts still aren't unified?)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] How can I get data to map our linguistic interconnectedness?

2011-06-15 Thread Alec Conroy
The recent elections showed us that language issues and translation
are something we have to take very seriously from now on.  As a first
step towards improving communication, it seems like we should get an
idea of which users speak which languages?

We could directly ask them to tell us, but upon reflection, the
information is already hidden in our database.  A multilingual user is
one that actively edits two projects of different languages.

In devising a comprehensive translation strategy, we need to know how
interconnected any two given projects are.   We also need to know how
connected any given project is to English, since it's our working
language.

We need to pay special attention to languages that are very 'distant'
from English-- distant in the sense of having few members who fluent
in both English and the language in question.

Could someone aid me in getting this data, or explaining why I don't
need it or why we already have it, etc?

Specifically, I'm looking for:
#   For each non-english-language project, how many of their active
users are ALSO active on an english-language project? (the answer is
should be a single whole number for each project)
#   For any two projects, how many users are there who are active on
both? (answer is a square matrix, roughly 750x750 )
#   For any two languages, how many users appear to speak both
languages? (answer is a square matrix, roughly 750x750)

Does anyone know how to pull this out of the database?It's an
important question for us to recruit translators and really just
assess "where we are" in terms of inter-project language capabilities.

Alec

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l