[WikiEN-l] Copyright impact on Wikipedia comprehensiveness

2013-11-29 Thread Gwern Branwen
"Does Copyright Affect Creative Reuse? Evidence from the Digitization
of Baseball Digest", Nagaraj
2013 (draft) http://web.mit.edu/nagaraj/files/copyright_nagaraj.pdf

> While copyright governs the distribution of creative content in industries 
> like publishing and computer software, its impact on creative reuse has 
> largely evaded empirical analysis. I use the digitization of both copyrighted 
> and non-copyrighted issues of one publication, Baseball Digest, to measure 
> the impact of copyright on a prominent venue for reuse: Wikipedia. While the 
> overall impact of digitization on reuse is positive, copyright hurts both the 
> extent of reuse and the level of internet traffic to affected Wikipedia 
> pages. The impact of copyright is more pronounced for images compared to text 
> and becomes economically significant only post-digitization.

http://abhishek.mit.edu/

You may remember discussion of the work way back in July 2012:
http://www.theatlantic.com/technology/archive/2012/07/mit-economist-heres-how-copyright-laws-impoverish-wikipedia/259970/
The actual paper was only posted recently.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Psychological correlates of deletionism/inclusionism?

2013-04-13 Thread Gwern Branwen
On Sat, Apr 13, 2013 at 8:34 PM, David Gerard  wrote:
> You're assuming they could have, and that this would have been worth
> doing. I don't think there's any reasonable basis for such an
> assumption, as it carries the implicit assumption that we understood
> Wikipedia well enough to make that sort of intervention, and that's
> definitely false.

Of course they *could* have tried. What we'll never know is if they
would have succeeded, because they didn't try. Gardner and the
Foundation seemed to eventually realize the problem, but eh, barn
doors and horses.

On Sat, Apr 13, 2013 at 8:39 PM, Fred Bauder  wrote:
> Once the herd got going, no one had much affect.

Managing the herd is what leaders were for.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Psychological correlates of deletionism/inclusionism?

2013-04-13 Thread Gwern Branwen
On Sat, Apr 13, 2013 at 7:54 PM, Fred Bauder  wrote:
> Jimbo and Angela did not play a significant role in debates over
> inclusion and deletion

Indeed, that was my point. I don't think they did anything, or
intended anything of the kind, but they chose not to intervene back
when the gradual slide could have been stopped and so the ultimate
effect was much the same. (Amusingly eventually leading to a nasty
surprise for Jimbo with Mzoli's.)

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Psychological correlates of deletionism/inclusionism?

2013-04-13 Thread Gwern Branwen
On Sat, Apr 13, 2013 at 6:42 PM, Carcharoth  wrote:
> And why would you think that
> inclusionism/deletionism debates are intractable? I thought the idea
> that such terms should be avoided (as they are divisive) was taking
> hold and gaining ground?

We're getting a bit far afield (I was just hoping for some citations
to academic research I could look up), but since you asked... My own
impression was that the debates were never resolved so much as the
inclusionists driven out. Just look at the editor population numbers
from the last 9 years, since 2006, or look at the article growth
rates. Has the Foundation succeeded in keeping the editor population
from dropping (never mind growing, or growing as fast as the
Internet)? I've tracked some of the public goals and they've failed
entirely.

If you hear silence, it may be the silence of the content, happily
cooperating as they beaver away at their particular articles - or it
may be the silence of the grave.

Why do you never hear complaints from inclusionists about Star Wars
articles being deleted? Because so many were deleted that the involved
editors finally bit the bullet and escaped to Wikia, and the only ones
that are left are either ones onboard with rigid constrictive policies
or have seen their efforts fail and learned to comply with the current
regime. What happened with Star Wars could be said of many of the
Wikias. (One of the more amusing Wikipedia conspiracy theories I've
seen is that Wales & Angela deliberately encouraged or let En slide
towards deletionism because it provided a demand for his Wikia
startup. I doubt they intended any such thing, but the effect was the
same.) And after a while, people have enough run-ins with Wikipedians
or hear about such run-ins that they learn Wikipedia is no longer
friendly to a wide variety of topics and to not even try, so one then
cannot even point to content-generating communities migrating off
Wikipedia because the communities have learned to not use Wikipedia in
the first place but use Wikia or any of the many other options
available. Hence, an 'evaporative cooling' of participants
(http://lesswrong.com/lw/lr/evaporative_cooling_of_group_beliefs/) as
editors leave.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Psychological correlates of deletionism/inclusionism?

2013-04-13 Thread Gwern Branwen
On Sat, Apr 13, 2013 at 4:22 PM, David Carson  wrote:
> Obviously, either some sound peer-reviewed research displaying that
> "deletionists" suffer from deep-seated psychological problems that make
> them clinically unfit to work on a collaborative project; or some sound
> peer-reviewed research displaying that "inclusionists" suffer from some
> other, similarly severe, deep-seated psychological problems.

I'm not 'hoping' to see anything. The absence of any correlations
would be just as interesting because a lot of people seem to think the
opposite.

My basic observation here is that inclusionism/deletionism debates
seem intractable, like religion and politics, which have long been
correlated with a variety of mental and neurological observations and
this deep-seated roots of those beliefs seems to explain why politics
is so wasteful and damaging; hence the obvious question becomes, is
inclusionism/deletionism another such case?

But such findings would not tell us which side (or both) is the
intractable party. Merely from a correlation you can't infer which
side is right, since there's always two sides to a coin and you don't
know whose beliefs are correct. (Suppose a survey found Republicans
are more fearful of foreigners and foreign countries than Democrats;
well, this is interesting but what does it actually show? Where can we
get the ground truth on this question, what fact would we point to to
prove that Republicans are wrong to fear foreigners/foreign-countries
and allow us to draw a conclusion like 'Republican politics are driven
by excessive fear'? If they were actually right to fear foreigners,
then this finding would be better interpreted as 'Democrats
pathologically optimistic / naive', and of course, both sides could be
wrong on how dangerous foreigners were, in which case we might
conclude both that Republicans are driven by excessive fear while
those suffering from mindless optimism and naivete align with the
Democrats. Just because two groups are arguing doesn't mean either one
is right.)

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Psychological correlates of deletionism/inclusionism?

2013-04-13 Thread Gwern Branwen
On Sat, Apr 13, 2013 at 2:36 AM, Tom Morris  wrote:
> I'm waiting for extreme inclusionists or deletionists to produce some 
> high-quality, not-at-all bullshit research that shows that failure to adhere 
> to their preferred philosophy is something that shows a deep psychological 
> tendency to rape kittens.
>
> That'll elevate the debate, I'm sure.

On Sat, Apr 13, 2013 at 8:06 AM, Fred Bauder  wrote:
> Obviously toilet training is involved. That is the source of the anal
> personality. Need a study of toilet training of future editors...

Thanks for your contributions, guys, they were really helpful and not
at all completely useless and off-topic and exactly what I was hoping
not to see.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Psychological correlates of deletionism/inclusionism?

2013-04-12 Thread Gwern Branwen
Some recent musings reminded me that I never did find a good answer
for an old question of mine: does anything predict whether an editor
will lean towards deletionism?

More specifically, it seems to me that attitudes towards articles take
on almost emotional or moral dimensions, perhaps related to various
psychological factors. Does anyone remember ever seeing any research
touching on this? For example, perhaps someone surveyed editors,
asking for self-identified preference and doing an inventory measuring
personality factors like the OCEAN/Big Five? Of course I checked
https://en.wikipedia.org/wiki/Deletionism_and_inclusionism_in_Wikipedia
and Google but nothing particularly germane appears to have popped up
besides random speculation and analogies to Adorno's famous
http://en.wikipedia.org/wiki/The_Authoritarian_Personality

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] BBC article on Roth novel and Wikipedia article

2012-09-08 Thread Gwern Branwen
I liked the promoted comment in the Ars Technica article:
http://arstechnica.com/business/2012/09/wikipedia-told-philip-roth-hes-not-credible-source-on-book-he-wrote/

(Found via the Reddit comments in
http://www.reddit.com/r/wikipedia/comments/zim4r/philip_roth_an_open_letter_to_wikipedia_about/
& 
http://www.reddit.com/r/TrueReddit/comments/zirub/philip_roth_author_of_the_human_stain_writes_an/
)

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Editor retention

2012-09-04 Thread Gwern Branwen
On Tue, Sep 4, 2012 at 6:22 PM, Andrew Gray  wrote:
> I don't disagree with the overall results - editor numbers are still
> in decline - but I think it's worth including the caveat that the
> numbers reported on the wikistats site have recently been adjusted
> downwards by around 5% -
> http://blog.wikimedia.org/2012/08/31/improving-the-accuracy-of-the-active-editors-metric/
>
> The result is that Howie's quote above is doubly unlikely - it's based
> on an inflated estimate of how many editors we had then. Our figures
> for Aug 2011 are now 76,126 rather than the 81,450 quoted; adjusting
> his target accordingly, this would make it around 89,500. Still a long
> way to go, though, whichever you use!

Whups.

>  One last interesting point: the 2010 drop was mostly a non-en.wp event; the 
> drop on en.wp was proportionally much less. I have no idea as to the likely 
> cause of this.

Perhaps the damage has already been done on En? I would've suggested
that maybe the WMF retention initiatives might have not failed
entirely, except I don't remember any of them being finished in 2010,
much less being able to affect the overall wiki so much.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Editor retention

2012-09-04 Thread Gwern Branwen
http://online.wsj.com/article/SB10001424053111904875404576532431335938862.html
September 2011

> Adding more editors “is one of our top priorities for the year,” says Howie 
> Fung…aims to increase the number of editors across all languages of Wikipedia 
> to 95,000 from 81,450 by June of next year.

From http://stats.wikimedia.org/EN/TablesWikipediaZZ.htm using the >5
edits a month metric used in WMF docs:

1. July 2012: 76,400
2. June 2012: 74,402
3. May 2012: 76,956
4. April 2012: 75,141
5. March 2012; 76,274

The high water mark, incidentally, seems to have been March 2007 with
90,618 editors >5 edits that month. So we have been shrinking ~2.8k
editors a year  ((91 - 77) / (2012 - 2007)). In retrospect, my 75%
prediction that this priority would not be achieved
(http://predictionbook.com/predictions/3241) was ludicrously
optimistic, given that the 95k editor mark has *never* been reached.

-- 
gwern
http://www.gwern.net/In%20Defense%20Of%20Inclusionism

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-06-29 Thread Gwern Branwen
On Wed, May 30, 2012 at 2:33 PM, Gwern Branwen  wrote:
> My experiment has concluded and all the link removals reverted*. The
> full writeup is at
> http://www.gwern.net/In%20Defense%20Of%20Inclusionism#sins-of-omission-experiment-2
>
> Result: Of the 100 removals, just 3 were reverted.
>
> 3% is even lower than I expected, and very different from Horologium's
> estimate, incidentally.

Today I did a followup at the 1 month point, hand-checking the 100
links I restored to articles while cleaning up the experiment. Of the
100, 4 do not appear in the current version of the article.

(2 of the removals were in direct response to the restoration, while
the other 2 are either unexplained and part of a large edit with many
changes or got removed in a wholesale culling of the External links
section.)

Those who think that 3% was the correct reversion rate for the
removals are invited to explain how 4% could be the correct reversion
rate for the re-adding of the same links - if it was acceptable for
97% to be removed in the first place, how could it also be acceptable
for 94% to then be restored?

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-06-01 Thread Gwern Branwen
On Fri, Jun 1, 2012 at 6:19 AM, Carcharoth  wrote:
> This assumes that page views correspond to people reading the pages. I
> suspect that a lot of people viewing a page just scan briefly for what
> they are looking for (I typically use Ctl+F to find something if I am
> in a hurry), or realise they are in the wrong place and click away or
> click onwards through another link. There is no way of measuring the
> number of people that stop and carefully read a page as if they were
> sitting down to do some bedtime or leisure reading, as opposed to just
> looking up some factoid.

I'm sure the numbers are false, but numbers are always false. You make
points which are equally true of any article's statistics on
stats.grok.se (including the most popular ones), and this
overestimation is counterbalanced by the many forms of
*under*estimation going into the stats.grok.se numbers, like not
counting page views on any mirrors at all. Unless you have a reason to
think that the net error, inclusive of all these sources, leads to
overestimation, pointing out the possible error is a bit sophomoric.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-31 Thread Gwern Branwen
On Thu, May 31, 2012 at 11:08 AM, Carl (CBM)  wrote:
> There is a redacted (no user info) table in the toolserver database
> that can be used to count the number of editors who watchlist a page.
> I fetched the counts for the 100 articles and found the median.

Ah. That's interesting to know and useful for context, thank you.

On Thu, May 31, 2012 at 11:59 AM, WereSpielChequers
 wrote:
> Firstly rather than measure vandalism it created vandalism, and vandalism
> that didn't look like typical vandalism. Aside from the ethical issue
> involved, this will have skewed the result.

As I've said multiple times, this was a designed feature, and not a
bug. The goal was not to measure the broadest possible kind of
vandalism's reversion rate, as that has been amply studied*, but a
specific kind. Complaining that this specific kind is 'skewed'
compared to 'all possible vandalism related to external links' is to
miss the point.

* Wikipedia generally does very well on *obvious* vandalism, and
especially since the introduction of anti-vandalism bots with machine
learning techniques. There's no need for anyone to spend time
measuring it except perhaps bot-writers to finetune their statistics.

> In particular the edit
> summaries were very atypical for vandalism, if I'd seen that edit summary
> on my watchlist I would probably have just sighed  and taken it as another
> example of deletionism in action.

I propose a version of http://en.wikipedia.org/wiki/Poe%27s_law - it
is impossible to create an example of deletionism mindless enough to
be detectable as such if it comes with jargon attached.

> Of the more than 13,000 pages on my
> watchlist I doubt there are 13 where I would look at such an edit, and
> that's if it was one of the changes on my watchlist that I was even aware
> of - it is far too big to fully check every day. Most IP vandals don't use
> jargon in edit summaries, and I know I'm not the only editor who is more
> suspicious of IP edits with blank edit summaries.
>
> You only ran the experiment for one month. I often revert older vandalism
> than that, I may be unusual there in that I've got some tools for finding
> vandalism that has got past the hugglers, but I'm not unusual in sometimes
> taking articles back to the "last clean version".

You are unusual. When I was spending time reading academic
publications on Wikipedia a few years ago, a number of them dealt with
quantifying vandalism and reversions; almost all vandalism was
reverted within days, and reversions which took longer than a month
were very rare (0-10%, IIRC, to be very generous). This was why I
chose to wait a month, because waiting longer added nothing. A week
would have been adequate.

There are a number of related papers, but for brevity's sake take
ftp://193.206.140.34/mirrors/epics-at-lnl/WikiDumps/localhost/group282-priedhorsky.pdf
which found a exponential distribution for ordinary vandalism:

> 42% of damage incidents are repaired essentially immediately (i.e., within 
> one estimated view). This result is roughly consistent with the work of Vi 
> ́gas et al. [20], which showed that the median persistence of certain types 
> of damage was 2.8 minutes. However, 11% of incidents persist beyond 100 
> views, 0.75% – 15,756 incidents – beyond 1000 views, and 0.06% – 1,260 
> incidents – beyond 10,000 views.

On average, the articles concerned had less than 100 page views a day
going off stats.grok.se, so by just a few days, most of the edits
should have been reverted - if they were going to be, of course. This
sort of behavior is why you see such different averages and medians
when you go looking in papers; eg

- ["Measuring
Wikipedia"](http://eprints.rclis.org/bitstream/10760/6207/1/MeasuringWikipedia2005.pdf),
Voss 2005
- ["Studying Cooperation and Conflict between Authors with history
flow 
Visualizations"](http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf),
Viégas et al 2003
- ["Detecting Wikipedia vandalism via spatio-temporal analysis of
revision 
metadata?"](http://repository.upenn.edu/cgi/viewcontent.cgi?article=1963&context=cis_reports),
West 2010
- ["User Contribution and Trust in
Wikipedia"](http://www.ics.uci.edu/~sjavanma/CollabCom), Javanmardi et
al
- ["He says, she says: conflict and coordination in
Wikipedia"](http://nguyendangbinh.org/Proceedings/CHI/2007/docs/p453.pdf),
Kittur et al 2007

On Thu, May 31, 2012 at 12:03 PM, Thomas Morton
 wrote:
> This, I think, is a major issue which make the results useless
>
> * The edit summary implies policy knowledge, I'd only check an edit like
> that on my watchlist on occasion.

And deletionists have no policy knowledge?

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-31 Thread Gwern Branwen
On Thu, May 31, 2012 at 8:31 AM, Carl (CBM)  wrote:
> Of course there are good external links, but they
> are a minority on the articles I follow. Examples include these
> removals:
>
> http://en.wikipedia.org/w/index.php?title=Scala_%28programming_language%29&diff=prev&oldid=489800521
> http://en.wikipedia.org/w/index.php?title=HUD_%28video_gaming%29&diff=prev&oldid=487559372

I actually find your examples amusing. The HUD link was one of the
ones I was seriously considering not restoring because it was a junk
link; while I was especially disappointed to see that the Scala
editors did not restore the link for what is not just their standard
IDE, but a major reason for use of their language, an examplar of
their close alliance/fusion with Java, and a vital resource to link
especially given how impoverished the external links section was. (And
I've never written a line of Scala in my life!)

> Separately, the median number of watchlisters for the 100 pages you
> edited is 5.

Where is this figure coming from?

> And we have no way to get the names of the watchlisters
> to see whether they are active. So for many of the pages, it seems
> plausible nobody even noticed that the link was removed. That is a
> separate issue unrelated to links.

If the community "exists" but is inactive, that's as bad as it not
existing. Wikipedia is as Wikipedia does. Either way, the test is
revealing.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-30 Thread Gwern Branwen
On Wed, May 30, 2012 at 2:39 PM, Carcharoth  wrote:
> You can out a date limiter on that URL so it won't become outdated. This one 
> should work indefinitely (unless some of the edits get deleted):
>
> http://en.wikipedia.org/w/index.php?title=Special:Contributions/Gwern&offset=201205301826&limit=100&target=Gwern

Neat. I didn't know we could do that.

On Wed, May 30, 2012 at 2:43 PM, Carcharoth  wrote:
> PS. You didn't have to spam links to your 'experiment' in the revert
> edit summaries, you know. Some good-faith editors may get upset by
> that.

I disagree. The edit summary box is far too short to include any real
explanation, so a link to the full explanation is best. The other
alternative is to include no explanation in any form, and I regard
that as unacceptable - people should know why some an apparently
useless edit and revert were done.

> The edit summary was:
>
> "rv test of editors for this page; you failed. see
> http://www.gwern.net/In%20Defense%20Of%20Inclusionism#sins-of-omission-experiment-2";
>
> This is something else that could have benefitted from outside input.
> Some of the attitude you have towards all this rolls off the page,
> with phrases such as "perhaps editors collectively know that putting a
> link into a section named ‘External Links’ is painting a cross-hair on
> its forehead".

I should pretend I have no point of view and I am disinterested while
somehow not being uninterested? Academics may have to adopt such an
imposture, but I do not. As long as my 'snark' does not change the
results - as it does not - I do not care.

> My view is that if such experiments are to be carried
> out, it would be better if they were designed and conducted by those
> able to restrain themselves from such snark.

Better how?

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-30 Thread Gwern Branwen
My experiment has concluded and all the link removals reverted*. The
full writeup is at
http://www.gwern.net/In%20Defense%20Of%20Inclusionism#sins-of-omission-experiment-2

Result: Of the 100 removals, just 3 were reverted.

3% is even lower than I expected, and very different from Horologium's
estimate, incidentally.

* for those who wish to check, feel free to cross-reference the list
of diffs http://www.gwern.net/In%20Defense%20Of%20Inclusionism#link-removals
against my recent edits:
http://en.wikipedia.org/w/index.php?title=Special:Contributions/Gwern&offset=&limit=100&target=Gwern

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-22 Thread Gwern Branwen
On Mon, May 21, 2012 at 6:33 PM, Anthony  wrote:
> All of this is fine, by the way, depending on what your intention was
> to show.  If it was to show that a certain type of external link can
> be removed without likely being reverted, then your methodology is
> fine.  But then you shouldn't advertise your experiment as "the
> removal of 100 random external links", because that is not what you
> did.

OK, do you have a better summary in 7 words?

On Mon, May 21, 2012 at 8:02 PM, David Levy  wrote:
> And those mistakes could have been prevented via consultation with the
> Wikipedia editing community.

Anthony's complaint there is more one complaining about what he thinks
is a misleading summary.

I don't regard it as a mistake, and so no consultation would have been
useful: if I were to do it again, I would do it the same way - I don't
care about how well official links are defended, because they tend to
be the most useless external links around and also are the most
permitted by EL. Worrying about them is roughly akin to an
inclusionist worrying that [[George Washington]] or [[Julius Caesar]]
might not be as well-defended as possible. They are the entries that
will be the very last to go under any scenario of decline. The
endangered links are links to news article, reviews, that sort of
thing, and my procedure examines them.

(No matter if those links were reverted at as much as 100%, since
fortunately they still only make up a fraction of external links, they
can under every scenario affect the final result only so much.)

As for the terminological dispute, if you take intent into account,
perhaps they are not vandalism; but the edits themselves in isolation
were designed to look like ordinary deletionist vandalism.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-21 Thread Gwern Branwen
On Mon, May 21, 2012 at 5:32 PM, Anthony  wrote:
> How could we do that?  You could have just cherrypicked the worst
> links that were last links which are not official or
> template-generated in External Link sections.  I'm not saying I think
> you did that.  But you certainly could have.

Cherrypicking even under this strategy would force me to do both >2x
as much work and engage in conscious deception. If I were consciously
trying to deceive, I would have adopted an entirely unverifiable
strategy like 'roll a dice' or 'pick a random integer 0-length of
links' and then would have both cherry-picked without problem and much
less overall effort (as I had to throw out something like a third to
half the pages with external links because they did not meet one of
the criteria).

> Anyway, the main thing I'd like to say about all of this is simply
> that your selection is not random.  Your sample is biased.  Biased in
> which direction, I don't know.  Biased intentionally, I doubt.  But
> your sample is biased.

Sheesh. Every sample is biased in many ways - but random samples are
biased in unpredictable ways, which is why randomizing was such a big
innovation when Fisher and his contemporaries introduced it. What's
next, PRNGs are unacceptable for any kind of study because you can
predict each output if you know the seed and run the PRNG
appropriately?

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-21 Thread Gwern Branwen
On Mon, May 21, 2012 at 2:57 AM, David Gerard  wrote:
> What I'm feeling about this *feels* just like hindsight bias, but I
> vaguely recall saying something just like that.

It certainly sounds like it too. :) But if you ever refind where you
said that, you get some Gwern points.

On Mon, May 21, 2012 at 8:07 AM, Anthony  wrote:
> You haven't gone over your methodology.  I highly doubt you've
> selected the links randomly.  And you don't seem to have done any
> analysis of whether or not the links should be there or not.

On Mon, May 21, 2012 at 8:15 AM, Anthony  wrote:
> So, you are not removing random links at all.

>.< I should just link XKCD here, but I'll forebear. I am reminded of an 
>anecdote describing a court case involving the draft back in Vietnam, where 
>the plaintiff's lawyer argued that the little cage and balls method was not 
>random and was unfair because the balls on top were much more likely to be 
>selected. The judge asked, "Unfair to *whom*?" Indeed.

And I'd note that my methodology, while being quite as random as most
methods, carries the usual advantages of determinism: anyone will be
able to check whether I did in fact remove only last links which are
not official or template-generated in External Link sections, and that
I did not simply cherrypick the links that I thought were worst and so
least likely to be restored.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-20 Thread Gwern Branwen
On Sun, May 20, 2012 at 7:47 PM, David Levy  wrote:
> There's no harm in discussing the methodology (but not the specific
> targets or IP addresses), thereby confirming its validity and ensuring
> that the effort isn't needlessly duplicated by multiple editors across
> countless articles.

Alright, fine, I will copy in my current writeup minus the list of
targets and the yet to be conducted analysis.

> Again, what if hundreds or thousands of users, whose methodologies are
> undiscussed and potentially flawed, were to take it upon themselves to
> conduct such "experiments" without consultation or approval?  That's
> the hypothetical scenario to which I referred.

It's unfortunate that I am such a prominent figure and powerful
thought-leader that hundreds and thousands of Wikipedians have even a
tiny chance of mimicking my actions; but that's a risk you just have
to take when you are as world-renowned as I am. I'm sure Kant would
understand.

---

...
The procedure: remove random links and record whether they are
restored to obtain a restoration rate.

- Editors might defer to other editors, so I will remove links as a
anonymous user from multiple proxies; the restoration rate will
naturally be an *under*estimate of what a registered editor would be
able to commit, much less a tendentious deletionist.
- To avoid issues with selecting links, I will remove only the final
external link on pages selected by
 which
have at least 2 external links in an 'External links' section, and
where the final external link is neither an 'official' link nor
template-generated. (This avoids issues where pages might have 5 or 10
'official' external links to various versions or localizations, all of
which an editor could confidently and blindly revert the removal of;
template-generated links also carry imprimaturs of authority.)
- The edit summary for each edit will be `remove external link per
[[WP:EL]]` - which has the nice property of being obviously
meaningless to anyone capable of critical thought (by definition a
link removal should be per one of WP:EL's criterions - but *which*
[criterion](!Wikipedia "Wikipedia:External links#Links normally to be
avoided")?) but also official-looking like many deletionist
edit-summaries.
- To avoid flooding issues and be less obvious, no more than 5 or 10
links a day will be removed with at least 1 minute between each edit.
- To avoid building up credibility, I will not make any real edits
with the anonymous IPs

After the last of the 100 links have been removed, I will wait 1 month
(long enough for the edit to drop off all watchlists) and restore all
links. I predict [at least
half](http://predictionbook.com/predictions/6586) will not be restored
and certainly not [more than
90%](http://predictionbook.com/predictions/6585).
...

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-20 Thread Gwern Branwen
On Sun, May 20, 2012 at 6:09 PM, David Levy  wrote:
> Yes, there is.  Your methodology has been challenged

I don't recall any challenges, just people expressing their contempt
for external links, which is not a methodological challenge.

Or did you mean the issue about editing logged in versus logged out as
an anon? Obviously I did all my editing as an anon: if even an
anonymous IP can get away this kind of blatant vandalism just by
invoking the name WP:EL, then that's a lower bound on how much an
editor can get away with.

On Sun, May 20, 2012 at 6:22 PM, Anthony  wrote:
> Removing 100 random external links?  For a few weeks?  Then adding
> back the ones that deserve to be added back?

I think it's less questionable to just re-add all the links, no
questions asked about 'deserving'.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-20 Thread Gwern Branwen
On Sun, May 20, 2012 at 4:37 PM, David Levy  wrote:
> As Gwern (User:Gwern) continues to edit the English Wikipedia (today
> concluding a different "experiment") and appears to have stopped
> participating in this discussion (thereby ignoring questions about the
> acknowledged vandalism), I agree that the account and associated IP
> addresses should be blocked until such time as a promise to cease the
> disruption and evidence that the damage has been repaired are
> forthcoming.

There's nothing to answer; and I've been copying the most informative
or hilarious quotes for posterity, such as an active administrator in
good standing wondering if it might actually increase article quality
and not constitute vandalism at all!

The whole thing was worth it just for that quote; I could not have
made up a better example of the sickness.

As for today's experiment, I'm surprised anyone cares. After all, all
that was involved was one single link to a webpage written by a
non-expert. I should be getting a barnstar for removing it, judging by
everyone's reactions. (The result, incidentally, was that
click-through fell from 9 a day to 1 a day, which was 17% and not the
5% I had predicted.)

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-16 Thread Gwern Branwen
On Wed, May 16, 2012 at 10:49 PM, Anthony  wrote:
> First shouldn't we guess as to what percentage of the links were
> actually good in the first place?

I must say, I didn't expect to see someone rationalizing the results
even *before* they happened.

But no, you don't need to guess: you edit Wikipedia, you already know
what external links usually look like, and how many are bad on
average. (From actually doing the deletions, my own appraisal is that
<10% were at all questionable, and I felt pretty bad deleting most of
them.)

If you don't, you can go click on Special:Random 10 times and ask
yourself, 'would I delete the last link in the External links
section?' If you think 2 links are rotten, then perhaps you should be
predicting that - since everything is well, and any result is
acceptable, and the status quo is perfect - only 80% of the edits will
be reverted.

I look forward to your percentages.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-16 Thread Gwern Branwen
On Wed, May 16, 2012 at 8:47 PM, Ian Woollard  wrote:
> The number of
> editors is fairly static, although there were about 25% more people
> volunteering in 2006 when there were lots of new things to write about.

Staticness is a serious problem: the world is not staying still. We
can't keep up with a growing world with a editor base that is static
in absolute terms. Productivity improvements like
anti-obvious-vandalism bots offer limited gains which can keep our
heads over the rising water, temporarily, but they don't change the
bigger picture.

As I demonstrated earlier with my external link experiment, editors
are not keeping up with even the clearest, best intentioned, highest
quality suggestions. How can you hope that this means that more
sophisticated and difficult tasks like anti-troll, vandalism, hoax,
etc. are still being performed to past standards?

Incidentally, I have been finishing an experiment involving the
removal of 100 random external links by an IP; I haven't analyzed it
yet, so I don't know the outcome, but this gives us an opportunity!

Would anyone in this thread (especially the ones convinced Wikipedia's
editing community is in fine shape) care to predict what percentage or
percentage range they expect will have been reverted?

Or what percentage/percentage range they would regard as an acceptable
failure-to-revert rate?

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-16 Thread Gwern Branwen
On Wed, May 16, 2012 at 2:34 PM, Charles Matthews
 wrote:
> And why haven't they taken those who generalise broadly from a single
> example with them?

Are you denying the general decline in editors, even as Internet usage
continues to increase?

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

2012-05-16 Thread Gwern Branwen
http://www.theatlantic.com/technology/archive/2012/05/how-the-professor-who-fooled-wikipedia-got-caught-by-reddit/257134/
Print: 
http://www.theatlantic.com/technology/print/2012/05/how-the-professor-who-fooled-wikipedia-got-caught-by-reddit/257134/

> A woman opens an old steamer trunk and discovers tantalizing clues that a 
> long-dead relative may actually have been a serial killer, stalking the 
> streets of New York in the closing years of the nineteenth century. A beer 
> enthusiast is presented by his neighbor with the original recipe for Brown's 
> Ale, salvaged decades before from the wreckage of the old brewery--the very 
> building where the Star-Spangled Banner was sewn in 1813. A student buys a 
> sandwich called the Last American Pirate and unearths the long-forgotten tale 
> of Edward Owens, who terrorized the Chesapeake Bay in the 1870s.
>
> These stories have two things in common. They are all tailor-made for viral 
> success on the internet. And they are all lies.
>
> Each tale was carefully fabricated by undergraduates at George Mason 
> University who were enrolled in T. Mills Kelly's course, Lying About the 
> Past. Their escapades not only went unpunished, they were actually encouraged 
> by their professor. Four years ago, students created a Wikipedia page 
> detailing the exploits of Edward Owens, successfully fooling Wikipedia's 
> community of editors. This year, though, one group of students made the 
> mistake of launching their hoax on Reddit. What they learned in the process 
> provides a valuable lesson for anyone who turns to the Internet for 
> information.
>
> The first time Kelly taught the course, in 2008, his students confected the 
> life of Edward Owens, mixing together actual lives and events with brazen 
> fabrications. They created YouTube videos, interviewed experts, scanned and 
> transcribed primary documents, and built a Wikipedia page to honor Owens' 
> memory. The romantic tale of a pirate plying his trade in the Chesapeake 
> struck a chord, and quickly landed on USA Today's pop culture blog. When 
> Kelly announced the hoax at the end of the semester, some were amused, 
> applauding his pedagogical innovations. Many others were livid.
>
> Critics decried the creation of a fake Wikipedia page as digital vandalism. 
> "Things like that really, really, really annoy me," fumed founder Jimmy 
> Wales, comparing it to dumping trash in the streets to test the willingness 
> of a community to keep it clean. But the indignation may, in part, have been 
> compounded by the weaknesses the project exposed. Wikipedia operates on a 
> presumption of good will. Determined contributors, from public relations 
> firms to activists to pranksters, often exploit that, inserting information 
> they would like displayed. The sprawling scale of Wikipedia, with nearly four 
> million English-language entries, ensures that even if overall quality 
> remains high, many such efforts will prove successful.

> One group took its inspiration from the fact that the original Star-Spangled 
> Banner had been sewn on the floor of Brown's Brewery in Baltimore. The group 
> decided that a story that good deserved a beer of its own. They crafted a 
> tale of discovering the old recipe used by Brown's to make its brews, 
> registered BeerOf1812.com, built a Wikipedia page for the brewery, and 
> tweeted out the tale on their Twitter feed. No one suspected a thing. In 
> fact, hardly anyone even noticed. They did manage to fool one well-meaning DJ 
> in Washington, DC, but the hoax was otherwise a dud.  The second group 
> settled on the story of serial killer Joe Scafe. Using newspaper databases, 
> they identified four actual women murdered in New York City from 1895 to 
> 1897, victims of broadly similar crimes. They created Wikipedia articles for 
> the victims, carefully following the rules of the site. They concocted an 
> elaborate story of discovery, and fabricated images of the trunk's contents.
>
> ...it took just twenty-six minutes for a redditor to call foul, noting the 
> Wikipedia entries' recent vintage. Others were quick to pile on, 
> deconstructing the entire tale. The faded newspaper pages looked artificially 
> aged. The Wikipedia articles had been posted and edited by a small group of 
> new users. Finding documents in an old steamer trunk sounded too convenient. 
> And why had Lisa been savvy enough to ask Reddit, but not enough to Google 
> the names and find the Wikipedia entries on her own? The hoax took months to 
> plan but just minutes to fail.
>
> Why...One answer lies in the structure of the Internet's various communities. 
> Wikipedia has a weak community, but centralizes the exchange of information. 
> It has a small number of extremely active editors, but participation is 
> declining, and most users feel little ownership of the content. And although 
> everyone views the same information, edits take place on a separate page, and 
> discussions of reliability on 

Re: [WikiEN-l] How our competitors are doing

2012-04-19 Thread Gwern Branwen
On Thu, Apr 19, 2012 at 6:21 PM, David Gerard  wrote:
> Mr Schlafly approves:
>
> http://www.conservapedia.com/index.php?title=User_talk:CPalmer&curid=72836&diff=976121&oldid=975547

Poe's law lives!

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] English Wikipedia blackout

2012-01-17 Thread Gwern Branwen
On Tue, Jan 17, 2012 at 12:09 PM, David Gerard  wrote:
> Citizendium will *clean up* tomorrow.

I rather doubt they will. Who has heard of them? And they'd better
hope they don't because any tiny fraction of En's traffic will knock
them offline, further poisoning what little reputation it has left.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Talk pages Considered Harmful (for references)

2012-01-13 Thread Gwern Branwen
An update: I managed to fix the double-counting problem I mentioned
was skewing the numbers upwards, and fixed a few other issues. (In
retrospect, the solution was almost trivial: just discard any URL that
appears *twice* in the diff, since none of the edits would repeat an
added link.)

The updated numbers are:

- My anime references: <8%
- My non-anime references: <3%
- Krebmarkt's references: <4%
- Total references used: <4.15% of 1206

As one would expect from fixes removing false positives, all the new
figures are smaller. I invite people to go through and double-check -
everything you need is provided.

On Thu, Dec 22, 2011 at 1:10 PM, Ken Arromdee  wrote:
> The rest of that, about deletionism, may be at least as interesting.

Or it's a rant, depends on your own inclinations, I think. (I do well
on things like belief calibration and avoided political bias on tests,
but who knows whether my beliefs on Wikipedia are correct.) Sue
Gardner liked it, at least.

> I wonder how the ban on canvassing is affecting deletion.  Our system is set
> up so that informing the very people who would be affected most by deleting
> an article is not permitted.  (And of course, we have WP:OWN, which prevents
> even *recognizing* that some people may have a particular interest in an
> article not being deleted.)

It helps deletion, unsurprisingly; see the study quoted & linked in
http://www.gwern.net/In%20Defense%20Of%20Inclusionism#fn22

On Thu, Dec 22, 2011 at 1:48 PM, Rob  wrote:
> This makes a lot of sense.  Many times I've removed these from the
> article for valid reasons - text/link dumps, mal- or unformed
> sections, etc. - and placed them on talk so editors could use them for
> future edits.

They don't use them, as I've shown.

On Thu, Dec 22, 2011 at 9:46 PM,   wrote:
> This rate, without additional context, is meaningless.  As Rob pointed
> out, there are many different reasons for moving
> references/links/citations from an article to a talk page, and unless you
> have more information about why people are moving these to talk pages, the
> rate at which they move back doesn't really mean anything. By labeling
> this rate a 'failure rate' you are strongly implying that success would be
> keeping the link in the article.  I don't believe this is right - I
> believe that 'success' is doing what's best for the article.
>
> Even if 99% of things that were moved to talk pages were not subsequently
> returned, I would not find this at all disturbing without evidence that a
> large portion of the removed things should not have been removed.
> Frankly, I would be surprised if 10% of things that I personally moved to
> talk pages were moved back in to the article space.

You and Rob have apparently completely missed the point of the
exercise, the reason why I invested so much manual effort into this.

I didn't look at a bunch of anonymous edits, precisely because I
*knew* someone would say 'oh they're from dirty anonymouses and so
they are probably crappy links - why be bothered by a 10% or a 1%
rate?' This is wrong, but it has a surface plausibility and there's no
point in compiling data that can be so glibly dismissed.

So I looked *only* at known good links, links I and Krebmarkt had
hand-selected as useful. Again, feel free to go through the links and
look at them! My first 2 anime links were RSs for a director's next
movie, and box office receipts; Krebmarkt's first 2 links were RS
critics' reviews for manga that both have (note the present tense) 0
reviews in their articles. And so on.

There is a known rate at which these links ought to be included. It's
>90%. (I am being charitable in not saying 99% or 100%.) The actual
inclusion rate is <10%. The difference should bother us.

-- 
gwern
http://www.gwern.net/In%20Defense%20Of%20Inclusionism#the-editing-community-is-dead-who-killed-it

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Administrator power

2012-01-11 Thread Gwern Branwen
On Wed, Jan 11, 2012 at 9:13 AM, Nathan  wrote:
> Methodology and analysis leaves a lot to be desired and doesn't really
> support either their conclusion or your bolder restatement of it.

Something which could be said of any research paper ever.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Administrator power

2012-01-10 Thread Gwern Branwen
https://www.technologyreview.com/blog/arxiv/27437/ discussing
http://arxiv.org/abs/1112.3670 "Echoes of power: Language effects and
power differences in social interaction", abstract:

> Understanding social interaction within groups is key to analyzing online 
> communities. Most current work focuses on structural properties: who talks to 
> whom, and how such interactions form larger network structures. The 
> interactions themselves, however, generally take place in the form of natural 
> language --- either spoken or written --- and one could reasonably suppose 
> that signals manifested in language might also provide information about 
> roles, status, and other aspects of the group's dynamics. To date, however, 
> finding such domain-independent language-based signals has been a challenge.
>
>Here, we show that in group discussions power differentials between 
> participants are subtly revealed by how much one individual immediately 
> echoes the linguistic style of the person they are responding to. Starting 
> from this observation, we propose an analysis framework based on linguistic 
> coordination that can be used to shed light on power relationships and that 
> works consistently across multiple types of power --- including a more 
> "static" form of power based on status differences, and a more "situational" 
> form of power in which one individual experiences a type of dependence on 
> another. Using this framework, we study how conversational behavior can 
> reveal power relationships in two very different settings: discussions among 
> Wikipedians and arguments before the U.S. Supreme Court.

From the paper proper:

> Status change. Wikipedians can be promoted to administrator status through a 
> public election, and almost always after extensive prior involvement in the 
> community. Since we track the communications of editors over time, we can 
> examine how linguistic coordination behavior changes when a Wikipedian 
> becomes an “admin”. To our knowledge, our study is the first to analyze the 
> effects of status change on specific forms of language use.

> Users are promoted to admins through a transparent election process known as 
> requests for adminship4 , or RfAs, where the community decides who will 
> become admins. Since RfAs are well documented and timestamped, not only do we 
> have the current status of editors, we can also extract the exact time when 
> editors underwent role changes from non-admins to admins.
> Textual exchanges. Editors on Wikipedia interact on talk pages5 to discuss 
> changes to article or project pages. We gathered 240,436 conversational 
> exchanges carried out on the talk pages, where the participants of these 
> (asynchonous) discussions were associated with rich status and social 
> interaction information: status, timestamp of status change if there is one, 
> as well as activity level on talk pages, which can serve as a proxy of their 
> sociability, or how socially inclined they are. In addition, there is a 
> discussion phase during RfAs, where users “give their opinions, ask 
> questions, and make comments” over an open nomination. Candidates can reply 
> to existing posts during this time. We also extracted conversations that 
> occurred in RfA discussions, and obtained a total of 32,000 conversational 
> exchanges. Most of our experiments were carried out on the larger dataset 
> extracted from talk pages, unless otherwise noted. (The dataset will be 
> distributed publicly.)

> We measure the linguistic style of a person by their usage of function words 
> that have little lexical meaning, thereby marking style rather than content. 
> For consistency with prior work, we employed the nine LIWC-derived categories 
> [36] deemed to be processed by humans in a generally non-conscious fashion 
> [25]. The nine categories are: articles, auxiliary verbs, conjunctions, 
> high-frequency adverbs, impersonal pronouns, negations, personal pronouns, 
> prepositions, and quanti-
fiers (451 lexemes total).

Results, starting page 5:

> ...communication behavior on Wikipedia provides evidence for hypothesis 
> Ptarget : users coordinate more toward the (higher-powered) admins than 
> toward the non-admins (Figure 1(a)12 ).
> In the other direction, however, when comparing admins and non-admins as 
> speakers, the data provides evidence that is initially at odds with Pspeaker 
> : as illustrated in Figure 1(b), admins coordinate to other people more than 
> non-admins do (while the hypothesis predicted that they would coordinate 
> less).13 We now explore some of the subtleties underlying this result, 
> showing how it arises as a superposition of two effects.

> One possible explanations for the inconsistency of our observations with 
> Pspeaker is the effect of personal characteristics suggested in Hypothesis B 
> from Section 2. Specifically, admin status was not conferred arbitrarily on a 
> set of users; rather, admins are those people who sou

[WikiEN-l] Talk pages Considered Harmful (for references)

2011-12-21 Thread Gwern Branwen
I have just completed and written up a little research project of mine:
http://www.gwern.net/In%20Defense%20Of%20Inclusionism#the-editing-community-is-dead-who-killed-it

Summary:

1. Talk pages are where references/links/citations go to die; less
than 10% ever make it back
2. In just the sampled edits, millions of page-views are affected
3. Conclusion: putting references/links/citations in an Article's Talk
page is a bad idea (compared to External Links)

Numbers, source code, and lists of edits are provided in the link.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Knol is dead (2007-2012); Re: 2 years & 9 months later, Re: 6 months later: Knol update

2011-11-22 Thread Gwern Branwen
On Thu, Oct 13, 2011 at 9:56 PM, Gwern Branwen  wrote:
> Yes, it was pretty abrupt. See Jason Scott on this issue and how it
> wasn't even announced but buried in some obscure Yahoo documentation
> entry.

Google to its credit didn't bury the death notice in help, but they
didn't exactly highlight it:
http://googleblog.blogspot.com/2011/11/more-spring-cleaning-out-of-season.html

> Knol—We launched Knol in 2007 to help improve web content by enabling experts 
> to collaborate on in-depth articles. In order to continue this work, we’ve 
> been working with Solvitor and Crowd Favorite to create Annotum, an 
> open-source scholarly authoring and publishing platform based on WordPress. 
> Knol will work as usual until April 30, 2012, and you can download your knols 
> to a file and/or migrate them to WordPress.com. From May 1 through October 1, 
> 2012, knols will no longer be viewable, but can be downloaded and exported. 
> After that time, Knol content will no longer be accessible.

That's surprisingly harsh - when I looked through past shut-downs (
http://www.gwern.net/Wikipedia%20and%20Knol#knol-death-watch ), Google
seemed to usually preserve *public* material as static files. But in
this case, they seem to be saying the Knol content will be completely
purged off their servers.

(Which is a good lesson that Jason Scott would also appreciate,
anyway, about trusting the cloud with your content. Not that trusting
your content to Wikipedia is much better, from the long-term point of
view.)

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] 2 years & 9 months later, Re: 6 months later: Knol update

2011-10-13 Thread Gwern Branwen
On Thu, Oct 13, 2011 at 9:34 PM, Carcharoth  wrote:
> And the mention of data preservation made me wonder as well. I would
> hate to see things like Google Groups eventually vanish or become less
> useful. You mentioned Flickr briefly as well, saying it is (still)
> neglected by Yahoo. And you said the Geocities shutdown was
> "shockingly abrupt".

Yes, it was pretty abrupt. See Jason Scott on this issue and how it
wasn't even announced but buried in some obscure Yahoo documentation
entry.

> One of the most depressing thing about the internet is dead links
> (despite some archival services).
>
> And then I got to thinking how long www.gwern.net will survive for? :-)

Depends; I hope to maintain it indefinitely and it's pretty cheap. I
am interested in archiving and have my own strategies implemented (see
http://www.gwern.net/Archiving%20URLs ), since I gave up trying to
work with WMF & IA on archiving* and looked into what I could do as an
individual.

* I was very pleased to see that one of this year's Summer of Code
projects was an archive plugin, but I will believe it when I see it
running on the English Wikipedia.

They already work to some extent; for example, you can see most of
gwern.net is already in the Internet Archive (
http://wayback.archive.org/web/*/http://www.gwern.net/* ) and many
pages are in WebCite (eg. http://webcitation.org/627QAJ3KL ). I
provide the source repository, so if anyone ever cloned it, it is
available that way too. (HTML dumps of gwern.net periodically go into
my personal backup harddrive.)

> [Moving from the corporate level of the web to the personal level]
>
> Do you think the collections of blogs hosted by the WMF will survive
> longer than various other blogs and blogging sites?
>
> Or even the archives of the WMF mailing lists.

Yes. WMF is corporatizing, and that inherently means stuff will
survive longer than personal fly-by-night blogs. If nothing else,
those mailing list and blogs are relatively small - Google Groups and
Usenet is *huge*. If the Archive Team had to rescue those, I doubt
they would be able to retrieve any but a small fraction even with a
long lead time.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] 2 years & 9 months later, Re: 6 months later: Knol update

2011-10-13 Thread Gwern Branwen
On Wed, Jan 21, 2009 at 10:51 AM, Gwern Branwen  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> On Tue, Jan 20, 2009 at 10:56 AM, Anthony  wrote:
>> On Tue, Jan 20, 2009 at 10:30 AM, Gwern Branwen  wrote:
>>>
>>> As will surprise none of the Knol nay-sayers here (in which number I
>>> believe I can count myself), Knol hasn't done too great.
>>
>> Compared to what?  I can't imagine Knol is much worse than Wikipedia when it
>> was 6 months old.  Knol just published its 100,000th article.  When
>> Wikipedia was 5 months old, it said on the main page "We've got over 6,000
>> pages already. We want to make over 100,000."  The Wayback machine then
>> skips ahead 5 more months, by which point Wikipedia brags "We started in
>> January 2001 and already have over 13,000 articles. We want to make over
>> 100,000, so let's get to work"
>>
>> To be sure, Knol has a lot of very serious problems with it.  But it's only
>> 6 months old.  The concept is far from finalized.  6 months into Jimmy
>> Wales' encyclopedia dream he was still working on Nupedia.

A thread on Haskell-cafe asked whether Knol was a good place for
documentation, which reminded me of Knol's continued existence. Like I
did previously, I went looking and the performance of Knol in the
years since has been quite bad:
http://www.gwern.net/Wikipedia%20and%20Knol#knol-did-fail

At this point, I'm comfortable asserting Knol is a failure, and have
moved on to trying to guess when it will die, exactly:
http://www.gwern.net/Wikipedia%20and%20Knol#knol-death-watch

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Readers clicking through to talk pages

2011-10-11 Thread Gwern Branwen
Pondering the utility of talk page edits recently, I've begun to
wonder: how many of our readers actually look at the talk page as
well? I know some writers writing articles on Wikipedia have mentioned
or rhapsodized at length on the interest of the talk pages for
articles, but they are rare birds and statistically irrelevant.

It might be enough simply to know how much traffic to talk pages there
is period. I doubt editors make up much of Wikipedia's traffic, with
the shriveling of the editing population, which never kept pace with
the growth into a top 10/20 website, so that would give a good upper
bound.

It would seem to be very small; there's not a single Talk page in the
top 1000 on http://stats.grok.se/en/top and comparing a few articles
like Anime, Talk:Anime has 273 hits over an entire month
(http://stats.grok.se/en/201109/Talk%3AAnime) while the article has
128,657 hits (a factor of 471); or Talk:Barack Obama with 1800 over
the month (http://stats.grok.se/en/201109/Talk%3ABarack_Obama)
compared to Barack Obama, 504,827 hits
(http://stats.grok.se/en/201109/Barack%20Obama) for a factor of 280.

The raw stats in http://dammit.lt/wikistats are currently unavailable;
I've bugged domas to get it back up but it's still been down for
hours, so I went to
http://dumps.wikimedia.org/other/pagecounts-raw/2011/2011-09/ instead
- each file seems to be an hour of the day so I downloaded one day's
worth and gunzipped them all which is enough info to get a good idea
of the right ratio.

We do some quick shell scripting:

grep -e '^en Talk:' -e '^en talk:' pagecounts-* | cut -d ' ' -f 3 |
paste -sd +|bc
~>
582771

grep -e '^en ' pagecounts-* | grep -v -e '^en Talk:' -e '^en talk:' |
cut -d ' ' -f 3 | paste -sd + | bc
~>
202680742

Looks somewhat sane - 58,2771 for all talk page hits versus
2,0268,0742 for all non-talk page hits A factor of 347 is pretty much
around where I was expecting based on those 2 pages. And Domas says
the statistics exclude API hits but includes logged-in editor hits, so
we can safely say that anonymous users made far *fewer* than 58k page
views that day and hence the true ratios are worse than 471/280/347.

- If we take the absolutely most favorable ratio, Obama's at 280, and
then further assume it was looked at by 0 logged-in users (yeah
right), then that implies something posted on its talk page will be
seen by <0.35% of interested readers (504827/1800*1.0)*100).
- If we use the aggregate statistic and say, generously, that
registered users make up only 90% of the page views, then something on
the talk page will be seen by <0.028% of interested readers
((202680742/582771*0.1)*100).

I suggest that the common practice of 'moving reference/link to the
Talk page' be named what it really is: a subtle form of deletion.

It would be a service to our readers to end this practice entirely: if
a link is good enough to be hidden on a talk page (supposedly in the
interests of incorporating it in the future*), then it is good enough
to put at the end of External Links or a Further Reading section, and
our countless thousands of readers will not be deprived of the chance
to make use of it.

* one of my little projects is compiling edits where I or another have
added a valuable source to an article Talk page, complete with the
most relevant excerpts from that source, and seeing whether anyone
bothered making any use of that source/link in any fashion. I have not
finished, but to summarize what I have seen so far: that justification
for deletion is a dirty lie. Hardly any sources are ever restored.

-- 
gwern
http://www.gwern.net/In%20Defense%20Of%20Inclusionism

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Fwd: [Wikitech-l] Fwd: Autoconfirmed article creation trial

2011-09-13 Thread Gwern Branwen
On Tue, Sep 13, 2011 at 12:24 PM, David Gerard  wrote:
> It may seem a big goal, but perhaps en:wp can emulate the success of
> en:wn. Will we achieve the best-practice level of seven layers of
> review? We can but hope.

And in turn, I look forward to the study of the effects of this
change, which will never happen despite all promises before.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Spoiler wars revisited

2011-08-12 Thread Gwern Branwen
http://www.wired.com/wiredscience/2011/08/spoilers-dont-spoil-anything/

> "I’ve always assumed that this reading style is a perverse personal habit, a 
> symptom of a flawed literary intelligence. It turns out, though, that I was 
> just ahead of the curve, because spoilers don’t spoil anything. In fact, a 
> new study [upcoming in _Psychological Science_] suggests that spoilers can 
> actually *increase* our enjoyment of literature. Although we’ve long assumed 
> that the suspense makes the story — we keep on reading because we don’t know 
> what happens next — this new research suggests that the tension actually 
> detracts from our enjoyment.
>
> The experiment itself was simple: Nicholas Christenfeld and Jonathan Leavitt 
> of UC San Diego gave several dozen undergraduates 12 different short stories. 
> The stories came in three different flavors: ironic twist stories (such as 
> Chekhov’s “The Bet”), straight up mysteries (“A Chess Problem” by Agatha 
> Christie) and so-called “literary stories” by writers like Updike and Carver. 
> Some subjects read the story as is, without a spoiler. Some read the story 
> with a spoiler carefully embedded in the actual text, as if Chekhov himself 
> had given away the end. And some read the story with a spoiler disclaimer in 
> the preface.
>
> ...The first thing you probably noticed is that people don’t like literary 
> stories. (And that’s a shame, because Updike’s “Plumbing” is a masterpiece of 
> prose: “All around us, we are outlasted….”) But you might also have noticed 
> that *almost every single story*, regardless of genre, was more pleasurable 
> when prefaced with a spoiler. This suggests that I read fiction the right 
> way, beginning with the end and working backwards. I like the story more 
> because the suspense is contained."

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] WP:RSs

2011-08-11 Thread Gwern Branwen
On Thu, Aug 11, 2011 at 6:03 PM, Andreas Kolbe  wrote:
> There was an article in the New York Times a few days ago, on a related theme:
>
> http://www.nytimes.com/2011/08/08/business/media/a-push-to-redefine-knowledge-at-wikipedia.html?_r=2
>
> One of its arguments was that there are whole cultures that lack published 
> "reliable sources".

I found that article very funny, personally. So apparently it's noble
and worthwhile for the Foundation to go out into South Africa or India
and spend the donations listening to people on random things like how
to make a drink (not to produce articles, even, but just a
documentary).

But things the white nerds who wrote Wikipedia care about, like comic
books or MUDs or text games or anime which are underserved by RSs?
Well, if they don't have RSs, they can go screw themselves. (If you
care so much about fancruft, go work on a Wikia! We're busy trying to
figure out how to deal with editor retention.)

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] WP:RSs

2011-08-09 Thread Gwern Branwen
"Brain Diving: The Ghost with the Most" by Brain Ruh, _ANN_
http://www.animenewsnetwork.com/brain-diving/2011-08-09

> "This time, though, instead of a fictional book about the supernatural I'm 
> going to be examining a nonfiction book about Japanese ghosts – Patrick 
> Drazen's  A Gathering of Spirits: Japan's Ghost Story Tradition: From 
> Folklore and Kabuki to Anime and Manga, which was recently self-published 
> through the iUniverse service. This is Drazen's second book; the first one, 
> Anime Explosion! The What? Why? & Wow! of Japanese Animation, came out in 
> 2002 from Stone Bridge Press and was an introduction to many of the genres 
> and themes that can be found in anime.
> I think the switch from a commercial press to self-publication may indicate 
> the direction English-language anime and manga scholarship may be heading in. 
> A few years ago, when Japanese popular culture seemed like the Next Big 
> Thing, there were more publishers that seemed like they were willing to take 
> a chance on books about anime and manga.
>
> Unfortunately, as I know firsthand (and as I've heard from other authors, 
> confirming that it's not just me) these books didn't sell nearly as well as 
> anyone was hoping, which in turn meant that these publishers didn't want to 
> take risks with additional books along these lines. After all, all publishers 
> need to make money in one way or another to stay afloat. In the last few 
> years, the majority of books on anime and manga have been published by 
> university presses, perhaps most notably the University of Minnesota Press.
>
> ...However, this puts books like Drazen's in an odd predicament. It's not 
> really an academic book, since it lacks the references and theories something 
> like that would entail, which means it's not a good candidate for a 
> university press. However, since few popular presses have seen their books on 
> anime and manga reflect positively on their bottom lines, there aren't many 
> other options these days other than self-publishing. Of course, these days 
> publishing a book on your own doesn't have nearly the same connotations it 
> did decades ago, when vanity presses were the domain of those with more money 
> (and ego) than sense. These days you can self-publish a quality product, get 
> it up on Amazon for all to see, and (if you're savvy about these things) 
> perhaps even make a tidy profit."

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Millions for salaries, not one cent for defense

2011-07-21 Thread Gwern Branwen
On Thu, Jul 21, 2011 at 8:41 PM, Mark  wrote:
> But I haven't seen any evidence of their leadership
> having that kind of vision.

Fun facts: according to their 2009 IRS filing*, their income was $53
million. $23 million went to JSTOR employee salaries/compensation. The
president makes >$500,000 and the executive vice president >$320,000;
I'd list the various other managers making >$200k, but there's like 10
of them.

The expenses section is quite fascinating.

- IT in general costs them no more than $4.3m
- They spent $1m on 'travel' and another $312k on 'Conferences,
conventions, and meetings'
- "journal acquisition & scan" costs $4.8m
- "NITLE TRANSFER"** costs $4.4m
- "fees & publisher payments"*** cost $8.3m

* 
http://www.guidestar.org/FinDocuments/2009/133/857/2009-133857105-06a32823-9.pdf
linked from 
http://www2.guidestar.org/organizations/13-3857105/ithaka-harbors.aspx#
** 'title transfer'? Have no idea what this is. Hopefully such a
colossal sum is buying something worth buying, like copyright to
entire journals and it's just a misleading label.
*** Am I reading this Form 990 right? Are they *really* spending 3
times more on their employees than is going to the publishers, or they
spend on *all* their technical initiatives, scanning and servers and
all? I am reminded of the WMF budget.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Millions for salaries, not one cent for defense

2011-07-21 Thread Gwern Branwen
> 'This archive contains 18,592 scientific publications totaling 33GiB, all 
> from Philosophical Transactions of the Royal Society and which should be  
> available to everyone at no cost, but most have previously only been made 
> available at high prices through paywall gatekeepers like JSTOR. Limited 
> access to the  documents here is typically sold for $19 USD per article, 
> though some of the older ones are available as cheaply as $8. Purchasing 
> access to this collection one article at a time would cost hundreds of 
> thousands of dollars.
>
> ...When I received these documents I had grand plans of uploading them to 
> Wikipedia's sister site for reference works, Wikisource - where they could be 
> tightly interlinked with Wikipedia, providing interesting historical context 
> to the encyclopedia articles. For example, Uranus was discovered in 1781 by 
> William Herschel; why not take a look at the paper where he originally 
> disclosed his discovery? (Or one of the several follow on publications about 
> its satellites, or the dozens of other papers he authored?)
>
> But I soon found the reality of the situation to be less than appealing: 
> publishing the documents freely was likely to bring frivolous litigation from 
> the publishers. As in many other cases, I could expect them to claim that 
> their slavish reproduction - scanning the documents - created a new copyright 
> interest. Or that distributing the documents complete with the trivial 
> watermarks they added constituted unlawful copying of that mark. They might 
> even pursue strawman criminal charges claiming that whoever obtained the 
> files must have violated some kind of anti-hacking laws.
>
> In my discreet inquiry, I was unable to find anyone willing to cover the 
> potentially unbounded legal costs I risked, even though the only unlawful 
> action here is the fraudulent misuse of copyright by JSTOR and the Royal 
> Society to withhold access from the public to that which is legally and 
> morally everyone's property.'

--User:Gmaxwell,
http://thepiratebay.org/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro

> 'We're projecting today that 2010-11 revenue will have increased 49% from 
> 2009-10 actuals, to $23.8 million. Spending is projected to have increased 
> 103% from 2009-10 actuals, to $18.5 million. This means we added $5.3 million 
> to the reserve, for a projected end-of-year total of $19.5 million which 
> represents 8.3 months of reserves at the 2011-12 spending level.
>
> ...We started the year with an ambitious plan to grow the Wikimedia 
> Foundation staff 82% from 50 to 91 and a decision to, if necessary, sacrifice 
> speed for quality (“hiring well rather than hiring quickly”). We expect to 
> end the year with staff of 78, representing an increase over 2009-10 of 56%.'

http://upload.wikimedia.org/wikipedia/foundation/3/37/2011-12_Wikimedia_Foundation_Plan_FINAL_FOR_WEBSITE_.pdf

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Article Feedback - Ramp up to 10% of Articles

2011-07-14 Thread Gwern Branwen
On Thu, Jul 14, 2011 at 2:31 PM, David Gerard  wrote:
> Making all the rating data publicly
> available for analysis (with no usernames or IPs attached, of course)
> is a first step. Before proposing solutions to problems in the data,
> look at the data ;-)

A sound recommendation from the psychology literature on problem
solving. To quote Eliezer Yudkowsky (
http://lesswrong.com/lw/ka/hold_off_on_proposing_solutions/ ) quoting
Robyn Dawes (_Rational Choice in an Uncertain World_) expanding Norman
R. F. Maier:

> "...when a group faces a problem, the natural tendency of its members is to 
> propose possible solutions as they begin to discuss the problem.  
> Consequently, the group interaction focuses on the merits and problems of the 
> proposed solutions, people become emotionally attached to the ones they have 
> suggested, and superior solutions are not suggested.  Maier enacted an edict 
> to enhance group problem solving: "Do not propose solutions until the problem 
> has been discussed as thoroughly as possible without suggesting any."  It is 
> easy to show that this edict works in contexts where there are objectively 
> defined good solutions to problems."

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Community network effects

2011-07-06 Thread Gwern Branwen
"Emergence of good conduct, scaling and Zipf laws in human behavioral
sequences in an online world" http://arxiv.org/abs/1107.0392

> "...In their virtual life players use eight basic actions which allow them to 
> interact with each other. These actions are communication, trade, 
> establishing or breaking friendships and enmities, attack, and punishment. We 
> measure the probabilities for these actions conditional on previous taken and 
> received actions and find a dramatic increase of negative behavior 
> immediately after receiving negative actions. Similarly, positive behavior is 
> intensified by receiving positive actions. We observe a tendency towards 
> anti-persistence in communication sequences. Classifying actions as positive 
> (good) and negative (bad) allows us to define binary 'world lines' of lives 
> of individuals. Positive and negative actions are persistent and occur in 
> clusters, indicated by large scaling exponents alpha~0.87 of the mean square 
> displacement of the world lines. For all eight action types we find strong 
> signs for high levels of repetitiveness, *especially for negative 
> actions*..." [emphasis added]

popularization: "Virtual World Study Reveals the Origin of Good and
Bad Behavior Patterns"
http://www.technologyreview.com/blog/arxiv/26967/

> "...Thurner and co found that positive behaviour intensifies after an 
> individual receives a positive action.
>
> However, they also found a far more dramatic increase in negative behaviour 
> immediately after an individual receives a negative action. "The probability 
> of acting out negative actions is about 10 times higher if a person received 
> a negative action at the previous timestep than if she received a positive 
> action," they say.
>
> Negative action is also more likely to be repeated than merely reciprocated, 
> which is why it spreads more effectively.
>
> So negative actions seem to be more infectious than positive ones.
>
> However, players with a high fraction of negative actions tend to have 
> shorter lives. Thurner and co speculate that there may be two reasons for 
> this: "First because they are hunted down by others and give up playing, 
> second because they are unable to maintain a social life and quit the game 
> because of loneliness or frustration."
>
> So the bottom line is that the society tends towards positive behaviour."

Well, maybe in the game they studied, _Pardus_. I couldn't say about
Wikipedia...

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Waiting for CV

2011-06-20 Thread Gwern Branwen
A recent _Wired_ article put me in mind of Wikipedia's copyright
paranoia: http://www.wired.com/threatlevel/2011/06/fair-use-defense/

> The lawsuit decided Monday targeted Wayne Hoehn, a Vietnam veteran who posted 
> all 19 paragraphs of November editorial from the Las Vegas Review-Journal, 
> which is owned by Stephens Media. Hoehn posted the article, and its headline, 
> “Public Employee Pensions: We Can’t Afford Them” on medjacksports.com to 
> prompt discussion about the financial affairs of the nation’s states. Hoehn 
> was a user of the site, not an employee.
>
> Righthaven sought up to $150,000, the maximum in damages allowed under the 
> Copyright Act. Righthaven argued that the November posting reduced the number 
> of eyeballs that would have visited the Review-Journal site to read the 
> editorial.
>
> “Righthaven did not present any evidence that the market for the work was 
> harmed by Hoehn’s noncommercial use for the 40 days it appeared on the 
> website. Accordingly, there is no genuine issue of material fact that Hoehn’s 
> use of the work was fair and summary judgment is appropriate,” Judge Pro 
> ruled.
>
> ...Judge Pro, in his fair-use analysis, also found that the posting was for 
> noncommercial purposes, and was part of an “online discussion.”

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The expert problem, dissolved

2011-06-06 Thread Gwern Branwen
On Fri, Jun 3, 2011 at 8:50 AM, Charles Matthews
 wrote:
> The argument that class
> assignments will prove the soft underbelly of academia depends on some
> things we can know about (assessment methods, for example - pretty much
> ruling it out here in the UK), and some we don't (whether more intimate
> contact with WP mechanisms will enthuse academic experts or put them off).
>
> Obviously catering for evaluation makes sense. But I suspect the key
> issue is going to turn out to be this: do 20 hours working on a WP
> assignment teach a student more than 20 hours working on something more
> conventional? If WP work turns out to be educational, then academics
> ought to support it. I think there are reasons to be positive about this
> point.

And as usual, El Reg's coverage manages to be negative anyway. They
are truly a lesson to all of us. From
http://www.theregister.co.uk/2011/06/01/wikipedia_makes_students_do_better_work/
:

> This is achieved, however, not by the kids finding stuff out on the 
> notoriously unreliable site, but rather by getting them to write material for 
> it. Fear of criticism by the obsessive Wiki-fiddler community apparently 
> motivates youngsters far more than the worry that their academic supervisors 
> might catch them out in an error

> Normally in cases of Wikipedia's effects on academia - or indeed other fields 
> of endeavour such as journalism, politics etc - the story is one of lazy 
> students, hacks, speechwriters etc clipping stuff from the site without 
> checking it or even disguising it before claiming it as their own work. Today 
> we hear of a new way to exploit the unpaid Wikipedian: lazy college 
> professors can use the crowdsourced encyclo-custodians to mark their 
> students' work, again without any guarantee that they will do so properly or 
> accurately.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The general population & AfD

2011-06-01 Thread Gwern Branwen
On Fri, May 27, 2011 at 9:19 PM, Samuel Klein  wrote:
> This is a nicely competent paper.  Thanks for the heads up!  SJ

Re-reading, I'm not sure I understand what the results mean. To
continue the above quote:

> Participants recruited by keep !voters were about four times less likely to 
> support deletion as those recruited by delete !voters. The participants that 
> bots recruited also appear unlikely to support deletion, which reflects the 
> policy bias we observed earlier.
>
> To see what effect participant recruitment has on decision quality, we 
> introduce four binary variables: BotRecruit, NomRecruit, DeleteRecruit, and 
> KeepRecruit. These variables indicate whether a bot, the AfD nominator, a 
> delete !voter, or a keep !voter successfully recruited somebody to the group, 
> respectively.
>
> Looking back to table 1, we find that regardless of the decision, none of the 
> first three variables has a statistically significant effect. On the other 
> hand, when a keep !voter recruited someone to the discussion, we see a 
> significant effect: delete decisions are more likely to be reversed.

So:

1. people recruited by a !keep voter (KeepRecruit) also tend to vote !keep
2. people recruited by a !delete voter (DeleteRecruit) tend neither
way, both !delete and !keep
2.5. likewise for people recruited (NomRecruit) by the nominator
(almost always a !delete voter, obviously)
3. people recruited by a bot (BotRecruit), like 2 & 2.5, have no
'statistically significant effect'

This is a little troubling for anyone who wants to argue that deleting
fewer articles is the will of the people - the BotRecruits should then
have been more likely to be !keepers.

> We offer two possible explanations: the first is that recruitment by keep 
> !voters, biased as it may appear, is a sign of positive community interest, 
> and suggests that the article should be kept. If the community decides 
> otherwise and deletes the article, then decision quality suffers. An 
> alternative explanation is that keep !voter recruitment is a sign of activism 
> among those who prefer to keep the article. These proponents may be 
> especially persistent in maintaining the article’s existence in Wikipedia, 
> even if it requires working to reverse a delete decision."

Obviously I prefer the first interpretation. With that one, the story
becomes an article in an obscure niche is put for deletion by a
boorish deletionist; in come the specialists who are not ignorant of
the topic and literature and save it. If I saw an anime article that
should not be deleted up for deletion, I wouldn't ask random
Wikipedians to help, I'd go to what pass for anime experts on
Wikipedia like Timothy Perper, who can look through the academic
literature and have better access to media both English and Japanese.
Looks like bias, smells like homophily, but really just the system
working.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] The general population & AfD

2011-05-20 Thread Gwern Branwen
http://www-users.cs.umn.edu/~lam/papers/lam_group2010_wikipedia-group-decisions.pdf
:

> "We also found that there have been two bots (computer programs that edit 
> Wikipedia)—BJBot and Jayden54Bot—that automatically automatically notified 
> article editors about AfD discussions and recruited them to participate per 
> the established policy. These bots performed AfD notifications for several 
> months, and offer us an opportunity to study the effect of recruitment that 
> is purely policy driven. We use a process like one described above to detect 
> successful instances of bot-initiated recruitment: if a recruitment bot 
> edited a user’s talk page, and that user !voted in an AfD within two days, 
> then we consider that user to have been recruited by the bot.
> Using the above processes, we identified 8,464 instances of successful 
> recruiting. Table 2 shows a summary of who did the recruiting, and how their 
> recruits !voted. We see large differences in !voting behavior, which suggests 
> that there is bias in who people choose to recruit. (From these data we 
> cannot tell whether the bias is an intentional effort to influence consensus, 
> or the result of social network homophily [14].) Participants recruited by 
> keep !voters were about four times less likely to support deletion as those 
> recruited by delete !voters. The participants that bots recruited also appear 
> unlikely to support deletion, which reflects the policy bias we observed 
> earlier."

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Nationality on the lead of articles

2011-03-31 Thread Gwern Branwen
On Thu, Mar 31, 2011 at 3:44 PM, Fred Bauder  wrote:
> "WikiProject Rational Skepticism High-importance)" Really?

Astrology is one of the oldest and, amazingly enough, still most
popular foes of skepticism. If they don't consider it
'High-importance' then what *is*?

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Rating the English wikipedia

2011-02-14 Thread Gwern Branwen
On Mon, Feb 14, 2011 at 3:17 PM, David Gerard  wrote:
> On 14 February 2011 20:04, Fences&Windows
>  wrote:
>> From: Ian Woollard 
>
>>>2. Coverage and accuracy: criterion not met (currently 3.5 million
>>>of an estimated 4.4 million articles)
>
>> You think there are only 4.4 million possible topics? Based on what criteria?
>
>
> I recall someone (Ray Saintonge?) working out there'd be at least 20
> million, just going on placenames and politicians that are currently
> in all the large WPs. Anyone got a link on hand to that?

Perhaps 
http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Wikipedia and libraries

2011-02-09 Thread Gwern Branwen
On Tue, Feb 8, 2011 at 9:35 AM, Ed Summers  wrote:
> Linkypedia kind of turns Wikipedia inside out, and lets content
> publishers see what articles reference their content. So for example
> the British Museum can see what Wikipedia articles reference their
> site [3]. And folks who are interested in keeping current with how
> Wikipedia uses their content can subscribe to a feed that lists them
> as they are added [4].
>
> I'd like to scale this project significantly by allowing any domain to
> be looked at, and include links from all language wikipedias [5]. But
> this will require a small (but not insignificant) investment in a
> server with a couple gigabytes of RAM. I was thinking of contacting
> the toolserver people to see if I could potentially work in that
> environment.

Perhaps I am missing something, but aren't there existing SEO tools
for seeing 'where are my domains being linked from'?

I occasionally go into Google's Webmaster Tools
(https://www.google.com/webmasters/tools/home?hl=en) and see where
gwern.net pages are being linked from. (I was surprised to learn that
my [[dual n-back]] FAQ (http://www.gwern.net/N-back%20FAQ.html) had
been linked on the German Wikipedia.) And surely Google is not the
only purveyor of such tools.

I also wonder how much such a server would cost, even if you *had* to
roll your own service. It sounds like it'd be trivial to provide a
browsable web front-end, so I assume the gigabytes you speak of are
needed for analyzing the database dump. But dumps occur so rarely you
don't need a 24/7 server crunching the numbers. For a server with 7.5
GB of RAM, Amazon charges only $0.34/hr
(http://aws.amazon.com/ec2/pricing/), so even a very long
number-crunching session would only cost a few dollars.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Wired on the Spanish mutiny

2011-01-21 Thread Gwern Branwen
On Fri, Jan 21, 2011 at 2:21 PM, Nathan  wrote:
> http://www.wired.co.uk/news/archive/2011-01/20/wikipedia-spanish-fork?page=1
>
> Interesting. This article could be titled "Spanish Fork: In which
> Edgar Enyedy made Wikipedia what it is today." Who knew his unilateral
> decision to take things out of context and proportion was the crucial
> determining factor in the future of the English Wikipedia and the
> Wikimedia Foundation?

So I guess climbing the Reichstag *does* work sometimes!

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Drake Bennett: Wikipedia ten years on

2011-01-10 Thread Gwern Branwen
On Mon, Jan 10, 2011 at 9:46 AM, Tony Sidaway  wrote:
> There are some obvious errors which will be obvious to us all and may
> mislead readers less familiar with the inner politics. For instance
> the response to the Siegenthaler affair includes Jimmy Wales' decision
> to stop anonymous creation of new articles (correct but out of date as
> it was a temporary measure) but doesn't seem to cover the much more
> important policy shift.

What was temporary about it? I haven't tried recently to create an
article as an anonymous, but judging from the continued existence of
WP:AFC, Wales's decision still stands (despite broken faith as to a
study of the effects).

And policy? The writer focused on the right thing - code is law.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Wikipedia Comes of Age

2011-01-08 Thread Gwern Branwen
On Sat, Jan 8, 2011 at 3:56 PM, Fred Bauder  wrote:
> http://chronicle.com/article/article-content/125899/
>
> January 7, 2011
> Wikipedia Comes of Age
> By Casper Grathwohl
>
> Casper Grathwohl is vice president and publisher of digital and reference
> content for Oxford University Press.

I was particularly struck by

> How is that happening? Take the case of a project undertaken by the academic 
> music community. In 2006 a large group of musicologists began discussing, on 
> an academic listserv, their students' use of Wikipedia. One scholar issued a 
> challenge: Wikipedia is where students are starting research, whether we like 
> it or not, so we need to improve its music entries. That call to arms 
> resonated, and music scholars worked hard to improve the quality of Wikipedia 
> entries and make sure that bibliographies and citations pointed to the most 
> reliable resources. As a result, Oxford University Press experienced a 
> tenfold increase in Wikipedia-referred traffic on its music-research site 
> Grove Music Online. Research that began on Wikipedia led to (the more 
> advanced and peer-validated) Grove Music, for researchers who were going on 
> to do in-depth scholarly work. The rise in Grove traffic alerted me to the 
> music Wikipedia project, but I assume that other such projects that have 
> passed me by yielded similar positive results.

They are far from the first group to notice such traffic advantages.
It's kind of sad that group after group keeps rediscovering this - you
would think that the SEOs wouldn't be the only ones to appreciate the
value of Wikipedia linking them.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Good Faith Collaboration: The Culture of Wikipedia by Reagle (MIT, 2010)

2010-12-21 Thread Gwern Branwen
On Mon, Dec 20, 2010 at 11:19 PM, Tony Sidaway  wrote:
> Joseph Reagle's book on Wikipedia culture reviewed by Cory Doctorow
>
> http://www.boingboing.net/2010/12/20/good-faith-collabora.html
>
> Could be useful if you still haven't worked out what to get the
> internet nerd in your life for Christmas.

I note with interest this conclusion:

> "Ultimately, Reagle offers a compelling case that Wikipedia's most 
> fascinating and unprecedented aspect isn't the encyclopedia itself -- rather, 
> it's the collaborative culture that underpins it: brawling, self-reflexive, 
> funny, serious, and full-tilt committed to the project, even if it means 
> setting aside personal differences. Reagle's position as a scholar and a 
> member of the community makes him uniquely situated to describe this culture."

The implicit commentary on changes in en over the last few years is
too obvious to spell out.

-- 
gwern
http://www.gwern.net

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Evaporative cooling in online communities

2010-10-11 Thread Gwern Branwen
On Mon, Oct 11, 2010 at 2:27 PM, Ryan Delaney  wrote:
> Now here's the interesting point:
>
> "High value participants are treated as special because they have
> recognition & reputation from the community. But, as the community
> scales, these social mechanisms break down and often, if nothing is
> done to replace them, high value members get especially miffed at the
> loss of special recognition and this accelerates the Evaporative
> Cooling."
>
> We have the reverse problem on Wikipedia, where visibility and
> reputation allows some editors to get away with behavior that we
> otherwise wouldn't tolerate. John Locke called this kind of reputation
> 'prerogative' -- it's now become a technical term in political
> science, but it basically means that when we notice someone making
> decisions that everyone else goes along with, we start to 'go with the
> flow' and accept that person's authority in future cases as well. It's
> a kind of momentum building of social power, and since it's the only
> real power anyone has on Wikipedia, it is very significant - and
> vulnerable to abuse. Where a contributor known to make lots of
> valuable contributions in other areas suddenly demonstrates insanity
> on a specific topic, people will tend to give way where they wouldn't
> if it were coming from someone they didn't know or view as a 'valued
> contributor'. The result is the 'evaporative cooling' of those who
> don't have that social power on Wikipedia, or less of it, but whose
> edits are no less valuable - if only less voluminous.

Arguably we have the reverse of your reverse problem.

What is the ultimate status-lowering action which one can do to an
editor, short of actually banning or blocking them? Deleting their
articles.

In a particular subject area, who is most likely to work on obscurer
articles? The experts and high-value editors - they have the
resources, they have the interest, they have the competency. Anyone
who grew up in America post-1980 can work on [[Darth Vader]]; many
fewer can work on [[Grand Admiral Thrawn]]. Anyone can work on
[[Basho]]; few can work on [[Fujiwara no Teika]].

What has Wikipedia been most likely to delete in its shift deletionist
over the years? Those obscurer articles.

The proof is in the pudding: all the high-value/status Star Wars
editors have decamped for somewhere they are valued; all the
high-value/status Star Trek editors, the Lost editors... the list goes
on. They left for a community that respected them and their work more;
these specific examples are striking because the editors had to *make*
a community, but one should not suppose such departures are limited to
fiction-related articles.

-- 
gwern
http://www.gwern.net/

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Liberating classical music, or the street performer protocol

2010-09-12 Thread Gwern Branwen
http://entertainment.slashdot.org/story/10/09/12/1350202/Orchestra-To-Turn-Copyright-Free-Classical-Scores-Into-Copyright-Free-Music
http://www.kickstarter.com/projects/Musopen/record-and-release-free-music-without-copyrights

http://en.wikipedia.org/wiki/Threshold_pledge_system

Commons would be a natural home for the recordings, as they mention.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] OED goes print-only

2010-08-30 Thread Gwern Branwen
On Mon, Aug 30, 2010 at 3:01 PM, Fred Bauder  wrote:
> The problem remains that and individual subscription of $295 a year
> stinks, to say nothing of $995.00 for a printed copy. Basically, only
> institutions or major publishers would find a subscription worthwhile and
> those are higher yet.

Or $400 for the photoreduced version
(http://www.amazon.com/Dictionary-Complete-Reproduced-Micrographically-slipcase/dp/0198612583/),
although I seem to recall that when I bought mine it was more like
$200.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Webypedia - another doomed alternative to Wikipedia

2010-08-28 Thread Gwern Branwen
On Sat, Aug 28, 2010 at 12:07 PM, Ian Woollard  wrote:
> On 27/08/2010, David Gerard  wrote:
>> Wikipedia needs competitors.
>
> Realistically, the space that Wikipedia occupies seems to be a more or
> less a natural monopoly.
>
> And Wikipedia doesn't even make money per se, so why would anyone even
> want to be a competitor to it? There's no market. A market is where
> people pay for stuff.

Wikipedia doesn't make money by choice. But remember there are many
ways we *could* make money. For example, if we had switched to a
CC-NC, there are all the licensing fees we could have charged. (And
the Foundation makes money even with CC-SA - although I don't remember
how much it charges Ask.com and others for the live feed of
revisions.) And the most obvious way to monetize Wikipedia is
advertising, and that has been estimated at millions a month (a quick
Google turns up http://www.watchmojo.com/web/blog/?p=626 estimating
~50 million USD a month - in 2006).

> It's not like Wikipedia is abusing its monopoly power. Is it?

Depends on how you interpret the existence of Wikias like Memory Alpha
or Wookieepedia. There is a case to be made that they exist only
because we have abused our powers to excise their content from
Wikipedia, forcing them to resort to their own sites (a very
suboptimal situation).

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] 'Wikipedia editing courses launched by Zionist groups'

2010-08-20 Thread Gwern Branwen
http://www.guardian.co.uk/world/2010/aug/18/wikipedia-editing-zionist-groups

> Yesha Council, representing the Jewish settler movement, and the rightwing 
> Israel  Sheli (My I srael) movement, ran their first workshop this week in 
> Jerusalem, teaching participants how to rewrite and revise some of the most 
> hotly disputed pages of the online reference site.
> "We don't want to change Wikipedia or turn it into a propaganda arm," says 
> Naftali Bennett, director of the Yesha Council. "We just want to show the 
> other side. People think that Israelis are mean, evil people who only want to 
> hurt Arabs all day."

> And on Wikipedia, they believe that there is much work to do.
> Take the page on Israel, for a start: "The map of Israel is portrayed without 
> the Golan heights or Judea and Samaria," said Bennett, referring to the 
> annexed Syrian territory and the West Bank area occupied by Israel in 1967.
> Another point of contention is the reference to Jerusalem as the capital of 
> Israel – a status that is constantly altered on Wikipedia.

> In 2008, members of the hawkish pro-Israel watchdog Camera who secretly 
> planned to edit Wikipedia were banned from the site by administrators.
> Meanwhile, Yesha is building an information taskforce to engage with new 
> media, by posting to sites such as Facebook and YouTube, and claims to have 
> 12,000 active members, with up to 100 more signing up each month. "It turns 
> out there is quite a thirst for this activity," says Bennett. "The Israeli 
> public is frustrated with the way it is portrayed abroad."
> The organisiers of the Wikipedia courses, are already planning a competition 
> to find the "Best Zionist editor", with a prize of a hot-air balloon trip 
> over Israel.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Prizes and the British Museum and Wikipedia

2010-07-26 Thread Gwern Branwen
 "Thing is, it's fairly difficult to bring an article to
"featured" status on the English Wikipedia.  It takes a lot of time
and review to push the article candidate through the larger and
frequently melodramatic English review community.  However, on some
other less-traveled Wikipedia language versions, getting content to
the featured level is relatively easy.  So, while the museum probably
expected that its five $140 prizes would be going to articles written
in English, two of the actual winning articles were authored in
Catalan, another in Spanish, another in Latin, and only one in
English.  To give you an idea of comparative traffic statistics, the
English Wikipedia garners over 7 million page views per hour (or,
almost every person over the age of 5 in metropolitan Chicago could
each view one page).  The Latin Wikipedia captures the attention of
fewer than two thousand page views hourly (or, the population of the
town of Helper, Utah).  The winning Latin featured article about the
British Museum's Rosetta Stone artifact received only about 14 page
views a day  over the past ten days.  (Compare the traffic on the
English Wikipedia's article about the Rosetta Stone: 24,300 page views
per day.)  One of the winning articles in the Catalan language gets
only 12 page views daily.

 Imagine paying $140 to a copywriter for content that will get 12
or 14 page views per day.  It may be the British Museum's worst
pay-per-impression deal on the Internet ever."

'British Museum pays for Wikipedia page views'
http://www.examiner.com/x-58002-Wiki-Edits-Examiner~y2010m7d26-British-Museum-pays-for-Wikipedia-page-views

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] "Wikipedia’s Foundation Plans Expa nsion"

2010-07-14 Thread Gwern Branwen
On Wed, Jul 14, 2010 at 4:55 AM, Liam Wyatt  wrote:
> But I also think that we all agree
> that there's definitely a long way to go before en-wp could be considered
> "full". IMO we're only just scratching the surface of what we can eventually
> achieve :-)
>
> -Liam

My usual link on this topic is:
http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test

It's worth noting that the "we've got a long way to go" result still
holds even in mid-2009; given the general impression that WP isn't
growing as fast, this seems like a result that will hold for a very
long time indeed.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] "Wikipedia’s Foundation Plans Expa nsion"

2010-07-12 Thread Gwern Branwen
http://bits.blogs.nytimes.com/2010/07/09/wikipedia-foundation-plans-expansion/
> The foundation that runs the Wikipedia Web site plans to add 44 employees in 
> the next year — roughly doubling the size of its current professional staff — 
> and to raise $20 million to support a much-enhanced vision for the 
> volunteer-created encyclopedia that nearly anyone can edit.

> The announcement of the expansion was made by the Wikimedia Foundation’s 
> executive director, Sue Gardner, at the start of the sixth annual Wikimania 
> conference, held this year in Poland. The conference brings together editors, 
> administrators and the professional staff to discuss trends and ideas for 
> Wikipedia and other collaborative Internet projects.

> By hiring more employees and raising more money, the foundation hopes to 
> nearly double the number of unique visitors to the site by 2015, to 680 
> million a month, Ms. Gardner told an audience of a few hundred who had 
> assembled in the Polish Baltic Philharmonic hall, on an island across from 
> Gdansk’s historic old city. The foundation plans to focus on expanding 
> generally in Africa, Central and Latin America, and Asia, and specifically 
> setting up offices in Brazil and India, she said.

Well, I suppose as long as the technical side doesn't suffer
starvation and they actually can raise that much...

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Parallel Articles on topics

2010-06-27 Thread Gwern Branwen
On Sun, Jun 27, 2010 at 7:59 AM, Fred Bauder  wrote:
> Yes, articles from diverse points of view would be good.
>
> Fred Bauder

An open question, I think; the failure of your own Wikinfo* would seem
to suggest it's not particularly valuable.

* http://en.wikipedia.org/wiki/Wikipedia%3AWikinfo

--
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] declining numbers of active EN wiki admins

2010-05-28 Thread Gwern Branwen
On Fri, May 28, 2010 at 11:48 AM, Alan Liefting  wrote:
> Tightening up on new page creation would free up a lot of time for
> admins as well as other editors.  A lot of rubbish articles get created
> that need to be speedied.
>
>
> Alan Liefting

{{fact}}.

Jimbo himself admits that banning all anons from page creation didn't
do much of anything to help.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] declining numbers of EN wiki admins

2010-05-27 Thread Gwern Branwen
On Wed, May 26, 2010 at 7:34 PM, David Goodman  wrote:
> Are you saying that a _declining_ number of administrators means a
> _growth_ in bureaucracy?  It would normally mean the opposite, either
> a loss of control, or that the ordinary members were taking the
> function upon themselves.  What I see is a greater degree of control
> and uniformity, not driven by those in formal positions of authority.

If you assume that administrators are identical to the bureaucracy or
some non-shrinking proportion thereof, then that does look like a
falsehood.

If you assume that administrators reflect rather the number of
committed long-term contributors, and their numbers wax and wane
pretty independently of the need for administrators, then that makes
sense. Little kills enthusiasm and participation as surely as
bureaucracy. Why are so few even trying for adminship?

(I remember being VP of a taekwondo club in college; I decided to get
us a locker for our gear, which we had club funds for. The paperwork
and circumlocutions nearly destroyed my merely college-student
enthusiasm, and made me seriously consider purchasing the damn locker
myself. This would've been possible because in meatspace, there are no
bots, scripts or policy wonks who would've noticed the sudden
appearance of a locker and objected.)

Indeed, aside from cutting off the branch we're sitting on,
bureaucracy diminishes the need for admins. Admins, at their best,
embody the old benevolent dictator or {{sofixit}} or IAR spirit - not
mechanically applying guidelines and deleting or not deleting, but
judging based on all factors. Bureaucracies on the other hand, seek
ever more automation and de-humanizing of the process. Consider
WP:PROD. I used to clear out PRODs myself, and I know that some admins
who did similar work took the PROD process as a reason not to think -
if the PROD has been unchallenged for several days, then it must be
deleted. There were good reasons to not be mechanical; some articles
were vandalized and then prodded, or deliberately edited down, or were
reasonable articles. But there you have it anyway.

You only need 1 admin to delete a few dozen or hundred PRODs; even
fewer, if the occasional suggestions for admin bots go through. You
need many more admins to read through a few hundred AfDs and ponder
the right decision.

If the increasing bureaucracy idea is right, we should expect our
contributor base to shrink and especially to see fewer edits by new
users survive.

This is the case.

New articles are down significantly:
http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia#Wikipedia_growth

Edits and new users are down, and reverts are up:
http://en.wikipedia.org/wiki/User:Dragons_flight/Log_analysis

Felipe Ortege's thesis mentions:

 “In the first place, we note the remarkable difference between
the English and the German language versions. The first one presents
one of the worst survival curves in this series, along with the
Portuguese Wikipedia, whereas the German version shows the best
results until approximately 800 days. From that point on, the Japanese
language version is the best one. In fact, the German, French,
Japanese and Polish Wikipedias exhibits some of the best survival
curves in the set, and only the English version clearly deviates from
this general trend. The most probable explanation for this difference,
taking into account that we are considering only logged authors in
this analysis, is that the English Wikipedia receives too
contributions from too many casual users, who never come back again
after performing just a few revisions.”

(The last sentence could as well be summarized: people are trying en,
and not coming back.)

And it's not like there isn't a lot to write about. (See eg.
http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test#Updates
)

Some of these statistics are old. But I don't know of any newer more
optimistic data.

-- 
gwern
___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Jimbo on Commons

2010-05-08 Thread Gwern Branwen
Since I haven't seen it mentioned here, thought I would. Jimbo appears
to have started another one of his trademark ill-thought-out
unilateral decisions & purges on Commons, by deleting scads of images
that offend him.

Relevant links:

- http://commons.wikimedia.org/wiki/Commons_talk:Sexual_content
- 
http://commons.wikimedia.org/wiki/Commons_talk:Sexual_content/Village_pump/2010-5-6
- 
http://commons.wikimedia.org/wiki/Commons_talk:Sexual_content/Village_pump/2010-5-7

Over on Foundation-l there is an active discussion:
http://lists.wikimedia.org/pipermail/foundation-l/2010-May/thread.html#57789

Jimbo's explanation? bad press:

- http://lists.wikimedia.org/pipermail/foundation-l/2010-May/057896.html

In related news, there is a proposal to remove the software privileges
that lets Jimbo do that sort of thing:

- http://meta.wikimedia.org/wiki/Requests_for_comment/Remove_Founder_flag

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] A war on external links? Was: Inside Higher Ed: Does Wikipedia Suck?

2010-03-28 Thread Gwern Branwen
On Sun, Mar 28, 2010 at 3:51 PM, Gregory Maxwell  wrote:
> On Sun, Mar 28, 2010 at 3:24 PM, Fred Bauder  wrote:
>>> And
>>> further reading sections can point the way for future expansions of
>>> the article, or for the reader to go and find out more about the
>>> topic.
>>>
>>> Carcharoth
>>
>> That is why I despise the war on external links and further reading some
>> editors seem to think is appropriate.
>
> I don't think I've seen much evidence of a "war on external links"
> ... what there is is, however, is pressure against an unfiltered flood
> of external links.

Some editors, though, do have a thing against external links. An
example from my recent experience: edit-warring with an editor about
linking <5 reviews and official sites on _[[Royal Space Force: The
Wings of Honnêamise]]_. They apparently interpreted WP:EL as meaning
that *if* a link could be used elsewhere in the article (such as a
reception section), it *must* be so used or be removed.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] At least it's not *just* YouTube...

2010-03-15 Thread Gwern Branwen
http://www.nytimes.com/2010/03/15/technology/15fedflix.html
"Duplicating Federal Videos for an Online Archive"

"Ms. Pruszko is a volunteer for the International Amateur Scanning
League, an invention of the longtime public information advocate Carl
Malamud. The league plans to upload the archives’ collection of 3,000
DVDs in what Mr. Malamud calls an “experiment in crowd-sourced
digitization.”
Armed with nothing but a DVD duplicator and a YouTube account, the
volunteers have copied and uploaded, among other video clips, an
address by John F. Kennedy; a silent film about the Communist “red
scare”; a training video on farming; and a Disney film for World War
II soldiers about how to avoid malaria, in Spanish. So far, nothing
elusive has emerged — but the project is in its infancy.
...
In red envelopes labeled “FedFlix,” his DVD-by-mail variation on
Netflix, the volunteers mail the DVD copies to Mr. Malamud’s home in
Northern California, where he uploads them to YouTube, the Internet
Archive Web site and an independent server. Mr. Malamud said that the
volunteer work hardly reduces the need for the government to increase
its own digitization efforts."

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Another notability casualty

2010-03-06 Thread Gwern Branwen
On Sat, Mar 6, 2010 at 7:31 PM, David Gerard  wrote:
> On 7 March 2010 00:00, Abd ul-Rahman Lomax  wrote:
>> Onus? No, I'm seeing masses of highly experienced editors leaving the
>> project, with those replacing them being relatively clueless, as to
>> the original vision, which was itself brilliant but incomplete.
>
> You aren't allowing for the typical length of intense participation in
> *any* online environment typically being 18-24 months (MMORPGs, etc),
> and that the stated reason may not be the reason.

This, incidentally, allows for a third option to Abd's dilemma: an
editor can just be patient.

Here's a personal example, lightly fictionalized (because I know that
if I specify the page and edits, *someone* will take it upon
themselves to undo them just to make a point).

3 or 4 years ago, there was a certain controversy, which got written
up into an article. The article included a chronology with
referencing/links to the site which first noticed the discrepancy
which started the whole shebang.

This site was considerably disapproved of, and the Powers That Be
decreed that links to it were banned, and of course, without the
referencing links, the initial entries in the chronology were now
unreferenced & OR* & to be removed. Bans were spoken of.

I gave up on the subject, and instead added a reminder to revisit it
in a few years' time - roughly 2 of the canonical cycles.

That timer fired a few months ago. I took the material as it was in
the last removal, and added it back in.

Not one person has commented about or opposed the addition.

* I'll note in passing that OR has policy-creeped considerably since
the early days; certainly the people who originally were invoking OR
against Time Cube or Archimedes Plutonium or the electric universe
would be a little surprised at current usage.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Chile earthquake article vs Haiti earthquake article

2010-03-05 Thread Gwern Branwen
On Fri, Mar 5, 2010 at 8:42 AM, Carcharoth  wrote:
> Has anyone been following the way editing has developed on the
> en-Wikipedia articles on the Haiti and Chile earthquakes? It looks
> quite different to me. For some reason, the editing has tailed off a
> lot on the Chile earthquake article (could the fact that the article
> was semi-protected for the past 5 days have anything to do with
> that?), but the editing on the Haiti earthquake article kept on going.
> Of course, the Haiti earthquake (rightly) got more press coverage, but
> our article on the Chile earthquake is not in a good state.
>
> Compare the en-wiki article with the (es) Spanish Wikipedia one:
>
> http://en.wikipedia.org/wiki/2010_Chile_earthquake
> http://es.wikipedia.org/wiki/Terremoto_de_Chile_de_2010
>
> The Spanish Wikipedia one is a lot better organised and better
> focused. The en-Wikipedia one is more rambling and fails to focus on
> Chile and says a lot more about the tsunami warnings around the
> Pacific (which is old news now).
>
> There are suggestions on the talk page to try and fix this:
>
> http://en.wikipedia.org/w/index.php?title=Talk:2010_Chile_earthquake&diff=347843229&oldid=347824475
>
> But I still find it surprising. It is not as if there is a lack of
> sources in English (though there are more in Spanish), some of which I
> put on the talk page which got zero response.
>
> Compare with the Haiti earthquake article (which also had large blocks
> of semi-protection, so that can't explain it):
>
> http://en.wikipedia.org/wiki/2010_Haiti_earthquake
>
> Anyone have any idea why the two articles developed (and stalled) in
> such different ways, and had a very different pattern of editing
> volume and frequency? Is it purely down to the Chile earthquake
> getting less news coverage?
>
> Carcharoth

Perhaps it is a combination of coverage (I see a lot less English
coverage of Chile than of Haiti) due to the much smaller death toll
and people using up their 'humanitarian' or 'interest' budget for the
month on Haiti, and the relative lack of connection of Chile to US
editors - there are a lot of Haitians or relations thereof in the US
and apparently not so many Chileans.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Notability for FLOSS - the public's reaction

2010-03-05 Thread Gwern Branwen
On Fri, Mar 5, 2010 at 5:58 AM, Charles Matthews
 wrote:
> Gwern Branwen wrote:
>> The [[dwm]] deletion discussion has caught the interest of some of the
>> more nerdy online communities:
>>
>> - 
>> http://www.reddit.com/r/programming/comments/b8s29/the_wikipedia_deletionists_are_at_it_again_this/
>> - http://news.ycombinator.com/item?id=1163884
>>
>> It's interesting to see the general levels of disgust and how few
>> current editors there are in comparison to former, and read the
>> dislike of WP:N.
>>
> As usual, one has to sift the arguments. Why aren't blogs included under
> RS? That would be because they are generally unreliable? Why does a
> snowboarding slalom event not have its own article? That would be
> because no one has started one, I guess. Why does someone who left in
> 2006 still bring it up? Elephant's memory for grudges, I suppose.

4 years is hardly extraordinary. What events would someone who left
because of something in 2006 cite other than it? 'Oh, I left in 2006
and haven't contributed since, but an excellent example of what I mean
was the deletion discussion for [[foo]] in 2008; of course, I don't
know anything about it since I wasn't contributing as I said, but you
see what I mean.'

> Oh yes, and what Carcharoth said about FLOSS history needing the
> secondary sources: if "they" don't write the history, it isn't just WP
> coverage that suffers, but the whole documentation, especially if the
> primary sources are emails, perishable web pages, and suchlike.
>
> Charles

So basically, 'if you guys choose to use modern media like wikis and
blogs, and not dead tree formats, then don't cry about your articles
being deleted - it's all *your* fault! Cut your hair, you damn
hippies!'

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Notability for FLOSS - the public's reaction

2010-03-04 Thread Gwern Branwen
The [[dwm]] deletion discussion has caught the interest of some of the
more nerdy online communities:

- 
http://www.reddit.com/r/programming/comments/b8s29/the_wikipedia_deletionists_are_at_it_again_this/
- http://news.ycombinator.com/item?id=1163884

It's interesting to see the general levels of disgust and how few
current editors there are in comparison to former, and read the
dislike of WP:N.

I certainly hope the usability initiatives bear fruit and entice
regular people into becoming editors, because we're burning our
bridges among our original techy contributor base.

([[Wikipedia:Articles for deletion/Dwm (2nd nomination)]] is trending
keep, but many FLOSS articles have been deleted lately, and many will
yet feel the axe.)

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Another notability casualty

2010-02-21 Thread Gwern Branwen
On Sat, Feb 20, 2010 at 3:59 PM, Carcharoth  wrote:
> On Sat, Feb 20, 2010 at 4:05 PM, Ken Arromdee  wrote:
>> I stumbled into this:
>>
>> http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Kinuyo_Yamashita
>>
>> My personal summary: Notability requirements shown to be utterly broken for
>> popular culture topics.
>
> Yeah. It's difficult. The discussion looks like a 'no consensus', but
> throw in the socking accusations and the BLP background, and you can
> understand the result, even if you disagree with it. I would look up
> some sources, but I really hate those "pseudonym in another language
> in an obscure and emerging genre (video music)" cases. You really
> can't make much progress with those unless someone actually goes and
> writes a book about it, or you know the other language (and I know no
> Japanese at all).
>
> Carcharoth

And it doesn't help that composers lend themselves to being indexed in
databases and general name-checking without substantive content.

For example, look at the hits for Kinuyo in my CSE:
http://www.google.com/cse?cx=009114923999563836576%3A1eorkzz2gp4&q=%22Kinuyo+Yamashita%22

Leaving aside the issue that I have no idea whether to whitelist
originalsoundversion.com as a RS or blacklist it as a database/blog
filling up my results, note that there are tons of references &
mentions, but few substantive discussions. (Ironically, one of the
more prominent hits is an osv.com post criticizing the deletion:
http://www.originalsoundversion.com/?p=7667 )

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Unreverted vandalism

2010-02-11 Thread Gwern Branwen
On Thu, Feb 11, 2010 at 12:21 PM, Thomas Dalton  wrote:
> On 11 February 2010 17:17, Carcharoth  wrote:
>> On Thu, Feb 11, 2010 at 5:05 PM, Andrew Gray  
>> wrote:
>>
>>> b) Use reversions. Sample a thousand uses of rollback from the recent
>>> changes list, find time between that edit and the one it was
>>> reverting.
>>
>> That one sounds easier. If only people wouldn't use rollback 
>> inappropriately...
>
> Looking for rollback edits is a good way to find vandalism that was
> reverted quickly, but as Andrew says it won't find old vandalism on
> articles with subsequent edits, which is essential if the intention it
> to find out how much vandalism takes a long time to be reverted.

And such are very common. In high-vandalism pages, it's easy for
entire sections to just drop out in the back and forth. Bot edits
badly exacerbate the issue because they edit whenever the heck they
feel like it, and increase the noise in diffs.

An example: while looking at a reversion of a few anon edits on
[[Legalism (Chinese philosophy)]], I grew suspicious of the ordering
of sections - it seemed a little off, a little too choppy. I looked at
consolidated diffs back to January, finding nothing in particular, but
it was only when I gave it a last try all the way back to December,
that I figured it out: 2 entire substantial sections had gotten
deleted.

I had to manually copy them back in because of all the bot activity in
the interim: 
https://secure.wikimedia.org/wikipedia/en/w/index.php?title=Legalism_%28Chinese_philosophy%29&action=historysubmit&diff=342300729&oldid=341823153

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Custom Google search engines for finding RSs for subject areas

2010-02-09 Thread Gwern Branwen
On Mon, Feb 8, 2010 at 12:21 PM, Rajat Mukherjee  wrote:
> Gwern
> This is not true - we support a lot more than 20 patterns - so I will follow
> up to have this addressed  in the forum
> if you can provide a specific example where you believe patterns are not
> being used, we can look to see if there's any issue at our end
> thanks
> rajat

I've replied on the forum; with any luck, this will turn out to simply
be a (disconcerting, gut-wrenching) UI issue.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Custom Google search engines for finding RSs for subject areas

2010-02-07 Thread Gwern Branwen
On Fri, Jan 22, 2010 at 12:00 PM, Gwern Branwen  wrote:

> Further examples can be multiplied, but I hope this shows that CSEs
> can be very useful for finding online sources; I'm sure it would work
> as well for other subject-areas!
>
> (And since I can't let recent events go, I'll mar my little essay with
> a final remark: *this* is the sort of thing that will lessen issues
> like BLPs - not fanaticism like "Caedite eos! Novit enim Dominus qui
> sunt eius".)

I withdraw my enthusiastic support of CSEs. Apparently Google will
without warning or notice arbitrarily delete all but 20 URL filters:

http://www.google.com/support/forum/p/customsearch/thread?tid=29757bc2983d538d&hl=en

This makes CSE utterly useless for Wikipedia. The obvious workaround,
keeping a list of URLs on a subpage and having a CSE load that, will
run afoul of the Wikipedia blacklist filter.

Even if I found a workaround, I am sufficiently angry that Google
would unilaterally destroy approximately 10-20 hours of my work that I
do not think I would use CSE anyway.

The idea is still good, however. The restricted searches proved their
utility to me. But I currently don't know of any alternatives.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Administrator coup / mass deletions

2010-01-26 Thread Gwern Branwen
On Tue, Jan 26, 2010 at 4:11 PM, Ken Arromdee  wrote:
> On Thu, 21 Jan 2010, Adam Koenigsberg wrote:
>> I oppose this mass deletion but support the theory behind it, that is to
>> say, I would support this deletion criteria but believe this to be out of
>> process. Being Bold doesn't extend to administrator tools, IMHO. This
>> reminds me of the Userbox mass deletion fiasco of January 2006, see
>> RFC/Kelly Martin
>
> It reminds me of spoiler warnings.  It's amazing just how much spoiler
> warnings turned out to be a template for all sorts of...  suboptimal...
> activities.  Once you delete tens of thousands of things, you've won,
> regardless of whether you've followed the rules or not.

It is easier to attack than defend. If you want to justify high
standards and removal, there are easy arguments: 'what if this could
be another Seigenthaler?' 'what if this is fancruft Wikipedia will be
criticized for including?'

If you want to defend, you have... what? Even the mockery of _The New
Yorker_ didn't convince several editors that [[Neil Gaiman]] should
cover Scientology. There is no beacon example of deletionism's
grievous errors.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Removing unsourced information

2010-01-26 Thread Gwern Branwen
On Tue, Jan 26, 2010 at 5:47 PM, Charles Matthews
 wrote:
> Gwern Branwen wrote:
>> On Tue, Jan 26, 2010 at 3:42 PM, Charles Matthews
>>  wrote:
>>
>>> quiddity wrote:
>>>
>>>> What to do about someone who has "lost the plot"?
>>>> For example, this editor seems to be going from article to article,
>>>> deleting every prose paragraph that doesn't have a ref tag (usually
>>>> everything except the intro sentence).
>>>> http://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20100125214401&target=JBsupreme
>>>> Some of the content being removed is obviously not good (selfpromoting
>>>> peacockery etc), but much is perfectly fine, and this seems to be one
>>>> of the worst (most indiscriminate) ways to handle the hypothetical
>>>> problem.
>>>>
>>>>
>>> Suggest that one can drive-by even faster in adding {{fact}}? I think
>>> this is the first step, the suggestion that identifying unsourced facts
>>> is a way of achieving a similar end, and that we can all applaud it when
>>> properly done.
>>>
>>> Charles
>>>
>>
>> And where does the {{fact}}-bombing end?
>>
>> [[Medici bank]] is as finely referenced an article as I have ever (or
>> likely will ever) written with 96 footnotes, multiple books & papers
>> consulted, and extensive quoting - yet the overwhelming majority of
>> sentences lack  tags and are presumably candidates for bombing.
>>
>>
> Well, I think that in a well-written, well-sourced article people should
> be still allowed to ask for further references. I foolishly copied the
> basics of [[List of dissenting academies]] out of a book, thinking it
> was a cheap article; and so far have added about 120 footnotes and
> created around 50 articles at Wikisource to support it. Just shows where
> these things can lead.
>
> I actually had big problems with inline referencing style when it was a
> hot potato, and I did start putting articles together sentence by
> sentence. There were reassurances that it was not going to lead to
> "lame" writing, and I think those were overdone (more precisely, in an
> area where there is plenty of academic research at book length, you will
> probably by OK, but that's quite a limitation). OTOH inline referenced
> writing is now the house style, and actually there are worse things:
> concision is good, and fact-checked encyclopedia articles are good, and
> the fact that articles are never finished is a given.
>
> Charles

The problem is not that the article is not finished, the problem is
our guidelines allow wikilawyers to demand that the map be the
territory.

[[WP:ZEN]]:

 'Once, a novice was meditating over a guideline, when Gwern came
by. The novice was tossed an unreferenced line from a plot summary.
Gwern said, "If you do not reference this, it is unsourced and must be
removed. But if you do reference it with a quote from the story, it is
a copyvio and so must be removed. Now quickly! What do you do?"'

Our guidelines make a weak nod toward 'hey guys don't be a WP:DICK
mmkay?' but do not ever countermand the strong injunctions towards
sourcing. This is not helped by the extremist statements by people
like Jimbo that material without a  is to be considered guilty
until proven innocent. An editor can go to an article and challenge
unremarked material of ancient provenance sentence by sentence*, and
there is no point at which a good editor can say to her, 'You have sat
too long for any good you have been doing lately... Depart, I say; and
let us have done with you. In the name of God, go!'

Our guidelines assume a binary - something is referenced or
unreferenced. An article is referenced or unreferenced. Magical
thinking ('if we just delete all unreferenced BLPs, we will have
improved article quality *obviously*!').

There is, of course, nothing that is completely 'referenced' - if I
state the Medici bank used bills of draft payable in florins on the
Bruges branch, and I cite de Rouver 1987, it can be object that I
haven't really referenced it; if I provide page number, they can fall
back on 'does the ref say Bruges? florins? Medici bank? bills of
drat?'; if I provide quotes, they can employ copyright paranoia; and
so on.

Nor is there anything completely unreferenced; if I assert Star Wars
canon has multiple levels, the references - though inaccessible to me
now and never to be included in the article - are the many books and
articles I've read about Star Wars. Referencing is a long continuum
with nothing at either end.

* It is worth noting that the administrator Lars, involved in deleting
BLPs, has claimed that WP:SILENCE has its exact opposite meaning in
BLP articles - material that has gone unremarked & unchallenged for
years is actually highly controversial, and not anodyne & acceptable.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Removing unsourced information

2010-01-26 Thread Gwern Branwen
On Tue, Jan 26, 2010 at 3:42 PM, Charles Matthews
 wrote:
> quiddity wrote:
>> What to do about someone who has "lost the plot"?
>> For example, this editor seems to be going from article to article,
>> deleting every prose paragraph that doesn't have a ref tag (usually
>> everything except the intro sentence).
>> http://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20100125214401&target=JBsupreme
>> Some of the content being removed is obviously not good (selfpromoting
>> peacockery etc), but much is perfectly fine, and this seems to be one
>> of the worst (most indiscriminate) ways to handle the hypothetical
>> problem.
>>
> Suggest that one can drive-by even faster in adding {{fact}}? I think
> this is the first step, the suggestion that identifying unsourced facts
> is a way of achieving a similar end, and that we can all applaud it when
> properly done.
>
> Charles

And where does the {{fact}}-bombing end?

[[Medici bank]] is as finely referenced an article as I have ever (or
likely will ever) written with 96 footnotes, multiple books & papers
consulted, and extensive quoting - yet the overwhelming majority of
sentences lack  tags and are presumably candidates for bombing.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Time heals all wounds

2010-01-25 Thread Gwern Branwen
One from the archives:

https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Requests_for_adminship/Everyking_6
https://secure.wikimedia.org/wikipedia/en/wiki/User_talk:Everyking#Congratulations
https://secure.wikimedia.org/wikipedia/en/w/index.php?title=Wikipedia:Former_administrators&curid=21026701&diff=340034766&oldid=339578680

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] "Hungry for New Content, Google Tries to Grow Its Own in Africa"

2010-01-25 Thread Gwern Branwen
http://www.nytimes.com/2010/01/25/technology/25link.html

"But Google can do something that cowboys can’t: create more real
estate. The company is sponsoring a contest to encourage students in
Tanzania and Kenya to create articles for the Swahili version of
Wikipedia, mainly by translating them from the English Wikipedia. The
winners are to be announced Friday, with prizes including a laptop, a
wireless modem, cellphones and Google gear.

So far the contest, Google says, has added more than 900 articles from
more than 800 contributors.

“Our algorithms are primed and ready to give you the answer you are
looking for, but the pipeline of information just isn’t there,” said
Gabriel Stricker, Google’s spokesman on search issues. “The challenge
for searches in many languages for us no longer is search quality. Our
ability to get the right answer is hindered by the lack of quality and
lack of quantity of material on the Internet.”

Sitting in a Google cafeteria, Mr. Stricker outlined all the ways
information eludes the search engine — wrong language, not digitized,
too recent, doesn’t exist but should. Feeding the maw is clearly an
obsession of Google’s. After all, the search engine’s
comprehensiveness is an edge against a new, well-financed competitor,
Bing from Microsoft.

In e-mail interviews, two of the finalists in the Swahili contest said
the arrival of Google on their campuses changed them from passive
users of Wikipedia to active contributors. Still, they expressed mixed
feelings about receiving material rewards for sharing knowledge.

One of the finalists, Jacob Kipkoech, a 21-year-old from the Rift
Valley of Kenya who is studying software engineering at Kenyatta
University in Nairobi, has created 17 articles so far that were given
points. Among the topics were water conservation, Al Qaeda and
afforestation, the process of creating forests.

“Wikipedia has been a good online research base for me,” he wrote,
“and this was a way I could make it possible for people who can’t use
English to benefit from it as well.”

Another finalist, Daniel Kimani, also 21, is studying for a degree in
business information technology at Strathmore University in Kenya. He
said that contests were an effective way to attract contributors but
that “bribing,” or paying per article, “is not good at all because it
will be very unfair to pay some people and others are not paid.”

“I believe in Wikipedia,” he said, “since it is the only free source
of information in this world.”

Swahili, because it is a second language for as many as 100 million
people in East Africa, is thought to be one of the only ways to reach
a mass audience of readers and contributors in the region. The Swahili
Wikipedia still has a long way to go, however, with only 16,000
articles and nearly 5,000 users. (Even a relatively obscure language
like Albanian has 25,000 articles and more than 17,000 contributors.)

Mr. Kimani and Mr. Kipkoech represent one of the challenges for
creating material in African languages. The people best equipped to
write in Swahili, or Kiswahili as it is sometimes known, are
multilingual university students. And yet Mr. Kimani wrote that he
used “the English version more than Kiswahili since most of my school
work is in English.”

Translation could be the key to bringing more material to non-English
speakers. It is the local knowledge that is vital from these Kenyan
contributors, the thinking goes, assuming that Swahili-English
translation tools improve.

Mr. Kimani wrote one entry in English and Swahili about drug use in
Mombasa, the second-largest city in Kenya. It says that the “youth in
this area strongly believe that use of bhang or any other narcotic
drug could prevent one from suffering from ghosts attacks.”

Now the article lives in English and Swahili, although the English
Wikipedia editors have asked for citations and threatened to remove
it."

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Custom Google search engines for finding RSs for subject areas

2010-01-23 Thread Gwern Branwen
On Sat, Jan 23, 2010 at 4:31 AM, Carcharoth  wrote:
> On Sat, Jan 23, 2010 at 3:21 AM, Gwern Branwen  wrote:
>> On Fri, Jan 22, 2010 at 8:45 PM, K. Peachey  wrote:
>>> On Sat, Jan 23, 2010 at 3:00 AM, Gwern Branwen  wrote:
>>>> ...snip...
>>>> I started with all the links listed in
>>>> https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:WikiProject_Anime_and_manga/Online_reliable_sources
>>>> and then began running searches on random topics and pruning based on
>>>> that - chucking sites into the blacklist sinbin, or finding good sites
>>>> omitted from the list and adding them to the whitelist. At last count,
>>>> I had 200 sites on the nice list and 311 on the naughty list (but this
>>>> counts things like the Mirrors page as a single link, though they ban
>>>> dozens or hundreds of sites).
>>>> ...snip...
>>> Perhaps we should encourage more WikiProjects to create lists like the
>>> one displayed then add them into a category and someone could work on
>>> a custom search that suitable to use across the project that is
>>> continuously updated with more allow/black lists.
>>>
>>> -Peachey
>>
>> That would be an excellent idea, especially if they could then all be
>> {{subst}}ed into a single page - just as I can ban every site listed
>> in the consolidated WP:MIRRO page, so too I can *include* every site
>> listed on a page. It would probably be superior to the current AfD
>> template with just some normal Google/Books/News searches.
>
> Does your custom search aggregate books, news, and scholar searches,
> as well as ordinary web searches?

I put in the Books/News/Scholar URLs, but I'm unsure it did anything.
For example, AFAIK, a site search of Google books will only turn up
the homepage for a book - the metadata, reviews, etc; the actual OCR
page contents are part of the 'deep web' you can get at only through
the actual Google search box. One might think that Google's custom
search might recognize the Google service URLs and run the deep web
queries and not just query the surface details - but that seems to be
too much to expect. (So I am perhaps a little hasty in suggesting a
universal CSE would replace the AfD searches.)

> Those are the four Google searches I
> use most often, and it is interesting to see how some subjects get
> more coverage in one area of the information metasphere than other
> areas. It is all quite logical when you think about when the topic
> received most coverage. The one thing I still find that is lacking a
> lot is Google News - a lots of old newspapers still seem to need to be
> searched on separate databases. What is the best database out there
> for searching in old newspapers?
>
> Carcharoth

I don't know of any good non-proprietary old newspaper database, personally.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Custom Google search engines for finding RSs for subject areas

2010-01-22 Thread Gwern Branwen
On Fri, Jan 22, 2010 at 8:45 PM, K. Peachey  wrote:
> On Sat, Jan 23, 2010 at 3:00 AM, Gwern Branwen  wrote:
>> ...snip...
>> I started with all the links listed in
>> https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:WikiProject_Anime_and_manga/Online_reliable_sources
>> and then began running searches on random topics and pruning based on
>> that - chucking sites into the blacklist sinbin, or finding good sites
>> omitted from the list and adding them to the whitelist. At last count,
>> I had 200 sites on the nice list and 311 on the naughty list (but this
>> counts things like the Mirrors page as a single link, though they ban
>> dozens or hundreds of sites).
>> ...snip...
> Perhaps we should encourage more WikiProjects to create lists like the
> one displayed then add them into a category and someone could work on
> a custom search that suitable to use across the project that is
> continuously updated with more allow/black lists.
>
> -Peachey

That would be an excellent idea, especially if they could then all be
{{subst}}ed into a single page - just as I can ban every site listed
in the consolidated WP:MIRRO page, so too I can *include* every site
listed on a page. It would probably be superior to the current AfD
template with just some normal Google/Books/News searches.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Administrator coup / mass deletions

2010-01-22 Thread Gwern Branwen
On Fri, Jan 22, 2010 at 12:50 PM, Cool Hand Luke
 wrote:
> On Fri, Jan 22, 2010 at 11:20 AM, The Cunctator  wrote:
>
>> At the same time,
>>
>> *Always leave something undone.
>> **Give the author a chance.*
>> *Build the web.*
>> *Do not disrupt Wikipedia to illustrate a point.*
>>
>> and
>>
>> *If the page can be improved, this should be solved through regular
>> editing,
>> rather than deletion.*
>>
>>
>
> These maxims were very good in the formative stages of our project.  You and
> other early editors were right (maybe even prophetic) to adopt them.  The
> fledgling project needed hands, eyeballs, and content.  By zealously keeping
> and expanding content--even shoddy content--we grew dramatically.
>
> But this debate has come to a boil because we've been too slow in realizing
> that the balance must change because conditions have changed.  We are no
> longer a small project, but one that places in the top three google search
> results for almost any topic in our encyclopedia.  We have succeeded because
> of our formative policies, and with our success comes responsibility.
>
> In an era when any living subject can have their life harmed by a poorly
> vetted biography, we should strike a new balance.  We should not bite off
> more than we can chew.  In this area, we ought to weed out BLPs that we can
> no longer maintain at appropriately high standatds.  As a happy consequence
> of this process, many notable biographies will be improved.  I hope that
> this improvement and re-examination process is continual.
>
> In this way, we will effectively shoulder the responsibility we have for
> maintaining one of the top ten sites on the internet.
>
> Cool Hand Luke

"When I was a child, I talked like a child, I thought like a child, I
reasoned like a child. When I became a man, I put childish ways behind
me." eh?

You older Wikipedians run along now; you've had your day. The adults
are talking now - I are serious editors, this are serious website.

Funny how BLPs have been the most serious threat facing the project,
so serious that mass mutiny is justified and the jettisoning of our
old ways and practices - and have been since at least 2006. I guess
when I look cynically upon the Chicken Little BLP warriors, it just
reflects my own ignorance of how Wikipedia teeters on the brink every
day, how countless suicides and ruined lives have been averted by
their heroic daily efforts.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Custom Google search engines for finding RSs for subject areas

2010-01-22 Thread Gwern Branwen
So, on a lighter note, I recently got sick & tired of running site:
search after site: -wiki search in Google, and began looking for some
way to automate it.

I discovered that one can make a 'custom' Google search:
https://secure.wikimedia.org/wikipedia/en/wiki/Google_Co-op

It allows one essentially to tell Google to increase the score of any
hits in certain domains, and blacklist other domains. It has a number
of neat features - for example, I can tell it to blacklist any domain
on 
https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Mirrors_and_forks/All
. You might think that a parameter like '-wiki' or '-wikipedia' would
do the same thing, but alas!

In particular, I've created a CSE focused on anime & manga  topics:
http://www.google.com/cse/home?cx=009114923999563836576:1eorkzz2gp4

I started with all the links listed in
https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:WikiProject_Anime_and_manga/Online_reliable_sources
and then began running searches on random topics and pruning based on
that - chucking sites into the blacklist sinbin, or finding good sites
omitted from the list and adding them to the whitelist. At last count,
I had 200 sites on the nice list and 311 on the naughty list (but this
counts things like the Mirrors page as a single link, though they ban
dozens or hundreds of sites).

The results are *much* better. To take my most recent use, finding
material on [[Amanchu!]] for its AFD
(https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Articles_for_deletion/Amanchu!),
compare the regular Google search:
http://www.google.com/search?q=amanchu

with the CSE search:
htp://www.google.com/cse?cx=009114923999563836576%3A1eorkzz2gp4&q=amanchu

All the blogs & scanlations & forums in the former are great for
someone who just wants to read _Amanchu!_, but for a Wikipedian? It's
terrible. Notice that the ANN launch article, which is apparently the
most substantive English coverage in a RS*, is the first hit in the
CSE but the fifth in the regular Google search, and you can keep
scrolling down and find mostly chaff. And the weekly sales ranking
that puts _Amanchu!_ at #8 nationally, that shows up in the first page
in the CSE? I've no idea where it is in the regular Google hits.

Or take a critical classic: _The Wings of Honneamise_
(https://secure.wikimedia.org/wikipedia/en/wiki/Royal_Space_Force:_The_Wings_of_Honn%C3%AAamise).

Google:
http://www.google.com/search?q=wings%20of%20honneamise
CSE:
http://www.google.com/cse?cx=009114923999563836576%3A1eorkzz2gp4&q=wings+of+honneamise

Google has on its first page WP, IMDb, Amazon, video links, Tucows
(!), ads, and just 2 reviews a Wikipedian might find useful.

CSE has 9 or 10 good review sources from respectable publications like
Ex.org or the New York Times, and even the questionable hits like
RottenTomatoes have their good points - RT would lead one to the
famous critic Roger Ebert's *very* flattering review of _Wings of
Honneamise_. And it'll take you straight to Ebert's review on page 2,
whereas in regular Google search, you have to go to page 7 or 8.

Further examples can be multiplied, but I hope this shows that CSEs
can be very useful for finding online sources; I'm sure it would work
as well for other subject-areas!

(And since I can't let recent events go, I'll mar my little essay with
a final remark: *this* is the sort of thing that will lessen issues
like BLPs - not fanaticism like "Caedite eos! Novit enim Dominus qui
sunt eius".)

* Unsurprising, really. _Amanchu!_ is Japanese only and likely will
stay that way for years; even the anime media can be very
language-parochial.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Administrator coup / mass deletions

2010-01-21 Thread Gwern Branwen
On Thu, Jan 21, 2010 at 12:43 PM, Cary Bass  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> The Cunctator wrote:
>> Just restored a former prime minister.
>>
>
> Hi!
>
> I just want to ask a question about this, and since I don't know the
> article of which you speak, I can't judge its specific merits.  This is
> my personal opinion, and does not reflect that of any organization of
> which I may be employed.
>
> Judging by your contributions, you've been restoring articles and
> providing sources.  Reading your email, I think, "The result of deleting
> this biography was that it get restored and provide sources, that's a
> good thing, right?  The quality of the project goes up one more notch."
>    I don't have an issue with the article of a former prime minister
> disappearing for a few hours.
>
> I want to get a full perspective, however.  If you see fault with my
> interpretation, please help me understand.
>
> Cary

That argument sounds like a broken window fallacy. Cunctator has been
irked and annoyed, and driven that much closer to leaving the project
forever. And he can only experience that joy because he's an admin.

A regular contributor will have different reactions. When he hasn't
been driven away already.

And what benefit was there *really*? I see a lot of mindless fetishism
of sourcing here, but suppose Cunctator resurrected an article and
stuck in a random newspaper article for the claim 'Foo was married in
1967.' Nobody disputed that before; nobody disputed that after; no new
information was added. How *exactly* is the article better? Is it
better because some hypothetical viewer might one day go, hm, I wonder
if he really was married in 1967, and will look at the cite and be
relieved?

Speaking from personal experience on the _Evangelion_ articles: I have
on multiple occasions spent hours or weeks tracking down some fact
widely accepted amongst Eva fans & academic commentators to its
original source and found it.  And then felt a sick hollow feeling as
I realize that all I have done is waste my life satisfying RS
standards, when the fans and professors knew it all along because they
trust each other and their forebears and can see for themselves the
consilience of all those commonly accepted facts.

Sourcing is orthogonal to quality. I would trade a thousand useless
citations for a single good administrator, or heck, even editor.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Administrator coup / mass deletions

2010-01-21 Thread Gwern Branwen
On Wed, Jan 20, 2010 at 8:25 PM, Apoc 2400  wrote:
> Apparently there is some kind of coup on English Wikipedia where a large
> group of administrators have decided that since the community disagrees with
> them, they will use their admin powers to override consensus and policy. At
> least that is what they seem to claim it is.
>
> "The community is incapable of such a conversation and decision."
> --MZMcBride
> "Hence my actions." Kevin
> http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents#User:Rdm2376_starting_mass_deletions
>
> Specifically it is about mass-deleting articles about living people for the
> sole reason of lacking sources.
>
> Is there anyone here who can do something about this before it becomes an
> even bigger wheel-war?

Yes, the Arbcom has done something about it. Specifically, it has
patted them on the head and said, 'good job, guys! Just be quieter in
the future'.

https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Arbitration/Requests/Case#Motions

"* That unsourced biographies of living people may contain
seemingly innocuous statements which are actually damaging, but there
is no way to determine whether they do without providing sources;
* That Wikipedia, through the founding principle of "Ignore All
Rules", has traditionally given administrators wide discretion to
enforce policies and principles using their own best judgment; and
* That administrators have been instructed to aggressively enforce
the policy on biographies of living people.

 The Committee has determined that the deletions carried out by
Rdm2376, Scott MacDonald, and various other administrators are a
reasonable exercise of administrative discretion to enforce the policy
on biographies of living people.

 The administrators who carried out these actions are commended
for their efforts to enforce policy and uphold the quality of the
encyclopedia, but are urged to conduct future activities in a less
chaotic manner.

 The administrators who interfered with these actions are reminded
that the enforcement of the policy on biographies of living people
takes precedence over mere procedural concerns.

 The Committee hereby proclaims an amnesty for all editors who may
have overstepped the bounds of policy in this matter. Everyone is
asked to continue working together to improve and uphold the goals of
our project. The Committee recommends, in particular, that a request
for comments be opened to centralize discussion on the most efficient
way to proceed with the effective enforcement of the policy on
biographies of living people."

Translation: BLP now means anything whatsoever unsourced is evil & to
be burned with fire; anything is justified in pursuit of previous; IAR
now means flagrant admin abuses are justified if you can cite
imaginary bits of a policy, and other admins have to sit there and
take it; silent mass deletions are now an acceptable admin tactic.

I particularly enjoy the 'innocuous statements' point. It's
reminiscent of the best Cold War paranoia: your friend, your
co-worker, or even your dog could secretly be a Commie agent! No one
is safe! Not even *you*. I have a list of 55 unsourced
innocuous-seeming statements in the [[State Department]]...

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The Curious Incident of the Fans in the Night

2010-01-19 Thread Gwern Branwen
On Tue, Jan 19, 2010 at 9:07 PM, Jussi-Ville Heiskanen
 wrote:
> Ken Arromdee wrote:
>> On Mon, 18 Jan 2010, David Gerard wrote:
>>
>>> If they want to filibuster the reliability of this source, it speaks
>>> of some child being Robert Heinlein's great-grandson ... Heinlein
>>> didn't have any children. I wonder where they got that from.
>>>
>>
>> Wikipedia's article on Heinlein nowhere says he didn't have any children.
>> It's generally accepted that he and Virginia didn't have any children, but
>> Virginia was his third wife, and he was married to his second for 15 years.
>>
>>
>
> And serious historians still don't know even the name of his
> first wife, much less their connubial history.
>
>
> Yours,
>
> Jussi-Ville Heiskanen

Our article seems to know her name - sourced to the LA Times even.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The Curious Incident of the Fans in the Night

2010-01-19 Thread Gwern Branwen
On Tue, Jan 19, 2010 at 2:58 AM, Ken Arromdee  wrote:
> On Mon, 18 Jan 2010, David Gerard wrote:
>> If they want to filibuster the reliability of this source, it speaks
>> of some child being Robert Heinlein's great-grandson ... Heinlein
>> didn't have any children. I wonder where they got that from.
>
> Wikipedia's article on Heinlein nowhere says he didn't have any children.
> It's generally accepted that he and Virginia didn't have any children, but
> Virginia was his third wife, and he was married to his second for 15 years.

True, but the New York Times obituary says he was survived only by his
third wife. If he had children by either 1 or 2, wouldn't they have
mentioned it? And try googling around a bit; you'll find nothing, and
even the occasional hit specifically claiming there were no children
(http://www.nitrosyncretic.com/rah/rahfaq.html#0106)

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The Curious Incident of the Fans in the Night

2010-01-18 Thread Gwern Branwen
On Mon, Jan 18, 2010 at 6:21 PM, David Gerard  wrote:
> 2010/1/18 Ken Arromdee :
>
>> The problem is that Wikipedia policies pretty much encourage editors to
>> filibuster changes they don't like by demanding sources and questioning the
>> sources.  This is useful when there's a serious question about whether the
>> information is accurate, but it's also abused when there's no serious 
>> question
>> about the information's accuracy and the request for sources is used to block
>> something they want to exclude for other reasons.  If someone then provides a
>> valid source anyway, the source just gets repeatedly questioned regardles of
>> whether it follows Wikipedia's sourcing rules.
>
>
> If they want to filibuster the reliability of this source, it speaks
> of some child being Robert Heinlein's great-grandson ... Heinlein
> didn't have any children. I wonder where they got that from.
>
>
> - d.

After some checking, it seems he really didn't have any offspring. But
he had quite a few siblings, so I am going to tentatively assume that
http://www.cheryl-morgan.com/?p=7536 is right and what was meant was
great-grandnephew.

They might've simply asked the kid and gotten that response; I
remember when I was that age & younger I was none too clear on the
whole genealogical tree and who was nephew to whom. But hopefully
someone will contact the article writer and get it straightened out.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The Curious Incident of the Fans in the Night

2010-01-18 Thread Gwern Branwen
On Mon, Jan 18, 2010 at 5:29 PM, quiddity  wrote:
> On Mon, Jan 18, 2010 at 12:50 PM, Ken Arromdee  wrote:
> ...
>> elsewhere.  Our rules generally don't say we can't use information unless
>> it has *two* sources; and in this case it's obvious that the reason the
>> information is hard to find is that Neil Gaiman is trying to keep it quiet,
>> not that it isn't true.
>>
>
> Unless there's a [[Template:Notable_Wikipedian]] tag missing from the
> article's talkpage,
> I suspect you probably mean "Neil Gaiman's /fans are/ trying to keep
> it quiet". Not neilhimself...!
>
> quiddity.

No, I think he meant Neil.

Did you know there's not one single use of the term 'Scientology' on
neilgaiman.com or any subdomains? Given his family is Scientologist,
he was raised a Scientologist in a major bastion of Scientology,
married a Scientologist, and so on, and given that people have been
interested in all the foregoing for a long time, and also given that
he used to have comments on the website, and *also* given that he is
historically very responsive to random questions about just about
anything* - the utter silence must be deliberate. Which is as one
would expect.

* I know this from personal experience:
http://journal.neilgaiman.com/2004/09/from-mailbag-er-i-won-huck.asp

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] The Curious Incident of the Fans in the Night

2010-01-18 Thread Gwern Branwen
http://www.newyorker.com/reporting/2010/01/25/100125fa_fact_goodyear?currentPage=all

 "The pivotal fact of Gaiman’s childhood is one that appears
nowhere in his fiction and is periodically removed from his Wikipedia
page by the site’s editors. When he was five, his family moved to East
Grinstead, the center of English Scientology, where his parents began
taking Dianetics classes. His father, a real-estate developer, and his
mother, a pharmacist, founded a vitamin shop, G & G Foods, which is
still operational. (According to its Web site, it supplies the Human
Detoxification Programme, a course of vitamins, supplements, and other
alleged purification techniques, which Scientology offers at disaster
sites like Chernobyl and Ground Zero.) In the seventies, his father,
who died last year, began working in Scientology’s public-relations
wing and over time rose high in the organization. Gaiman has two
younger sisters, both still active in Scientology; one of them works
for the church in Los Angeles, and the other helps run the family
businesses.

 At times, Scientology proved awkward for the Gaiman children.
According to Lizzy Calcioli, the sister who stayed in England, “Most
of our social activities were involved with Scientology or our Jewish
family. It would get very confusing when people would ask my religion
as a kid. I’d say, ‘I’m a Jewish Scientologist.’ ” Gaiman says that he
was blocked from entering a boys’ school because of his father’s
position and had to remain at the school he’d been attending, the only
boy left in a classroom full of girls. These days, Gaiman tends to
avoid questions about his faith, but says he is not a Scientologist.
Like Judaism, Scientology is the religion of his family, and he feels
some solidarity with them. “I will stand with groups when I feel like
they’re being properly persecuted,” he told me."

It is entertaining to read the relevant talk page sections:

* 
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Neil_Gaiman#Neil_Gaiman_is_not_a_Scientologist
* 
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Neil_Gaiman/Archive1#Scientology
* 
https://secure.wikimedia.org/wikipedia/en/wiki/Talk:Neil_Gaiman/Archive1#Possible_reference_found

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Google bows to censorship

2010-01-17 Thread Gwern Branwen
On Sun, Jan 17, 2010 at 12:13 PM, Anthony  wrote:
> On Sun, Jan 17, 2010 at 11:58 AM, Anthony  wrote:
>> If censoring some things (like "the most offensive sorts of racial
>> vilification you could possibly find"), and refusing to censor other things
>> (like an historical account of a pro-democracy demonstration), is hypocrisy,
>> then let me be the first to say that I'm in favor of hypocrisy.

Silly Anthony. Don't you know that China was simply asking Google to
comply with local laws against morals-destroying smut, the propaganda
of life-destroying evil cults, and the subversion of mass-murdering
terrorists?

What's some peculiar racist humor compared with *that*? Strange moral
standards you have there.

> But then, treating one country differently from another country is not
> hypocrisy.  Treating one situation differently from another situation is not
> hypocrisy.  Looking at the relevant part of the Google statement, it was
> this: "We have decided we are no longer willing to continue censoring our
> results on Google.cn, and so over the next few weeks we will be discussing
> with the Chinese government the basis on which we could operate an
> unfiltered search engine within the law, if at all."
> http://googleblog.blogspot.com/2010/01/new-approach-to-china.html
>
> It was a statement specifically about the Chinese government, and about
> results on google.cn.  Google did not claim or even imply that it was
> stopping all censorship altogether.  So I don't see the hypocrisy.

It is, at the very least, inconsistent. One set of rules for the
Chinese (and the world), and another set for the Australians. What
difference is there between the 2 situations that justifies this? If
there is no difference, then it's a plain contradiction. (Oh, you
happen to agree with one and not the other? I see...)

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Google Books settlement reached

2010-01-17 Thread Gwern Branwen
On Sun, Jan 17, 2010 at 9:59 AM, Carcharoth  wrote:
> On Sun, Jan 17, 2010 at 2:17 PM, Gwern Branwen  wrote:
>> On Sun, Jan 17, 2010 at 3:34 AM, Carcharoth  
>> wrote:
>>> It seems a settlement has been reached between Google Books and those
>>> taking action against it. Anyone here know what this means in terms of
>>> what we do and how we use Google Books?
>>>
>>> http://books.google.com/googlebooks/agreement/
>>>
>>> Carcharoth
>>
>> I can't speak to the issues of exclusivity everyone was worrying
>> about, but it seems to be a win for Google Books in respect to orphan
>> works:
>>
>> http://books.google.com/googlebooks/agreement/#3
>>
>>     "# In-copyright but out-of-print books
>>
>>     Out-of-print books aren’t actively being published or sold, so
>> the only way to procure one is to track it down in a library or used
>> bookstore. When this agreement is approved, every out-of-print book
>> that we digitize will become available online for preview and
>> purchase, unless its author or publisher chooses to "turn off" that
>> title. We believe it will be a tremendous boon to the publishing
>> industry to enable authors and publishers to earn money from volumes
>> they might have thought were gone forever from the marketplace."
>>
>> Notice that it's opt-out, which for a real orphan work means no one
>> will opt-out. Preview is better than nothing, for us. (It was
>> unrealistic to expect Google to be able to offer full downloads.)
>
> My question would be: if Google can charge for full downloads of their
> scans of orphan (out-of-print) works still in copyright, can others do
> the same? If I had a copy of a work still in copyright but
> out-of-print, and scanned it, and sold the scans, and then stopped if
> the publisher contact me (i.e. asked for it to be "switched off") what
> am I doing differently to what Google are doing?
>
> For me, the exciting thing is being able to get full downloads, even
> if at a price, rather than ordering through a rare or secondhand book
> website. Dunno what that Google price will be though...
>
> Carcharoth

I don't think they're offering downloads, per se.

"When this agreement is approved, every out-of-print book that we
digitize will become available online for preview and purchase, unless
its author or publisher chooses to "turn off" that title."

"When this agreement is approved, every out-of-print book that we
digitize will become available online for preview and purchase, unless
its author or publisher chooses to "turn off" that title."

I don't remember ever seeing any downloads offered for any works -
just the dead-tree versions. Which implies that the orphan works will
get the same treatment.

(Still, if they're showing the entire orphan work, means one can
download it with some scripting.)

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Google bows to censorship

2010-01-17 Thread Gwern Branwen
> Google has agreed to take down links to a website that promotes racist views 
> of indigenous Australians.
> Aboriginal man Steve Hodder-Watt recently discovered the US-based site by 
> searching "Aboriginal and Encyclopedia" in the search engine.
> He tried to modify the entry on Encyclopedia Dramatica, a satirical and 
> extremely racist version of Wikipedia, but was blocked from doing so.
...
> Mr Newhouse said Google agreed to take the link down after he filed an 
> official complaint to the Australian Human Rights Commission.
> "Lo and behold they agreed last night to take down the sites."

http://www.smh.com.au/technology/technology-news/google-agrees-to-take-down-racist-site-20100115-maxd.html

I'm so torn. On the one hand, the hypocrisy is blinding - filtering
its search results is exactly what Google was doing in China. On the
other hand, it's Encyclopedia Dramatica...

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Google Books settlement reached

2010-01-17 Thread Gwern Branwen
On Sun, Jan 17, 2010 at 3:34 AM, Carcharoth  wrote:
> It seems a settlement has been reached between Google Books and those
> taking action against it. Anyone here know what this means in terms of
> what we do and how we use Google Books?
>
> http://books.google.com/googlebooks/agreement/
>
> Carcharoth

I can't speak to the issues of exclusivity everyone was worrying
about, but it seems to be a win for Google Books in respect to orphan
works:

http://books.google.com/googlebooks/agreement/#3

 "# In-copyright but out-of-print books

 Out-of-print books aren’t actively being published or sold, so
the only way to procure one is to track it down in a library or used
bookstore. When this agreement is approved, every out-of-print book
that we digitize will become available online for preview and
purchase, unless its author or publisher chooses to "turn off" that
title. We believe it will be a tremendous boon to the publishing
industry to enable authors and publishers to earn money from volumes
they might have thought were gone forever from the marketplace."

Notice that it's opt-out, which for a real orphan work means no one
will opt-out. Preview is better than nothing, for us. (It was
unrealistic to expect Google to be able to offer full downloads.)

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] The story of an article

2010-01-03 Thread Gwern Branwen
On Sun, Jan 3, 2010 at 9:51 AM, altally  wrote:
> On Sun, Jan 3, 2010 at 12:23 AM, Apoc 2400  wrote:
>
>> >
>> > Yes, it's not that difficult to create an account and wait a few days is
>> > it?
>> >
>> You are making the mistake of assuming newcomers think like "addicted"
>> Wikipedians or persistent troublemakers. This user only ever made about 25
>> edits and stayed just over a month, unless he or she got an account or
>> switched IP.
>>
>> This goes in the same category as:
>> "Anyone with good intentions can get through the 6757836 step New Article
>> Wizard, so it's not too complicated"
>> "It doesn't matter if someone gets wrongly blocked, because they can just
>> request to be unblocked."
>>
>
> When I started, I created an account from the beginning. Why? Because it
> wasn't hard to notice the big "Sign in/create account" link in the corner.
> Newbies aren't all clueless idiots. You are making the mistake of assuming
> newcomers all have no idea what they are doing.
>
> --Majorly

Is it enough that *some* users are savvy enough to make an account
right from the start? No one is saying that such barriers would stop
*all* new editors. Even small percentages matter over the long run.

I suspect many old editors are more like me, who made a few anonymous
edits and only eventually committed to an account, than like you.

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Help Wanted

2009-12-27 Thread Gwern Branwen
Thought this was kind of interesting:

http://www.aaronsw.com/weblog/researcherjob

"I’m looking for a researcher to work with on a couple projects. The
research will mostly be into questions of United States government
policy and the relevant factual basis. For example, you might be asked
to look up things about cap-and-trade legislation and the evidence for
anthropogenic global warming. You can do it part-time. You can work
from anywhere. I think the work will be interesting and I’ll be doing
it too. I think the work will be important, which is why I’m doing it.
The requirements are:

The ideal person is probably someone who’s contributed to Wikipedia."

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] oh dear

2009-12-11 Thread Gwern Branwen
On Fri, Dec 11, 2009 at 9:35 AM, David Gerard  wrote:
> Wikipe-tan on b3ta.
>
> http://www.b3ta.com/board/9830507
>
>
> - d.

Looks like the urine pic is a 'shopped version of one of these:
http://danbooru.donmai.us/post/index?tags=wikipe-tan

-- 
gwern

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


  1   2   >