Re: [Wiki-research-l] Wikimedia Research Newsletter launched

2011-09-03 Thread Reid Priedhorsky
On 09/02/11 17:21, Asaf Bartov wrote:
> Hi.
>
> I think it desirable to post this on Internal-L as well, at least for
> the first couple of issues, to get this concise information to many of
> our local Wikimedian communities.  I expect there are quite a few people
> who could be interested in this, but currently aren't _aware_ of there
> being anything to be interested in... :)
>
> (Maybe also add an explicit invitation to join the Research community
> and to look at relevant Meta pages.)

Is there an RSS feed for these newsletters? I'd love to keep up to date 
on them, and that's my canonical way of following "stuff to read".

I didn't see one during a brief look at the newsletter web pages.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Workshop call for participation: WikiLit: Collecting the Wiki and Wikipedia Literature at WikiSym 2011

2011-09-02 Thread Reid Priedhorsky
On 8/31/11 8:53 PM, Daniel Mietchen wrote:
> Dear Reid and Phoebe,
> 
> I would love to participate, but can't make it to WikiSym. Do you see
> a way to participate online?

Hi Daniel,

Glad to hear of your enthusiasm, and sorry to hear you won't be able to
attend. In terms of remote participation, I have a couple of suggestions.

1. Before the workshop, we'd love to hear any thoughts you might have.
Do you have time to briefly write up problems, solutions, observations,
etc. that you see in this space? If so, you could e-mail those to Phoebe
and myself; I'm sure they would be helpful in guiding the discussion.

2. One of the products of the workshop will be proposals for what do to
moving forward, for the community to consider, develop further, and
perhaps implement. We will publish and announce here. These will
necessarily include a strong, if not exclusive, online component. I
don't know what this will look like, but I'm sure there will be a great
need for participation by folks like yourself.

I think we do not have the infrastructure to offer meaningful remote
live participation during the actual workshop, sadly. We might be able
to do stuff like liveblogging or tweeting. I'll talk with Phoebe.

HTH,

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Workshop call for participation: WikiLit: Collecting the Wiki and Wikipedia Literature at WikiSym 2011

2011-08-31 Thread Reid Priedhorsky
Hi all,

Phoebe Ayers and I are leading a workshop at WikiSym this year,
"WikiLit: Collecting the Wiki and Wikipedia Literature". We would love
to have your participation!

This workshop has three key goals. First, we will examine existing and
proposed systems for collecting and analyzing the research literature
about wikis. Second, we will discuss the challenges in building such a
system and will engage participants to design a sustainable
collaborative system to achieve this goal. Finally, we will provide a
forum to build upon ongoing wiki community discussions about problems
and opportunities in finding and sharing the wiki research literature.

For more details, please see:

  http://www.wikisym.org/ws2011/workshop:wikilit

Please do not hesitate to ask questions, either by replying here on the
list or by contacting me or Phoebe (psay...@ucdavis.edu) directly.

Looking forward to seeing you at WikiSym!

Reid



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Action plan for AcaWiki hosting change

2011-04-29 Thread Reid Priedhorsky
On 4/26/11 4:45 PM, Jon Phillips wrote:
>
>> First, the current AcaWiki skin is not so good. For the reasons I've
>> mentioned earlier, I believe this negatively impacts our ability to grow the
>> community.
>
> Great, Fabricatorz is making a new theme. We will try to be as
> transparent as possible about this...ideas and thoughts on this super
> valuable!

My take, based on prior discussion, is that the consensus is that we 
just want to use the Vector theme and don't want to mess with designing 
a custom theme and keeping it up to date.

>> The reason I feel urgency is this: I'm itching to start work on the
>> annotated bibliography of wiki research we've been discussing on
>> wiki-research-l (which would quadruple the number of summaries in AcaWiki),
>> and I'm uncomfortable doing so with these two problems unsolved (or, at
>> least, without a concrete plan for solving them and a credible, short
>> timeline).
>
> I understand. I'm on this. Don't you worry :)
>
> Best thing next imo is to file a bug about this and spec out your plan
> and what it will take. Sounds like some development needed. Ideal if
> you can do everything with the current setup and not need ssh access.

See prior e-mails for my proposal. I'm happy to do the needed dev work, 
and no it wouldn't need ssh access.

Reid

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Action plan for AcaWiki hosting change

2011-04-29 Thread Reid Priedhorsky
On 4/28/11 7:33 PM, Erik Moeller wrote:
> Hi Jon,
>
> we're not ready to do this right now -- I suggest either staying with
> CC hosting or transitioning to Referata until we are (if you still
> want to move at that point). I'm happy to restart these conversations
> mid-August as we begin to get more serious about our labs effort. I
> targeted end of 2011 for a migration for a reason -- I do think
> Acawiki needs stable/performant hosting, and it needs to develop a
> real community, before we can move it over, and we need to have our
> ducks in a row to ensure a smooth transition.

Given that, I reinstate my plan to being the Referata process.

As an example of my worries with the current hosting situation, here are 
a couple of new observations from the past few days:

1. The clock on AcaWiki is wrong by around 5 minutes.

2. There is a Bugzilla bug saying that AcaWiki is running a version of 
SMW with known security vulnerabilities 
(https://bugzilla.wikimedia.org/show_bug.cgi?id=28661). It is a week old 
and has generated no interest from people who can fix it (or even anyone 
in the AW community, for that matter).

These are basic issues, and rather than chasing basic issues like these 
at CC, I would prefer to move somewhere else forthwith and worry about 
higher-level things.

I did receive a private e-mail from Angela Beesley at Wikia encouraging 
us to host there (which I replied to). I still feel this is unworkable 
due to Wikia's poor track record of letting wikis go elsewhere (we need 
to be free to change arbitrarily, since the WMF plan may not work out 
for whatever reason and circumstances may change arbitrarily). Also, I 
feel we have sufficient donation power to avoid ads on the pages. 
However, I'm happy to have this conversation.

Reid

p.s. I see the CC list is growing and growing - I have no idea whether 
any of these people are subscribed to the lists, and so I don't feel 
that I can trim it. I do apologize for duplicate and/or unwanted e-mail, 
and I'll respect any requests to be removed from the CC list.

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Action plan for AcaWiki hosting change

2011-04-26 Thread Reid Priedhorsky
On 4/26/11 4:06 PM, Jon Phillips wrote:
 >
> Hi Reid, thanks for the detailed discussion. Our best course of action
> is to transition to the virtualized setup at wikimedia foundation to
> minimize steps.
>
> There is no reason to transition twice if we can move to WMF
> infrastructure soon. There is no burning fire on CC hosting.

Hi Jon,

Thanks for the reply! An implicit purpose for my proposal was to light a 
fire under people, and that seems to have been successful.

I see two problems with the current setup which feel urgent to me.

First, the current AcaWiki skin is not so good. For the reasons I've 
mentioned earlier, I believe this negatively impacts our ability to grow 
the community.

Second, I am concerned about the reliability of the current hosting. For 
one, people have had difficulty getting problems resolved recently. This 
concern is shared with others - I know at least one person who is making 
regular dumps of AcaWiki because he does not have confidence in the 
backup plan (or, there isn't one).

Moving from CC to an intermediate provider would solve both these 
problems. However, I agree that one move is better than two - I'm happy 
to discuss other solutions as well.

The reason I feel urgency is this: I'm itching to start work on the 
annotated bibliography of wiki research we've been discussing on 
wiki-research-l (which would quadruple the number of summaries in 
AcaWiki), and I'm uncomfortable doing so with these two problems 
unsolved (or, at least, without a concrete plan for solving them and a 
credible, short timeline).

One scenario I would like to avoid is the move to WMF being "real soon 
now" for a long time, as that's a common issue with such moves. But, we 
could be on Referata within a few weeks easily.

Reid

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Action plan for AcaWiki hosting change (was: Proposal: new hosting for AcaWiki)

2011-04-26 Thread Reid Priedhorsky
Hi folks,

I've been on vacation for the past couple of weeks, so here is a 
catch-all response to the apparent list activity while I've been gone. I 
have *not* edited the CC list - sorry for any duplicates.

* I agree that a medium- to long-term goal of being folded into WMF is 
an excellent one, and that the short-term strategy of transitioning from 
Creative Commons hosting to a dedicated MediaWiki provider is the best 
path to that goal.

* I do think that Semantic MediaWiki rather than plain MediaWiki is very 
important because of the type of use researchers are interested in; 
e.g., we need to be able to do searches like "author = X" and get 
citations where the author is X rather than all citations containing X 
anywhere in the summary. My understanding is that SMW can do this but MW 
cannot. Thus, IMO, waiting longer for WMF migration in order to continue 
to use SMW is the better choice.

* I am not very interested in being early-stage tester for an immature 
hosting structure - having reliable hosting is much more important to me 
(and, I claim, most researchers) than speeding development of WMF labs 
infrastructure, particularly if WMF already has a development plan that 
they're happy with. However, if WMF really needs the testing, I am happy 
to have that discussion.

* Regarding Referata vs. Fabricatorz, my take is that Referata would be 
preferable because they are a dedicated SMW host while Fabricatorz 
appears not to be. Referata would be shared hosting, but IMO that's fine.

Given all that, I propose to do the following. And by propose, I mean "I 
will start doing this unless people complain". :)

1. Open a free account at Referata, which gets us acawiki.referata.com.
2. Work with AcaWiki people to get an importable dump, containing 
existing AcaWiki accounts, taking appropriate steps to ensure privacy of 
account holders.
3. Import that into acawiki.referata.com
4. Have the community prod/test the new site; iterate until we have 
consensus that moving to Referata is right and the migration went OK.
5. Work with AcaWiki people to redirect the acawiki.org domain name, 
figure out the financials, and make the final cut-over.

No data would be lost in the transition - people will still be able to 
log in using existing accounts, and all history will be preserved. The 
only change people not paying attention to this discussion would notice 
is some period of read-only and a change in skin.

So the basic steps are: (a) change hosts, (b) become even more awesome, 
and (c) get "bought out" by WMF.

Please reply if anything above seems like a bad idea, if I've missed 
anything, or if you have any other thoughts. I'll start doing stuff 
within a few days unless there are objections or people want to discuss 
further.

Reid

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [acawiki-general] Proposal: new hosting for AcaWiki

2011-04-02 Thread Reid Priedhorsky
On 4/2/11 5:59 AM, Jodi Schneider wrote:
>
> Yes--keeping the domain name is important. Otherwise, we break all
> links, and alienate existing users -- many of whom do not read these
> lists, and who may check the site infrequently. Since we're a nonprofit,
> we should ask about a discounted price.

OK, that makes sense. I am happy to make the inquiry later in the month, 
or someone else could do it sooner.

> Further, we might want to change hosting again sometime in the future
> (for instance if Referata went away or significantly changed).
>
> I see that "Referata offers hosting of semantic wikis" but I hadn't
> heard of it before, though WikiWorks is well-known. What's your
> connection with Referata, and how stable are they? It appears that
> hosting is funded by the fees, with the free hosting just coming along
> for the ride...

No connection. I suggest it for three reasons:

1. It's a concrete proposal for folks to respond to.

2. Yaron is apparently one of the principals of Semantic MediaWiki and 
thus invested in the MW/SMW community.

3. The only alternative I found in some brief searching was Wikia, which 
seemed less desirable because leaving Wikia seems to be hard (people 
report that they leave your wiki up even if you move somewhere else and 
ask for it to be turned off) and because they would put ads on it.

I'm happy to chip in on funding if needed.

> No--the existing skin needs improvement. Is there info about the default
> Refarata skin?

I did not check. They do seem to have the Vector skin available (that's 
the new one that Wikipedia uses, right?).

> It's great to have your offer of help for the transition. But one
> challenge is ongoing technical leadership. I'd like some clarification
> from Referata about what is included in hosting. Any volunteers for
> technical administration would be welcome, too!

Does this answer your questions?

http://referata.com/wiki/Referata:Features

I've proposed the "Ad-supported" ($50/mo list) and "Enterprise" ($80/mo) 
service levels. My take on ad-supported means that we could put our own 
ads on it if we want, but they don't put any of theirs, but that's 
something to clarify certainly.

I'm happy to be on top of technical issues and liaise with the vendor on 
getting stuff fixed, on a best effort basis (i.e., no guarantees of 
response time, no action if I'm on vacation, etc.). I have no interest 
in hacking LocalSettings.php, etc., but it looks like the vendor would 
take care of that.

I'm also happy to do the import/export tasks I proposed earlier.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Proposal: new hosting for AcaWiki

2011-04-01 Thread Reid Priedhorsky
[I am putting the most interesting stuff at the beginning, but there are 
some responses in-line below too.]

I spoke with Mako Hill, one of the principals at AcaWiki, last week and 
am now quite enthusiastic about the system. I believe the flaws can be 
fixed and we should take advantage of the small but present community 
which is hungry for new members.

The key problem is hosting, and Mako shares this concern. It turns out 
that the current hosts (Creative Commons) also would like to transfer 
hosting somewhere else, as running a MediaWiki is not part of their core 
mission. So, our interests align.

Thus, I propose that we move AcaWiki hosting to Referata, retaining all 
existing history and user accounts. If acawiki.referata.com is OK, then 
it can be done for free; if keeping the acawiki.org domain is important 
(in this case, the move would be completely transparent to users), then 
we'll need to find $50/mo; if keeping the existing skin is important 
(which think it is not - see below), then $80/mo.

AcaWiki folks, what do you think about this? Have I mischaracterized 
anything about you above?

I would be happy to lead this process (Mako gave me some technical 
people at CC to talk to) and could start on this in late April. (I have 
some unavoidable responsibilities in the next few weeks and won't have 
time until then.)

Once this is done, I could then facilitate the other technical tasks 
which I offered to do (uploading existing stuff into AcaWiki).

>> I don't necessarily believe that we need it to be the standard MW look
>> in all respects (though I personally like the consistency), but the wiki
>> controls need to be consistent with other MW installs (most importantly,
>> Wikipedia) so people can see easily that it's a wiki and in particular
>> one they've used before.
 >
> Actually, the controls seem to me to be quite similar to the standard
> Wikipedia layout. For example, look at
> http://acawiki.org/Measuring_user_influence_in_Twitter:_The_million_follower_fallacy.
> The page edit controls are on the top of the article, and the navigation
> bar is on the left, all very similar to Wikipedia. Since these key
> functional elements are very similar to the default, I assumed that your
> comments had more to do with the aesthetic elememts. Could you perhaps
> point out some specific differences in the core MediaWiki functionality
> elements that you think might confuse new users who are familiar with
> editing Wikipedia?

Hmm, looking again you are right. I'm not sure exactly what happened; 
perhaps I was confusing AcaWiki with something else.

Anyway, I still don't like the AcaWiki default skin. I could provide a 
specific critique of the problems I see, but it might be better to 
simply offer a better one for comparison, which I am happy to do. At a 
high level, it's a little sloppy, it wastes important vertical space, 
and standard elements (e.g., search, login) are in nonstandard 
locations. On the other hand, the default MW skin is very professional 
looking and gets these things right. It's another aspect of separation 
of responsibilities - let people who are good at web design design the 
pages.

> Actually, another reason for my comments is that I would assume that the
> core audience of contributors (academic researchers who are willing to
> share their research summaries online) would not have trouble trying to
> learn how to edit, even if AcaWiki used something other than MediaWiki.

That is true; however, many won't. Barriers to entry matter a lot more 
than one might think, even small ones. The basic theory is, folks who 
are new to a system don't care much about it and are easy to drive away 
by making small mistakes. On the other hand, if their initial experience 
is smooth and pleasant, and enables microcontributions right away, that 
builds emotional investment in the community and those people are more 
likely to come back and help build the community and the resource. 
Researchers in particular are very busy and (I claim) will have less 
patience than average to hassle with bad systems.

Reid

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Proposal: build a wiki literature review wiki-style

2011-03-24 Thread Reid Priedhorsky
On 3/23/11 2:56 PM, Chitu Okoli wrote:
>
> Actually, I feel grad student summaries are an excellent contribution.
> Although they might not be perfect, grad student seminar assignments
> would probably be the largest single source of stub articles. Multiply
> that by every academic field that requires grad students to summarize
> articles, and I think promoting the wiki as an outlet for grad student
> work would be the single most effective strategy to make it huge in just
> one or two years. I, for one, very much see grad students as a major
> contributing community.

Oh, I definitely agree that grad student contributions are tremendously 
valuable! (especially having been on until very recently)

My point was this: that writing for a lay audience and writing for 
fellow researchers (grad students included) are different tasks, and 
mixing them leads to reduced value for each audience.

I am fine with each paper having a "for laypeople" and "for researchers" 
section to the summary.

>> Right; what I meant was that while AW does use MW it doesn't *look like*
>> it does, and that's a barrier to entry, which matters. The default skin
>> needs to look more like default MediaWiki.
 >
> Actually, I don't agree with Reid on this point. Appearance is very much
> a subjective issue. Here's my purely subjective opinion:
> * I find it irritating that hundreds or thousands of MediaWiki instances
> all look like Wikipedia, as if MediaWiki didn't didn't have any skinning
> flexibility. (I'm assuming that when Reid says "look like the default
> MediaWiki", what he effectively means is "look like Wikipedia"; Reid,
> please correct me if I'm misunderstanding you.)
> * I like the AcaWiki interface; I wouldn't want to change it to look
> like Wikipedia.
>
> Less subjectively, I don't think that the appearance is a significant
> barrier to entry. Saying, "It works just like Wikipedia" should do the
> job fine to communicate the familiarity of the wiki language.

My concern is less with aesthetics than what the interface looks like it 
does (the "apparent affordances" to use some jargon). As an analogy, I'm 
sure many of you have encountered Java and Flash applications which have 
all the same GUI widgets (buttons, scroll bars, etc.) as native OS apps, 
but they look slightly different. Obviously one can overcome the 
differences, but unfamiliarity makes the apps harder to use and turns 
off newbies (or even experienced people who are sick of the 
"specialness"). (Kai's Power Tools is a classic offender in this regard 
- where are the controls in this screen shot and how do you use them? 
http://en.wikipedia.org/wiki/File:Kai%27s_Power_Tools.jpg).

I could certainly be wrong, but this is professional rather than 
personal opinion, as someone with an HCI education. Sorry for the lack 
of citations. I do agree that aesthetics is to some degree subjective.

I don't necessarily believe that we need it to be the standard MW look 
in all respects (though I personally like the consistency), but the wiki 
controls need to be consistent with other MW installs (most importantly, 
Wikipedia) so people can see easily that it's a wiki and in particular 
one they've used before.

Reid

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Proposal: build a wiki literature review wiki-style

2011-03-23 Thread Reid Priedhorsky
On 3/23/11 1:16 PM, Samuel Klein wrote:
>
> You could allow each biblio page to decide who its audience is.  If
> there is ever a conflict between a lay and a specialist audience, you
> can have two sets of annotations.  I'd like to see this happen in
> practice before optimizing against it.

I think that is workable if the two sides don't step on each other's 
toes too much. I am also coming around to the view that we should just 
try it and see what happens.

>> * It doesn't look like a MediaWiki. Since the MW software is so
>
> This is easy to fix -- people who like the current acawiki look can
> use their own skin.

Well, my concern is for newcomers who by definition don't have a skin 
configured. What I want this this reaction:

http://acawiki.org/whatever>
"Hey! This is MediaWiki! I know how to use this!"


But now I think the following reaction is more likely:

http://acawiki.org/whatever>
"Hmmm, what's this?"



These small barriers to entry matter. My basic argument is that 
leveraging familiarity by making it look like something people have seen 
before is more important than branding.

Reid

-- 
I work for IBM, and sending this e-mail might be part of my job.
However, I speak for myself only, not the company.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Proposal: build a wiki literature review wiki-style

2011-03-23 Thread Reid Priedhorsky
Jodi,

Great to have you chip in so quickly.

I am replying only on wiki-research-l to keep the thread from splitting 
too much.

>> I'm not ready to write off AcaWiki, but I have a number of significant
>> concerns. Some of these I've mentioned before. I'd really like someone
>> from that project to comment on these.
>>
>> * Is the project dead? The mailing list is pretty much empty and the
>> amount of real editing activity in the past 30 days is pretty low.
>
> Definitely not dead!

OK, your quick response is a great start.

However, do you have thoughts on the lack of mailing list activity and 
very low level of edits? Is there a real community around AcaWiki or 
just the desire for one?

>> * It appears that the project self-hosts - this means that the project
>> has to do its own sysadmin work,
>
> Neeru & Mike, can you comment on who's doing sysadmin work now?

My point here is: I would like to depend on pros for the sysadmin work, 
rather than volunteers, because there's no need for us to be sysadmins. 
Let the experts be expert on what they're expert in and all that.

Bottom line: right now I'm not persuaded that the AcaWiki hosting 
situation is stable. The key example is letting the domain expire and 
the apparent lack of access to someone who can fix it (see 
http://lists.ibiblio.org/pipermail/acawiki-general/2011-March/21.html and 
http://code.creativecommons.org/issues/msg2778).

> The main interest, from my perspective (others may be able to add their
> own), is in making research more accessible. Several AcaWiki users are
> grad students who are writing summaries in order to consolidate their
> own knowledge or prepare for qualifier exams.

OK, that's somewhat different than the goals being proposed in this thread.

I think that's a problem, but perhaps a surmountable one if different 
communities can have different standards for their papers. We (or I) 
need to be able to focus on writing "summaries" aimed at other 
researchers; if someone else wants to come along and add additional 
summaries for laypeople, that's fine. But (for example) if other people 
start rewriting our lit review text because it's too technical, I don't 
think it will work out.

>> * I don't think the focus on "summaries" is right. I think we need a
>> structured infobox plus semi-structured text (e.g. sections for
>> contributions, evidence, weaknesses, questions).
>
> I agree! Right now there's some structured information, but that could
> be readily changed. I'm definitely open to restructuring AcaWiki, so do
> propose this on the mailing list (acawiki-gene...@lists.ibiblio.org
> ), and we can discuss further.

Great!

Is there a sandbox where I can experiment (e.g., as in Wikipedia user 
subpages)?

I don't want a lengthy discussion on a mailing list about what the 
content of the infobox should be, nor agree across the entire set of 
disciplines - I'd like to just build one and then iterate with my own 
community until we agree it's good enough to start (in this case, the 
people who want to build a wiki research lit review).

> One ongoing issue is the best way to handle bibliographic
> information--which has subtle complexities which we're only partly
> handling now.

I'd be curious to learn more, though I'll defer that discussion. At a 
high level, a key concern I have is the perfect becoming the enemy of 
the good. (For example: dealing with two authors both named John Smith.) 
I do agree that a big flaw of using (S)MW for this project is the lack 
of any way to build a structure data model, unless I'm missing big parts 
of SMW. (RDF triples aren't enough.)

To be clear, what I'm interested in (for now) is not solving these 
problems but accepting a reasonably good but imperfect platform, which 
SMW is, and moving forward with the wiki research survey work.

I do have interest in building a better platform, but in the future.

>> * It doesn't look like a MediaWiki. Since the MW software is so
>> dominant, that means pretty much everyone who knows about editing wikis
>> knows how to use MW - and not looking like MW means there's no immediate
>> "aha! I can edit this". There's a lot of value in familiarity.
>
> Actually, AcaWiki uses MediaWiki -- specifically Semantic Media Wiki.

Right; what I meant was that while AW does use MW it doesn't *look like* 
it does, and that's a barrier to entry, which matters. The default skin 
needs to look more like default MediaWiki.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Proposal: build a wiki literature review wiki-style

2011-03-23 Thread Reid Priedhorsky
On 3/22/11 4:28 PM, Chitu Okoli wrote:
 >
 > Reid wrote:
>>
>> There also appear to be various options for Semantic MediaWiki hosting:
>> Wikia, Referata, etc. It would be nice to not have to deal with the
>> sysadmin aspects of the project.
 >
> I agree that going with a reliable host would be the way to go. I think
> that for the nature of our project, choosing a paid Referata plan would
> probably be better than going for Wikia. I for one could probably easily
> find grant funding to keep it going.

Sure. If nothing else I'd be happy to chip in personally. I could also 
ask around for funding here at IBM, but I'm quite pessimistic on that.

Paid plans run from $240 to $960/year, and we could certainly get 
started for free (http://www.referata.com/wiki/Referata:Features).

I'm not ready to write off AcaWiki, but I have a number of significant 
concerns. Some of these I've mentioned before. I'd really like someone 
from that project to comment on these.

* Is the project dead? The mailing list is pretty much empty and the 
amount of real editing activity in the past 30 days is pretty low.

* It appears that the project self-hosts - this means that the project 
has to do its own sysadmin work, which appears to have been a problem 
(e.g., the domain expired earlier this month and no one noticed until 
the site went down!).

* Is the target audience correct? I think we want to specifically target 
our annotated bibliography to researchers, but AcaWiki appears to be 
targeting laypeople as well as researchers (and IMO it would be very 
tricky to do both well).

* I don't think the focus on "summaries" is right. I think we need a 
structured infobox plus semi-structured text (e.g. sections for 
contributions, evidence, weaknesses, questions).

* It doesn't look like a MediaWiki. Since the MW software is so 
dominant, that means pretty much everyone who knows about editing wikis 
knows how to use MW - and not looking like MW means there's no immediate 
"aha! I can edit this". There's a lot of value in familiarity.

I will post an invitation on the AcaWiki mailing to come here and 
participate.

>> One final note on bibliographic software: many of these claim to do
>> automatic import of a reference simply by pointing the software at the
>> publisher's web page for the references. But I have never seen this work
>> correctly; always, the imported data needs significant cleanup, enough
>> that personally I'd rather type it in manually anyway. For example,
>> titles of ACM papers aren't even correctly cased on the official ACM
>> pages (e.g.,http://dx.doi.org/10.1145/1753326.1753615)!
 >
> My only experience with "scraping" pages is with Zotero, and it does it
> beautifully. I assume (but don't know) that the current generation of
> other bibliography software would also do a good job. Anyway, Zotero has
> a huge support community, and scrapers for major sources (including
> Google Scholar for articles and Amazon for books) are kept very well up
> to date for the most part.

Perhaps I'm just unlucky, then - I've only ever tried it on ACM papers 
(which it failed to do well, so I stopped).

>> Bi-directional synchronization is hard to get right, particularly when
>> the two sides have different data models. I think we are much
>> better off declaring one or the other to be the master and the rest
>> should remain read-only (i.e. export rather than synchronization).
 >
> I like this idea; with SMW as the primary, editable source, a read-only
> Zotero library imported from the SMW would work well. The problem,
> though, is that duplicate detection would need to prevent imports from
> adding existing articles. A complete overwrite would not work, since
> this would break article IDs for word processor integration. Zotero has
> been slow on implementing duplicate detection, but they finally have a
> very impressive solution in alpha
> (http://www.zotero.org/blog/new-release-multilingual-zotero-with-duplicates-detection/).

I don't know anything about how article IDs works in Zotero, but how to 
build a unique ID for each is an interesting, subtle, and important 
problem. Others have suggested using opaque IDs such as DOI. I think 
this is a mistake, because it means that they are utterly meaningless to 
people when creating citations. For example, consider the following two 
citations that I might put in my LaTeX code.

\cite{10.1145/1753326.1753615}
\cite{Panciera2010Lurking}

The first means nothing to me, but the second is a useful reminder as to 
the paper I'm citing. That's what CiteULike does, and it's built from 
first author, year, first meaningful word of title. In the tiny 
percentage of cases where this is not unique, a disambiguation digit 
could be added.

I don't know how citation works in Word et al., but I would hope you're 
not stuck with opaque numeric IDs and/or that Zotero doesn't force you 
to use integers or something like that.

Reid

___
Wiki-research

Re: [Wiki-research-l] Proposal: build a wiki literature review wiki-style

2011-03-22 Thread Reid Priedhorsky
Hi Chitu,

Some reactions inline.

> One of my most important functionalities would be automatic citations
> into papers that I'm working with. I haven't used a wide variety of
> citation managers, but the functionality in EndNote and Zotero is
> what I'm talking about; I just don't see how a MediaWiki instance
> could do that, unless some standardized bibliographic information be
> embedded into each article page to begin with.

Agreed. What I envision is that we write a script which would export the
bibliographic data into whatever formats people prefer: BibTeX (my own
preferred format), Zotero, EndNote, whatever. I'm happy to write this
script for relatively sane formats. This would then let people create
citations in the way they usually do.

> Moreover, as Dario and Felipe explained, while far from perfect, the
> search capabilities of dedicated bibliography managers is far
> superior to what I presently see in MediaWiki.

(I don't know the details of MediaWiki search well, so some of the
following may not be quite right.) What MediaWiki would give us is
fulltext search. So while it would be easy to search for "John Smith",
and that query would find papers authored by John Smith plus perhaps
other stuff; however, one cannot search for "author = John Smith" and
get only results where the author field matches John Smith and no others.

However, it does seem like Semantic MediaWiki has this type of search
and otherwise behaves much like plain MediaWiki.

Maybe you or others could say more about what types of search are
important to you?

AcaWiki uses Semantic MediaWiki, I believe. However, I have some
reservations about it:
* Is it sufficiently stable? (e.g., the FAQ says they "just launched"
but the page has not been edited in two years.
* The focus on "summaries" worries me, and the target audience is
laypeople? We're talking about an annotated bibliography targeted at
researchers, which is a different audience.
* I don't care for the user interface (this is a mix of personal opinion
and professional opinion as an HCI researcher).

As-is, I'm not very interested in AcaWiki. But, if there is an
opportunity to make significant changes, then it seem plausible. I would
want to know about hosting, backups, etc. make sure that it is a
reliable platform.

There also appear to be various options for Semantic MediaWiki hosting:
Wikia, Referata, etc. It would be nice to not have to deal with the
sysadmin aspects of the project.

One final note on bibliographic software: many of these claim to do
automatic import of a reference simply by pointing the software at the
publisher's web page for the references. But I have never seen this work
correctly; always, the imported data needs significant cleanup, enough
that personally I'd rather type it in manually anyway. For example,
titles of ACM papers aren't even correctly cased on the official ACM
pages (e.g., http://dx.doi.org/10.1145/1753326.1753615)!

Bibliographic software then also typically does not include the proper 
metadata for automatically lower-casing titles in citations. For 
example, the title "Path Selection: Novel Interaction Technique for 
Wikipedia" should be lower-cased as "Path selection: Novel interaction 
technique for Wikipedia". But so often I see papers with "Path 
selection: novel interaction technique for wikipedia". It's embarrassing.

But, if we were writing our own (e.g.) MediaWiki -> BibTeX export
script, we could automatically note that "Novel" should be capitalized
(because it begins the subtitle) as well as provide for people to
indicate explicitly title words that should remain capitalized. (In this
instance, the proper BibTeX export syntax would be "Path Selection:
{Novel} Interaction Technique for {Wikipedia}".)

> Would it be feasible to have both, and use them concurrently so that
>  researchers could use one or the other, or both, as they prefer? I'm
>  thinking of something like this (for purpose of illustration, let's
> call the chosen MediaWiki instance MW and the chosen dedicated online
> shared bibliographic tool BT):

Bi-directional synchronization is hard to get right, particularly when 
the two sides have different data models. I think we are much
better off declaring one or the other to be the master and the rest
should remain read-only (i.e. export rather than synchronization).

To be clear, I'm offering to do the following things:

1. Help define a reasonable starting summary template for papers.
2. Build the proper MediaWiki infoboxes and whatnot to realize what we
decide from #1 (perhaps concurrently to that discussion, to facilitate it).
3. Write a script to import some sane text-based format to this
MediaWiki instance. (I assume Zotero can export such a format.) This
would be run once or a few times for initial import, not regularly for
synchronization.
4. Write a script to export the MediaWiki data to BibTeX and one or two
other sane text-based formats, and arrange for it to be run frequently
so people's paper ci

Re: [Wiki-research-l] Fwd: Proposal: build a wiki literature review wiki-style

2011-03-21 Thread Reid Priedhorsky
On 3/18/11 12:30 PM, Dario Taraborelli wrote:
>
> There are excellent free and standards-based services out there
> designed precisely to allow groups of researchers to collaboratively
> import, maintain and annotate scholarly references. Zotero is one of
> them, others are: CiteULike, Bibsonomy, Mendeley, Connotea. My
> feeling is that the majority of people on this list are already using
> one of these services to maintain their individual reference
> library.

My take on these software is different: all of the ones I've tried are 
really rather bad.

* Zotero - software to install, clunky UI.
* CiteULike - clunky, sharing is hard, weird duplication of publications.
* Bibdex - seems to be run by a private company which is one guy, no 
blog activity since April 2010, login required.
* Mendeley - non-free software to install, clunky?
* Bibsonomy - couldn't figure out how to use it, lots of bibliographic 
database noise in the interface that gets in the way
* Connotea - run by a private company, login required (I didn't create a 
login so I don't know if the UI is any good), API seems limited???

I for one do not use any of these. It's either a cobbled-together BibTeX 
document or my own Yabman software, which has a lot of flaws but is at 
least fast for putting together a paper's ref list and getting a 
decently formatted BibTeX file.

The main benefit of doing it with Mediawiki is that has a nice clean 
interface and it's super easy to get started - just go to the website 
and edit. No login required, nothing to install, no software to learn 
(other than a very basic knowledge of wiki markup). We know this is a 
big reason Wikipedia is successful, and that barriers to entry, even if 
small, really discourage people from getting started, and if they don't 
get started they don't develop into core contributors.

There is also a rich ecosystem of support software (e.g. It's All Text 
extension and Emacs wikipedia-mode). Bottom line, we're asking people to 
commit to spending whole days of work in the system. Would I do that in 
MediaWiki? Yes, definitely. Any of the other bibliographic software 
mentioned above? No.

I would be more than happy to use something other than Mediawiki, but 
thus far nothing that seems acceptable to me has been suggested.

Others in this thread have mentioned projects similar to what I suggest:

* AcaWiki - This is similar to what I suggest, though the template used 
for papers needs work IMO. Could be a plausible starting point. The fact 
that it doesn't look like regular Mediawiki is a drawback.

* BredeWiki - Very much along the lines of what I suggest.

I think a key goal here is to not let the perfect become the enemy of 
the good. We can start a Mediawiki-based bibliography *now* and easily 
mold it into something which meets our needs quite well. If we want to 
add on fancy connector later, that's fine; but IMO simple exporters 
would be plenty for most uses.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Proposal: build a wiki literature review wiki-style (was: Re: Wikipedia literature review - include or exclude conference articles)

2011-03-16 Thread Reid Priedhorsky
Chitu and others,

I too see great need for a comprehensive survey paper in this field. My 
own personal interest is in one that covers wiki research in general, 
not just research of Wikipedia; this of course makes the intractable 
number of papers even more intractable.

In fact, I am involved with a team of researchers with the same goal as 
you, though we are just getting started.

It seems to me that you are in a very difficult position. As others have 
noted, the scoping filter you propose is not a good one, but the number 
of papers is simply intractable without a very aggressive filter that 
excludes 2/3 or more of the known papers. (To further complicate the 
issue, I am skeptical of machine filtering period, fearing that any 
useful filter would necessarily be complex and difficult to justify in a 
writeup.)

However, I believe that there is a solution, and that is to dramatically 
increase the team size by doing the analysis wiki style. Rather than a 
small team creating the review, do it in public with an open set of 
contributors. Specifically, I propose:

1. Create a public Mediawiki instance.
2. Decide on a relatively standardized format of reviewing each paper 
(metadata formats, an infobox, how to write reviews of each, etc.)
3. Upload your existing Zotero database into this new wiki (I would be 
happy to write a script to do this).
4. Proceed with paper readings, with the goal that every single paper is 
looked at by human eyes.
5. Use this content to produce one or more review articles.

The goals of the effort would be threefold.

* Create an annotated bibliography of wiki research that is easy to keep 
up to date.
* Identify the N most important papers for more focused study and 
synthesis (perhaps leading towards more than one survey article).
* Provide metadata on the complete set of papers so that it can be 
described statistically.

Simply put, I believe that we as modern researchers need to be able to 
build survey articles which analyze 2,000-5,000 or more papers, and 
maybe this is a way to do that.

I and the other members of my team have already planned significant time 
towards this effort and would be very excited to join forces to lead 
such a mass collaboration.

Why use Mediawiki rather than Zotero or some other bibliography manager? 
First, it would be easy for anyone to participate because there is no 
software to install, no database to import, etc. Second, I personally 
have found Zotero, CiteULike, and every other bibliography manager I've 
tried to be clunky and tedious to use and not flexible enough for my 
needs (for example, three-state tags that let us say a paper has, does 
not have, or we do not know if it has, a certain property could be 
useful). We can always export the data into whatever bibliography 
software is preferred by particular authors.

Authorship is of course an issue, and one that should be worked out 
before people start contributing IMO, but not an intractable one, and 
there is precedent for scientific papers to have hundreds of authors 
(and it would certainly be in the wiki spirit). I myself would love to 
have a prominent place in the author list, but having the survey article 
written at all is a much higher priority.

Finally, one of my dreams has been to create a more or less complete 
database of *all* scientific publications, with reviews, a citation 
graph, private notes, and a robust data model (e.g., one that can tell 
two John Smiths apart and know when J. Smith is the same as John Smith). 
Maybe this is the first step along that path. (I did work a bit on data 
models for citation databases a bit about five years go and still use 
the software I created - Yabman, http://yabman.sf.net/.)

Thoughts?

Reid

p.s. Chitu, do you subscribe to this list? If so, we'll stop CC'ing you; 
if not, I encourage you to do so - it's pretty low traffic and certainly 
relevant to your work.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] There is no silver identifier

2010-07-21 Thread Reid Priedhorsky
On 07/21/2010 03:36 PM, Jakob wrote:
> Hi,
> 
> Talking about identifiers for bibliographic records I just want to  
> stress one crucial point:
> 
>> This gives us the following key, guaranteed to be unique:
>> KangHsuKrajbich20091011b
> 
> There is absolutely no such thing as a "guaranteed unique identifier"  
> that can be derived from existing metadata. You will *always* have  
> false positives (different publications get the same identifier [1])  
> and false negatives (same publication has different identifiers [2]).  
> Fuzzy identifiers even occur if they are created by the publisher or  
> author himself (for instance duplicate ISBNs for definitely different  
> editions or even totally different books). If you argue about  
> identifiers please keep in mind that you *always* talk about  
> heuristics but not about something "unique per se". Existing  
> identifiers only differ in the ratio of false positives and false  
> negatives.
> 
> The only way you may get unique identifiers is to assign your own  
> identifiers that are *not* derived from the content - such as  
> auto-incremented record ids in a database. Even then they are not  
> unique if you change the content because the identity of the object  
> may change.

I haven't been following this thread, but the way I addressed this in my 
own bibliography manager (http://yabman.sourceforge.net/) is: the BibTeX 
key is the first author's name (lowercased) plus an auto-incremented ID. 
So for example, one of my papers is "priedhorsky229". 229 is arbitrary, 
but there's only a few 3-digit numbers per author, so I don't get confused.

Now in a large system, that would obviously break down into the long, 
incomprehensible CiteULike-type IDs.

A compromise could be that the ID is the first author's name plus an 
auto-incrememented ID per author. So for example, the first paper of 
mine the system learns is priedhorsky1, the second priedhorsky2, etc. So 
you get a system-generated ID for uniqueness but also something 
comprehensible for people.

HTH,

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] article length and rating

2010-07-10 Thread Reid Priedhorsky
On 7/10/10 8:39 AM, Gorbatai, Andreea wrote:
> Hi -
>
> I am interested in looking at how evolution of articles in the
> English Wikipedia in terms of length and article rating (within
> WikiProjects) is affected by participation from anonymous and
> registered contributors. Does anyone have an earlier dataset - the
> article length and rating I have is from May 2010. Data from late
> 2008-early 2009, one or two datapoints/ timeslices would be greatly
> appreciated.

Andrea,

Couldn't this be easily computed from the full dump with history, though 
you'd have to parse it which is obviously rather time-consuming.

I know our team has lots of tools for parsing dumps and making them 
easier to work with. Send me a direct e-mail if you want to be put in touch.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Quality and pageviews

2010-06-03 Thread Reid Priedhorsky
Brian J Mingus wrote:
> -- Forwarded message --
> From: Brian 
> Date: Wed, Jun 2, 2010 at 10:46 PM
> Subject: Re: [Wiki-research-l] Quality and pageviews
> To: Liam Wyatt 
> 
> 
> Interestingly, the result is negative. The correlation coefficient between
> 2500 featured articles and 2500 random articles is .18 which is very low. I
> also trained a linear classifier to predict the quality of an article based
> on the number of page views and it was no better than chance.

That reminds me of an incidental finding from our 2007 work: we wanted 
to use article edit rate to predict view rate, but there was no 
correlation between the two.

Reid


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Access to HTTP access logs for Wikipedia articles?

2010-04-23 Thread Reid Priedhorsky
Anthony wrote:
>
>>> Access to this information poses
>>> no risk to users' privacy since no user information is made available
>>> - sessions' id, hour/minute timestamp data and IPs could be easily
>>> discarded.
>>
>> What if your referer was your facebook personal page leaking your full
>> real name?
> 
> And what if you're in the sample?  I find it quite inappropriate that even
> sampled data like this is being released.

It's not. The sample data we get is sequence number, timestamp, URL 
requested. That's it.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Access to HTTP access logs for Wikipedia articles?

2010-04-13 Thread Reid Priedhorsky
On 04/13/10 10:09, Felipe Ortega wrote:
> Hi Sérgio,
> 
> Some universities (like ours) receive a 1/100 sample of the whole set
> of petitions processed by Wikimedia Squid servers.
> 
> It is provided on direct request, however. As far as I know the data
> is not consistently archived in a public repository anywhere (but I
> maybe unaware of some system storing that info).

At UMN, we have a 1/10 sample going back to spring 2007. We're happy to 
share, but it's pretty unwieldy and other resources are often sufficient.

We don't have the referrer data, though. IIRC it's just timestamp, 
requested URL.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [WikiEN-l] strategy QOTW

2009-12-05 Thread Reid Priedhorsky
Piotr Konieczny wrote:
>
> Good points. But as (few threads earlier) be lack any dedicated 
> publication to Wikiepdia (and as far as I know Wikimania doesn't issue 
> peer review compilations, and is not seen as a "real" academic 
> conference), works on Wikipedia are submitted to various traditional 
> outlets, and reviewed by traditional experts - which may be really good 
> in their branch of academia, but probably 1) are not Wikipedians and 2) 
> have not read much research on Wikipedia.
> 
> I also wonder how many research pieces on Wikipedians are written by 
> scholars who 1) are not Wikipedians 2) do not realize that there is an 
> already large body of literature on the subject...

I'm not sure to what degree this is of concern. Personally, I have on 
the order of 500 Wikipedia revisions and have a vague notion of how WP 
policy works - certainly not expert status, but I'd guess on the order 
of 95th percentile of WP users who have edited at least once. Thinking 
of my colleagues who also write and review Wikipedia work, this is not 
atypical. Certainly my GroupLens colleagues understand policy far better 
than I.

I regularly review for CHI, CSCW, and other high-reputation 
"traditional" venues, and I also regularly reject work which is lacking 
in understanding of how Wikipedia works and the existing literature. I 
see other reviewers doing this as well, and the associate chairs (who 
manage each paper's review) are generally well-versed in Wikipedia and 
have similar high standards.

In general, understanding the user community in question is a must for 
any credible research, so if you can provide evidence that the flaws you 
identify are leading to systematic problems in the work that is 
happening, I strongly encourage you to write it up and submit it to the 
traditional venues. This kind of process critique would be very well 
received.

HTH,

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Quantitative evaluation of wikis

2009-12-05 Thread Reid Priedhorsky
On 12/05/09 08:13, Nicholas Moreau wrote:
 >
> Featured status on wikis might not be a reliable indicator, even on
> those wikis that have the process.
> 
> Case in point Muppet Wiki: they started a FA process, and received a
> burst of nominations and approvals for a few months... after a while
> though, it became a "meh"... there was no real reason to take time
> acknowledging successes, it took away from working on the project
> itself. Many of the highest quality articles go unacknowledged,
> because no one wants to bother nominating.

I remember vaguely someone studying this question in Wikipedia (Ed 
Chi?). I believe the conclusion was that FA status had high precision 
but poor recall: i.e., FA articles were reliably high-quality but there 
were many high-quality articles that were not FA.

I could very well be wrong or misremembering, so take with a grain of salt.

Reid


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Journal?

2009-09-25 Thread Reid Priedhorsky
On 09/25/09 06:59, Liam Wyatt wrote:
> 
> But I think that this issue (that of "but would academics * actually*
> write for this Journal?") is the one piece of the proposal that is 
> the genuine and acceptable risk. [...] The risk of the Journal
> failing because of a lack of interest from academics is indeed a
> possibility. But, I think that is the thing that needs to be tested.
> Academics have never yet been given academically legitimate reasons
> to participate and I would like to give them the option. If the
> Journal were to fail for lack of interest from Academics, then that
> is a very important lesson and worth the effort of learning it.

Sorry to be a party pooper. But, I think that lack of interest from 
academics is not a risk, it's a near-certainty. There are already plenty 
of journals and conferences out there, and I can tell you now that we 
would not be submitting anything.

Now, if the goal is to bring the whole of Wikipedia-related research 
into one place -- which is a good one, though I would extend it to all 
wiki research since Wikipedia is just one example and (IMO) over-studied 
to the exclusion of other systems -- then a (preferably online) 
publication which put out summaries/reviews of wiki research wherever 
it's published (think the page on Wikipedia, but better) would be highly 
desirable. Math does this sort of thing to great success, I think.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Reid Priedhorsky
On 08/20/2009 11:34 AM, Gregory Maxwell wrote:
> On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohde wrote:
> [snip]
>> When one downloads a dump file, what percentage of the pages are
>> actually in a vandalized state?
> 
> Although you don't actually answer that question, you answer a
> different question:
> 
> [snip]
>> approximations:  I considered that "vandalism" is that thing which
>> gets reverted, and that "reverts" are those edits tagged with "revert,
>> rv, undo, undid, etc." in the edit summary line.  Obviously, not all
>> vandalism is cleanly reverted, and not all reverts are cleanly tagged. 
> 
> Which is interesting too, but part of the problem with calling this a
> measure of vandalism is that it isn't really, and we don't really have
> a good handle on how solid an approximation it is beyond gut feelings
> and arm-waving.

We looked into this a couple of years ago and came up with a similar 
number (though I won't quote it because I don't quite remember what it 
was), though we estimated the probability that a viewer would encounter 
a damaged article rather than how many articles were currently damaged.

We used the term "damaged" instead of "vandalized" for essentially the 
reasons you mention (though I confess I didn't fully read your whole 
letter).

Priedhorsky et al., GROUP 2007.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] WIkipedia proposal: internal IRB for research

2009-06-03 Thread Reid Priedhorsky
On 06/03/2009 05:29 PM, Alexander Foley wrote:
 >
> I think a better course of action might be to establish a Wikipedia  
> Experiments subgroup of users who opt-in to participate in  
> experiments, much like what Google does with its experimental  
> features.  You're limiting the sample quite a bit, and quite possibly  
> only getting involved or heavily involved Wikipedia users, but if your  
> core survey group is editors it would likely be ideal.

I agree. I'd extend this notion a bit: my impression is that most people 
are more than happy to be solicited for studies (provided it doesn't 
happen too often). So I'd suggest two components:

1. The opt-in defines specifically how frequently one can be solicited 
(e.g. N times per year).

2. The opt-in is widely pushed: highly visible on account creation, and 
all existing users get one (1) invitation to opt in.

I think #2 is important because a de facto policy that only heavily 
involved Wikipedians participate in research would be severely limiting 
to the work we do.

Red

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] soliciting participants for Wikipedia studies

2008-11-19 Thread Reid Priedhorsky
Piotr Konieczny wrote:
> 
> Is asking for a survey spamming? That's a good question. If we could 
> raise it on a community page and get a consensus that it is not, than we 
> could potentially create a bot that could be fed a survey and would 
> deliver it to x random users via the above page.

One of my colleagues was metaphorically eaten alive for posting requests 
for research participation on user talk pages (a few dozen IIRC). In her 
case the users in question were chosen fairly specifically (perhaps by 
the kinds of pages they edited?), and I think she had 100-200 candidate 
participants.

I've bcc'ed her in case she wants to chime in.

Let me voice my strong support for Phoebe's effort. I'm not sure what I 
can do personally (though please enlighten me if you know), but some 
process with community buy-in for soliciting random Wikipedia users for 
research purposes would be extremely useful to the research community.

Reid



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] "Regular contributor"

2008-11-16 Thread Reid Priedhorsky
Platonides wrote:
 >
> Desilets, Alain wrote:
 >>
>> Regarding this, I have had heard different stories about
>> contributors.
>> 
>> I seem to recall one study that concluded that, while 85% of the
>> **edits** are done by a small core of contributors, if you take a
>> random page and select a sentence from it, this sentence is more
>> likely to be the result of edits by contributors from the "long
>> tail" than core contributors. I forget the reference for that study
>> though.
>> 
>> Does someone on this list have solid information about this? I
>> think it's a fairly crucial piece of information that we should
>> have a clear handle on as a research community.
>> 
>> Alain
> 
> It was a research by Aaron Swartz 
> http://www.aaronsw.com/weblog/whowriteswikipedia

I led a study last year that found that the long tail was even longer 
than it usually is (i.e., the "elite" contributors contribute even more 
than they would be expected to).

Specifically, the 0.1% of editors who edited the most times contributed 
about half the "value" of Wikipedia, when value is measured by words 
times views.

http://www-users.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf

End of shameless plug. ;)

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikimedia access data

2008-09-02 Thread Reid Priedhorsky
Phanikumar Bhamidipati wrote:
> Hi All,
> 
> We are two research students looking for Wikipedia access data. We tried to
> use the statistics available at http://dammit.lt/wikistats/. But, we would
> like to know these data in detail: drilled down to per page access, i.e.,
> triplets of the form .
> 
> Could you please let us know if we can get such information? The IP details
> can be anonymous, if required. We are only looking for a detailed Wikipedia
> page access log information.

Wikimedia was kind enough to share a 1/10 streaming sample of their 
access logs with us and several other researchers. I do not know if they 
still do this. It's a LOT of data: several gigs per day even after 
compression. They consider IP addresses to be private data and share 
only .

Our contact for this is Tim Starling, [EMAIL PROTECTED] I think.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] enwiki database dumps

2008-01-03 Thread Reid Priedhorsky
Truran, Mark wrote:
> 
> Does anyone know what is going on here?

Basically, dumps on the large wikis have been broken for ages and WMF 
has been having considerable trouble getting them rolling again.

I have a November 2006 dump that I'm happy to share.

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wiki research papers from my group

2007-10-04 Thread Reid Priedhorsky
Hi folks,

I am the lead author on two wiki research papers my group at the 
University of Minnesota is publishing this fall.

The first focuses on Wikipedia:

* Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. Lam, Kathering 
Panciera, Loren Terveen, John Riedl. "Creating, Destroying, and 
Restoring Value in Wikipedia." To appear in Proc. GROUP 2007. 10 pages.

* Link: http://www.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf

* Abstract: Wikipedia’s brilliance and curse is that any user can edit 
any of the encyclopedia entries. We introduce the notion of the impact 
of an edit, measured by the number of times the edited version is 
viewed. Using several datasets, including recent logs of all article 
views, we show that an overwhelming majority of the viewed words were 
written by frequent editors and that this majority is increasing. 
Similarly, using the same impact measure, we show that the probability 
of a typical article view being damaged is small but increasing, and we 
present empirically grounded classes of damage. Finally, we make policy 
recommendations for Wikipedia and other wikis in light of these findings.

The second is not Wikipedia-focused, and as such not really on topic for 
this list, but as I'm already sending a mail, I thought I'd include it. 
If you are interested only in research directly related to Wikipedia, 
you can stop reading now.

* Reid Priedhorsky, Benjamin Jordan, Loren Terveen. "How a Personalized 
Geowiki Can Help Bicyclists Share Information More Effectively." Short 
paper. To appear in Proc. WikiSym 2007. 6 pages.

* Link: http://www.cs.umn.edu/~reid/papers/wiki09s-priedhorsky.pdf

* Abstract: The bicycling community is focused around a real-world 
activity - navigating a bicycle - which requires planning within a 
complex and ever-changing space. While all the knowledge needed to find 
good routes exists, it is highly distributed. We show, using the results 
of surveys and interviews, that cyclists need a comprehensive, 
up-to-date, and personalized information resource. We introduce the 
personalized geowiki, a new type of wiki which meets these requirements,
and we formalize the notion of geowiki. Finally, we state some general 
prerequisites for wiki contribution and show that they are met by cyclists.

Questions and comments welcome.

Take care,

Reid Priedhorsky
Graduate Research Assistant
GroupLens Research, http://www.grouplens.org


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikis as a tool for fostering emergence of communities

2007-08-29 Thread Reid Priedhorsky
Desilets, Alain wrote:
> 
> I need a good solid reference to substantiate the following claim:
> 
> "Besides leading to high quality content, wikis have been shown to be
> good tools for fostering the emergence of active communities"
> 
> Does anyone know of a good research paper that looks specifically at
> this kind of impact of wikis?

Hi Alain,

The following work by Dan Cosley et al. argues that wikis and 
traditional review-before-publication result in the same quality, but 
wiki gets there faster:

Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2006). Using 
Intelligent Task Routing and Contribution Review to Help Communities 
Build Artifacts of Lasting Value. Proc. CHI 2006. 
http://grouplens.org/papers/pdf/itr-chi2006.pdf

(Full disclosure: this work is a product of my own research group.)

HTH,

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Historical articles served rate?

2007-05-01 Thread Reid Priedhorsky
Dear wiki-researchers,

I am looking for historical articles-served rate for wikipedia. I know 
that historical logs don't exist, but I'm looking for ballpack figures 
on articles served per day (or whatever intervals available).

Any suggestions on where to look would be most welcome.

Many thanks,

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l