Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-27 Thread Finn Aarup Nielsen



On Tue, 27 Jul 2010, John Vandenberg wrote:



On Tue, Jul 27, 2010 at 12:06 AM, Jodi Schneider
 wrote:

...
[3] Other side-effects might be helping to identify what's highly cited in
Wikipedia (which would be interesting -- and might help prioritize
Wikisource additions), automatically adding quotes to Wikiquote, ...


I don't think this has been raised on this list.

The academic journals project hosts "Journals cited by Wikipedia"
using the {{cite}} data.  It is broken down by usage count.

http://en.wikipedia.org/wiki/WP:JCW


I also have statistics of that sort. The corresponding to your "Top 
journals"


http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Popular1

is this:

http://neuro.imm.dtu.dk/services/wikipedia/enwiki-20080312-ref-articlejournal_highlycited.html

From the 2008 dump and based on the 'cite journal' template. For some of 
the statistics I skipped the citations added automatically from the 
"Protein Box Bot". I have built a small file which can aggregate the 
different names of popular journals. It is available from here:


http://neuro.imm.dtu.dk/services/brededatabase/wojous.xml

and my be useful for WP:JCW.

On the same site is results from different clusterings of the Wikipedia 
citations, for example:


http://neuro.imm.dtu.dk/services/wikipedia/enwiki-20080312-ref-articlejournal_clustering_10.html


The main page is 
http://neuro.imm.dtu.dk/services/wikipedia/citejournalminer.html

/Finn

___

 Finn Aarup Nielsen, DTU Informatics, Denmark
 Lundbeck Foundation Center for Integrated Molecular Brain Imaging
   http://www.imm.dtu.dk/~fn/  http://nru.dk/staff/fnielsen/
___
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-26 Thread John Vandenberg
On Tue, Jul 27, 2010 at 12:06 AM, Jodi Schneider
 wrote:
>...
> [3] Other side-effects might be helping to identify what's highly cited in
> Wikipedia (which would be interesting -- and might help prioritize
> Wikisource additions), automatically adding quotes to Wikiquote, ...

I don't think this has been raised on this list.

The academic journals project hosts "Journals cited by Wikipedia"
using the {{cite}} data.  It is broken down by usage count.

http://en.wikipedia.org/wiki/WP:JCW

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-26 Thread Samuel Klein
Jakob writes:
> there already *are* communities that collect and share bibliographic data

I would be happy if anyone does what I was describing; no point in
reinventing what already exists.  But I have not found it:

I mean a public collection of citations, with reader-editable
commentary and categorization, for published works.  Something that
Open Library could link to from each of its books, that arXiv.org and
PLoS could link to from each of its articles.   Something that, for
better or worse, Wikipedia articles could link to also, when they are
cited as sources.


Jodi Schneider  wrote:
>
> I think focusing on Wikimedia's citation needs is the most promising,
> especially if this is intended to be a WMF project.

Agreed.  That is clearly the place to start, as it was with Commons.

And, as with Commons, the project should be free to develop its own
scope, and be more than a servant project to the others.  That scope
may be grand (a collection of all educational freely licensed media; a
general collection of citations), but shouldn't keep us from getting
started now.

> As for mission -- yes -- let's talk about what problem we're trying to
> solve. Two central ones come to mind:
> 1. Improve verifiability by making it possible to start with a source and
> verify all claims made by referencing that source [1]
> 2. Make it easier for editors to give references, and readers to use them [2]
< others?  [3]

3. Enable commenting on sources, to discuss their reliability and
notability, in a shared place.  (Note the value of having a
multilingual discussion here: currently notions of notability and
reliability can change a great deal across language barriers)

4. Enable discussing splitting or merging sources, or providing
disambiguations when different people are confusingly using a single
citation to refer to more than one source.

> To figure out what the right problems are, I think it would help to look at
> the pain points -- and their solutions -- the hacks and proposals related to
> citations. Hacks include plugins and templates people have made to make
> MediaWiki more citation-friendly. Proposals include the ones on strategy wiki.
<
> Some of the hacks and proposals are listed here:
> http://strategy.wikimedia.org/wiki/Category:Proposals_related_to_citations
> Could you add other hacks, proposals, and conversations...?

Thanks for that link.

Sam.


> [1] This can be done using backlinks.
>  http://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:Greenwood%26Earnshaw  )
> [2] I think of this as "actionable references" -- we'd have to explain
> exactly what the desirable qualities are. Adding to bilbiographic managers
> in one click is one of mine. :)
> [3] Other side-effects might be helping to identify what's highly cited in
> Wikipedia (which would be interesting -- and might help prioritize
> Wikisource additions), automatically adding quotes to Wikiquote, ...


-- 
Samuel Klein          identi.ca:sj           w:user:sj

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-26 Thread Jodi Schneider

On 24 Jul 2010, at 23:01, Jakob wrote:
> ...to attract more then a little fraction of the declining  
> number of Wikipedia authors we need a clear mission and usable  
> software for this task - I seen neither the one nor the other.

I think focusing on Wikimedia's citation needs is the most promising, 
especially if this is intended to be a WMF project.

As for mission -- yes -- let's talk about what problem we're trying to solve. 
Two central ones come to mind:
1. Improve verifiability by making it possible to start with a source and 
verify all claims made by referencing that source [1]
2. Make it easier for editors to give references, and readers to use them [2] 
Are those the right problems? Are there others? [3]

To figure out what the right problems are, I think it would help to look at the 
pain points -- and their solutions -- the hacks and proposals related to 
citations. Hacks include plugins and templates people have made to make 
MediaWiki more citation-friendly. Proposals include the ones on strategy wiki.

Anybody want to take a look through?

Some of the hacks and proposals are listed here:
http://strategy.wikimedia.org/wiki/Category:Proposals_related_to_citations
Could you add other hacks, proposals, and conversations related to citations, 
if you know of them? 

-Jodi




[1] This can be done using backlinks. 
 
http://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:Greenwood%26Earnshaw
  )

[2] I think of this as "actionable references" -- we'd have to explain exactly 
what the desirable qualities are. Adding to bilbiographic managers in one click 
is one of mine. :)

[3] Other side-effects might be helping to identify what's highly cited in 
Wikipedia (which would be interesting -- and might help prioritize Wikisource 
additions), automatically adding quotes to Wikiquote, ...

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-24 Thread Jakob
David wrote:

> There's a difference between a project to centralize the various
> references in Wikipedia, and an attempt to build a universal
> bibliographic database. The first is a reasonable project, though I
> think everyone involved has underestimated the extent to which
> normalization and manual aggregation will be needed.

Well said. Reminds me on Erik Möllers Wikimania talk about Free  
Knowledge projects beyond the Encyclopedia: You need a clearly  
articulated mission. There already are many projects to create a  
universal bibliographic database (Worldcat, The Open Library,  
LibraryThing etc.) and all either failed or have a specific scope. A  
wiki-based bibliographic database for sources in Wikimedia projects  
("citations version of Commons") is a reasonable scope, I think. "Lets  
just collect all bibliographic data we can get onto a gigantic pile of  
data" is not. Let's better focus on real use cases, such as citations  
in Wikimedia projects.

SJ wrote:

> I like the French model of using "Article name (Authors)" as a key.
> Perhaps with "Article name (Authors, Year)" if needed to disambiguate.
> This shares a design principle with the move away from CamelCase to
> freeform article titles: one should be able to insert an article name
> into a natural sentence, and link the appropriate section of the
> sentence, and have it take you to the appropriate article.

With free form titles there will be no general 100% schema (there are  
always exceptions) but a general rule to start with is needed. There  
at least 32 ways to combine only title and authors: which to put  
first, which character to separate author names, order of names and  
name-parts, ways abbreviate etc. - and this are only the possibilities  
if its a simple English title with English author names!

If you are looking for a method to define one schema please have a  
loot at the Citation Style Language and use or define a citation style  
in CSL so users of Zotero, Mendeley and other bibliographic software  
can automatically create a key from given bibliographic data.

> To DGG's question: in the long run, the scope of "all cited works" can
> be captured in such a project, at least for the works cited on a wiki
> Project -- anyone making a new citation would either find it already
> in the project or would add it.  Whether this covers all works cited
> by active academics of scholars depends on how effectively we draw
> them into our community and help them see where an extra minute of
> work on their part will help thousands of their readers, reviewers,
> and reusers.

Again: there already *are* communities that collect and share  
bibliographic data - why should they move to a new project with  
unclear mission and unusable software (we need much more then Liquid  
Threads) that was never created for this task? Everyone talking about  
a Wikimedia project with bibliographic data should *at least* have a  
look at Zotero, CSL, The Open Library, and LibraryThing first and make  
clear then what a new project should copy from this existing projects  
and what should be done differently. Please do not reinvent a wheel  
that nobody beside some Wikimediacs want.

Don't get me wrong: I also want such a free bibliographic wiki  
database. But to attract more then a little fraction of the declining  
number of Wikipedia authors we need a clear mission and usable  
software for this task - I seen neither the one nor the other.

Jakob


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-24 Thread Brian J Mingus
David,

In the m:WikiBibliography draft proposal I have briefly tried to explain the
difference you allude to. Wikipedia is a project dedicated to synthesizing
every notable topic into an encyclopedia. Since Wikipedia doesn't contain
original research, eventually every statement there should be able to be
traced to its source. The opposite also holds true - eventually every
notable topic will be able to be traced back to Wikipedia. We don't
necessarily have to cite all sources that a topic is mentioned in within a
given article, but it is desirable to document the relationships between
these sources so that we understand the true context. These really are two
sides of the same problem, and the project proposal aims to cover both
sides.

Brian

ps: Once people top-post it makes it challenging to bottom post without
breaking thread continuity. Since I always top-post at work I don't mind
doing it, but I just wanted to note that I know it might irk some people:)



On Sat, Jul 24, 2010 at 12:30 PM, David Goodman wrote:

> There's a difference between a project to centralize the various
> references in Wikipedia, and an attempt to build a universal
> bibliographic database. The first is a reasonable project, though I
> think everyone involved has underestimated the extent to which
> normalization and manual aggregation will be needed.
>
> On Thu, Jul 22, 2010 at 7:25 PM, Samuel Klein  wrote:
> > Thanks for those links, John.
> >
> > I agree that a separate project is needed to have a central source
> > that all language versions of all projects can reference.  The
> > citations version of Commons.
> >
> > I like the French model of using "Article name (Authors)" as a key.
> > Perhaps with "Article name (Authors, Year)" if needed to disambiguate.
> >  This shares a design principle with the move away from CamelCase to
> > freeform article titles: one should be able to insert an article name
> > into a natural sentence, and link the appropriate section of the
> > sentence, and have it take you to the appropriate article.
> >
> > To DGG's question: in the long run, the scope of "all cited works" can
> > be captured in such a project, at least for the works cited on a wiki
> > Project -- anyone making a new citation would either find it already
> > in the project or would add it.  Whether this covers all works cited
> > by active academics of scholars depends on how effectively we draw
> > them into our community and help them see where an extra minute of
> > work on their part will help thousands of their readers, reviewers,
> > and reusers.
> >
> > SJ.
> >
> >
> > On Thu, Jul 22, 2010 at 12:13 AM, John Vandenberg 
> wrote:
> >> On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen 
> wrote:
> >>>..
> >>> Do anyone knows anything about the French discussions on the
> introduction of
> >>> the 'Reference' namespace? Should we just implement the French system
> on the
> >>> English Wikipedia and we are there?
> >>
> >> This was discussed on en.wp in late 2007...
> >>
> >>
> http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_14#Is_there_a_centralized_bibliographic_database_for_wikipedia.3F_Is_there_a_way_to_make_citations_just_by_giving_an_universal_ID_instead_of_copying_a_full_citation_template.3F
> >>
> >> The proposal on fr.wp in early 2006:
> >>
> >>
> http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%C3%A9f%C3%A9rence
> >>
> >> --
> >> John Vandenberg
> >>
> >> ___
> >> Wiki-research-l mailing list
> >> Wiki-research-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>
> >
> >
> >
> > --
> > Samuel Klein  identi.ca:sj   w:user:sj
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
>
> --
> David Goodman, Ph.D, M.L.S.
> http://en.wikipedia.org/wiki/User_talk:DGG
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-24 Thread David Goodman
There's a difference between a project to centralize the various
references in Wikipedia, and an attempt to build a universal
bibliographic database. The first is a reasonable project, though I
think everyone involved has underestimated the extent to which
normalization and manual aggregation will be needed.

On Thu, Jul 22, 2010 at 7:25 PM, Samuel Klein  wrote:
> Thanks for those links, John.
>
> I agree that a separate project is needed to have a central source
> that all language versions of all projects can reference.  The
> citations version of Commons.
>
> I like the French model of using "Article name (Authors)" as a key.
> Perhaps with "Article name (Authors, Year)" if needed to disambiguate.
>  This shares a design principle with the move away from CamelCase to
> freeform article titles: one should be able to insert an article name
> into a natural sentence, and link the appropriate section of the
> sentence, and have it take you to the appropriate article.
>
> To DGG's question: in the long run, the scope of "all cited works" can
> be captured in such a project, at least for the works cited on a wiki
> Project -- anyone making a new citation would either find it already
> in the project or would add it.  Whether this covers all works cited
> by active academics of scholars depends on how effectively we draw
> them into our community and help them see where an extra minute of
> work on their part will help thousands of their readers, reviewers,
> and reusers.
>
> SJ.
>
>
> On Thu, Jul 22, 2010 at 12:13 AM, John Vandenberg  wrote:
>> On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen  wrote:
>>>..
>>> Do anyone knows anything about the French discussions on the introduction of
>>> the 'Reference' namespace? Should we just implement the French system on the
>>> English Wikipedia and we are there?
>>
>> This was discussed on en.wp in late 2007...
>>
>> http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_14#Is_there_a_centralized_bibliographic_database_for_wikipedia.3F_Is_there_a_way_to_make_citations_just_by_giving_an_universal_ID_instead_of_copying_a_full_citation_template.3F
>>
>> The proposal on fr.wp in early 2006:
>>
>> http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%C3%A9f%C3%A9rence
>>
>> --
>> John Vandenberg
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
>
> --
> Samuel Klein          identi.ca:sj           w:user:sj
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
David Goodman, Ph.D, M.L.S.
http://en.wikipedia.org/wiki/User_talk:DGG

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-22 Thread Samuel Klein
Thanks for those links, John.

I agree that a separate project is needed to have a central source
that all language versions of all projects can reference.  The
citations version of Commons.

I like the French model of using "Article name (Authors)" as a key.
Perhaps with "Article name (Authors, Year)" if needed to disambiguate.
 This shares a design principle with the move away from CamelCase to
freeform article titles: one should be able to insert an article name
into a natural sentence, and link the appropriate section of the
sentence, and have it take you to the appropriate article.

To DGG's question: in the long run, the scope of "all cited works" can
be captured in such a project, at least for the works cited on a wiki
Project -- anyone making a new citation would either find it already
in the project or would add it.  Whether this covers all works cited
by active academics of scholars depends on how effectively we draw
them into our community and help them see where an extra minute of
work on their part will help thousands of their readers, reviewers,
and reusers.

SJ.


On Thu, Jul 22, 2010 at 12:13 AM, John Vandenberg  wrote:
> On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen  wrote:
>>..
>> Do anyone knows anything about the French discussions on the introduction of
>> the 'Reference' namespace? Should we just implement the French system on the
>> English Wikipedia and we are there?
>
> This was discussed on en.wp in late 2007...
>
> http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_14#Is_there_a_centralized_bibliographic_database_for_wikipedia.3F_Is_there_a_way_to_make_citations_just_by_giving_an_universal_ID_instead_of_copying_a_full_citation_template.3F
>
> The proposal on fr.wp in early 2006:
>
> http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%C3%A9f%C3%A9rence
>
> --
> John Vandenberg
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Samuel Klein          identi.ca:sj           w:user:sj

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread John Vandenberg
On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen  wrote:
>..
> Do anyone knows anything about the French discussions on the introduction of
> the 'Reference' namespace? Should we just implement the French system on the
> English Wikipedia and we are there?

This was discussed on en.wp in late 2007...

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_14#Is_there_a_centralized_bibliographic_database_for_wikipedia.3F_Is_there_a_way_to_make_citations_just_by_giving_an_universal_ID_instead_of_copying_a_full_citation_template.3F

The proposal on fr.wp in early 2006:

http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%C3%A9f%C3%A9rence

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
 On Wed, Jul 21, 2010 at 5:47 PM, David Goodman 
 wrote:

> Sure, but first, is this capable of being done at all?  I have never
> seen a method of bibliographic control that can cope with the complete
> range of publications, even just print publications. Perhaps we need
> to proceed  within narrow domains.
>

I assume that by range you mean the number of publications in a domain, and
that by domain you mean the type of publication, be it a book, webpage or
map.

The generic nature of a markup such as wiki template syntax allows us to
easily adapt the same application to new domains. The challenge of the range
within a domain is largely one of resolving ambiguities, which can be
settled with policies that carefully adjudicate troublesome cases.


> Second, is this capable of being done by crowd-sourcing, or does it
> require enforceable standards? The work of Open Library is not a
> promising model, being a uncontrolled mix, done to many different
> standards.  Actually, within the domain of scientific journal articles
> from the last 10 years in Western languages, the best current method
> seems to be a mechanical algorithm, the one used by Google Scholar.
> True,  it does not aggregate perfectly--but it does aggregate better
> than any other existing database. And it does not get them all--nor
> could it no matter how much improved, for many of the versions that
> are actually available are off limits to its crawlers.


In my conception the enforceable standards are to emerge in the meta pages
of this project based on the actual issues that the community encounters.

Googlebot has many deep web accounts to journals online. When you search
Google Scholar the relevance algorithm is actually comparing your query to
the content of pdf pages which you do not have permission to access. Of
course, Google can't access them all, but many publishers have found it in
their interest to give them a complimentary account since it drives
subscription rates.

We can rely on individuals, particularly academics, who have access to the
deep web to help us curate the bibliography. And we can rely on the massive
number of personal bibliographies already out there to help us get good
coverage.

Cleaning up the mass of bibliographic content that I anticipate would be
uploaded by users would require the writing of bots in coordination with the
creation of policy pages.

Getting rid of copyright material would be handled in the same manner, I
presume. After major content publishers see what we are doing, I am sure
they will let us know their opinion about what we can and cannot do. It
seems likely that they will overreach their bounds, and as I have seen on
Wikipedia, the community members will happily ignore them. Or, if they think
the requests are actually in compliance with the law, they will comply.

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread David Goodman
Sure, but first, is this capable of being done at all?  I have never
seen a method of bibliographic control that can cope with the complete
range of publications, even just print publications. Perhaps we need
to proceed  within narrow domains.

Second, is this capable of being done by crowd-sourcing, or does it
require enforceable standards? The work of Open Library is not a
promising model, being a uncontrolled mix, done to many different
standards.  Actually, within the domain of scientific journal articles
from the last 10 years in Western languages, the best current method
seems to be a mechanical algorithm, the one used by Google Scholar.
True,  it does not aggregate perfectly--but it does aggregate better
than any other existing database. And it does not get them all--nor
could it no matter how much improved, for many of the versions that
are actually available are off limits to its crawlers.

On Wed, Jul 21, 2010 at 7:02 PM, Brian J Mingus
 wrote:
>
>
> On Wed, Jul 21, 2010 at 4:33 PM, Jodi Schneider 
> wrote:
>>
>> On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>>
>>  Finn,
>> I'm not a fan of including a portion of the the title for a couple of
>> reasons. First, it's not required to make the key unique. Second, it makes
>> the key longer than necessary. Third, the first word or words from a title
>> are not guaranteed to convey any meaning.
>> Regarding a Reference: namespace, I can see how this has some utility and
>> why projects have moved to it. However, I consider it a stopgap solution
>> that projects have implemented when what they really want is a proper wiki
>> for citations. Here are a few quick things that you can't do (or would have
>> to go out of your way to do) with just a Reference namespace that you can do
>> with a wiki dedicated to all the world's citations:
>> - Custom reports that are boolean combinations of citation fields, ala
>> SMW. This requires substantive new technology as SMW doesn't scale.
>> - User bibliographies which are a logical subset of all literature ever
>> published.
>>
>> Not sure why a Reference namespace couldn't do this.
>>
>> - Conduct a search of the literature.
>>
>> Or this  (you can search just one namespace)
>>
>> - A new set of policies that are not necessarily NPOV, regarding the
>> creation of articles that discuss collections of literature (lit review-like
>> concept). The content of these policies will emerge over years with the help
>> of a community. These articles could, for instance, help people who are
>> navigating a new area of a literature avoid getting stuck in local minima.
>> It could point out the true global context to them. It could point out
>> experimenter biases in the literature; for example, a recent article was
>> published where it was found that citation networks in academic literature
>> can have a tendency to form based on the assumption of authority, when in
>> fact that authority is false, bringing a whole thread of publications into
>> doubt.
>>
>> I'm not sure that literature reviews belong in the same wiki as citations.
>> That's definitely a different namespace. :)
>>
>> - Create wiki articles about individual sources.
>>
>> This might or might not be the same wiki -- but that could be interesting.
>> I could imagine a page for a journal being pulled in from several sources:
>> the collection of citations in the wiki for that journal, RSS from the
>> current contents (license permitting), a Wikipedia page about the journal
>> (if it exists), a link to author guidelines/submission info, open access
>> info from SHERPA/ROMEO,  In this vision, very little of the content
>> "lives" in this wiki itself. Rather, it's templated from numerous other
>> places Perhaps in the way "buy this book" links are handled in
>> librarything -- there are numerous external links which can be activated
>> with a checkbox, and some external content that is pulled in based on
>> copyright review.
>>
>> While I am not dedicated to any of these things happening, I also do not
>> wish to rule them out. The hope is that a new community will emerge around
>> the project and guide it in the direction that is most useful. My hope in
>> this thread is that we can identify some of the most likely cases and
>> imagine what it will be like, so that we can convey this vision to the
>> Foundation and they can get a sense of the potential importance of the
>> project.
>>
>> Scoping is a big problem, I think -- because it would help to have a
>> vision of which of several related tasks/endpoints is primary.
>> I think an investigation of what fr.wikipedia is doing would be really
>> useful -- does anybody edit there, or have an interest in digging into that?
>> Questions might include: What is the reference namespace doing? What isn't
>> it doing, that they wish it would? Did they consider alternatives to a
>> namespace? How is maintenance going? Do they see the reference namespace as
>> longstanding into the future, or as a stopgap?
>> -Jodi

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 4:33 PM, Jodi Schneider wrote:

>
> On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>
>  Finn,
>
> I'm not a fan of including a portion of the the title for a couple of
> reasons. First, it's not required to make the key unique. Second, it makes
> the key longer than necessary. Third, the first word or words from a title
> are not guaranteed to convey any meaning.
>
> Regarding a Reference: namespace, I can see how this has some utility and
> why projects have moved to it. However, I consider it a stopgap solution
> that projects have implemented when what they really want is a proper wiki
> for citations. Here are a few quick things that you can't do (or would have
> to go out of your way to do) with just a Reference namespace that you can do
> with a wiki dedicated to all the world's citations:
>
> - Custom reports that are boolean combinations of citation fields, ala SMW.
> This requires substantive new technology as SMW doesn't scale.
> - User bibliographies which are a logical subset of all literature ever
> published.
>
>
> Not sure why a Reference namespace couldn't do this.
>
> - Conduct a search of the literature.
>
>
> Or this  (you can search just one namespace)
>
> - A new set of policies that are not necessarily NPOV, regarding the
> creation of articles that discuss collections of literature (lit review-like
> concept). The content of these policies will emerge over years with the help
> of a community. These articles could, for instance, help people who are
> navigating a new area of a literature avoid getting stuck in local minima.
> It could point out the true global context to them. It could point out
> experimenter biases in the literature; for example, a recent article was
> published where it was found that citation networks in academic literature
> can have a tendency to form based on the assumption of authority, when in
> fact that authority is false, bringing a whole thread of publications into
> doubt.
>
>
> I'm not sure that literature reviews belong in the same wiki as citations.
> That's definitely a different namespace. :)
>
>  - Create wiki articles about individual sources.
>
>
> This might or might not be the same wiki -- but that could be interesting.
>
> I could imagine a page for a journal being pulled in from several sources:
> the collection of citations in the wiki for that journal, RSS from the
> current contents (license permitting), a Wikipedia page about the journal
> (if it exists), a link to author guidelines/submission info, open access
> info from SHERPA/ROMEO,  In this vision, very little of the content
> "lives" in this wiki itself. Rather, it's templated from numerous other
> places Perhaps in the way "buy this book" links are handled in
> librarything -- there are numerous external links which can be activated
> with a checkbox, and some external content that is pulled in based on
> copyright review.
>
>
> While I am not dedicated to any of these things happening, I also do not
> wish to rule them out. The hope is that a new community will emerge around
> the project and guide it in the direction that is most useful. My hope in
> this thread is that we can identify some of the most likely cases and
> imagine what it will be like, so that we can convey this vision to the
> Foundation and they can get a sense of the potential importance of the
> project.
>
>
> Scoping is a big problem, I think -- because it would help to have a vision
> of which of several related tasks/endpoints is primary.
>
> I think an investigation of what fr.wikipedia is doing would be really
> useful -- does anybody edit there, or have an interest in digging into that?
> Questions might include: What is the reference namespace doing? What isn't
> it doing, that they wish it would? Did they consider alternatives to a
> namespace? How is maintenance going? Do they see the reference namespace as
> longstanding into the future, or as a stopgap?
>
> -Jodi
>

More broadly speaking, a reference namespace does not accomplish the goal of
having a free repository of all citations, complete with collections of
citations curated by the community, and documentation of those citations by
the community, in various forms to be determined by the community. While it
is possible to create specialized cases that suit the narrow needs of
individual projects, I and many of the people I have spoken to see a
justification for a broader vision. This broader vision is directly in line
with the WMF mission of giving free access to the world's knowledge. One of
the first steps must be making the Wikipedia's aware of that knowledge, and
enabling them to build linked networks of information around it.

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread David Goodman
The model for this is WP:Book sources, though this relies upon the
user selecting the appropriate places to look, rather than guiding
him.

On Wed, Jul 21, 2010 at 6:33 PM, Jodi Schneider  wrote:
>
> On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>
>  Finn,
> I'm not a fan of including a portion of the the title for a couple of
> reasons. First, it's not required to make the key unique. Second, it makes
> the key longer than necessary. Third, the first word or words from a title
> are not guaranteed to convey any meaning.
> Regarding a Reference: namespace, I can see how this has some utility and
> why projects have moved to it. However, I consider it a stopgap solution
> that projects have implemented when what they really want is a proper wiki
> for citations. Here are a few quick things that you can't do (or would have
> to go out of your way to do) with just a Reference namespace that you can do
> with a wiki dedicated to all the world's citations:
> - Custom reports that are boolean combinations of citation fields, ala SMW.
> This requires substantive new technology as SMW doesn't scale.
> - User bibliographies which are a logical subset of all literature ever
> published.
>
> Not sure why a Reference namespace couldn't do this.
>
> - Conduct a search of the literature.
>
> Or this  (you can search just one namespace)
>
> - A new set of policies that are not necessarily NPOV, regarding the
> creation of articles that discuss collections of literature (lit review-like
> concept). The content of these policies will emerge over years with the help
> of a community. These articles could, for instance, help people who are
> navigating a new area of a literature avoid getting stuck in local minima.
> It could point out the true global context to them. It could point out
> experimenter biases in the literature; for example, a recent article was
> published where it was found that citation networks in academic literature
> can have a tendency to form based on the assumption of authority, when in
> fact that authority is false, bringing a whole thread of publications into
> doubt.
>
> I'm not sure that literature reviews belong in the same wiki as citations.
> That's definitely a different namespace. :)
>
> - Create wiki articles about individual sources.
>
> This might or might not be the same wiki -- but that could be interesting.
> I could imagine a page for a journal being pulled in from several sources:
> the collection of citations in the wiki for that journal, RSS from the
> current contents (license permitting), a Wikipedia page about the journal
> (if it exists), a link to author guidelines/submission info, open access
> info from SHERPA/ROMEO,  In this vision, very little of the content
> "lives" in this wiki itself. Rather, it's templated from numerous other
> places Perhaps in the way "buy this book" links are handled in
> librarything -- there are numerous external links which can be activated
> with a checkbox, and some external content that is pulled in based on
> copyright review.
>
> While I am not dedicated to any of these things happening, I also do not
> wish to rule them out. The hope is that a new community will emerge around
> the project and guide it in the direction that is most useful. My hope in
> this thread is that we can identify some of the most likely cases and
> imagine what it will be like, so that we can convey this vision to the
> Foundation and they can get a sense of the potential importance of the
> project.
>
> Scoping is a big problem, I think -- because it would help to have a vision
> of which of several related tasks/endpoints is primary.
> I think an investigation of what fr.wikipedia is doing would be really
> useful -- does anybody edit there, or have an interest in digging into that?
> Questions might include: What is the reference namespace doing? What isn't
> it doing, that they wish it would? Did they consider alternatives to a
> namespace? How is maintenance going? Do they see the reference namespace as
> longstanding into the future, or as a stopgap?
> -Jodi
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>



-- 
David Goodman, Ph.D, M.L.S.
http://en.wikipedia.org/wiki/User_talk:DGG

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Jodi Schneider

On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>  Finn,
> 
> I'm not a fan of including a portion of the the title for a couple of 
> reasons. First, it's not required to make the key unique. Second, it makes 
> the key longer than necessary. Third, the first word or words from a title 
> are not guaranteed to convey any meaning.
> 
> Regarding a Reference: namespace, I can see how this has some utility and why 
> projects have moved to it. However, I consider it a stopgap solution that 
> projects have implemented when what they really want is a proper wiki for 
> citations. Here are a few quick things that you can't do (or would have to go 
> out of your way to do) with just a Reference namespace that you can do with a 
> wiki dedicated to all the world's citations:
> 
> - Custom reports that are boolean combinations of citation fields, ala SMW. 
> This requires substantive new technology as SMW doesn't scale.
> - User bibliographies which are a logical subset of all literature ever 
> published.

Not sure why a Reference namespace couldn't do this.

> - Conduct a search of the literature.

Or this  (you can search just one namespace)

> - A new set of policies that are not necessarily NPOV, regarding the creation 
> of articles that discuss collections of literature (lit review-like concept). 
> The content of these policies will emerge over years with the help of a 
> community. These articles could, for instance, help people who are navigating 
> a new area of a literature avoid getting stuck in local minima. It could 
> point out the true global context to them. It could point out experimenter 
> biases in the literature; for example, a recent article was published where 
> it was found that citation networks in academic literature can have a 
> tendency to form based on the assumption of authority, when in fact that 
> authority is false, bringing a whole thread of publications into doubt.

I'm not sure that literature reviews belong in the same wiki as citations. 
That's definitely a different namespace. :)

> - Create wiki articles about individual sources.

This might or might not be the same wiki -- but that could be interesting.

I could imagine a page for a journal being pulled in from several sources: the 
collection of citations in the wiki for that journal, RSS from the current 
contents (license permitting), a Wikipedia page about the journal (if it 
exists), a link to author guidelines/submission info, open access info from 
SHERPA/ROMEO,  In this vision, very little of the content "lives" in this 
wiki itself. Rather, it's templated from numerous other places Perhaps in 
the way "buy this book" links are handled in librarything -- there are numerous 
external links which can be activated with a checkbox, and some external 
content that is pulled in based on copyright review.

> 
> While I am not dedicated to any of these things happening, I also do not wish 
> to rule them out. The hope is that a new community will emerge around the 
> project and guide it in the direction that is most useful. My hope in this 
> thread is that we can identify some of the most likely cases and imagine what 
> it will be like, so that we can convey this vision to the Foundation and they 
> can get a sense of the potential importance of the project.

Scoping is a big problem, I think -- because it would help to have a vision of 
which of several related tasks/endpoints is primary.

I think an investigation of what fr.wikipedia is doing would be really useful 
-- does anybody edit there, or have an interest in digging into that? Questions 
might include: What is the reference namespace doing? What isn't it doing, that 
they wish it would? Did they consider alternatives to a namespace? How is 
maintenance going? Do they see the reference namespace as longstanding into the 
future, or as a stopgap?

-Jodi___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Kinzler
> Hey Daniel,
> 
> Bibsonomy seems to suffer from the same problem as CiteULike - urls
> which convey no meaning. An example url id from CiteULike is 2434335,
> and one from Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to
> continue to steer the conversation away from that direction. These IDs
> guarantee uniqueness, but I believe that we can create keys that both
> guarantee uniqueness and convey some meaning to humans. Consider that
> this key will be embedded in wiki articles any time a source is cited.
> It's important that it make some sense.

Oh, I didn#t mean we should use hashes or IDs as keys or identifiers in the URL.
I mean we can employ the hashing technique to detect dupes. Because you will
inadvertably get information about the same thing under two different keys,
because of issues with translitteration, etc.

-- daniel

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 2:42 AM, Daniel Kinzler wrote:

> >> 1) The first three author names separated by slashes
> > why not separate by pluses? they don't form part of names either, and
> > don't cause problems with wiki page titles.
>
> I like this... however, how would you represent this in a URL? Also note
> that
> using plusses in page names don't work with all server configurations,
> since
> plus has a special meaning in URLs.
>
> >> 3) Some or all of the date. For instance, if there is only one source by
> >> this set of authors that year, we can just use . However, once
> another
> >> source by those set of authors is added, the key should change to
> MMDD
> >> or similar.
> > I don't think it is a good idea to change one key as a function of
> > updates on another, except for a generic disambiguation tag.
>
> I agree. And if you *have* to use the full date, use MMDD, not the
> other way
> around, please.
>
> >> Since the slashes are somewhat cumbersome, perhaps we can not make them
> >> mandatory, but similarly use them only when they are necessary in order
> to
> >> "escape" a name. In the case that one of the authors does not have a
> slash
> >> in their name - the dominant case - we can stick to the easily legible
> and
> >> niecly compact CamelCase format.
> >>
> >> Example keys generated by this algorithm:
> >>
> >> KangHsuKrajbichEtAl2009
> > Kang+Hsu+Krajbich+2009+the+wick+in
> > or
> > Kang+Hsu+Krajbich+2009+twi
>
> Both seem good, though i would suggest to form a convention to ignore any
> leading "the" and "a", to a more distinctive 3 word suffix.
>
> > Of course, it does not have to be _exactly_ three authors, nor three
> > words from the title, and it does not solve the John Smith (or Zheng
> > Wang) problem.
>
> It also doesn't solve issues with transliteration: Merik Möller may become
> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even
> "VoB",
> etc. In case of chinese names, it's often not easy to decide which part is
> the
> last name.
>
> To avoid this kind of ambiguity, i suggest to automatically apply some type
> of
> normalization and/or hashing. There is quite a bit of research about this
> kind
> of normalisation out there, generally with the aim of detecting duplicates.
> Perhaps we can learn from bibsonomy.org, have a look how they do it:
> .
>
> Gotta love open source university research projects :)
>
> -- daniel


Hey Daniel,

Bibsonomy seems to suffer from the same problem as CiteULike - urls which
convey no meaning. An example url id from CiteULike is 2434335, and one from
Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to continue to steer
the conversation away from that direction. These IDs guarantee uniqueness,
but I believe that we can create keys that both guarantee uniqueness and
convey some meaning to humans. Consider that this key will be embedded in
wiki articles any time a source is cited. It's important that it make some
sense.

Plus signs and slashes in the key appear to be cumbersome. Perhaps we can
avoid this by truncating last names that involve a slash to either the
portion before or after the slash.

Changing the key seems to be a bad idea, so we want a key system that is
unique from the start. That means we should use the full date, MMDD as
suggested by Daniel.

In the event that multiple sources are published by the same set of authors
on the same day, we can use a, b, c disambiguation.

This gives us the following key, guaranteed to be unique:
KangHsuKrajbich20091011b

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 5:49 AM, Finn Aarup Nielsen  wrote:

>
>
> On Wed, 21 Jul 2010, Jodi Schneider wrote:
>
>  On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>>
>>> Kang+Hsu+Krajbich+2009+the+wick+in

>>>
>> This seems best to me of what's proposed so far.
>>
>>> Both seem good, though i would suggest to form a convention to ignore any
>>> leading "the" and "a", to a more distinctive 3 word suffix.
>>>
>>
>> While that's a good idea, then we'd have to know all "indistinctive" words
>> in all languages. (Die, Der, La, L', ...)
>>
>> There are still going to be duplicates, alas...
>>
>>
>>>  Of course, it does not have to be _exactly_ three authors, nor three
 words from the title, and it does not solve the John Smith (or Zheng
 Wang) problem.

>>>
>>> It also doesn't solve issues with transliteration: Merik Möller may
>>> become
>>> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even
>>> "VoB",
>>> etc. In case of chinese names, it's often not easy to decide which part
>>> is the
>>> last name.
>>>
>>
> I have a large bibtex file where I (mostly) use Surname + one initial +
> year + first important word (
> http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)
>
> So for example: AaltoS2002Neuroanatomical
>
> There are lots of special cases
>
> "M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)
>
> "Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).
>
> ACM computer classification system "ACM1998Computing" (an organization as
> an author: do you use 'association' or 'ACM'?)
>
> "A Content-Driven Reputation System for the {Wikipedia}" ->
> AdlerB2007ContentDriven (discarding slash in title and camelcasing)
>
> "$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" ->
> StrotherS1996Owater (here we have sharp parentheses that will be a problem
> in wiki text. I suppose that in chemistry it becomes even worse)
>
> "On the Distribution of the Quotient of two chance variables" becomes
> CurtissJ1941On (as 'On' here is not regarded as a stopword).
>
> Modelling the fMRI response using smooth FIR filters ->
> NielsenF2001ModelingfMRI (extra word because of collision with "Modeling of
> locations in the {BrainMap} database: Detection of outliers"
>
> With 3 author + year + title you sometimes run into collisions:
>
>  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
>  title ={Separating Processes within a Trial in Event-Related
>  Functional {MRI}. {II}. Analysis},
>
>  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
>  title ={Separating Processes within a Trial in Event-Related
>  Functional {MRI}. {I}. The Method},
>
>
> When dealing with scientific articles it is not always possible to use the
> full given name, since sometimes you just know the initial.
>
> I know one called Vibe Frøkjær. Presumable because she is afraid the PubMed
> and others will not be able to handle the Nordic letters she writes her name
> as Vibe G. Frokjaer in science contexts. Other authors may write her as Vibe
> G. Frøkjær.
>
>
> Articles usually one have one edition. Sometimes you find reprinted
> versions here and there. For books there might be different versions and you
> need to find out whether you want to have the key to the 'Work',
> 'Expression', 'Manifestation' or 'Item' to use the wording from
>
>
> http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records
>
> The French Wikipedia has a page for each book title ('work' regardless of
> language and editions). Editions are listed with multiple infoboxes on the
> page. In this way there is not a one-to-one correspondence between wiki page
> and, say, ISBN. It seems the best to me to have one page for a 'work' where
> you collect comments. However, in citations with page numbers you need the
> 'expression' because of page break differences between versions.
>
> I like the French way, except that each book has two pages: One under the
> 'Reference' namespace and another under the 'Template' namespace.
>
> The French tend to use "Title (authors)" as key in the Reference namespace.
> Mostly fullname:
>
> http://fr.wikipedia.org/wiki/Référence:Weaving_the_Web_(Tim_Berners-Lee)
>
> But sometimes diverge a bit:
>
> http://fr.wikipedia.org/wiki/Référence:Theory_of_numbers_(HardyWright)
>
> The associated template has somewhat unpredictable name, e.g.,
>
> http://fr.wikipedia.org/wiki/Modèle:HardyWright
>
> They link in the template instatiations, e.g., "auteurs=[[Tim
> Berners-Lee]], Mark Fischetti" which I still don't like and would instead
> suggest:
>
> author1=Tim Berners-Lee | author2=Mark Fischetti and templates
> [[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disamb

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Finn Aarup Nielsen



On Wed, 21 Jul 2010, Jodi Schneider wrote:


On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:

Kang+Hsu+Krajbich+2009+the+wick+in


This seems best to me of what's proposed so far.

Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.


While that's a good idea, then we'd have to know all "indistinctive" words in 
all languages. (Die, Der, La, L', ...)

There are still going to be duplicates, alas...




Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.


It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
last name.


I have a large bibtex file where I (mostly) use Surname + one initial + 
year + first important word 
(http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)


So for example: AaltoS2002Neuroanatomical

There are lots of special cases

"M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)

"Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).

ACM computer classification system "ACM1998Computing" (an organization as 
an author: do you use 'association' or 'ACM'?)


"A Content-Driven Reputation System for the {Wikipedia}" ->
AdlerB2007ContentDriven (discarding slash in title and camelcasing)

"$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" -> 
StrotherS1996Owater (here we have sharp parentheses that will be a problem 
in wiki text. I suppose that in chemistry it becomes even worse)


"On the Distribution of the Quotient of two chance variables" becomes 
CurtissJ1941On (as 'On' here is not regarded as a stopword).


Modelling the fMRI response using smooth FIR filters -> 
NielsenF2001ModelingfMRI (extra word because of collision with "Modeling 
of locations in the {BrainMap} database: Detection of outliers"


With 3 author + year + title you sometimes run into collisions:

  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
  title ={Separating Processes within a Trial in Event-Related
  Functional {MRI}. {II}. Analysis},

  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
  title ={Separating Processes within a Trial in Event-Related
  Functional {MRI}. {I}. The Method},


When dealing with scientific articles it is not always possible to use the 
full given name, since sometimes you just know the initial.


I know one called Vibe Frøkjær. Presumable because she is afraid the 
PubMed and others will not be able to handle the Nordic letters she writes 
her name as Vibe G. Frokjaer in science contexts. Other authors may write 
her as Vibe G. Frøkjær.



Articles usually one have one edition. Sometimes you find reprinted 
versions here and there. For books there might be different versions and 
you need to find out whether you want to have the key to the 'Work', 
'Expression', 'Manifestation' or 'Item' to use the wording from


http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records

The French Wikipedia has a page for each book title ('work' regardless of 
language and editions). Editions are listed with multiple infoboxes on the 
page. In this way there is not a one-to-one correspondence between wiki 
page and, say, ISBN. It seems the best to me to have one page for a 'work' 
where you collect comments. However, in citations with page numbers you 
need the 'expression' because of page break differences between versions.


I like the French way, except that each book has two pages: One under the 
'Reference' namespace and another under the 'Template' namespace.


The French tend to use "Title (authors)" as key in the Reference 
namespace. Mostly fullname:


http://fr.wikipedia.org/wiki/Référence:Weaving_the_Web_(Tim_Berners-Lee)

But sometimes diverge a bit:

http://fr.wikipedia.org/wiki/Référence:Theory_of_numbers_(HardyWright)

The associated template has somewhat unpredictable name, e.g.,

http://fr.wikipedia.org/wiki/Modèle:HardyWright

They link in the template instatiations, e.g., "auteurs=[[Tim 
Berners-Lee]], Mark Fischetti" which I still don't like and would instead 
suggest:


author1=Tim Berners-Lee | author2=Mark Fischetti and templates 
[[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disambig 
[[{{authorlink1}}}|{{{author1}}}]], [[{{{authorlink2|{{{author2}}}]] This 
way you allow for easier extraction and you do not need SMW array 
processing to distinguish the names.


It seems to me that the French has come a long way. I am surprised that 
only John Vandenberg has pointed to the French efforts. I was not aware of 
it before.


Do anyone knows anything about the French discussions on the introduction 
of the 'Reference' namespace? Should we just implement the French

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Mietchen
On Wed, Jul 21, 2010 at 10:42 AM, Daniel Kinzler  wrote:
>>> 1) The first three author names separated by slashes
>> why not separate by pluses? they don't form part of names either, and
>> don't cause problems with wiki page titles.
>
> I like this... however, how would you represent this in a URL?
%2B would seem to be the obvious choice to me.

> Also note that
> using plusses in page names don't work with all server configurations, since
> plus has a special meaning in URLs.

Don't know too much about the double escaping business to comment on that, but
if pluses are not acceptable, we still have equal signs (possibly with
similar problems, but
still useful for direct web search) and underscores (which would turn
the whole key into one
string for search engines).

Daniel

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Kinzler
Jodi Schneider schrieb:
> On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>>> Kang+Hsu+Krajbich+2009+the+wick+in
> 
> This seems best to me of what's proposed so far. 
>> Both seem good, though i would suggest to form a convention to ignore any
>> leading "the" and "a", to a more distinctive 3 word suffix.
> 
> While that's a good idea, then we'd have to know all "indistinctive" words in 
> all languages. (Die, Der, La, L', ...)

Stopword lists for major languages exists, and where they don't, they are easily
created, even automatically. Word frequency analysis on a few megabyte of text
is cheap these days :)

-- daniel


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Jodi Schneider

On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>> Kang+Hsu+Krajbich+2009+the+wick+in

This seems best to me of what's proposed so far. 
> Both seem good, though i would suggest to form a convention to ignore any
> leading "the" and "a", to a more distinctive 3 word suffix.

While that's a good idea, then we'd have to know all "indistinctive" words in 
all languages. (Die, Der, La, L', ...)

There are still going to be duplicates, alas...

> 
>> Of course, it does not have to be _exactly_ three authors, nor three
>> words from the title, and it does not solve the John Smith (or Zheng
>> Wang) problem.
> 
> It also doesn't solve issues with transliteration: Merik Möller may become
> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
> etc. In case of chinese names, it's often not easy to decide which part is the
> last name.
> 
> To avoid this kind of ambiguity, i suggest to automatically apply some type of
> normalization and/or hashing. There is quite a bit of research about this kind
> of normalisation out there, generally with the aim of detecting duplicates.
> Perhaps we can learn from bibsonomy.org, have a look how they do it:
> .

Good idea!

-Jodi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Kinzler
>> 1) The first three author names separated by slashes
> why not separate by pluses? they don't form part of names either, and
> don't cause problems with wiki page titles.

I like this... however, how would you represent this in a URL? Also note that
using plusses in page names don't work with all server configurations, since
plus has a special meaning in URLs.

>> 3) Some or all of the date. For instance, if there is only one source by
>> this set of authors that year, we can just use . However, once another
>> source by those set of authors is added, the key should change to MMDD
>> or similar.
> I don't think it is a good idea to change one key as a function of
> updates on another, except for a generic disambiguation tag.

I agree. And if you *have* to use the full date, use MMDD, not the other way
around, please.

>> Since the slashes are somewhat cumbersome, perhaps we can not make them
>> mandatory, but similarly use them only when they are necessary in order to
>> "escape" a name. In the case that one of the authors does not have a slash
>> in their name - the dominant case - we can stick to the easily legible and
>> niecly compact CamelCase format.
>>
>> Example keys generated by this algorithm:
>>
>> KangHsuKrajbichEtAl2009
> Kang+Hsu+Krajbich+2009+the+wick+in
> or
> Kang+Hsu+Krajbich+2009+twi

Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.

> Of course, it does not have to be _exactly_ three authors, nor three
> words from the title, and it does not solve the John Smith (or Zheng
> Wang) problem.

It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
last name.

To avoid this kind of ambiguity, i suggest to automatically apply some type of
normalization and/or hashing. There is quite a bit of research about this kind
of normalisation out there, generally with the aim of detecting duplicates.
Perhaps we can learn from bibsonomy.org, have a look how they do it:
.

Gotta love open source university research projects :)

-- daniel



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Mietchen
On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus
 wrote:
> I like your suggestion that the abc disambiguator be chosen based on the
> first date of publication, and I also like the prospect of using slashes
> since they can't be contained in names. Using the full year is a good idea
> too. We can combine these to come up with a key that, in principle, is
> guaranteed to be unique. This key would contain:
>
> 1) The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and
don't cause problems with wiki page titles.

> 2) If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.

> 3) Some or all of the date. For instance, if there is only one source by
> this set of authors that year, we can just use . However, once another
> source by those set of authors is added, the key should change to MMDD
> or similar.
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.

> If there are multiple publications on the same day, we can
> resort to abc. Redirects and disambiguation pages can be set up when a key
> changes.
As Jodi pointed out already, the exact date is often not clearly
identifiable, so I would go simply for the year.
Instead of an alphabetic abc, one could use some function of the
article title (e.g. the first three words thereof, or the initials of
the first three words), always in lower case.

An even less ambiguous abc would be starting page (for printed stuff)
or article number (for online only) but this brings us back to the
7523225 problem you mentioned above.

> Since the slashes are somewhat cumbersome, perhaps we can not make them
> mandatory, but similarly use them only when they are necessary in order to
> "escape" a name. In the case that one of the authors does not have a slash
> in their name - the dominant case - we can stick to the easily legible and
> niecly compact CamelCase format.
>
> Example keys generated by this algorithm:
>
> KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in
or
Kang+Hsu+Krajbich+2009+twi

also note that the CamelCase key does not yield results in a google
search, whereas the first plused variant brings up the right work
correctly, while the plused one with initialed title tends to bring at
least something written by or cited from these authors.

> Author1Author2/Author-Three/2009
Author1+Author2+Author-Three+2009+just+another+article
or
Author1+Author2+Author-Three+2009+jat

Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.

Daniel

-- 
http://www.google.com/profiles/daniel.mietchen

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-20 Thread Brian J Mingus
On Mon, Jul 19, 2010 at 9:37 PM, Samuel Klein  wrote:

> Brian,
>
> The meta process for new project proposals is still the cleanest one
> for suggesting a specific Project and presenting it alongside similar
> projects.
>
> It would be helpful if you could update a related project proposal on
> meta -- say, [[m:WikiBibliography]], if that seems relevant.  (I just
> cleaned that page up and merged in an older proposal that had been
> obfuscated.)
>
>
Thanks for your work on this - definitely in the right direction! I will
consider whether I feel it's the right way for me to get started. One point
is that I am pointing more in the direction of a long-form proposal, and I
have more experience writing white-paper proposals for academia. I certainly
want it to end up on wiki, but when TPTB finally read the proposal perhaps
they will find it more persuasive if it is a professional looking document
that lands in their inbox.


> Or you can create a new project proposal...  WikiCite as a name can be
> confusing, since it has been used to refer to this bibliographic idea,
> but also to refer to the idea of citations for every statement or fact
> - something closer to a blame or trust solution that includes
> citations in its transactions.
>
>
Another name that I have come up with is OpenScholar. I still rather like
it, but suspect it has too much of a scientific ring to it? Names are
certainly very important so we should do more work on this avenue. Including
a list of names in the proposal would be a good idea, and perhaps the final
name will be a combination of existing name proposals.


> We should figure out how this project would work with acawiki, and
> possibly bibdex.  Bibdex doesn't aim to   And it would be helpful to
> have a publicly-viewable demo to play with -- could you clone your
> current wiki and populate the result with dummy data?
>

The problem with WikiPapers is that it has too many features! A feature-thin
version would be ideal for the proposal though, so I will plan to have some
kind of a demo site available.


> I love the idea of having a global place to discuss citations -- ALL
> citations -- something that OpenLibrary, the arXiv, and anyone else
> hosting cited documents could point to for every one of its works.
>

Exactly :)

Brian


> Sam.
>
>
> On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
>  wrote:
> > Brian J Mingus, 19/07/2010 22:20:
> >> The basic idea is a centralized wiki that contains citation information
> that
> >> other MediaWikis and WMF projects can then reference using something
> like a
> >> {{cite}} template or a simple link. The community can document the
> citation,
> >> the author, the book etc.. and, in one idealization, all citations
> across
> >> all wikis would point to the same article on WikiCite. Users can use
> this
> >> wiki as their personal bibliography as well, as collections of citations
> can
> >> be exported in arbitrary citation formats.
> >
> > I have already mentioned it before, but this description looks quite
> > similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
> > your proposal also to Sunir Shah).
> >
> > Nemo
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
>
> --
> Samuel Klein  identi.ca:sj   w:user:sj
>
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-20 Thread Brian J Mingus
On Mon, Jul 19, 2010 at 8:08 PM, Rob Lanphier  wrote:

> On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus
>  wrote:
> > I have been working with Sam and others for some time now on
> brainstorming a
> > proposal for the Foundation to create a centralized wiki of citations, a
> > WikiCite so to speak, if that is not the eventual name. My plan is to
> > continue to discuss with folks who are knowledgeable and interested in
> such
> > a project and to have the feedback I receive go into the proposal which I
> > hope to write this summer.
>
> This sounds great.  Just speaking as a community member, I've been
> thinking about this topic a long time myself, and have plenty to add
> to the conversation.
>
> > The proposal white paper will then be sent around
> > to interested parties for corrections and feedback, including on-wiki and
> > mailing lists, before eventually landing at the Foundation officially. As
> we
> > know WMF has not started a new project in some years, so there is no
> > official process. Thus I find it important to get it right.
>
> I'd suggest finding an on-wiki spot to discuss this work.  Here's one
> place this has been discussed in the past that may be a good place to
> revive the conversation:
>
> http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books_ever_published
>
> Rather than commenting on list about the subject itself, I've
> commented on the discussion page there:
>
> http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_books_ever_published#Fact_database_6531
>
> Rob
>

Rob,

Thanks for bringing my attention to this proposal. It certainly has some of
the same ring as this project, with of course some important differences.
Commonalities between the projects are that they are multilingual and
require a powerful search engine. Differences are that this project is for
all literary sources and that I believe it is best suited at the WMF. The
widespread use of citations across the Wikipedias will drive user
contributions towards adding richer metadata to those citations. And having
a source of citations available will increase the quality of the Wikipedias
as it becomes easier and easier to cite sources.

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-19 Thread Samuel Klein
Brian,

The meta process for new project proposals is still the cleanest one
for suggesting a specific Project and presenting it alongside similar
projects.

It would be helpful if you could update a related project proposal on
meta -- say, [[m:WikiBibliography]], if that seems relevant.  (I just
cleaned that page up and merged in an older proposal that had been
obfuscated.)

Or you can create a new project proposal...  WikiCite as a name can be
confusing, since it has been used to refer to this bibliographic idea,
but also to refer to the idea of citations for every statement or fact
- something closer to a blame or trust solution that includes
citations in its transactions.

We should figure out how this project would work with acawiki, and
possibly bibdex.  Bibdex doesn't aim to   And it would be helpful to
have a publicly-viewable demo to play with -- could you clone your
current wiki and populate the result with dummy data?

I love the idea of having a global place to discuss citations -- ALL
citations -- something that OpenLibrary, the arXiv, and anyone else
hosting cited documents could point to for every one of its works.

Sam.


On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
 wrote:
> Brian J Mingus, 19/07/2010 22:20:
>> The basic idea is a centralized wiki that contains citation information that
>> other MediaWikis and WMF projects can then reference using something like a
>> {{cite}} template or a simple link. The community can document the citation,
>> the author, the book etc.. and, in one idealization, all citations across
>> all wikis would point to the same article on WikiCite. Users can use this
>> wiki as their personal bibliography as well, as collections of citations can
>> be exported in arbitrary citation formats.
>
> I have already mentioned it before, but this description looks quite
> similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
> your proposal also to Sunir Shah).
>
> Nemo
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Samuel Klein          identi.ca:sj           w:user:sj

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-19 Thread Sunir Shah
Hey folks,

I've been lurking on this list since the beginning of time and saw
this fly by. Thanks Nemo for the shout out. That is pretty much what
Bibdex is about. My inspiration was a Big Hairy Goal  to provide a
central place where the body of academic knowledge can be curated by
the public in a wiki style. It's different than Wikipedia because
there is no NPOV and often research needs to be secret.

I originally tried this with both MeatballWiki and a similar service
called BibWiki. Bibdex is my latest adaptation based on what I learnt.
The current iteration embraces the face that  academia is built on
controversy. Different groups need to have space to express different
opinions apart from others. So, I rebuilt the software so that
research groups can create their own public annotated bibliographies
and control who has access to write to those bibliographies, much like
Google Groups has different levels of public and private access
control.

My understanding is that WikiCite is focused specifically on the needs
of the WMF projects. That has its own set of interesting use cases.

By the way, the http://www.openlibrary.org project is very inspiring
and in a similar vein, albeit restricted to books.

Cheers,
Sunir, Bibdex

On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
 wrote:
> Brian J Mingus, 19/07/2010 22:20:
>> The basic idea is a centralized wiki that contains citation information that
>> other MediaWikis and WMF projects can then reference using something like a
>> {{cite}} template or a simple link. The community can document the citation,
>> the author, the book etc.. and, in one idealization, all citations across
>> all wikis would point to the same article on WikiCite. Users can use this
>> wiki as their personal bibliography as well, as collections of citations can
>> be exported in arbitrary citation formats.
>
> I have already mentioned it before, but this description looks quite
> similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
> your proposal also to Sunir Shah).
>
> Nemo
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-19 Thread Federico Leva (Nemo)
Brian J Mingus, 19/07/2010 22:20:
> The basic idea is a centralized wiki that contains citation information that
> other MediaWikis and WMF projects can then reference using something like a
> {{cite}} template or a simple link. The community can document the citation,
> the author, the book etc.. and, in one idealization, all citations across
> all wikis would point to the same article on WikiCite. Users can use this
> wiki as their personal bibliography as well, as collections of citations can
> be exported in arbitrary citation formats.

I have already mentioned it before, but this description looks quite 
similar to http://bibdex.org/ . Maybe we should join forces (i.e., send 
your proposal also to Sunir Shah).

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l