Re: [Wiki-research-l] share research without paywalls or requests for personal information?

2017-08-24 Thread Jeremy Baron
Hi,

On Thu, Aug 24, 2017 at 2:06 PM, James Salsman  wrote:
> However, I am not comfortable seeing research papers being shared on
> this list in manners which ask the readers to disclose their personal
> information:
>
> http://imgur.com/a/qtzRS
>
> Can we please have some baseline standards for sharing research
> without any paywalls or requests for personal information?

I agree in theory but your screenshot is for the abstract only.

there is an option to make an account instead of doing the Google (or
Facebook) login thing and once you've made the account and downloaded
the PDF then it's still just an abstract, less than 1 full page.

the full text is behind a different paywall and seems to require
actual payment not just signing up for a free account:
http://journals.sagepub.com/doi/abs/10.1177/0268580917722906

-Jeremy

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Working with edit history dump

2015-02-24 Thread Jeremy Baron
On Feb 24, 2015 1:44 PM, Behzad Tabibian btabib...@gmail.com wrote:
 I am new to working with Wikipedia dumps. I am trying to obtain full
revision history of all the articles on Wikipedia. I
downloaded enwiki-20140707-pages-meta-history1.xml-*.7z from
https://dumps.wikimedia.org/enwiki/20140707/. However, by looking at the
xml files revision history of individual articles do not match with
revision history one may see from history page on Wikipedia website. It
seems the dump contains significantly smaller number of revisions than what
can be found on Wikipedia.

This may be a decent place to ask (actually I don't read this list too much
so just guessing) but probably more relevant at
xmldatadump...@lists.wikimedia.org . FYI

-Jeremy
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Looking for reader's click log data for Wikipedia

2014-12-28 Thread Jeremy Baron
On Dec 28, 2014 11:35 PM, Oliver Keyes oke...@wikimedia.org wrote:
 More importantly, the HTTPS protocol involves either sanitising or
completely stripping referers, rendering those chains impossible to
reconstruct.

Could you elaborate? (we're talking about hops from one page to another
within the same domain name?)

More generally: what is the status of hadoop? could we potentially have
3rd-party users get access even if they can't do an NDA by writing their
own mapreduce jobs to support their research? Depending on the job maybe it
would need legal (LCA) review before releasing results or maybe some could
be reviewed by others (approved by LCA).

We could give researchers (all labs users?) access to a truly sanitized
dataset with the right format for use when designing jobs. Or maybe not
sanitized but filtered to requests for just a few users that volunteered to
release their data for X days.

-Jeremy
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Jeremy Baron
On Dec 13, 2014 12:33 PM, Aaron Halfaker ahalfa...@wikimedia.org wrote:
 1. It turns out that generating diffs is computationally complex, so
generating them in real time is slow and lame.  I'm working to generate all
diffs historically using Hadoop and then have a live system listening to
recent changes to keep the data up-to-date[2].

IIRC Mako does that in ~4 hours (maybe outdated and takes longer now) for
all enwiki diffs for all time. (don't remember if this is namespace
limited) But also using an extraordinary amount of RAM. i.e. hundreds of GB

AIUI, there's no dynamic memory allocation. revisions are loaded into
fixed-size buffers larger than the largest revision.

https://github.com/makoshark/wikiq

-Jeremy
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [OpenAccess] Extracting PMIDs

2014-10-20 Thread Jeremy Baron
On Tue, Oct 21, 2014 at 3:57 AM, Jake Orlowitz jorlow...@gmail.com wrote:
 Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs from
 Wiki references?  Furthermore, could you dump those IDs out into a list for
 analysis?

I think so.

Can you tell us more about what they want?

Using [[wikipedia:ebola virus disease]] as an example:
 ref name=Gatherer 2014{{cite journal | author = Gatherer D | title = The 
 2014 Ebola virus disease outbreak in West Africa | journal = J. Gen. Virol. | 
 volume = 95 | issue = Pt 8 | pages = 1619–1624 | year = 2014 | pmid = 
 24795448 | doi = 10.1099/vir.0.067199-0 }}/ref

One of the params is pmid.

-Jeremy

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l