Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
If a package is not broadly using a dependency it can be listed under Suggests: instead. On Mon, Nov 23, 2009 at 7:09 PM, spencerg wrote: > Beyond what Gabor said, I might download a package that uses "zoo", then use > "zoo" directly in other contexts without ever downloading it directly. >  Tota

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread spencerg
Beyond what Gabor said, I might download a package that uses "zoo", then use "zoo" directly in other contexts without ever downloading it directly. Total downloads would capture that; top level downloads would not. The flip side is that a package that requires "zoo" may only use it for featu

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Fellows, Ian
f popularity. Ian From: Gabor Grothendieck [ggrothendi...@gmail.com] Sent: Monday, November 23, 2009 3:15 PM To: Fellows, Ian Cc: hadley wickham; Stefan Theussl; R-devel Subject: Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics) On Mon, Nov 23, 2009 at 3:51 PM, Fellows, Ian w

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 3:51 PM, Fellows, Ian wrote: > 6. Regarding package dependancies, I was thinking about also counting the > number of top level downloads, as approximated > by the number of downloads where a reverse dependancy was not downloaded in > the next 5 min by the same IP. Top le

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Fellows, Ian
] CRAN Server download statistics (Was: R Usage Statistics) Hi Ian, I've spoken with Stefan Theussl (cran maintainer) about this, and he's concerned about the privacy implications of making the apache access logs public. A compromise that he mentioned was having a script run on the cra

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 12:37 PM, Friedrich Leisch wrote: > >  > On Mon, Nov 23, 2009 at 12:15 PM, Friedrich Leisch >  > wrote: >  >> IP address plus time will always allow sysadmins to recover >  >> identities. For static adresses or in combination with mail headers >  >> etc it is also not exac

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 12:15 PM, Friedrich Leisch wrote: > IP address plus time will always allow sysadmins to recover > identities. For static adresses or in combination with mail headers > etc it is also not exactly rocket science for others. I had not suggested that identifying information be

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
>> As Hadley already pointed out we cannot make CRAN logs publicly >> available for privacy reasons. That would be a violation of national >> laws. > > I think that's unlikely.  There is no info given out identifying > users.  There are lots of web stats on the net. Fritz and Stefan are concerned

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 12:09 PM, hadley wickham wrote: >>> As Hadley already pointed out we cannot make CRAN logs publicly >>> available for privacy reasons. That would be a violation of national >>> laws. >> >> I think that's unlikely.  There is no info given out identifying >> users.  There are

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 11:11 AM, Friedrich Leisch wrote: >> On , >> Anonymous () wrote: >  > Knowing what percentage of different OSes are being used is of >  > interest to package developers and would be obscured by the proposal >  > to massage the data.  I prefer to see the raw figure a

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Jeff Ryan
While I think download statistics are potentially interesting for developers, done incorrectly it can very likely damage the community. A basic data reporting problem, with all of the caveats attached. This information has also been readily available from the main CRAN mirror for years: http://ww

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
> A few comments on your current site: > >  * Are you just including packages downloaded interactively from within R? > >  * I don't think the continent from which the package was download is > of much interest.  There's definitely no need to include it on the > main page. > >  * I'd be far more in

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
> Knowing what percentage of different OSes are being used is of > interest to package developers and would be obscured by the proposal > to massage the data.  I prefer to see the raw figure as is. I agree. I was arguing that sorting by that value wasn't very useful. > Also the number of IPs are

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 9:48 AM, hadley wickham wrote: >> Knowing what percentage of different OSes are being used is of >> interest to package developers and would be obscured by the proposal >> to massage the data.  I prefer to see the raw figure as is. > > I agree.  I was arguing that sorting b

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
Knowing what percentage of different OSes are being used is of interest to package developers and would be obscured by the proposal to massage the data. I prefer to see the raw figure as is. Also the number of IPs are important and should not be removed in my opinion since (1) it is a measure of

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
Hi Ian, I've spoken with Stefan Theussl (cran maintainer) about this, and he's concerned about the privacy implications of making the apache access logs public. A compromise that he mentioned was having a script run on the cran mirror that processed the log files and output summary statistics. T

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-22 Thread Detlef Steuer
Hi! Nice work! But keep in mind, that for example the opensuse packages are no longer kept up to date on CRAN, but in openSUSE's Build Service. So the stats are biased towards windows and mac. It seems you only count binary downloads of contributed packages? Introduces some nice bias, too. Never

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-22 Thread Fellows, Ian
Hi All, It seems that the question of how may people use (or download) R, and it's packages is one that comes up on a fairly regular basis in a variety of forums (There was also recent thread on the subject on Stack Overflow). A couple of students at UCLA (including myself), wanted to address t