Re: Example Datasets

2009-10-08 Thread Ted Dunning
For redistributable data, we should definitely lock down a version in our distribution or an associated one. This is true if only to make sure that we don't get surprised by somebody rearranging their web site. For non-redistributable but available data, I think having a download procedure that s

Re: Example Datasets

2009-10-08 Thread Sean Owen
Several data sets I use have distribution clauses that forbid or complicate redistribution, so not sure I can do that. Of course we should check that on any other data set. On Thu, Oct 8, 2009 at 1:09 PM, Robin Anil wrote: > We need a central place for all sample datasets used for examples and un

Re: Example Datasets

2009-10-08 Thread Isabel Drost
On Thu, 8 Oct 2009 17:39:47 +0530 Robin Anil wrote: > We need a central place for all sample datasets used for examples and > unit tests? I am against putting it in the repo > Any suggestions? The data in question is the following: http://fimi.cs.helsinki.fi/data/ (retail, accidents, webdocs).

Re: Example Datasets

2009-10-08 Thread Robin Anil
Take a look at this repo http://fimi.cs.helsinki.fi/data/ I am specifically talking about the retail and accidents dataset. A modified version of them(comma separated) is being used by me for FPGrowth testing. Webdocs dataset looks good enough for being used for parallel fpgrowth testing. Questio

Example Datasets

2009-10-08 Thread Robin Anil
We need a central place for all sample datasets used for examples and unit tests? I am against putting it in the repo Any suggestions? Robin