For redistributable data, we should definitely lock down a version in our
distribution or an associated one. This is true if only to make sure that
we don't get surprised by somebody rearranging their web site.
For non-redistributable but available data, I think having a download
procedure that s
Several data sets I use have distribution clauses that forbid or
complicate redistribution, so not sure I can do that. Of course we
should check that on any other data set.
On Thu, Oct 8, 2009 at 1:09 PM, Robin Anil wrote:
> We need a central place for all sample datasets used for examples and un
On Thu, 8 Oct 2009 17:39:47 +0530
Robin Anil wrote:
> We need a central place for all sample datasets used for examples and
> unit tests? I am against putting it in the repo
> Any suggestions?
The data in question is the following:
http://fimi.cs.helsinki.fi/data/ (retail, accidents, webdocs).
Take a look at this repo http://fimi.cs.helsinki.fi/data/
I am specifically talking about the retail and accidents dataset. A modified
version of them(comma separated) is being used by me for FPGrowth testing.
Webdocs dataset looks good enough for being used for parallel fpgrowth
testing.
Questio
We need a central place for all sample datasets used for examples and unit
tests? I am against putting it in the repo
Any suggestions?
Robin