Take a look at Kaggle competition datasets - https://www.kaggle.com/competitions




For svm there are a couple of ad click prediction datasets of pretty large size.




For graph stuff the SNAP has large network data: https://snap.stanford.edu/data/



—
Sent from Mailbox

On Thu, Jul 3, 2014 at 3:25 PM, AlexanderRiggers
<alexander.rigg...@gmail.com> wrote:

> Hello!
> I want to play around with several different cluster settings and measure
> performances for MLlib and GraphX  and was wondering if anybody here could
> hit me up with datasets for these applications from 5GB onwards? 
> I mostly interested in SVM and Triangle Count, but would be glad for any
> help.
> Best regards,
> Alex
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Sample-datasets-for-MLlib-and-Graphx-tp8760.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to