when i gone through different Repos for spam data i am only getting MB files . To check in hadoop we need a large file right. I need to test my hadoop svm implementation.I gone through http://archive.ics.uci.edu/ml/machine-learning-databases/spambase/ .But the dataset is of only 700KB or something.I need similar dataset.
On Sat, Nov 23, 2013 at 8:35 AM, unmesha sreeveni <unmeshab...@gmail.com>wrote: > Thanks Devin :) That was a nice explanation. > > > On Fri, Nov 22, 2013 at 6:20 PM, Devin Suiter RDX <dsui...@rdx.com> wrote: > >> They are both for machine learning. Classification is known as >> "supervised learning" where you feed the engine data of known patterns and >> instruct it what are the key nodes. Clustering is "unsupervised learning" >> where you allow the algorithm to "guess" at what is significant in the >> correlations picked up by the algorithm. Spam filtering is a popular >> example of classification, and image indexing is a popular example of >> clustering. It is mainly used on Hadoop because when it comes to machine >> learning, the more data that passes through the algorithm the more accurate >> it should be, and Hadoop can handle large data better than anything else >> around at the moment. >> >> *Devin Suiter* >> Jr. Data Solutions Software Engineer >> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 >> Google Voice: 412-256-8556 | www.rdx.com >> >> >> On Fri, Nov 22, 2013 at 2:54 AM, unmesha sreeveni >> <unmeshab...@gmail.com>wrote: >> >>> what is the differences b/w classification algorithms and clustering >>> algorithms in hadoop? >>> >>> >>> -- >>> *Thanks & Regards* >>> >>> Unmesha Sreeveni U.B >>> >>> *Junior Developer* >>> >>> >>> >> > > > -- > *Thanks & Regards* > > Unmesha Sreeveni U.B > > *Junior Developer* > > > -- *Thanks & Regards* Unmesha Sreeveni U.B *Junior Developer*