What are the best settings for my clustering task

2013-09-30 Thread mercutio7979
context: http://lucene.472066.n3.nabble.com/What-are-the-best-settings-for-my-clustering-task-tp4092807.html Sent from the Mahout User List mailing list archive at Nabble.com.

Re: What are the best settings for my clustering task

2013-10-01 Thread Ted Dunning
-- > View this message in context: > http://lucene.472066.n3.nabble.com/What-are-the-best-settings-for-my-clustering-task-tp4092807.html > Sent from the Mahout User List mailing list archive at Nabble.com. >

Re: What are the best settings for my clustering task

2013-10-02 Thread Jens Bonerz
; the > > cluster count. > > > > The question is: what do I need to tweak with regard to the available > > mahout > > settings, so the clusters are created as precisely as possible? > > > > Many regards! > > Jens > > > > > > > >

Re: What are the best settings for my clustering task

2013-10-02 Thread Ted Dunning
t; > > > I adapted the reuters cluster script to read in my data and managed to > > > create a first set of clusters. However, I have not managed to maximise > > the > > > cluster count. > > > > > > The question is: what do I need to tweak with regard to t

Re: What are the best settings for my clustering task

2013-10-02 Thread Jens Bonerz
mound of found > clusters > > > (the > > > > best possible value would be 5.000 clusters with 10 products each) > > > > > > > > I adapted the reuters cluster script to read in my data and managed > to > > > > create a first set of cluste

Re: What are the best settings for my clustering task

2013-10-02 Thread Ted Dunning
; > > > > What would be a good approach to maximise the amound of found > > clusters > > > > (the > > > > > best possible value would be 5.000 clusters with 10 products each) > > > > > > > > > > I adapted the reuters cluste

Re: What are the best settings for my clustering task

2013-10-02 Thread Jens Bonerz
> > > ten > > > > > > varying product descriptions per product. The product > descriptions > > > are > > > > > > already prepared for clustering and contain a normalized brand > > name, > > > > > > product > &g

Re: What are the best settings for my clustering task

2013-10-03 Thread Jens Bonerz
and > > > > > ten > > > > > > varying product descriptions per product. The product > descriptions > > > are > > > > > > already prepared for clustering and contain a normalized brand > > name, > > > > > > pr

Re: What are the best settings for my clustering task

2013-10-04 Thread Ted Dunning
30, 2013 at 2:14 PM, mercutio7979 < >>> jbon...@googlemail.com >>>>>>> wrote: >>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>> I am currently trying create clusters fr

Re: What are the best settings for my clustering task

2013-10-06 Thread Jens Bonerz
? What else could I > >> use? > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> 2013/10/1 Ted Dunning > >>>>> > >>>>>> At such small sizes, I would guess that the sequential version of > >> the > >>>>>> streaming k-means or ball k-means would be better options. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, Sep 30, 2013 at 2:14 PM, mercutio7979 < > >>> jbon...@googlemail.com > >>>>>>> wrote: > >>>>>> > >>>>>>> Hello all, > >>>>>>> > >>>>>>> I am currently trying create clusters from a group of 50.000 > >>> strings > >>>>> that > >>>>>>> contain product descriptions (around 70-100 characters length > >>> each). > >>>>>>> > >>>>>>> That group of 50.000 consists of roughly 5.000 individual > >> products > >>>> and > >>>>>> ten > >>>>>>> varying product descriptions per product. The product > >> descriptions > >>>> are > >>>>>>> already prepared for clustering and contain a normalized brand > >>> name, > >>>>>>> product > >>>>>>> model number, etc. > >>>>>>> > >>>>>>> What would be a good approach to maximise the amound of found > >>>> clusters > >>>>>> (the > >>>>>>> best possible value would be 5.000 clusters with 10 products > >> each) > >>>>>>> > >>>>>>> I adapted the reuters cluster script to read in my data and > >> managed > >>>> to > >>>>>>> create a first set of clusters. However, I have not managed to > >>>> maximise > >>>>>> the > >>>>>>> cluster count. > >>>>>>> > >>>>>>> The question is: what do I need to tweak with regard to the > >>> available > >>>>>>> mahout > >>>>>>> settings, so the clusters are created as precisely as possible? > >>>>>>> > >>>>>>> Many regards! > >>>>>>> Jens > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> View this message in context: > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://lucene.472066.n3.nabble.com/What-are-the-best-settings-for-my-clustering-task-tp4092807.html > >>>>>>> Sent from the Mahout User List mailing list archive at > >> Nabble.com. > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>>

Re: What are the best settings for my clustering task

2013-10-06 Thread Ted Dunning
rithm. >>>>>>> >>>>>>> According to that paper, k-means relies on a preknown/predefined >>>> amount >>>>>> of >>>>>>> clusters as an input parameter. >>>>>>> >>>>>>

Re: What are the best settings for my clustering task

2013-10-06 Thread Jens Bonerz
gt;> Isn't the streaming k-means just a different approach to crunch > >>>> through > >>>>>> the > >>>>>>> data? In other words, the result of streaming k-means should be > >>>>>> comparable > >>>>>>> to using k-means in multiple chained map reduce cycl