Hello all,

I am currently trying create clusters from a group of 50.000 strings that
contain product descriptions (around 70-100 characters length each).

That group of 50.000 consists of roughly 5.000 individual products and ten
varying product descriptions per product. The product descriptions are
already prepared for clustering and contain a normalized brand name,
product model number, etc.

What would be a good approach to maximise the amound of found clusters (the
best possible value would be 5.000 clusters with 10 products each)

I adapted the reuters cluster script to read in my data and managed to
create a first set of clusters. However, I have not managed to maximise the
cluster count.

The question is: what do I need to tweak with regard to the available
mahout settings, so the clusters are created as precisely as possible?

Many regards!
Jens

Reply via email to