This does not help me to understand what you mean by "pattern" or how you 
expect to identify them with clustering. About the only tool in Mahout with 
which you might have any success is Dirichlet Process clustering, since you can 
define your own models which can detect the "patterns" you seek and the 
algorithm will attempt to match them to your data.

Alternatively, if this is time series data you might want to look at the 
Synthetic Control examples.

Finally, with only 50 datapoints you should really not be using Mahout at all. 
With so few datapoints any algorithm or implementation will have difficulty 
doing what you want and Mahout is geared to more like >50 million datapoints.

Try Weka.

-----Original Message-----
From: Alexander Kerner [mailto:[email protected]] 
Sent: Tuesday, August 16, 2011 1:47 AM
To: [email protected]
Subject: Re: Clustering Data

Hello Ted,

thanks for your help!

To give you more details:
Clustering in this case has something of pattern recognition:

for the first graph, I am looking for following pattern:

      *   *
    *          *
*                   *

for the second graph, I basically want following "pattern":

*  *  *  *  *  *

What I want to detect is now "overlaying" data or at a very basic point 
of view just changes from expected pattern:

e.g:

first case:


     *  *       *
   *       *  *   *
*

should be clustered into two groups

second case:

*  *  *
            *  *  *

should also clustered into two groups.

In general, I am working with very little datapoints ( 5 - 50).

I hope this makes it a bit more clear.

May thanks,
Alex

On 08/15/2011 05:08 PM, Ted Dunning wrote:
> Well, weka still stands as an option.  And frankly, you can call R from java
> pretty easily.
>
> But more importantly, *experimenting* with these alternatives doesn't need
> to be in Java.  You can noodle around with all the clustering algorithms in
> the world, select one and port it into Java or find an implementation.
>
> And if you don't describe your problem in a bit more detail, we can't help
> you.  Clustering specifically and machine learning in general is domain
> dependent.
>
> Your graphs don't explain what your data is, what your are trying to do,
> what results you expect to get nor why you don't like the results you are
> getting.  It is not obvious.
>
> I note that I asked essentially these same questions 10 days ago.
>
> On Sun, Aug 14, 2011 at 11:35 PM, Alexander Kerner<
> [email protected]>  wrote:
>
>> Matlab or R is not an option, since I need to integrate this clustering
>> into an existing Java program.
>>
>> On 08/05/2011 06:02 PM, Jeff Eastman wrote:
>>
>>> You may be better off experimenting with Weka (or MatLab or R) to try out
>>> various clustering algorithms on your data. Unless you have billions of
>>> points this sort of low-dimension clustering can all be done in memory and
>>> you don't need Mahout.
>>>
>>>
>>> -----Original Message-----
>>> From: Alexander Kerner 
>>> [mailto:a.kerner@dkfz-**heidelberg.de<[email protected]>
>>> ]
>>> Sent: Friday, August 05, 2011 7:28 AM
>>> To: [email protected]>>   "[email protected]"
>>> Subject: Re: Clustering Data
>>>
>>> Here is a link:
>>>
>>> Clustering 
>>> data<http://kerner.cc/box.**tightening.challenges.png<http://kerner.cc/box.tightening.challenges.png>
>>> On 08/05/2011 02:31 PM, Sean Owen wrote:
>>>
>>>> (Attachments don't come through on apache.org<http://apache.org>
>>>> mailing lists. Can you post it elsewhere, or describe it?)
>>>>
>>>> On Fri, Aug 5, 2011 at 1:30 PM, Alexander Kerner
>>>> <[email protected]<**mailto:a.kerner@dkfz-**heidelberg.de<[email protected]>>>
>>>>   wrote:
>>>>
>>>>      Hi all,
>>>>
>>>>      I would like to cluster following data (see attached picture) into
>>>>      three
>>>>      groups (light blue, dark blue, black).
>>>>      Can I use Apache Mahout for this? I want to integrate clustering
>>>>      within
>>>>      my existing Java application.
>>>>      What algorithm would I need to use and how do I set this up
>>>>      programatically?
>>>>
>>>>      Many thanks,
>>>>      Alex
>>>>
>>>>
>>>>
>>>>
>> --
>> Alexander Kerner
>> PhD Student
>>
>> Divison of Stem Cells and Cancer A010
>> German Cancer Research Center, DKFZ
>> and
>> Heidelberg Institute for Stem Cell Technology
>> and Experimental Medicine
>> HI-STEM GmbH
>>
>> Neuenheimer Feld 280
>> 69120 Heidelberg
>>
>> Tel.: +49(0)6221/42-3922
>> Fax: +49(0)6221/42-3902
>>
>> Email: [email protected]
>>
>>

-- 
Alexander Kerner
PhD Student

Divison of Stem Cells and Cancer A010
German Cancer Research Center, DKFZ
and
Heidelberg Institute for Stem Cell Technology
and Experimental Medicine
HI-STEM GmbH

Neuenheimer Feld 280
69120 Heidelberg

Tel.: +49(0)6221/42-3922
Fax: +49(0)6221/42-3902

Email: [email protected]

Reply via email to