Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-09-27 Thread abanero
Hi Ulrich, I'm studying the principles of Affinity Propagation and I'm really glad to use your package (apcluster) in order to cluster my data. I have just an issue to solve.. If I apply the funcion: apcluster(sim) where sim is the matrix of dissimilarities, sometimes I encounter the warning

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
abanero wrote: Do you know something like “knn1” that works with categorical variables too? Do you have any suggestion? There are surely plenty of clustering algorithms around that do not require a vector space structure on the inputs (like KNN does). I think agglomerative clustering would

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Christian Hennig
Dear abanero, In principle, k nearest neighbours classification can be computed on any dissimilarity matrix. Unfortunately, knn and knn1 seem to assume Euclidean vectors as input, which restricts their use. I'd probably compute an appropriate dissimilarity between points (have a look at

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread abanero
Hi, thank you Joris and Ulrich for you answers. Joris Meys wrote: see the library randomForest for example I'm trying to find some example in randomForest with categorical variables but I haven't found anything. Do you know any example with both categorical and numerical variables? Anyway I

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Joris Meys
Hi Abanero, first, I have to correct myself. Knn1 is a supervised learning algorithm, so my comment wasn't completely correct. In any case, if you want to do a clustering prior to a supervised classification, the function daisy() can handle any kind of variable. The resulting distance matrix can

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Joris Meys
I'm confusing myself :-) randomForest cannot handle character vectors as predictors. (Which is why I, to my surprise, found out that a categorical variable could not be used in the function). It can handle categorical variables as predictors IF they are put in as a factor. Obviously they handle

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
I had a look at the documentation of the package apcluster. That's interesting but do you have any example using it with both categorical and numerical variables? I'd like to test it with a large dataset.. Your posting has opened my eyes: problems where both numerical and categorical

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
Sorry, Joris, I overlooked that you already mentioned daisy() in your posting. I should have credited your recommendation in my previous message. Cheers, Ulrich -- View this message in context:

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread abanero
Ulrich wrote: Affinity propagation produces quite a number of clusters. I tried with q=0 and produces 17 clusters. Anyway that's a good idea, thanks. I'm looking to test it with my dataset. So I'll probably use daisy() to compute an appropriate dissimilarity then apcluster() or another

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Christian Hennig
Christian wrote: and the implement nearest neighbours classification myself if I needed it. It should be pretty straightforward to implement. Do you intend modify the code of the knn1() function by yourself? No; if you understand what the nearest neighbours method does, it's not very

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
What do you suggest in order to assign a new observation to a determined cluster? As I mentioned already, I would simply assign the new observation to the cluster to whose exemplar the new observation is most similar to (in a knn1-like fashion). To compute these similarities, you can use the

[R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-26 Thread abanero
Hi, I have a 1.000 observations with 10 attributes (of different types: numeric, dicotomic, categorical ecc..) and a measure M. I need to cluster these observations in order to assign a new observation (with the same 10 attributes but not the measure) to a cluster. I want to calculate for

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-26 Thread Joris Meys
Not a direct answer, but from your description it looks like you are better of with supervised classification algorithms instead of unsupervised clustering. see the library randomForest for example. Alternatively, you can try a logistic regression or a multinomial regression approach, but these