Re: [R] text vector clustering

2009-01-26 Thread San Miguel Martín , Eduardo
Dear srinivas, You can try using trigrams, a special case of N-grams, often used in Natural Language Processing. > I am interested in grouping/cluster these names as those which are >similar letter to letter. Are there any text clustering algorithm in R >which can group names of similar ty

Re: [R] text vector clustering

2009-01-23 Thread Ed Merkle
again. Ed -- Ed Merkle, PhD Assistant Professor Dept. of Psychology Wichita State University Wichita, KS 67260 Date: Thu, 22 Jan 2009 16:33:03 +0530 From: srinivasa raghavan Subject: [R] text vector clustering To: r-help@r-project.org Message-ID: Content-Type: text/plain Hi, I am

Re: [R] text vector clustering

2009-01-23 Thread Stefan Th. Gries
On Fri, Jan 23, 2009 at 08:28, Stefan Th. Gries wrote: > Hans-Joerg Bibiko's function Levenshtein would help; cf. below for an > example (very clumsy with two loops, but you can tweak that with apply > stuff). Like this maybe (sorry, should've thought about that earlier): [...] x<-rep(all.names,

Re: [R] text vector clustering

2009-01-23 Thread Stefan Th. Gries
Hans-Joerg Bibiko's function Levenshtein would help; cf. below for an example (very clumsy with two loops, but you can tweak that with apply stuff). HTH, STG levenshtein <- function(string1, string2, case=TRUE, map=NULL) { # levenshtein algorithm in R # #

Re: [R] text vector clustering

2009-01-22 Thread David Winsemius
Simply doing a tabulation and isolating the cases with only one entry might have been a possibility if the count discrepancy weren't so high. It appears you have a greater degree of corruption than would be expected just from "typos". Have you looked at the packages referenced at: http:

[R] text vector clustering

2009-01-22 Thread srinivasa raghavan
Hi, I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with single column which contain the 30,000 students names. There were typo errors while entering this student names. The actual list of names is < 1000. However we dont have that list for keyword search. I am interested