Dear srinivas,
You can try using trigrams, a special case of N-grams, often used in Natural
Language Processing.
> I am interested in grouping/cluster these names as those which are
>similar letter to letter. Are there any text clustering algorithm in R
>which can group names of similar ty
again.
Ed
--
Ed Merkle, PhD
Assistant Professor
Dept. of Psychology
Wichita State University
Wichita, KS 67260
Date: Thu, 22 Jan 2009 16:33:03 +0530
From: srinivasa raghavan
Subject: [R] text vector clustering
To: r-help@r-project.org
Message-ID:
Content-Type: text/plain
Hi,
I am
On Fri, Jan 23, 2009 at 08:28, Stefan Th. Gries wrote:
> Hans-Joerg Bibiko's function Levenshtein would help; cf. below for an
> example (very clumsy with two loops, but you can tweak that with apply
> stuff).
Like this maybe (sorry, should've thought about that earlier):
[...]
x<-rep(all.names,
Hans-Joerg Bibiko's function Levenshtein would help; cf. below for an
example (very clumsy with two loops, but you can tweak that with apply
stuff).
HTH,
STG
levenshtein <- function(string1, string2, case=TRUE, map=NULL) {
# levenshtein algorithm in R
#
#
Simply doing a tabulation and isolating the cases with only one entry
might have been a possibility if the count discrepancy weren't so
high. It appears you have a greater degree of corruption than would be
expected just from "typos".
Have you looked at the packages referenced at:
http:
Hi,
I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with
single column which contain the 30,000 students names. There were typo
errors while entering this student names. The actual list of names is <
1000. However we dont have that list for keyword search.
I am interested
6 matches
Mail list logo