Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-11 Thread Tuba Lambanog
Hi, Petr, Thank you for the pointer to the site. Indeed a treasure trove of ideas on stemmer algorithms. Tuba On Thu, Aug 11, 2011 at 8:45 AM, Petr Gladkikh wrote: > On Mon, Aug 8, 2011 at 1:46 PM, Tuba Lambanog > wrote: > > Hello, > > > > I’m doing a word stemmer for a non-English language. A

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-11 Thread Petr Gladkikh
On Mon, Aug 8, 2011 at 1:46 PM, Tuba Lambanog wrote: > Hello, > > I’m doing a word stemmer for a non-English language. A stemmer parses > a word into its word parts: prefixes, roots, suffixes. The input word > is at least a root word (English example would be ‘cloud’), but can be > any combination

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-08 Thread Resty Cena
Hi, Ken, Thanks for the suggestion. As I was looking at a suffix tree, it suddenly struck me that the following strategy may do just as well: 1. Use rest and next to generate the tentative suffix sets, thus for "directional", it will give the set of #{irectional rectional ectional ctional tiona

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-08 Thread Ken Wesson
On Mon, Aug 8, 2011 at 11:41 AM, Tuba Lambanog wrote: > Hi, > Thank you for the tip. It does look like the Patricia tree -- or suffix tree > -- is made-to-order for this kind of task. I'm reading up on it. You're welcome. > Would there be a Clojure implementation of this technology, I wonder. E

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-08 Thread Tuba Lambanog
Hi, Thank you for the tip. It does look like the Patricia tree -- or suffix tree -- is made-to-order for this kind of task. I'm reading up on it. Would there be a Clojure implementation of this technology, I wonder. Tuba On Mon, Aug 8, 2011 at 1:40 AM, Ken Wesson wrote: > On Mon, Aug 8, 2011 at

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-08 Thread Resty Cena
Hi, Andreas, << I don't quite understand what you mean by "I’m having a hard time thinking through the process of generating the candidate suffix set using set forms" >> It is my usual roundabout way of saying "I don't know how to do this." ;) I'm looking at your code as we speak. Thanks, Tuba

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-08 Thread Ken Wesson
On Mon, Aug 8, 2011 at 2:46 AM, Tuba Lambanog wrote: > I’m having a hard time thinking through the process of generating the > candidate suffix set using set forms, and I’m beginning to think I > have selected an arduous path (for me). > > Thoughts? Store the prefixes in a patricia tree, and the

Re: "Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-08 Thread Andreas Kostler
Hi Tuba, I don't quite understand what you mean by "I’m having a hard time thinking through the process of generating the candidate suffix set using set forms" but I have created a porter stemmer for English in the past. I understand that's not what you're looking for but it is moreso a framwork fo

"Elegant tools deserve elegant solutions." -- L. E. Gant

2011-08-07 Thread Tuba Lambanog
Hello, I’m doing a word stemmer for a non-English language. A stemmer parses a word into its word parts: prefixes, roots, suffixes. The input word is at least a root word (English example would be ‘cloud’), but can be any combination of prefix(es) and a root (e.g., 'pre-nuptial'), or a root and s