Hi, Petr,
Thank you for the pointer to the site. Indeed a treasure trove of ideas on
stemmer algorithms.
Tuba
On Thu, Aug 11, 2011 at 8:45 AM, Petr Gladkikh wrote:
> On Mon, Aug 8, 2011 at 1:46 PM, Tuba Lambanog
> wrote:
> > Hello,
> >
> > I’m doing a word stemmer for a non-English language. A
On Mon, Aug 8, 2011 at 1:46 PM, Tuba Lambanog wrote:
> Hello,
>
> I’m doing a word stemmer for a non-English language. A stemmer parses
> a word into its word parts: prefixes, roots, suffixes. The input word
> is at least a root word (English example would be ‘cloud’), but can be
> any combination
Hi, Ken,
Thanks for the suggestion.
As I was looking at a suffix tree, it suddenly struck me that the following
strategy may do just as well:
1. Use rest and next to generate the tentative suffix sets, thus for
"directional", it will give the set of #{irectional rectional ectional
ctional tiona
On Mon, Aug 8, 2011 at 11:41 AM, Tuba Lambanog wrote:
> Hi,
> Thank you for the tip. It does look like the Patricia tree -- or suffix tree
> -- is made-to-order for this kind of task. I'm reading up on it.
You're welcome.
> Would there be a Clojure implementation of this technology, I wonder.
E
Hi,
Thank you for the tip. It does look like the Patricia tree -- or suffix tree
-- is made-to-order for this kind of task. I'm reading up on it. Would there
be a Clojure implementation of this technology, I wonder.
Tuba
On Mon, Aug 8, 2011 at 1:40 AM, Ken Wesson wrote:
> On Mon, Aug 8, 2011 at
Hi, Andreas,
<< I don't quite understand what you mean by "I’m having a hard time
thinking through the process of generating the
candidate suffix set using set forms" >>
It is my usual roundabout way of saying "I don't know how to do this." ;)
I'm looking at your code as we speak.
Thanks,
Tuba
On Mon, Aug 8, 2011 at 2:46 AM, Tuba Lambanog wrote:
> I’m having a hard time thinking through the process of generating the
> candidate suffix set using set forms, and I’m beginning to think I
> have selected an arduous path (for me).
>
> Thoughts?
Store the prefixes in a patricia tree, and the
Hi Tuba,
I don't quite understand what you mean by "I’m having a hard time
thinking through the process of generating the
candidate suffix set using set forms" but I have created a porter
stemmer for English in the past.
I understand that's not what you're looking for but it is moreso a
framwork fo
Hello,
I’m doing a word stemmer for a non-English language. A stemmer parses
a word into its word parts: prefixes, roots, suffixes. The input word
is at least a root word (English example would be ‘cloud’), but can be
any combination of prefix(es) and a root (e.g., 'pre-nuptial'), or a
root and s