I haven't heard of any machine learning based stemmers. I'm not really
sure what algorithm you would use to do stemming - what you'd be looking
for is something that says, well, running stemmed to run, walking
stemmed to walk, therefore hopping should stem to hop, but that'd be
quite an algorithm to develop, I'd say.

There are a few ways you could handle this:

1) locate a Bengali linguist who can help you define an algorithm
 2) manually stem a large number of documents and use that as a basis
 for stemming

If you had a stemmed corpus, you could simply use synonyms to do it, in
English, you could map:

run,running,runs,ran,runner=>run
walk,walked,walking,walker=>walk

Then all you need to do is generate a synonym file and use the
SynonymFilterFactory with it, in place of a stemmer.

Would that work?

Upayavira

On Thu, Sep 10, 2015, at 09:59 AM, Imtiaz Shakil Siddique wrote:
> Thanks for the reply.
> 
> Currently I have 20GB Bengali newspaper data ( for corpus building )
> I don't have manual stemmed corpus but if needed I will build one.
> 
> Basically I need guidance regarding how to do this.
> If there are some standard approaches of building stemmer and stopword
> for
> use with solr then please
> share it .
> 
> Thank you Upayavira for your kind help.
> 
> Imtiaz Shakil Siddique
> 
> 
> On 10 September 2015 at 13:23, Upayavira <u...@odoko.co.uk> wrote:
> 
> >
> >
> > On Thu, Sep 10, 2015, at 04:45 AM, Imtiaz Shakil Siddique wrote:
> > > Hi,
> > >
> > > I am trying to develop stemmer and stopword for Bengaly language which is
> > > not shipped with solr.
> > >
> > > I am trying to make this with machine learning approach but I couldn't
> > > find
> > > any good documents to study. It would be very helpful if you could shed
> > > some lights into this matter.
> >
> > How are you going to do this with machine learning? What corpus are you
> > going to use to learn from? Do you have some documents that have been
> > manually stemmed for which you also have the originals?
> >
> > Upayavira
> >

Reply via email to