Re: Stemming in openNLP

Ling Fri, 07 Jul 2017 09:31:16 -0700

This is the function to getLemma() from "org.cleartk.token.type" package:


 //*--------------*
  //* Feature: stem

  /** getter for stem - gets
   * @generated
   * @return value of the feature
   */
  public String getStem() {
    if (Token_Type.featOkTst && ((Token_Type)jcasType).casFeat_stem == null)
      jcasType.jcas.throwFeatMissing("stem",
"org.cleartk.token.type.Token");
    return jcasType.ll_cas.ll_getStringValue(addr,
((Token_Type)jcasType).casFeatCode_stem);}

Anyway, I will directly use openNLP's new release without involving other
libraries. For openNLP, a lemmatization or stemming algorithm something
similar to WordNet seems  working better than Porter stemmer.

On Fri, Jul 7, 2017 at 9:24 AM, John Stewart <[email protected]> wrote:

> Which library?  They may be providing a trained model or a dictionary,
> distinct from the data files released by the OpenNLP project.
>
> jds
>
> On Thu, Jul 6, 2017 at 11:47 PM, Ling <[email protected]> wrote:
>
> > I use it indirectly through another library, there is a function
> > token.getLemma().
> >
> > On Jul 6, 2017 7:24 PM, "John Stewart" <[email protected]> wrote:
> >
> > > I'm asking because I thought there are no pre-trained models for the
> > > lemmatizer. How are you using it exactly?  There's also an option to
> use
> > a
> > > dictionary, e.g.
> > > https://stackoverflow.com/questions/38982423/opennlp-
> > lemmatization-example
> > >
> > > AFAIK the models in 1.8.1 are the same as 1.5.3
> > >
> > > jds
> > >
> > > On Thu, Jul 6, 2017 at 6:26 PM, Ling <[email protected]> wrote:
> > >
> > > > The openNLP1.5.3. I will update to 1.8.1 version after this week, if
> > it's
> > > > an issue due to old models.
> > > >
> > > > Thanks.
> > > >
> > > > On Thu, Jul 6, 2017 at 3:19 PM, John Stewart <[email protected]>
> > > wrote:
> > > >
> > > > > What model or dictionary are you using with the lemmatizer?
> > > > >
> > > > > jds
> > > > >
> > > > > On Thu, Jul 6, 2017 at 6:05 PM, Ling <[email protected]> wrote:
> > > > >
> > > > > > Hi, the problem with lemma is that, for "tmoble", the lemma
> > returned
> > > by
> > > > > > openNLP is "null", not "tmoble".
> > > > > >
> > > > > > Why is it?
> > > > > >
> > > > > > On Mon, Jul 3, 2017 at 6:54 PM, Rakesh P <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > Stemmer works based on some predefined rules. Examples for
> rules
> > > are
> > > > > > "word
> > > > > > > that ends with 'e'". So, if you want to get a meaning word
> after
> > > > > > > preprocessing, then better use lemmatization.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Rakesh P
> > > > > > >
> > > > > > > > On 03-Jul-2017, at 10:24 PM, Ling <[email protected]>
> wrote:
> > > > > > > >
> > > > > > > > Hi, I noticed that some words are stemmed like the following:
> > > > > > > >
> > > > > > > > iphone ->  iphon
> > > > > > > > tmobile -> T-mobil
> > > > > > > >
> > > > > > > > Is there some parameter to control this behavior? In such
> > cases,
> > > > > those
> > > > > > > > stems are actually harmful, making them become unknown words
> in
> > > > text.
> > > > > > > Since
> > > > > > > > these are quite common, I am just curious whether there is a
> > way
> > > to
> > > > > > > change
> > > > > > > > the default behavior.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Ling
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Stemming in openNLP

Reply via email to