Wikipedia says: "Languages like Japanese and Chinese have unambiguous
sentence-ending markers."
In this case we might be able to write a rule based sentence detector for
these languages?

Jörn

On Wed, Mar 21, 2012 at 3:18 PM, [email protected] <
[email protected]> wrote:

> Hi
>
> There is a Thai model for sentence detector. I don't know who created it,
> but someone from the list knows and can point to some article about it.
> What I can say is that OpenNLP had to be customized to work with Thai,
> including the EOS Characters that are ' ' and '\n'
>
>
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/sentdetect/lang/th/SentenceContextGenerator.java?view=markup
>
>
> William
>
>
> On Wed, Mar 21, 2012 at 8:05 AM, Jim - FooBar(); <[email protected]
> >wrote:
>
> > Basically you need to know the punctuation signs indicating end of
> > sentence or find someone who does...then use regex to split the sentences
> > at those signs! it's not gonna be perfect - you may have to pass it once
> or
> > twice with your own eyes to make sure everything is ok before training.
> > everything depends on the language and how ambiguous punctuation it has.
> >
> >
> > Jim
> >
> > On 20/03/12 18:38, Jairo Sarabia wrote:
> >
> >> Hi all,
> >>
> >> I see there aren't Sentence Detect Models for Asian languages in openNLP
> >> repository and I need these ones.
> >> I've to train Sentence Detect Models for Chinese, Japanese and Korean
> >> languages, but I don't know these languages.
> >> How coud I get the data train files for these languages?
> >>
> >> Thanks in advance!,
> >>
> >> Jairo Sarabia
> >>
> >>
> >
>

Reply via email to