[
http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12413756 ]
Samphan Raruenrom commented on LUCENE-503:
--
All the code have been tested with Lucene 2.0.0.
Thanks Art for the info/URL. I've never known about Pichai's w
[ http://issues.apache.org/jira/browse/LUCENE-503?page=all ]
Samphan Raruenrom updated LUCENE-503:
-
Attachment: TestThaiAnalyzer.java
Add TestThaiAnalyzer junit test, modified from TestFrenchAnalyzer. The Thai
words are picked so that changing the
[
http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12377206 ]
Samphan Raruenrom commented on LUCENE-503:
--
> -It uses the english stop words, does that make sense?
Yes. Thai usually mix English words in Thai text here and th
I've submited a ThaiAnalyzer for Lucene here
http://issues.apache.org/jira/browse/LUCENE-503?page=all
I want to contribute the code and have signed the CLA.
Can anyone review the code?
--
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128,
nalyzers in the contrib section of Lucene.
I hope DictionaryBasedBreakIterator is not a compile-time dependency, because
we probably can't distribute ICU4J due to the license.
Otis
- Original Message
From: Samphan Raruenrom <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
[
http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12374136 ]
Samphan Raruenrom commented on LUCENE-503:
--
I've changed the code to use java.text.BreakIterator instead of ICU4j to remove
the dependency on ICU4j. The ThaiAn
[ http://issues.apache.org/jira/browse/LUCENE-503?page=all ]
Samphan Raruenrom updated LUCENE-503:
-
Attachment: ThaiWordFilter.java
ThaiWordFilter which use java.text.BreakIterator to break Thai words into tokens
> Contrib: ThaiAnalyzer to ena
[ http://issues.apache.org/jira/browse/LUCENE-503?page=all ]
Samphan Raruenrom updated LUCENE-503:
-
Attachment: ThaiAnalyzer.java
ThaiAnalyzer which simply return a TokenFilter chain with ThaiWordFilter in the
middle
> Contrib: ThaiAnalyzer
Versions: 1.4
Reporter: Samphan Raruenrom
Thai text don't have space between words. Usually, a dictionary-based algorithm
is used to break string into words. For Lucene to be usable for Thai, an
Analyzer that know how to break Thai words is needed.
I've implemented suc
"issue" in JIRA, and attach your code to it. We can put the
analyzers in the contrib section of Lucene.
I hope DictionaryBasedBreakIterator is not a compile-time dependency, because
we probably can't distribute ICU4J due to the license.
Otis
- Original Message ----
From:
code using the Apache license,
so it'll be useful to other people.
How can I do this?
I see analyzers for various languages in the Sandbox.
How can I put the code there?
Thanks.
--
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128, http://www.o
11 matches
Mail list logo