Hello, Wakayama-san,

I read oktavia's web site. Great. It may be the one I wanted.

Have you tried it on sphinx already? If yes, what is your impression of it?

Has sphinx community known the existence of oktavia?

Anyway, I'll try the following tutorial later.

Install Oktavia to Sphinx:
http://oktavia.info/pages/doc/install_to_sphinx.html

2014年1月7日火曜日 10時14分30秒 UTC+9 shirou:
>
> Hi watanabe-san, 
>
> In current implementation, search runs on the browser side, not 
> server-side. So you need download the dictionary. 
> However, n-gram's dictionary may become fairly big and hard to download. 
>
> How about using oktavia (http://oktavia.info/) ? 
> It can search any language very fast with a small dictionary by using 
> FM-index and can be used from Sphinx. 
>
> WAKAYAMA Shirou 
>
>
> 2014/1/7 Kevin Horn <[email protected] <javascript:>>: 
> > This isn't necessarily about Asian languages, but if you're interested 
> in 
> > FTS for Sphinx, you may want to take a look at the whoosh builder 
> extension, 
> > in the sphinx-contrib repo: 
> > https://bitbucket.org/birkenfeld/sphinx-contrib 
> > 
> > 
> > 
> > On Mon, Jan 6, 2014 at 4:41 AM, Hiroki Watanabe 
> > <[email protected]<javascript:>> 
>
> > wrote: 
> >> 
> >> Hello, 
> >> 
> >> > Takayuki SHIMIZUKAWA wrote: 
> >> > FYI, the sphinx built-in search feature provides 2 language mode:'en' 
> >> > and 'ja'. 
> >> 
> >> Does sphinx hava a plan to introduce a language independent tokenizer 
> into 
> >> Sphinx to support not only Japanese but also Chinese, Korean and Thai. 
> >> These Asian languages also are not separated by white-space like 
> Japanese. 
> >> 
> >> TinySegmenter, which is Sphinx's tokenizer for Japanese, does not work 
> >> well for Chinese/Korean/Thai. 
> >> 
> >> I tested TinySegmenter on Chinese and Korean by TinySegmenter Online 
> Demo. 
> >> 
> >> TinySegmenter Online Demo: 
> >> http://chasen.org/~taku/software/TinySegmenter/ 
> >> 
> >> And the followings are results: 
> >> 
> >> 北京首都国际机场 (Beijing Capital International Airport) 
> >> TinySegmenter: 北京首 | 都国 | 际机 | 场 
> >> Expected: 北京 | 首都 | 国际 | 机场 
> >> 
> >> 인천국제공항 (Incheon International Airport) 
> >> TinySegmenter: 인 | 천 | 국제 | 공 | 항 
> >> Expected: 인천 | 국제 | 공항 
> >> 
> >> As you see, TinySegmenter does not work well for these languages. 
> >> 
> >> I think Mozilla Thunderbird team's approach can be adapted to sphinx 
> also. 
> >> The following site descries that they had a problem their full test 
> search 
> >> did not work for CJK and how they solved it. 
> >> 
> >> Thunderbird 3.0 global / full-text search support for CJK languages 
> >> landed, 
> >> will show up in nightlies tomorrow, requires a new database. 
> >> 
> >> 
> https://groups.google.com/forum/#!topic/mozilla.dev.apps.thunderbird/v0_gbw4LIKo
>  
> >> 
> >> They solved it by enhancing SQLite's porter tokenizer with bi-gram 
> >> algorithm. 
> >> 
> >> SQLite fts3_porter.c which is enhanced with bi-gram algorithm by 
> Mozilla 
> >> Thunderbird team: 
> >> 
> >> 
> http://hg.mozilla.org/comm-central/file/tip/mailnews/extensions/fts3/src/fts3_porter.c
>  
> >> 
> >> I think introducing SQLite FTS into sphinx may be difficult and not 
> >> appropriate, but their approach itself is valuable to be considered to 
> >> support multi-language search function. 
> >> 
> >> Best regard, 
> >> 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "sphinx-users" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an 
> >> email to [email protected] <javascript:>. 
> >> To post to this group, send email to 
> >> [email protected]<javascript:>. 
>
> >> Visit this group at http://groups.google.com/group/sphinx-users. 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > 
> > 
> > 
> > -- 
> > -- 
> > Kevin Horn 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sphinx-users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to [email protected] <javascript:>. 
> > To post to this group, send email to 
> > [email protected]<javascript:>. 
>
> > Visit this group at http://groups.google.com/group/sphinx-users. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sphinx-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to