Hi Byron, > For example, I have a section of text is as follows: > > "紐約洋基今天在新球場鏖戰14局,靠著卡布瑞拉的再見全壘打,9比7轟走奧克蘭運動 > 家,拿下主場首次延長賽勝利。" > > If I want to search a string as "卡布瑞拉" > > When Xapian is disabled, I can get a right search result.
Yes, the slow search mostly does a substring search. > But, when I enable Xapian, MoinMoin could not find the string. > > If I enter a string as "靠著卡布瑞拉的再見全壘打", MoinMoin shows the > result. Xapian search only finds stuff that was put into the index. When doing indexing, it finds stuff for putting into the index by running the text through a tokenizer and then puts the tokens it yields into the index. For example, for a string like "foo plus bar is FooBar.", the tokenizier should yield something like: foo, plus, bar, is, FooBar, Foo, Bar. You see, the camelcase word FooBar is split and yielded also as its components, but the normal splitting just happens at whitespace or punctuation. The builtin tokenizer works quite ok for English and other alphabetic languages, but I guess it is not appropriate for Chinese or other symbolic(?) languages. So what is needed for chinese is a chinese tokenizer. Most current moin developers have no clue about chinese, but well commented patches are welcome. Whoever works on that, please communicate with us often. Please note that this problem is quite hard to solve in a generic way, especially if the kind/language of text is unknown or mixed. > So, when I enable the Xapian, I could not find the world in a sentence not > only wiki page content but also attachment content > > If I wanna enable the Xapian function, how can I resolve this problem? If you have 100% chinese stuff, you need a different tokenizer. If you have mixed stuff, you need multiple tokenizers and some intelligent means to switch them depending on the content. Cheers, Thomas ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensign option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Moin-user mailing list Moin-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/moin-user