Hi, all, I am using solr-langid(Solr3.5.0) to do language detection, and I hope multiple languages in one text can be detected.
The example text is: 咖哩起源於印度。印度民間傳說咖哩是佛祖釋迦牟尼所創,由於咖哩的辛辣與香味可以幫助遮掩羊肉的腥騷,此舉即為用以幫助不吃豬肉與牛肉的印度人。在泰米爾語中,「kari」是「醬」的意思。在馬來西亞,kari也稱dal(當在mamak檔)。早期印度被蒙古人所建立的莫臥兒帝國(Mughal Empire)所統治過,其間從波斯(現今的伊朗)帶來的飲食習慣,從而影響印度人的烹調風格直到現今。 Curry (plural, Curries) is a generic term primarily employed in Western culture to denote a wide variety of dishes originating in Indian, Pakistani, Bangladeshi, Sri Lankan, Thai or other Southeast Asian cuisines. Their common feature is the incorporation of more or less complex combinations of spices and herbs, usually (but not invariably) including fresh or dried hot capsicum peppers, commonly called "chili" or "cayenne" peppers. I want the text can be separated into two parts, and the part in Chinese goes to "text_zh-tw" while the other one "text_en". Can I do something like that? Thank you. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821210.html Sent from the Solr - User mailing list archive at Nabble.com.