Dear All, Some notes about the significance of the short vowels and Arabic search.
1. When short vowels are not present, the meaning of the word can be ambiguous. The reader disambiguates by context, logic, or previous knowledge of the verse. The difference between the verbs "to kill" or "to be killed" lies in the short vowel. Thus, we can never overestimate their importance. 2. Un-vowelized text is highly valuable in terms of search because it makes it much easier and beneficial. No Arabic searcher wants to type short vowels, it's tedious and you may get it wrong. Not only that... most queries into the text are supposed to be ambiguous. The fact that "to kill" and "to be killed" are packed in one word would make a vowel-free search equivalent to the same kind of search that occurs in English. In addition, words that differ only in their vowelization are often related in meaning. 3. Arabic has a root/pattern morphology that makes many search options possible. One can search for words with a similar root or with a similar pattern. There is even a hybrid approach that I explored in my masters thesis that converts related verbal nouns to their related verbs. This kind of stuff exists by default in English because "reader","reading","read", and "readable" will all show up in search because the "er","ing", and "able" are not mixed inside the word. Anyway, I bring all this up to say that it would be valuable to have non-vowelized search of vowelized text and to have varying modes of search. I did some work with lucene in Java and I'm aware that it is possible to implement different kind of filters and to keep track of the location of the token in the original document. The time I can spend on this is limited, almost none. However, if someone would like to take these insights and use them, it would be beneficial and interesting at the same time. If someone is interesed, I can alsp provide you with my M.S. thesis, which was about a configurable stemming engine. The implementation was evaluated within IR. However, the methods use may be of more value in Bible search. God bless, Kamal Abou Mikhael _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page