Comparing the results total 12460 to the number of module verses that contain any text (14212), a search that finds the 10 letter search key in 87.67% of the total is clearly a serious matter, one so egregious that it almost defies a rational explanation.
Here's a possible clue. Taking the /unique/ letters from the example search word, and inserting a space between each, we get this: ϩ ⲉ ⲏ ⲙ ⲛ ⲟ ⲡ ⲩ Using this as the search key, and selecting *multi-word* search type in Xiphos, I got 9049 results using the Advanced Search dialog. Now although that's only 72.6% of the original number of results, or 63.67% of the non-empty verses. One further observation is that the results verse list starts in almost the same way as before. Genesis 3:10,11,14,15,16,19,20,21,... However, with such high proportions of the non-empty verse count, this is not so surprising. This comparison suggests the following plausible explanation for the weird result with Lucene. Is the software used by the Lucene search treating each Coptic Letter as a Word ? i.e. Just as it should if each Unicode Symbol was an Egyptian Hieroglyph or a Han/Hangul Ideograph. Maybe this conjecture needs teasing out in further detail, if perhaps only some of the Coptic Letters are misclassified. After all, the Coptic letters in the module are from two separate Unicode blocks. But if this is really the root cause, then it's clearly a critical bug in the Lucene software. Can anyone think of a better explanation? Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657105.html Sent from the SWORD Dev mailing list archive at Nabble.com. _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page