Correct. JSword uses Lucene's filter for the language, which does more 
normalization than the StandardAnalyzer which SWORD uses exclusively. The 
StandardAnalyzer should only be used for "unaccented" latinate text. Same with 
the SimpleAnalyzer. (In Lucene, an analyzer is a filter chain which normalizes 
text. Rule-of-Thumb: the same should be used for both index construction and 
searching.)

Each release of Lucene adds and/or improves the filters for non-latin text.

The biggest problem with using a new version of Lucene is that it invalidates, 
without notice, prior indexes. An analyzer may change from release to release. 
It has been true of the StandardAnalyzer. The impact is that the number of 
search hits may be reduced, perhaps to 0.

Both SWORD and JSword need a mechanism to record the version of Lucene that is 
used in constructing an index and to refuse to search an index unless the 
version of Lucene for searching and indexing match.

Also of note, there have been some substantial changes to Unicode from release 
to release. So, if the version unicode used by the OS, Java, ICU, .... changes, 
the index may no longer be valid. From what I can tell this will be minority 
languages.

In Him,
        DM Smith


On Nov 26, 2012, at 7:22 AM, Peter von Kaehne <ref...@gmx.net> wrote:

> 
>> Von: David Haslam <dfh...@googlemail.com>
> 
>> So a similar patch would be necessary in principle to JSword ???
> 
> No. If And Bible does not have a problem, then Jsword does its job correctly.
> 
> Peter
> 
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to