Re: [sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-02 Thread Tom Sullivan
In the meantime, I suppose that users ought to be instructed to search using the final sigma for words that end in sigma, and normal sigma otherwise, and to use lower case only. Or one might just not insert final sigmas, but that might broaden search results. Tom Sullivan i...@beforgiven.info

Re: [sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-01 Thread Nic Carter
PocketSword uses the standard SWORD library search implementation, using CLucene. Last I looked, the C version is a _long_ way behind the Java version (Lucene). The C version seemed to stop being developed after it worked well enough for English text and didn’t seem to get any love for other lan

Re: [sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-01 Thread DM Smith
From memory, SWORD uses SimpleAnalyzer. This analyzer works well for Western European languages. It won’t for non-latinate texts. It may work in part. The basic rule of thumb is that both the index has to be created with an analyzer and the search request has to be analyzed the same. PocketSwor

[sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-01 Thread TS
Does the CLucene indexing work for non-English texts? David's recent question about languages without spaces caused me to be a bit curious about this matter. Briefly looking at the current Apache Lucene code, their appears to be extra code for non-English text. However, this is in comparison to