Thanks, Greg.

I just came across this

https://lucene.apache.org/core/3_2_0/api/contrib-analyzers/org/apache/lucene/analysis/th/ThaiWordFilter.html

Is that the kind of thing you were thinking of?

David

Sent from Proton Mail for iOS

On Mon, Apr 17, 2023 at 17:51, Greg Hellings 
<[greg.helli...@gmail.com](mailto:On Mon, Apr 17, 2023 at 17:51, Greg Hellings 
<<a href=)> wrote:

> I don't believe you're going to get that sort of feature directly in the 
> engine's simple search.
>
> However, if you're using a build of the library that utilizes CLucene or 
> Xapian, then that should be the function of those libraries. They are 
> supposed to be able to handle all of that type of functionality if the 
> language has a corresponding contribution to that library. It might be better 
> to check in with them.
>
> --Greg
>
> On Mon, Apr 17, 2023 at 11:46 AM David Haslam <dfh...@protonmail.com> wrote:
>
>> Unlike Hebrew and Arabic, etc, none of the names of the Thai Unicode 
>> characters contain the word FINAL. Likewise for Myanmar letters.
>>
>> A possible way forward might be to run one of the several Word Segmentation 
>> programs on the text of the ThaiKJV.
>>
>> Examples: KuCut, DeepCut, AttaCut
>>
>> This should insert a Unicode zero width non-joiner (ZWNJ) as a word 
>> separator.
>>
>> NB. The module would have to be updated using the segmented source text.
>>
>> Visually, the resulting text would display the same as the original, but the 
>> module would be amenable to indexing for word searches.
>>
>> A difficulty that might then arise is how the front-end user might enter the 
>> search query for an exact phrase search type (containing more than one 
>> word). Other search types (all words, any word) might be OK as is.
>>
>> Aside: The KuCut method developed in 2004 was originally trained using the 
>> text of the ThaKJV.
>>
>> Regards,
>>
>> David
>>
>> Sent from Proton Mail for iOS
>>
>> On Mon, Apr 17, 2023 at 17:16, Peter Von Kaehne 
>> <[ref...@gmx.net](mailto:On+Mon,+Apr+17,+2023+at+17:16,+Peter+Von+Kaehne+%3C%3Ca+href=)>
>>  wrote:
>>
>>> Does Thai Burmese etc etc use end forms for letters? if so, are these 
>>> encoded as such?
>>>
>>> Peter
>>> Gesendet: Montag, 17. April 2023 um 16:47 Uhr
>>> Von: "David Haslam" <dfh...@protonmail.com>
>>> An: sword-devel@crosswire.org
>>> Betreff: [sword-devel] Languages without a space between words
>>> How (if at all) does the SWORD API generate a search index for a module 
>>> that is for a language without a space between words?
>>>
>>> Please consider how best to generate a useful search index for modules that 
>>> are
>>> for Bible translations in languages that have no spaces between words.
>>>
>>> Example: CrossWire module ThaiKJV
>>>
>>> See
>>> https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries
>>> Has this ever been considered before.
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent from Proton Mail for iOS
>>> _______________________________________________ sword-devel mailing list: 
>>> sword-devel@crosswire.org http://crosswire.org/mailman/listinfo/sword-devel 
>>> Instructions to unsubscribe/change your settings at above page
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel@crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to