Great suggestions all.  One thing to interject: SWORD raw search simply looks for a needles in a haystack-- it doesn't break words at all in the haystack.  Multi-word search-type will break the needles up by a space, e.g., if you search for "God love world" and specify multi-word then you effectively get a search for a 3 needles. "phrase" search-type takes the search term as one needle. Whether or not that would be more or less useful here, I'll let the language-informed determine.

On 4/17/23 11:24, Greg Hellings wrote:
Yes, that looks like the type of thing. Although that is for Lucene (Java). I don't know the status of CLucene's implementation of that nor of Xapian's. But that would be the proper place for such processing to occur. If those libraries do not have one, interested parties could submit one. They could probably develop it inside of the SWORD library to be sure it's doing what they want it to do (I believe those filters are designed to be pluggable by the calling application) before submitting it to those projects for inclusion.

--Greg

On Mon, Apr 17, 2023 at 1:12 PM David Haslam <dfh...@protonmail.com> wrote:

    Thanks, Greg.

    I just came across this

    
https://lucene.apache.org/core/3_2_0/api/contrib-analyzers/org/apache/lucene/analysis/th/ThaiWordFilter.html

    Is that the kind of thing you were thinking of?

    David

    Sent from Proton Mail for iOS


    On Mon, Apr 17, 2023 at 17:51, Greg Hellings
    <greg.helli...@gmail.com
    <mailto:On+Mon,+Apr+17,+2023+at+17:51,+Greg+Hellings+%3C%3Ca+href=>>
    wrote:
    I don't believe you're going to get that sort of feature directly
    in the engine's simple search.

    However, if you're using a build of the library that utilizes
    CLucene or Xapian, then that should be the function of those
    libraries. They are supposed to be able to handle all of that
    type of functionality if the language has a corresponding
    contribution to that library. It might be better to check in with
    them.

    --Greg

    On Mon, Apr 17, 2023 at 11:46 AM David Haslam
    <dfh...@protonmail.com> wrote:

        Unlike Hebrew and Arabic, etc, none of the names of the Thai
        Unicode characters contain the word FINAL. Likewise for
        Myanmar letters.

        A possible way forward might be to run one of the several
        Word Segmentation programs on the text of the ThaiKJV.

        Examples: KuCut, DeepCut, AttaCut

        This should insert a Unicode zero width non-joiner (ZWNJ) as
        a word separator.

        NB. The module would have to be updated using the segmented
        source text.

        Visually, the resulting text would display the same as the
        original, but the module would be amenable to indexing for
        word searches.

        A difficulty that might then arise is how the front-end user
        might enter the search query for an exact phrase search type
        (containing more than one word). Other search types (all
        words, any word) might be OK as is.

        Aside: The KuCut method developed in 2004 was originally
        trained using the text of the ThaKJV.

        Regards,

        David

        Sent from Proton Mail for iOS


        On Mon, Apr 17, 2023 at 17:16, Peter Von Kaehne
        <ref...@gmx.net
        <mailto:On+Mon,+Apr+17,+2023+at+17:16,+Peter+Von+Kaehne+%3C%3Ca+href=>>
        wrote:
        Does Thai Burmese etc etc use end forms for letters? if so,
        are these encoded as such?
        Peter
        *Gesendet:* Montag, 17. April 2023 um 16:47 Uhr
        *Von:* "David Haslam" <dfh...@protonmail.com>
        *An:* sword-devel@crosswire.org
        *Betreff:* [sword-devel] Languages without a space between words
        How (if at all) does the SWORD API generate a search index
        for a module that is for a language without a space between
        words?
        |Please consider how best to generate a useful search index
        for modules that are for Bible translations in languages
        that have no spaces between words. Example: CrossWire module
        ThaiKJV See
        
https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries
        Has this ever been considered before.|
        Best regards,
        David
        Sent from Proton Mail for iOS
        _______________________________________________ sword-devel
        mailing list: sword-devel@crosswire.org
        http://crosswire.org/mailman/listinfo/sword-devel
        Instructions to unsubscribe/change your settings at above page
        _______________________________________________
        sword-devel mailing list: sword-devel@crosswire.org
        http://crosswire.org/mailman/listinfo/sword-devel
        Instructions to unsubscribe/change your settings at above page
        _______________________________________________
        sword-devel mailing list: sword-devel@crosswire.org
        http://crosswire.org/mailman/listinfo/sword-devel
        Instructions to unsubscribe/change your settings at above page

    _______________________________________________
    sword-devel mailing list: sword-devel@crosswire.org
    http://crosswire.org/mailman/listinfo/sword-devel
    Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to