Hypothetical: What about mixed language texts such as a Greek/French lexicon?

DM

> On Feb 21, 2017, at 4:56 PM, Troy A. Griffitts <scr...@crosswire.org> wrote:
> 
> 
> Simply don't use the UTF-8 Greek Accent filter on non-Greek texts. As you 
> have discovered there are accents used in Greek which are also used in other 
> languages and adverse effects will be seen for these languages. The bottom 
> line is simple. Only use the UTF-8 Greek Accents filter on UTF-8 Greek texts.
> 
> Hope this helps.
> 
> On February 21, 2017 2:45:24 PM MST, David Haslam <dfh...@googlemail.com> 
> wrote:
> These are the principal diacritics found in Biblical Greek that have to be
> removed with a UTF8GreekAccents filter.
> 
> The first five are general accents, not particular to Greek.
> It's on account of these that the filter should not be applied to non-Greek
> text.
> 
> U+0300 ̀ COMBINING GRAVE ACCENT
> U+0301 ́ COMBINING ACUTE ACCENT
> U+0308 ̈ COMBINING DIAERESIS
> U+0313 ̓ COMBINING COMMA ABOVE
> U+0314 ̔ COMBINING REVERSED COMMA ABOVE
> U+0342 ͂ COMBINING GREEK PERISPOMENI
> U+0343 ̓ COMBINING GREEK KORONIS
> U+0344 ̈́ COMBINING GREEK DIALYTIKA TONOS
> U+0345 ͅ COMBINING GREEK YPOGEGRAMMENI
> 
> No other diacritics or characters should be removed. 
> Though there are a few more combining accents in this block, they aren't
> really used in Biblical Greek.
> I am open to correction on this point.
> 
> e.g. The right single quotation mark (U+2019) is NOT a diacritic. It should
> not be removed.
> 
> Before any of these accents can be removed, they must first be separated
> from the Greek letters they are combined with. 
> 
> Although normalization to the decomposed form can produce this effect, as we
> have seen already, this can have undesirable side effects on any non-Greek
> text in the module that may happen to include combined or unusual
> characters.
> 
> It would therefore be more sensible to simply use a comprehensive mapping
> table that replaces each possible accented character by the corresponding
> letter in the Greek alphabet. In this way the filter can completely avoid
> the need to apply any Unicode normalization. 
> 
> The complete mapping table would have at least 130 rows. It will need to
> take into account that there are at least 75 possible combinations of a
> letter with two accents. There are none with three.
> 
> Any residual combining characters should also be removed, to cover the
> possibility that a module may have been intentionally made without
> normalizing the Greek source text by default to NFC.
> 
> That's my proposal. I can easily create such a mapping table that
> programmers can use.
> I can also readily test it with a bespoke TextPipe filter.
> 
> 
> Best regards,
> 
> David
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656765.html
>  
> <http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656765.html>
> Sent from the SWORD Dev mailing list archive at Nabble.com 
> <http://nabble.com/>.
> 
> 
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel 
> <http://www.crosswire.org/mailman/listinfo/sword-devel>
> Instructions to unsubscribe/change your settings at above page
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to