Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread David Haslam
"Eventually" has happened. Enjoy! -- View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656861.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread David Haslam
For use with TextPipe, it's preferred to use the PCRE expression in the Find column. That's always my own method. Byte codes are harder to look up and recognise by human eyes. If I could, I'd use PCRE expressions in the Replace column, but TextPipe doesn't facilitate that. It's easy enough for

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread Peter Von Kaehne
rd-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek > modules > > And now for the world at large. > > https://github.com/DavidHaslam/UTF8-Greek-Accents > > Enjoy. > > btw. This isn't my first GitHub repo. Feel free to explore the others. > > David > >

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread David Haslam
And now for the world at large. https://github.com/DavidHaslam/UTF8-Greek-Accents Enjoy. btw. This isn't my first GitHub repo. Feel free to explore the others. David -- View this message in context:

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread David Haslam
Done! Including again for DM. -- View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656857.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread DM Smith
Would you mind sending this to me too? Thx, DM > On Mar 1, 2017, at 7:44 AM, David Haslam wrote: > > Hi Peter, > > I can readily email you my Excel workbook which contains my analysis of > Greek accents in Unicode, complete with character names and U+ numbers. > >

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread David Haslam
Hi Peter, I can readily email you my Excel workbook which contains my analysis of Greek accents in Unicode, complete with character names and U+ numbers. btw. My replacement tab file has PCRE patterns in column 1 and UTF-8 byte codes in column 2. I will include this derived text file in the

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread Peter Von Kaehne
want to make a module for your own purposes that is fine, but a simple string will do fine. Peter > Gesendet: Mittwoch, 01. März 2017 um 11:08 Uhr > Von: "David Haslam" <dfh...@googlemail.com> > An: sword-devel@crosswire.org > Betreff: Re: [sword-devel] GlobalO

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-03-01 Thread David Haslam
Added yesterday's comment to the issue in our tracker. http://tracker.crosswire.org/browse/API-198 David -- View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656851.html Sent from the SWORD Dev mailing

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-28 Thread David Haslam
This is a hunch, but I'm thinking that it's very likely that three more Greek combining characters near to that may also fail to be removed by the UTF8GreekAccents filter. The overlooked set would then be: U+0342 ͂ COMBINING GREEK PERISPOMENI U+0343 ̓ COMBINING GREEK KORONIS U+0344

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-28 Thread David Haslam
The Tyndale House edition of the SBL GNT (module name SBLG-THE renamed to SBLG_THE) was normalized to NFC during build, no doubt because the TH programmers would not be fully aware of the -N option in osis2mod. This makes a difference, in that the GreekAccents filter doesn't do anything that I

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-28 Thread David Haslam
Could take a while to do a full analysis on all the Greek modules with accents. The situation is further complicated by the fact that at least one Greek NT module was made without automatically normalising the UTF-8 text to NFC, so there are still a smattering of these separate diacritics next to

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-28 Thread Peter Von Kaehne
> Gesendet: Dienstag, 28. Februar 2017 um 07:48 Uhr > Von: "David Haslam" > Who knows? Maybe there are some badly constructed modules with *Lang=grc* > that actually used the wrong codepoint? Even if this were the case, it must > surely be a mistake to filter these out.

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-27 Thread David Haslam
As yet, nobody has given me any explanation of why the *right single quotation mark* U+2019 is treated as a Greek Accent that gets removed when this filter is used. A quotation mark is not a Greek diacritic. See https://en.wikipedia.org/wiki/Greek_diacritics My hunch is that because in some

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
A similar test on my Panjabi module (still WIP) but using GlobalOptionFilter=UTF8HebrewPoints demonstrated that there was no difference made to the output. The fact that vulgar fraction ¾ was left unaltered is proof that this filter does not use Unicode Normalisation to decompose combined

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
Greg, COMBINING GREEK YPOGEGRAMMENI is certainly a diacritic in terms of Unicode classification! Full name = U+0345 COMBINING GREEK YPOGEGRAMMENI : iota subscript It's in the block called "Combining Diacritical Marks". Full details: (courtesy of BabelPad or BabelMap) Character Properties for

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread Greg Hellings
On Wed, Feb 22, 2017 at 1:58 AM, David Haslam wrote: > The UTF8GreekAccents filter also fails to remove one particular Greek > diacritic. > > At least, that's the case for how it works within diatheke 4.7 distributed > with Xiphos 4.0.4 > > After the normalization step, it

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
I have created an issue in the CrossWire tracker. http://tracker.crosswire.org/browse/API-198 Note the temporary URL for the tracker. This link may not be permanent. Best regards, David -- View this message in context:

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
But not if the logical syllogism is flawed! It really was a case of affirming the consequent. Might be elegant, but elegance can sometimes lead us astray. Several major security vulnerabilities in high value services are due to a similar cause. Blessings! David -- View this message in

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread Peter Von Kaehne
> Von: "David Haslam" > A certain elegance, eh? Absolutely Yes, it is an afterthought for a feature which had a specific purpose, but it is in its own right a reasonably enough purpose. Specifically, using the engine to determine module features means that 1) None

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
A certain elegance, eh? https://en.wikipedia.org/wiki/Affirming_the_consequent :) David Prov 31:30 Favour is deceitful, and beauty is vain: but a woman that feareth the Lord, she shall be praised. -- View this message in context:

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
It's not just how the filter was compiled into diatheke. The same bug also occurs in Xiphos itself. e.g. In the word *τῳ* in this verse (2TGreek): Mt 1:18 Του δε Ιησου Χριστου η γενεσις ουτως ην. μνηστευθεισης της μητρος αυτου Μαριας τῳ Ιωσηφ, πριν η συνελθειν αυτους ευρεθη εν γαστρι εχουσα

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread Peter von Kaehne
Thanks Troy,  On Tue, 2017-02-21 at 16:36 -0700, Troy A. Griffitts wrote: > The filter is meant to be used on > UTF-8 Greek text.  I certainly receive the suggestion it might be > used in another case, as you suggest, and would consider that an > improvement, if we ever have a solid use case.

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-22 Thread David Haslam
The UTF8GreekAccents filter also fails to remove one particular Greek diacritic. At least, that's the case for how it works within diatheke 4.7 distributed with Xiphos 4.0.4 After the normalization step, it actually leaves in the following: U+0345 ͅ COMBINING GREEK YPOGEGRAMMENI This a

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
The one thing that the Module Team did that should not be done was to use the filter UTF8GreekAccents to detect change and mistakenly conclude that a module being prepared for release contained Greek. This is a simple thing to stop doing, now that we understand in part why the logic is flawed.

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread Matěj Cepl
On 2017-02-21, 01:47 GMT, Troy A. Griffitts wrote: > I would be concerned first that the module was properly encoded UTF-8. It is properly encoded UTF-8. Sources are available at https://gitlab.com/bible_sword/CzeKMS Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mc...@ceplovi.cz GPG

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread Troy A. Griffitts
Well, hypothetically, we might be able to make a reasonable attempt to teach the filter when to strip by determine which adjacent character an accent might be modifying and conditionally strip or not strip, but pragmatically, this filter is used to remove Greek accents while searching Greek

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread DM Smith
Hypothetical: What about mixed language texts such as a Greek/French lexicon? DM > On Feb 21, 2017, at 4:56 PM, Troy A. Griffitts wrote: > > > Simply don't use the UTF-8 Greek Accent filter on non-Greek texts. As you > have discovered there are accents used in Greek

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread Troy A. Griffitts
Simply don't use the UTF-8 Greek Accent filter on non-Greek texts. As you have discovered there are accents used in Greek which are also used in other languages and adverse effects will be seen for these languages. The bottom line is simple. Only use the UTF-8 Greek Accents filter on UTF-8

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
These are the principal diacritics found in Biblical Greek that have to be removed with a UTF8GreekAccents filter. The first five are general accents, not particular to Greek. It's on account of these that the filter should not be applied to non-Greek text. U+0300 ̀ COMBINING GRAVE ACCENT

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
A further idiosyncrasy of the UTF8GreekAccents filter that proves to be an interesting clue: It changes U+00BE VULGAR FRACTION THREE QUARTERS ¾ to ordinary 3/4. Vulgar fractions are about as far as you can get from Koine Greek, nicht wahr? This is what I think this proves: It must first

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
Further proof (specially for Peter) As far as I know, Luther wasn't Greek. A similar experiment with module GerLut1545 showed that all the umlauts are removed by the UTF8GreekAccents filter. diff B S:/Export/GerLut1545/2014-01-17/GerLut1545.diatheke.character.frequency.txt

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
Let's hear it from some of the coders, please. I'd be way out of my depth if I dived into C++ at my age. My IT skills are better devoted to testing and reporting issues. Best regards, David -- View this message in context:

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread Peter Von Kaehne
aslam" <dfh...@googlemail.com> > An: sword-devel@crosswire.org > Betreff: Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek > modules > > Hi Peter, > > I'm sure that your method should work correctly for the UTF-8 Arabic and > Hebrew filters, since

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
Hi Peter, I'm sure that your method should work correctly for the UTF-8 Arabic and Hebrew filters, since AFAIK, and as per my limited amount of testing yesterday, those filters are well behaved and have restricted scope. If there'd been nothing wrong with UTF8GreekAccents as a filter, then your

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread Peter Von Kaehne
aslam" <dfh...@googlemail.com> > An: sword-devel@crosswire.org > Betreff: Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek > modules > > Hi Troy, > > Surely there's no doubt the module source text was correctly encoded as > UTF-8 and normalised to NFC? >

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
Further proof if this were even needed: I temporarily added the Greek Accents filter to the conf file for the French Bible module FreBBB. Then I ran diatheke on the module. (i.e. default, without any option "-oa" that would include Greek Accents). It removed all Latin character diacritics:

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-21 Thread David Haslam
Hi Troy, Surely there's no doubt the module source text was correctly encoded as UTF-8 and normalised to NFC? We can examine the output of mod2imp and see that it is. Or am I missing something? mod2imp doesn't change the normalisation form, and I assume it doesn't change the encoding either.

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-20 Thread Troy A. Griffitts
I would be concerned first that the module was properly encoded UTF-8. On February 20, 2017 9:23:24 AM MST, David Haslam wrote: >Although it wasn't appropriate to include the line in the configuration >file, >I observed that when the module option Greek Accents is unticked

Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-20 Thread David Haslam
OK - to save others doing the testing, these three filters are OK. GlobalOptionFilter=UTF8ArabicPoints GlobalOptionFilter=UTF8Cantillation GlobalOptionFilter=UTF8HebrewPoints i.e They don't change the default diatheke output of the CzeCEP module if they are added to the conf file. It's only the

[sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

2017-02-20 Thread David Haslam
Although it wasn't appropriate to include the line in the configuration file, I observed that when the module option Greek Accents is unticked in Xiphos for the module CzeCEP, it played havoc with the displayed Czech text. This would seem to suggest that the option filter UTF8GreekAccents is too