"Eventually" has happened. Enjoy!
--
View this message in context:
http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656861.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
___
For use with TextPipe, it's preferred to use the PCRE expression in the Find
column.
That's always my own method. Byte codes are harder to look up and recognise
by human eyes.
If I could, I'd use PCRE expressions in the Replace column, but TextPipe
doesn't facilitate that.
It's easy enough for
rd-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek
> modules
>
> And now for the world at large.
>
> https://github.com/DavidHaslam/UTF8-Greek-Accents
>
> Enjoy.
>
> btw. This isn't my first GitHub repo. Feel free to explore the others.
>
> David
>
>
And now for the world at large.
https://github.com/DavidHaslam/UTF8-Greek-Accents
Enjoy.
btw. This isn't my first GitHub repo. Feel free to explore the others.
David
--
View this message in context:
Done! Including again for DM.
--
View this message in context:
http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656857.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
___
Would you mind sending this to me too?
Thx,
DM
> On Mar 1, 2017, at 7:44 AM, David Haslam wrote:
>
> Hi Peter,
>
> I can readily email you my Excel workbook which contains my analysis of
> Greek accents in Unicode, complete with character names and U+ numbers.
>
>
Hi Peter,
I can readily email you my Excel workbook which contains my analysis of
Greek accents in Unicode, complete with character names and U+ numbers.
btw. My replacement tab file has PCRE patterns in column 1 and UTF-8 byte
codes in column 2.
I will include this derived text file in the
want to
make a module for your own purposes that is fine, but a simple string will do
fine.
Peter
> Gesendet: Mittwoch, 01. März 2017 um 11:08 Uhr
> Von: "David Haslam" <dfh...@googlemail.com>
> An: sword-devel@crosswire.org
> Betreff: Re: [sword-devel] GlobalO
Added yesterday's comment to the issue in our tracker.
http://tracker.crosswire.org/browse/API-198
David
--
View this message in context:
http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656851.html
Sent from the SWORD Dev mailing
This is a hunch, but I'm thinking that it's very likely that three more Greek
combining characters near to that may also fail to be removed by the
UTF8GreekAccents filter.
The overlooked set would then be:
U+0342 ͂ COMBINING GREEK PERISPOMENI
U+0343 ̓ COMBINING GREEK KORONIS
U+0344
The Tyndale House edition of the SBL GNT (module name SBLG-THE renamed to
SBLG_THE)
was normalized to NFC during build, no doubt because the TH programmers
would not be fully aware of the -N option in osis2mod.
This makes a difference, in that the GreekAccents filter doesn't do anything
that I
Could take a while to do a full analysis on all the Greek modules with
accents.
The situation is further complicated by the fact that at least one Greek NT
module was made without automatically normalising the UTF-8 text to NFC, so
there are still a smattering of these separate diacritics next to
> Gesendet: Dienstag, 28. Februar 2017 um 07:48 Uhr
> Von: "David Haslam"
> Who knows? Maybe there are some badly constructed modules with *Lang=grc*
> that actually used the wrong codepoint? Even if this were the case, it must
> surely be a mistake to filter these out.
As yet, nobody has given me any explanation of why the *right single
quotation mark* U+2019 is treated as a Greek Accent that gets removed when
this filter is used.
A quotation mark is not a Greek diacritic. See
https://en.wikipedia.org/wiki/Greek_diacritics
My hunch is that because in some
A similar test on my Panjabi module (still WIP) but using
GlobalOptionFilter=UTF8HebrewPoints
demonstrated that there was no difference made to the output.
The fact that vulgar fraction ¾ was left unaltered is proof that this filter
does not use Unicode Normalisation to decompose combined
Greg,
COMBINING GREEK YPOGEGRAMMENI is certainly a diacritic in terms of Unicode
classification!
Full name = U+0345 COMBINING GREEK YPOGEGRAMMENI : iota subscript
It's in the block called "Combining Diacritical Marks".
Full details: (courtesy of BabelPad or BabelMap)
Character Properties for
On Wed, Feb 22, 2017 at 1:58 AM, David Haslam wrote:
> The UTF8GreekAccents filter also fails to remove one particular Greek
> diacritic.
>
> At least, that's the case for how it works within diatheke 4.7 distributed
> with Xiphos 4.0.4
>
> After the normalization step, it
I have created an issue in the CrossWire tracker.
http://tracker.crosswire.org/browse/API-198
Note the temporary URL for the tracker. This link may not be permanent.
Best regards,
David
--
View this message in context:
But not if the logical syllogism is flawed!
It really was a case of affirming the consequent.
Might be elegant, but elegance can sometimes lead us astray.
Several major security vulnerabilities in high value services are due to a
similar cause.
Blessings!
David
--
View this message in
> Von: "David Haslam"
> A certain elegance, eh?
Absolutely Yes, it is an afterthought for a feature which had a specific
purpose, but it is in its own right a reasonably enough purpose.
Specifically, using the engine to determine module features means that
1) None
A certain elegance, eh?
https://en.wikipedia.org/wiki/Affirming_the_consequent
:)
David
Prov 31:30 Favour is deceitful, and beauty is vain: but a woman that feareth
the Lord, she shall be praised.
--
View this message in context:
It's not just how the filter was compiled into diatheke.
The same bug also occurs in Xiphos itself.
e.g. In the word *τῳ* in this verse (2TGreek):
Mt 1:18 Του δε Ιησου Χριστου η γενεσις ουτως ην. μνηστευθεισης της μητρος
αυτου Μαριας τῳ Ιωσηφ, πριν η συνελθειν αυτους ευρεθη εν γαστρι εχουσα
Thanks Troy,
On Tue, 2017-02-21 at 16:36 -0700, Troy A. Griffitts wrote:
> The filter is meant to be used on
> UTF-8 Greek text. I certainly receive the suggestion it might be
> used in another case, as you suggest, and would consider that an
> improvement, if we ever have a solid use case.
The UTF8GreekAccents filter also fails to remove one particular Greek
diacritic.
At least, that's the case for how it works within diatheke 4.7 distributed
with Xiphos 4.0.4
After the normalization step, it actually leaves in the following:
U+0345 ͅ COMBINING GREEK YPOGEGRAMMENI
This a
The one thing that the Module Team did that should not be done was to use the
filter UTF8GreekAccents to detect change and mistakenly conclude that a
module being prepared for release contained Greek.
This is a simple thing to stop doing, now that we understand in part why the
logic is flawed.
On 2017-02-21, 01:47 GMT, Troy A. Griffitts wrote:
> I would be concerned first that the module was properly encoded UTF-8.
It is properly encoded UTF-8. Sources are available at
https://gitlab.com/bible_sword/CzeKMS
Matěj
--
https://matej.ceplovi.cz/blog/, Jabber: mc...@ceplovi.cz
GPG
Well, hypothetically, we might be able to make a reasonable attempt to
teach the filter when to strip by determine which adjacent character an
accent might be modifying and conditionally strip or not strip, but
pragmatically, this filter is used to remove Greek accents while
searching Greek
Hypothetical: What about mixed language texts such as a Greek/French lexicon?
DM
> On Feb 21, 2017, at 4:56 PM, Troy A. Griffitts wrote:
>
>
> Simply don't use the UTF-8 Greek Accent filter on non-Greek texts. As you
> have discovered there are accents used in Greek
Simply don't use the UTF-8 Greek Accent filter on non-Greek texts. As you have
discovered there are accents used in Greek which are also used in other
languages and adverse effects will be seen for these languages. The bottom line
is simple. Only use the UTF-8 Greek Accents filter on UTF-8
These are the principal diacritics found in Biblical Greek that have to be
removed with a UTF8GreekAccents filter.
The first five are general accents, not particular to Greek.
It's on account of these that the filter should not be applied to non-Greek
text.
U+0300 ̀ COMBINING GRAVE ACCENT
A further idiosyncrasy of the UTF8GreekAccents filter that proves to be an
interesting clue:
It changes U+00BE VULGAR FRACTION THREE QUARTERS ¾ to ordinary 3/4.
Vulgar fractions are about as far as you can get from Koine Greek, nicht
wahr?
This is what I think this proves:
It must first
Further proof (specially for Peter)
As far as I know, Luther wasn't Greek.
A similar experiment with module GerLut1545 showed that all the umlauts are
removed by the UTF8GreekAccents filter.
diff B
S:/Export/GerLut1545/2014-01-17/GerLut1545.diatheke.character.frequency.txt
Let's hear it from some of the coders, please.
I'd be way out of my depth if I dived into C++ at my age.
My IT skills are better devoted to testing and reporting issues.
Best regards,
David
--
View this message in context:
aslam" <dfh...@googlemail.com>
> An: sword-devel@crosswire.org
> Betreff: Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek
> modules
>
> Hi Peter,
>
> I'm sure that your method should work correctly for the UTF-8 Arabic and
> Hebrew filters, since
Hi Peter,
I'm sure that your method should work correctly for the UTF-8 Arabic and
Hebrew filters, since AFAIK, and as per my limited amount of testing
yesterday, those filters are well behaved and have restricted scope.
If there'd been nothing wrong with UTF8GreekAccents as a filter, then your
aslam" <dfh...@googlemail.com>
> An: sword-devel@crosswire.org
> Betreff: Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek
> modules
>
> Hi Troy,
>
> Surely there's no doubt the module source text was correctly encoded as
> UTF-8 and normalised to NFC?
>
Further proof if this were even needed:
I temporarily added the Greek Accents filter to the conf file for the French
Bible module FreBBB.
Then I ran diatheke on the module.
(i.e. default, without any option "-oa" that would include Greek Accents).
It removed all Latin character diacritics:
Hi Troy,
Surely there's no doubt the module source text was correctly encoded as
UTF-8 and normalised to NFC?
We can examine the output of mod2imp and see that it is. Or am I missing
something?
mod2imp doesn't change the normalisation form, and I assume it doesn't
change the encoding either.
I would be concerned first that the module was properly encoded UTF-8.
On February 20, 2017 9:23:24 AM MST, David Haslam wrote:
>Although it wasn't appropriate to include the line in the configuration
>file,
>I observed that when the module option Greek Accents is unticked
OK - to save others doing the testing, these three filters are OK.
GlobalOptionFilter=UTF8ArabicPoints
GlobalOptionFilter=UTF8Cantillation
GlobalOptionFilter=UTF8HebrewPoints
i.e They don't change the default diatheke output of the CzeCEP module if
they are added to the conf file.
It's only the
Although it wasn't appropriate to include the line in the configuration file,
I observed that when the module option Greek Accents is unticked in Xiphos
for the module CzeCEP, it played havoc with the displayed Czech text.
This would seem to suggest that the option filter UTF8GreekAccents is too
41 matches
Mail list logo