Thanks David and Troy, What is happening is - my script tests for presence of Greek accents by doing a before-and-after comparison using a Greek accent strip filter. This works beautifully for the Hebrew stuff - vowels and breathing marks. It should work for the Greek accent filter. It does not.
The script is under sword-tools/modules/conf/confmaker.pl. Right now the Greek accents' option has been commented out, so please have a look at the version svn-head-1. I do not think I use the filter wrong in my script, though of course I am keen to hear about any mistakes in my use. I have noted this a year or two ago and made a remark on the mailing list. I simply left my script as it was as it seemed correct and the problem was with the library to the best of my understanding. Peter > Gesendet: Dienstag, 21. Februar 2017 um 09:04 Uhr > Von: "David Haslam" <dfh...@googlemail.com> > An: sword-devel@crosswire.org > Betreff: Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek > modules > > Hi Troy, > > Surely there's no doubt the module source text was correctly encoded as > UTF-8 and normalised to NFC? > > We can examine the output of mod2imp and see that it is. Or am I missing > something? > > mod2imp doesn't change the normalisation form, and I assume it doesn't > change the encoding either. > > CzeCEP is not only recent module to which the script has added the > GlobalOptionFilter=UTF8GreekAccents. > FinRK was released yesterday and suffers the same issue. > > What I think has happened is this: > > The Greek Accents filter was probably never adequately beta tested. > > It was accepted after only being alpha tested, to see that it does remove > Greek accents from Greek text that has some. > > Nobody thought to check whether it did anything untoward on the UTF-8 > encoded text in a variety of non-Greek scripts. The bug has gone undetected > until yesterday. It's either a very old bug, or a library has changed > without anyone noticing. > > I understand that the Module Team's script does the following as part of the > automation to build the module conf file: > > It applies this filter, checks for change, then adds the filter line to the > conf file if a change was detected. > > Knowing this, it's not hard to see how we have ended up with a spurious > Greek Accents filter in some recently released modules, is it? > > The mopping up containment action is to determine how many modules have been > released with the spurious filter in the configuration file? These must each > be corrected by removing the line, updating the version and date, and > releasing the update. > > The permanent solution should be to find out exactly how this filter works > in detail, and rewrite it if necessary. That would require an update to > SWORD as a significant bug fix. > > The most recent mention of this filter in SWORD releases was under 1.5.10 > dated 20-Nov-2006 in which you added a further Greek accent. In fact, that's > the only explicit mention. The string "utf8" appears earlier a few times, > but in a more general sense. > > NB. Using diatheke version 4.7, I have thoroughly tested CzeCEP for the > four other UTF8 filters. Only GreekAccents is delinquent. > > Best regards, > > David > > PS. If only CrossWire had a "bug bounty" scheme.... Ah, but we're a > "non-income" organization. > Looking only to the heavenly reward, and the fruit of the Gospel here in > earth. :) > > > > > > -- > View this message in context: > http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656729.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page