Re: Batch conversion from "normal" Unicode text to NCRs

Giacomo Catenazzi via Unicode Tue, 12 Nov 2024 01:34:05 -0800

Hello,

On 2024-11-12 4:59, António MARTINS-Tuválkin via Unicode wrote:


I have been saying in the past couple decades that problems will vanish
if all files include only “ASCII characters”, by means of NCR escape
sequences, but some of the aforementioned individual editors seem unable
to ensure it, so a wholesale “conversion” is the intermediate step that
needs to be added to the workflow, before uploading.

I'm not sure NCR is the best way to go (also decades ago): it is just anumeric representation and not semantic (as with other HTML entities),and so finding problems may become difficult.

In my opinion, you should remove all NCR, else it would be a nightmareto check wrong encoding (maybe some of NCR where in Latin1, and some inUnicode, and the problem often we have double encoding). Also it makesdifficult to correct spell. And also it is more simple to handle forpeople with all kind of experience. (Now UTF-8 can be used with all tools).

So I would try to transform text as UTF-8 without NCR (web now isdefault UTF-8).

Then you can check files (and where) there is a bad encoding (atransformation with other UTF encodings should give warnings, justdiscard the output and check the warnings).

In my experience, one site has common patterns, so the NCR and "badcharacters" are limited on types, and you can use a text substitution(sed in Linux, macos, and I think various console tools in windowssupport it), or other more user friendly tools (see later). I find iteasy and quick. It is not general, but as I said: often a site hascommon patterns, not many languages, etc. So I usually go to quick anddirty which in this case is better than a perfect solution which canhandle all characters).

You may want to consider programmers or developers tools: Usually searchand replace can be done on a tree of directories, with visualconfirmation (e.g. jumping in the right file). They may also get batchencoding conversion: so possibly the best tools for such task, also ifwe do not program.


giacomo

Re: Batch conversion from "normal" Unicode text to NCRs

Reply via email to