RE: Removing accents and diacritics from a word

Sławomir Osipiuk via Unicode Wed, 17 Jul 2019 11:27:49 -0700

“Transliteration”?

Maybe more generic that what you’re looking for. Used for the process of 
producing the “machine readable zone” on passports:


https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf (see section 6, 
page 30)

 

“Accent folding” or “diacritic folding” is used in some places. String folding 
is “A string transform F, with the property that repeated applications of the 
same function F produce the same output: F(F(S)) = F(S) for all input strings 
S”. Accent folding is a special case of that.

https://unicode.org/reports/tr23/#StringFunctionClassificationDefinitions

https://alistapart.com/article/accent-folding-for-auto-complete/

 

 

From: Unicode [mailto:[email protected]] On Behalf Of Asmus Freytag 
via Unicode
Sent: Wednesday, July 17, 2019 13:38
To: Unicode Mailing List
Subject: Removing accents and diacritics from a word

 

A question has come up in another context:

Is there any linguistic term for describing the process of removing accents and 
diacritics from a word to create its “base form”, e.g. São Tomé to Sao Tome?

The linguistic term "string normalization" appears not that preferable in a 
computing context.

Any ideas?

A./

RE: Removing accents and diacritics from a word

Reply via email to