Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-11 Thread Tomáš Bořil
I do not blame anybody and I do have a huge respect to all authors of R. Actually, I like R very much and I would like to thank to everyone who contributes to it. I use R regularly in my work (moved from Java, C# and Matlab), I have created a package rPraat for phonetic analyses and I think R is a

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-11 Thread Tomas Kalibera
On 4/11/19 9:10 AM, Tomáš Bořil wrote: Or, if this cannot be done easily, please, disable the "utf-8" value in source(..., ) function on Windows R. source(..., encoding = "utf-8") -> error: "utf-8" does not work right on Windows. -> (or, at least) warning: "utf-8" is handled by "best fit" on

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-11 Thread Tomáš Bořil
Or, if this cannot be done easily, please, disable the "utf-8" value in source(..., ) function on Windows R. source(..., encoding = "utf-8") -> error: "utf-8" does not work right on Windows. -> (or, at least) warning: "utf-8" is handled by "best fit" on Windows and some characters in string

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-11 Thread Tomáš Bořil
For me, this would be a perfect solution. I.e., do not use the “best” fit and leave it to user’s competence: a) in some functions, utf-8 works b) in others -> error is thrown (e.g., incomplete string, NA, etc.) => user has to change the code with his/her intentional “best fit string literal

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-11 Thread Tomas Kalibera
On 4/10/19 6:32 PM, Jeroen Ooms wrote: On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch wrote: On 10/04/2019 10:29 a.m., Yihui Xie wrote: Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package authors to

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-11 Thread Tomas Kalibera
On 4/10/19 6:13 PM, Tomáš Bořil wrote: An optional parameter to source() function which would translate all UTF-8 characters in string literals to their "\U" codes sounds as a great idea (and I hope it would fix 99.9% of problems I have - because that is the way I overcome these problems

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Duncan Murdoch
On 10/04/2019 12:32 p.m., Jeroen Ooms wrote: On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch wrote: On 10/04/2019 10:29 a.m., Yihui Xie wrote: Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Jeroen Ooms
On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch wrote: > > On 10/04/2019 10:29 a.m., Yihui Xie wrote: > > Since it is "technically easy" to disable the best fit conversion and > > the best fit is rarely good, how about providing an option for > > code/package authors to disable it? I'm asking

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomáš Bořil
Yes, again in a script sourced by source(encoding = ...). But also by typing it directly in R console. Most of the time, I use RStudio as a front-end. For this experiment, I also verified it in Rgui. In both front-ends, it behaves completely in the same way. An optional parameter to source()

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Duncan Murdoch
On 10/04/2019 10:29 a.m., Yihui Xie wrote: Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package authors to disable it? I'm asking because this is one of the most painful issues in packages that may need

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Yihui Xie
Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package authors to disable it? I'm asking because this is one of the most painful issues in packages that may need to source() code containing UTF-8 characters

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomas Kalibera
On 4/10/19 1:14 PM, Jeroen Ooms wrote: On Wed, Apr 10, 2019 at 12:19 PM Tomáš Bořil wrote: Minimalistic example: Let's type "ř" (LATIN SMALL LETTER R WITH CARON) in RGui console: "ř" [1] "r" Although the script is in UTF-8, the characters are replaced by "simplified" substitutes

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Jeroen Ooms
On Wed, Apr 10, 2019 at 12:19 PM Tomáš Bořil wrote: > > Minimalistic example: > Let's type "ř" (LATIN SMALL LETTER R WITH CARON) in RGui console: > > "ř" > [1] "r" > > Although the script is in UTF-8, the characters are replaced by > "simplified" substitutes uncontrollably (depending on OS

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomas Kalibera
On 4/10/19 10:22 AM, Tomáš Bořil wrote: > Hello, > > There is a long-lasting problem with processing UTF-8 source code in R > on Windows OS. As Windows do not have "UTF-8" locale and R passes > source code through OS before executing it, some characters are > "simplified" by the OS before

[Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomáš Bořil
Hello, There is a long-lasting problem with processing UTF-8 source code in R on Windows OS. As Windows do not have "UTF-8" locale and R passes source code through OS before executing it, some characters are "simplified" by the OS before processing, leading to undesirable changes. Minimalistic