Re: possible problem with JNI GetStringUTFChars

2019-01-30 Thread Peter Levart
Hi Alan, On 1/26/19 8:38 PM, Alan Snyder wrote: My usage of GetStringUTFChars was to pass a String as a parameter to a system call that takes a NUL-terminated UTF-8 string (a file path). Obviously, the system call does not accept strings containing NUL. I suspect this is a common case. There

Re: possible problem with JNI GetStringUTFChars

2019-01-30 Thread Martin Buchholz
Recommendations: - open source the moribund jni spec, perhaps as part of openjdk, so that people can improve it - Add Scare doc to anything that deals with Modified UTF-8 - Add a Charset so that Java code can explicitly encode to Modified UTF-8, despite being a bug magnet. - AFAIK the "jnu" utility

Re: possible problem with JNI GetStringUTFChars

2019-01-29 Thread Alan Snyder
I don’t think that is correct. There are issues with whether and how file names are normalized when stored in a directory, and the answers are file system dependent. File system lookups are described as normalization-insensitive. I’m not an expert, though, and it is hard to find decent up-to-da

Re: possible problem with JNI GetStringUTFChars

2019-01-29 Thread Stuart Marks
In case you missed my previous message, there is a use case for file paths using macOS APIs. Hm, Martin had mentioned that macOS uses something more restrictive than UTF-8. It seems to me that a filesystem-specific encoding is called for here. If you search the JDK repo for GetStringUTFChars

Re: possible problem with JNI GetStringUTFChars

2019-01-28 Thread Alan Snyder
If you search the JDK repo for GetStringUTFChars, you will find several uses that do not appear to involve serialization or data input/output. It is not obvious whether these uses are correct or not. Consider test/jdk/java/nio/channels/FileChannel/directio/libDirectIO.c where GetStringUTFChars

Re: possible problem with JNI GetStringUTFChars

2019-01-28 Thread Alan Snyder
In case you missed my previous message, there is a use case for file paths using macOS APIs. Alan > On Jan 28, 2019, at 2:10 PM, Stuart Marks wrote: > > I think it would be far too troublesome to try to migrate the JNI methods to > process real UTF-8 instead of modified UTF-8. That raises t

Re: possible problem with JNI GetStringUTFChars

2019-01-28 Thread Stuart Marks
On 1/26/19 3:19 PM, David Holmes wrote: On 27/01/2019 3:08 am, Martin Buchholz wrote: It's a pet peeve that the name GetStringUTFChars is deeply misleading - there are many "UTF"s, and this encoding is meant for use with the JVM only.  The documentation should make it clearer that this is NOT

Re: possible problem with JNI GetStringUTFChars

2019-01-26 Thread David Holmes
On 27/01/2019 3:08 am, Martin Buchholz wrote: It's a pet peeve that the name GetStringUTFChars is deeply misleading - there are many "UTF"s, and this encoding is meant for use with the JVM only. The documentation should make it clearer that this is NOT the UTF-8 you might expect. It does! Get

Re: possible problem with JNI GetStringUTFChars

2019-01-26 Thread Martin Buchholz
On Sat, Jan 26, 2019 at 11:39 AM Alan Snyder wrote: > > Therefore, my needs would be met by a (new) primitive that returns UTF-8 > and fails if the String contains NUL. > Maybe your programs are always running in a UTF-8 locale and maybe most computing environments have migrated to UTF-8 (good!)

Re: possible problem with JNI GetStringUTFChars

2019-01-26 Thread Alan Snyder
My usage of GetStringUTFChars was to pass a String as a parameter to a system call that takes a NUL-terminated UTF-8 string (a file path). Obviously, the system call does not accept strings containing NUL. I suspect this is a common case. Therefore, my needs would be met by a (new) primitive th

Re: possible problem with JNI GetStringUTFChars

2019-01-26 Thread Martin Buchholz
It's a pet peeve that the name GetStringUTFChars is deeply misleading - there are many "UTF"s, and this encoding is meant for use with the JVM only. The documentation should make it clearer that this is NOT the UTF-8 you might expect. > >

Re: possible problem with JNI GetStringUTFChars

2019-01-26 Thread Claes Redestad
Modified UTF-8 goes way back in terms of internal use in java and its JVMs. It's the format used to store strings in class-files, and used as an internal representation in the HotSpot VM: various internal string tables, constant pools etc. So any Java code that interacts with the VM needs to know

Re: possible problem with JNI GetStringUTFChars

2019-01-25 Thread Alan Snyder
The reason to change is that returning UTF-8 is useful and returning “modified UTF-8” is apparently not (as no one has explained why it is useful). Why not deprecate it? It would be nice to get a warning. Alan > On Jan 25, 2019, at 6:40 PM, David Holmes wrote: > > On 26/01/2019 3:29 am, A

Re: possible problem with JNI GetStringUTFChars

2019-01-25 Thread David Holmes
On 26/01/2019 3:29 am, Alan Snyder wrote: My question was not about why it does what it does, but why it still does that. Is there a valid use of this primitive that depends upon it returning something other than true UTF-8? It still does what it does because that was how it was specified 20+

Re: possible problem with JNI GetStringUTFChars

2019-01-25 Thread Alan Snyder
My question was not about why it does what it does, but why it still does that. Is there a valid use of this primitive that depends upon it returning something other than true UTF-8? It may not have been an issue to you, but it was to me when I discovered my program could not handle certain fil

Re: possible problem with JNI GetStringUTFChars

2019-01-24 Thread David Holmes
On 25/01/2019 4:39 am, Alan Snyder wrote: Thank you. That post does explain what is happening, but leaves open the question of whether GetStringUTFChars should be changed. What is the value of the current implementation of GetStringUTFChars versus one that returns true UTF-8? Well that's rea

Re: possible problem with JNI GetStringUTFChars

2019-01-24 Thread Alan Snyder
Thank you. That post does explain what is happening, but leaves open the question of whether GetStringUTFChars should be changed. What is the value of the current implementation of GetStringUTFChars versus one that returns true UTF-8? Alan > On Jan 24, 2019, at 10:32 AM, Claes Redestad

Re: possible problem with JNI GetStringUTFChars

2019-01-24 Thread Claes Redestad
Hi Alan, GetStringUTFChars unfortunately doesn't give you true UTF-8, but a modified UTF-8 sequence as used by the VM internally for historical reasons. See answers to this related question on SO (which contains links to official docs): https://stackoverflow.com/questions/32205446/getting-tr