Hi Alan,
On 1/26/19 8:38 PM, Alan Snyder wrote:
My usage of GetStringUTFChars was to pass a String as a parameter to a system
call that takes a NUL-terminated UTF-8 string (a file path). Obviously, the
system call does not accept strings containing NUL. I suspect this is a common
case.
There
Recommendations:
- open source the moribund jni spec, perhaps as part of openjdk, so that
people can improve it
- Add Scare doc to anything that deals with Modified UTF-8
- Add a Charset so that Java code can explicitly encode to Modified UTF-8,
despite being a bug magnet.
- AFAIK the "jnu" utility
I don’t think that is correct.
There are issues with whether and how file names are normalized when stored in
a directory, and the answers are file system dependent.
File system lookups are described as normalization-insensitive.
I’m not an expert, though, and it is hard to find decent up-to-da
In case you missed my previous message, there is a use case for file paths
using macOS APIs.
Hm, Martin had mentioned that macOS uses something more restrictive than UTF-8.
It seems to me that a filesystem-specific encoding is called for here.
If you search the JDK repo for GetStringUTFChars
If you search the JDK repo for GetStringUTFChars, you will find several uses
that do not appear to involve serialization or data input/output.
It is not obvious whether these uses are correct or not.
Consider test/jdk/java/nio/channels/FileChannel/directio/libDirectIO.c
where GetStringUTFChars
In case you missed my previous message, there is a use case for file paths
using macOS APIs.
Alan
> On Jan 28, 2019, at 2:10 PM, Stuart Marks wrote:
>
> I think it would be far too troublesome to try to migrate the JNI methods to
> process real UTF-8 instead of modified UTF-8. That raises t
On 1/26/19 3:19 PM, David Holmes wrote:
On 27/01/2019 3:08 am, Martin Buchholz wrote:
It's a pet peeve that the name GetStringUTFChars is deeply misleading -
there are many "UTF"s, and this encoding is meant for use with the JVM
only. The documentation should make it clearer that this is NOT
On 27/01/2019 3:08 am, Martin Buchholz wrote:
It's a pet peeve that the name GetStringUTFChars is deeply misleading -
there are many "UTF"s, and this encoding is meant for use with the JVM
only. The documentation should make it clearer that this is NOT the UTF-8
you might expect.
It does!
Get
On Sat, Jan 26, 2019 at 11:39 AM Alan Snyder wrote:
>
> Therefore, my needs would be met by a (new) primitive that returns UTF-8
> and fails if the String contains NUL.
>
Maybe your programs are always running in a UTF-8 locale and maybe most
computing environments have migrated to UTF-8 (good!)
My usage of GetStringUTFChars was to pass a String as a parameter to a system
call that takes a NUL-terminated UTF-8 string (a file path). Obviously, the
system call does not accept strings containing NUL. I suspect this is a common
case.
Therefore, my needs would be met by a (new) primitive th
It's a pet peeve that the name GetStringUTFChars is deeply misleading -
there are many "UTF"s, and this encoding is meant for use with the JVM
only. The documentation should make it clearer that this is NOT the UTF-8
you might expect.
>
>
Modified UTF-8 goes way back in terms of internal use in java and its
JVMs. It's the format used to store strings in class-files, and used as
an internal representation in the HotSpot VM: various internal string
tables, constant pools etc.
So any Java code that interacts with the VM needs to know
The reason to change is that returning UTF-8 is useful and returning “modified
UTF-8” is apparently not (as no one has explained why it is useful).
Why not deprecate it?
It would be nice to get a warning.
Alan
> On Jan 25, 2019, at 6:40 PM, David Holmes wrote:
>
> On 26/01/2019 3:29 am, A
On 26/01/2019 3:29 am, Alan Snyder wrote:
My question was not about why it does what it does, but why it still does that.
Is there a valid use of this primitive that depends upon it returning something
other than true UTF-8?
It still does what it does because that was how it was specified 20+
My question was not about why it does what it does, but why it still does that.
Is there a valid use of this primitive that depends upon it returning something
other than true UTF-8?
It may not have been an issue to you, but it was to me when I discovered my
program could not handle certain fil
On 25/01/2019 4:39 am, Alan Snyder wrote:
Thank you. That post does explain what is happening, but leaves open the
question of whether GetStringUTFChars should be changed.
What is the value of the current implementation of GetStringUTFChars versus one
that returns true UTF-8?
Well that's rea
Thank you. That post does explain what is happening, but leaves open the
question of whether GetStringUTFChars should be changed.
What is the value of the current implementation of GetStringUTFChars versus one
that returns true UTF-8?
Alan
> On Jan 24, 2019, at 10:32 AM, Claes Redestad
Hi Alan,
GetStringUTFChars unfortunately doesn't give you true UTF-8, but a
modified UTF-8 sequence
as used by the VM internally for historical reasons.
See answers to this related question on SO (which contains links to
official docs):
https://stackoverflow.com/questions/32205446/getting-tr
18 matches
Mail list logo