I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to
correct the JavaDoc I mentioned (as well as to document the existing
behavior for the non-NIO methods).

Java doesn't eat the BOM, but this is a problem Java folks are used to
dealing with, and why things like Apache Common-IO's BOMInputStream
<https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
exist.

-Keegan

On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <glafo...@gmail.com>
wrote:

> So now, how to decide what's best? :-)
>
> Is a Java reader happy with the BOM? and eats it transparently? (I think
> in the past that wasn't the case but I may be wrong)
>
> 2015-06-09 17:21 GMT+02:00 Keegan Witt <keeganw...@gmail.com>:
>
>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims (in
>> the JavaDoc) it will write the BOM if needed, but it doesn't because it
>> uses Java's implementation rather than with Groovy's
>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
>> writeUTF16BomIfRequired.
>>
>> Whichever we decide, we should be consistent.
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>> paolo.ditomm...@gmail.com> wrote:
>>
>>> I'm wondering if NioGroovyMethods that implement the write methods for
>>> Path should do the same.
>>>
>>>
>>> Cheers,
>>> Paolo
>>>
>>>
>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganw...@gmail.com>
>>> wrote:
>>>
>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>> thinking the Javadoc would be changed from
>>>>     is "UTF-16BE" or "UTF-16LE"
>>>> to
>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>
>>>> -Keegan
>>>>
>>>>
>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glafo...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganw...@gmail.com>:
>>>>>
>>>>>> Created GROOVY-7461
>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>
>>>>>
>>>>> Cool!
>>>>>
>>>>>
>>>>>> How would you feel about a PR to copy the Javadoc comment mentioning
>>>>>> the UTF-16 BOM on File.newWriter to all the other methods that use
>>>>>> writeUTF16BomIfRequired (at least until we decide we're going to
>>>>>> change the current behavior)?
>>>>>>
>>>>>
>>>>> Right, worth it!
>>>>>
>>>>>
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glafo...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Good point!
>>>>>>>
>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganw...@gmail.com>:
>>>>>>>
>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6
>>>>>>>> for the non-indy version?
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glafo...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Well spotted!
>>>>>>>>>
>>>>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>>>>> through the name comparison:
>>>>>>>>>
>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>
>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>
>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, 
>>>>>>>>>> final OutputStream stream) throws IOException {
>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>     }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> should be
>>>>>>>>>>
>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, 
>>>>>>>>>> final OutputStream stream) throws IOException {
>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>     }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>> glafo...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy),
>>>>>>>>>>> the BOM is automatically discarded when you use one of our reader 
>>>>>>>>>>> methods
>>>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or 
>>>>>>>>>>> not.
>>>>>>>>>>>
>>>>>>>>>>> I tend to think that having the BOM always is a good thing (I
>>>>>>>>>>> even thought that was mandatory), but Groovy should guess the 
>>>>>>>>>>> endianness
>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>
>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>
>>>>>>>>>>> Guillaume
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> The code as-is today writes the BOM regardless of platform.  I
>>>>>>>>>>>> just tested in Linux with the same results.  I think there are 2 
>>>>>>>>>>>> parts to
>>>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>>>
>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if
>>>>>>>>>>>> the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>
>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>> applications disregard the RFC and assume little-endian because 
>>>>>>>>>>>> that's what Windows
>>>>>>>>>>>> does
>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, 
>>>>>>>>>>>> it's
>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and 
>>>>>>>>>>>> Java
>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>
>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing
>>>>>>>>>>>> the smarter, more correct behavior, but the typical user would 
>>>>>>>>>>>> assume this
>>>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I 
>>>>>>>>>>>> certainly
>>>>>>>>>>>> did).  So the question is, is it better to just document this 
>>>>>>>>>>>> difference in
>>>>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  
>>>>>>>>>>>> And if the
>>>>>>>>>>>> latter, what breakages would that cause within Groovy itself?  
>>>>>>>>>>>> Making that
>>>>>>>>>>>> change could break folks in production, because they could rely on 
>>>>>>>>>>>> that BOM
>>>>>>>>>>>> being there, in cases for example where the file is created on 
>>>>>>>>>>>> Windows, but
>>>>>>>>>>>> then processed on Linux or when working with a third party library 
>>>>>>>>>>>> that is
>>>>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>> glafo...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Now... is it what should be done or not is the good question
>>>>>>>>>>>>> to ask :-)
>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>> glafo...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
>>>>>>>>>>>>>>>>  on
>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} 
>>>>>>>>>>>>>>>> finally {
>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda
>>>>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Reply via email to