I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to correct the JavaDoc I mentioned (as well as to document the existing behavior for the non-NIO methods).
Java doesn't eat the BOM, but this is a problem Java folks are used to dealing with, and why things like Apache Common-IO's BOMInputStream <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html> exist. -Keegan On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <glafo...@gmail.com> wrote: > So now, how to decide what's best? :-) > > Is a Java reader happy with the BOM? and eats it transparently? (I think > in the past that wasn't the case but I may be wrong) > > 2015-06-09 17:21 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: > >> That's an excellent point, Paolo. NioGroovyMethods.newWriter claims (in >> the JavaDoc) it will write the BOM if needed, but it doesn't because it >> uses Java's implementation rather than with Groovy's >> writeUTF16BomIfRequired. None of the methods in NioGroovyMethods use >> writeUTF16BomIfRequired. >> >> Whichever we decide, we should be consistent. >> >> -Keegan >> >> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso < >> paolo.ditomm...@gmail.com> wrote: >> >>> I'm wondering if NioGroovyMethods that implement the write methods for >>> Path should do the same. >>> >>> >>> Cheers, >>> Paolo >>> >>> >>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganw...@gmail.com> >>> wrote: >>> >>>> Cool. I'll wait for PR 36 to be merged first, because I also was >>>> thinking the Javadoc would be changed from >>>> is "UTF-16BE" or "UTF-16LE" >>>> to >>>> is "UTF-16BE" or "UTF-16LE" (or an equivalent alias) >>>> >>>> -Keegan >>>> >>>> >>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glafo...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>> >>>>>> Created GROOVY-7461 >>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36 >>>>>> <https://github.com/apache/incubator-groovy/pull/36>. >>>>>> >>>>> >>>>> Cool! >>>>> >>>>> >>>>>> How would you feel about a PR to copy the Javadoc comment mentioning >>>>>> the UTF-16 BOM on File.newWriter to all the other methods that use >>>>>> writeUTF16BomIfRequired (at least until we decide we're going to >>>>>> change the current behavior)? >>>>>> >>>>> >>>>> Right, worth it! >>>>> >>>>> >>>>>> >>>>>> -Keegan >>>>>> >>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glafo...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Good point! >>>>>>> >>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>> >>>>>>>> That's only available in Java 7. Isn't Groovy still targeting 1.6 >>>>>>>> for the non-indy version? >>>>>>>> >>>>>>>> -Keegan >>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glafo...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Well spotted! >>>>>>>>> >>>>>>>>> You could also compare with the StandardCharset, instead of going >>>>>>>>> through the name comparison: >>>>>>>>> >>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html >>>>>>>>> >>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>> >>>>>>>>>> No, it's a Groovy bug. >>>>>>>>>> >>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, >>>>>>>>>> final OutputStream stream) throws IOException { >>>>>>>>>> if ("UTF-16BE".equals(charset)) { >>>>>>>>>> writeUtf16Bom(stream, true); >>>>>>>>>> } else if ("UTF-16LE".equals(charset)) { >>>>>>>>>> writeUtf16Bom(stream, false); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> should be >>>>>>>>>> >>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, >>>>>>>>>> final OutputStream stream) throws IOException { >>>>>>>>>> if ("UTF-16BE".equals(Charset.forName(charset).name())) { >>>>>>>>>> writeUtf16Bom(stream, true); >>>>>>>>>> } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { >>>>>>>>>> writeUtf16Bom(stream, false); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll >>>>>>>>>> probably want to fix that regardless of what we decide on the >>>>>>>>>> *withPrintWriter* question. I'll open a Jira and a PR. >>>>>>>>>> >>>>>>>>>> -Keegan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge < >>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), >>>>>>>>>>> the BOM is automatically discarded when you use one of our reader >>>>>>>>>>> methods >>>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or >>>>>>>>>>> not. >>>>>>>>>>> >>>>>>>>>>> I tend to think that having the BOM always is a good thing (I >>>>>>>>>>> even thought that was mandatory), but Groovy should guess the >>>>>>>>>>> endianness >>>>>>>>>>> regardless anyway. >>>>>>>>>>> >>>>>>>>>>> Happy to hear what others think too about all this though. >>>>>>>>>>> >>>>>>>>>>> Guillaume >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>>> >>>>>>>>>>>> The code as-is today writes the BOM regardless of platform. I >>>>>>>>>>>> just tested in Linux with the same results. I think there are 2 >>>>>>>>>>>> parts to >>>>>>>>>>>> the question of "what's the correct behavior?" >>>>>>>>>>>> >>>>>>>>>>>> 1. Should the BOM be written at all, particularly when the >>>>>>>>>>>> platform is Windows? >>>>>>>>>>>> 2. Should the behavior of *withPrintWriter* differ (even if >>>>>>>>>>>> the difference is to be smarter) from the behavior of *new >>>>>>>>>>>> PrintWriter*? >>>>>>>>>>>> >>>>>>>>>>>> *Discussion* >>>>>>>>>>>> 1. Strictly speaking, yes. Because RFC 2781 >>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to >>>>>>>>>>>> assume big endian if there is no BOM. However, in practice, many >>>>>>>>>>>> applications disregard the RFC and assume little-endian because >>>>>>>>>>>> that's what Windows >>>>>>>>>>>> does >>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>>>>>>>>>>> Because of this, the behavior could be changed so that when writing >>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, >>>>>>>>>>>> it's >>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and >>>>>>>>>>>> Java >>>>>>>>>>>> should have done this in their implementation of their PrintWriter. >>>>>>>>>>>> >>>>>>>>>>>> 2. This is a tough one. Arguably, *withPrintWriter* is doing >>>>>>>>>>>> the smarter, more correct behavior, but the typical user would >>>>>>>>>>>> assume this >>>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I >>>>>>>>>>>> certainly >>>>>>>>>>>> did). So the question is, is it better to just document this >>>>>>>>>>>> difference in >>>>>>>>>>>> the GroovyDoc? Or to change the behavior to be closer to Java? >>>>>>>>>>>> And if the >>>>>>>>>>>> latter, what breakages would that cause within Groovy itself? >>>>>>>>>>>> Making that >>>>>>>>>>>> change could break folks in production, because they could rely on >>>>>>>>>>>> that BOM >>>>>>>>>>>> being there, in cases for example where the file is created on >>>>>>>>>>>> Windows, but >>>>>>>>>>>> then processed on Linux or when working with a third party library >>>>>>>>>>>> that is >>>>>>>>>>>> more picky about the presence of a BOM. >>>>>>>>>>>> >>>>>>>>>>>> -Keegan >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge < >>>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Now... is it what should be done or not is the good question >>>>>>>>>>>>> to ask :-) >>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs? >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>>>>> >>>>>>>>>>>>>> I forgot to mention that. Yes, I ran the test mentioned in >>>>>>>>>>>>>> Windows. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge < >>>>>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> That's a good question. >>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here, >>>>>>>>>>>>>>> since I'm on OS X) >>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com >>>>>>>>>>>>>>> >: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding >>>>>>>>>>>>>>>> problems. I was intrigued by this SO question >>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM >>>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not. As >>>>>>>>>>>>>>>> demonstrated here: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> File file = new File("tmp.txt")try { >>>>>>>>>>>>>>>> String text = " " >>>>>>>>>>>>>>>> String charset = "UTF-16LE" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>>>>>>>>>>> println "withPrintWriter" >>>>>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>>>>>>>>>>> w.print(text) >>>>>>>>>>>>>>>> w.close() >>>>>>>>>>>>>>>> println "\n\nnew PrintWriter" >>>>>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) }} >>>>>>>>>>>>>>>> finally { >>>>>>>>>>>>>>>> file.delete()} >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Outputs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> withPrintWriter >>>>>>>>>>>>>>>> ff fe 20 00 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> new PrintWriter >>>>>>>>>>>>>>>> 20 00 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is this difference in behavior intentional? It seems kinda >>>>>>>>>>>>>>>> odd to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Keegan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>>> >>>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Guillaume Laforge >>>>>>>>>>> Groovy Project Manager >>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>> >>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Guillaume Laforge >>>>>>>>> Groovy Project Manager >>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>> >>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Guillaume Laforge >>>>>>> Groovy Project Manager >>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>> >>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Guillaume Laforge >>>>> Groovy Project Manager >>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>> >>>>> Blog: http://glaforge.appspot.com/ >>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>> >>>> >>>> >>> >> > > > -- > Guillaume Laforge > Groovy Project Manager > Product Ninja & Advocate at Restlet <http://restlet.com> > > Blog: http://glaforge.appspot.com/ > Social: @glaforge <http://twitter.com/glaforge> / Google+ > <https://plus.google.com/u/0/114130972232398734985/posts> >