Good point! 2015-06-09 14:11 GMT+02:00 Keegan Witt <[email protected]>:
> That's only available in Java 7. Isn't Groovy still targeting 1.6 for the > non-indy version? > > -Keegan > On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <[email protected]> wrote: > >> Well spotted! >> >> You could also compare with the StandardCharset, instead of going through >> the name comparison: >> >> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html >> >> 2015-06-09 13:49 GMT+02:00 Keegan Witt <[email protected]>: >> >>> No, it's a Groovy bug. >>> >>> private static void writeUTF16BomIfRequired(final String charset, final >>> OutputStream stream) throws IOException { >>> if ("UTF-16BE".equals(charset)) { >>> writeUtf16Bom(stream, true); >>> } else if ("UTF-16LE".equals(charset)) { >>> writeUtf16Bom(stream, false); >>> } >>> } >>> >>> should be >>> >>> private static void writeUTF16BomIfRequired(final String charset, final >>> OutputStream stream) throws IOException { >>> if ("UTF-16BE".equals(Charset.forName(charset).name())) { >>> writeUtf16Bom(stream, true); >>> } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { >>> writeUtf16Bom(stream, false); >>> } >>> } >>> >>> in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll probably >>> want to fix that regardless of what we decide on the *withPrintWriter* >>> question. I'll open a Jira and a PR. >>> >>> -Keegan >>> >>> >>> >>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <[email protected]> >>> wrote: >>> >>>> From Groovy's point of view (ie. when you're coding in Groovy), the BOM >>>> is automatically discarded when you use one of our reader methods >>>> (withReader, etc), so it's transparent whether the BOM is here or not. >>>> >>>> I tend to think that having the BOM always is a good thing (I even >>>> thought that was mandatory), but Groovy should guess the endianness >>>> regardless anyway. >>>> >>>> Happy to hear what others think too about all this though. >>>> >>>> Guillaume >>>> >>>> >>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <[email protected]>: >>>> >>>>> The code as-is today writes the BOM regardless of platform. I just >>>>> tested in Linux with the same results. I think there are 2 parts to the >>>>> question of "what's the correct behavior?" >>>>> >>>>> 1. Should the BOM be written at all, particularly when the platform >>>>> is Windows? >>>>> 2. Should the behavior of *withPrintWriter* differ (even if the >>>>> difference is to be smarter) from the behavior of *new PrintWriter*? >>>>> >>>>> *Discussion* >>>>> 1. Strictly speaking, yes. Because RFC 2781 >>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume >>>>> big endian if there is no BOM. However, in practice, many applications >>>>> disregard the RFC and assume little-endian because that's what Windows >>>>> does >>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>>>> Because of this, the behavior could be changed so that when writing >>>>> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, it's >>>>> best practice to always write a BOM when working with UTF-16, and Java >>>>> should have done this in their implementation of their PrintWriter. >>>>> >>>>> 2. This is a tough one. Arguably, *withPrintWriter* is doing the >>>>> smarter, more correct behavior, but the typical user would assume this is >>>>> just a shorthand convenience for newing up a PrintWriter (I certainly >>>>> did). So the question is, is it better to just document this difference >>>>> in >>>>> the GroovyDoc? Or to change the behavior to be closer to Java? And if >>>>> the >>>>> latter, what breakages would that cause within Groovy itself? Making that >>>>> change could break folks in production, because they could rely on that >>>>> BOM >>>>> being there, in cases for example where the file is created on Windows, >>>>> but >>>>> then processed on Linux or when working with a third party library that is >>>>> more picky about the presence of a BOM. >>>>> >>>>> -Keegan >>>>> >>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <[email protected]> >>>>> wrote: >>>>> >>>>>> Now... is it what should be done or not is the good question to ask >>>>>> :-) >>>>>> Does Windows manages to open UTF-16 files without BOMs? >>>>>> >>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <[email protected]>: >>>>>> >>>>>>> I forgot to mention that. Yes, I ran the test mentioned in Windows. >>>>>>> >>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> That's a good question. >>>>>>>> I guess this is happening on Windows? (I haven't tried here, since >>>>>>>> I'm on OS X) >>>>>>>> I think BOMs were mandatory in text files on Windows. >>>>>>>> >>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <[email protected]>: >>>>>>>> >>>>>>>>> I've always taken a perverse pleasure in character encoding >>>>>>>>> problems. I was intrigued by this SO question >>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>>>> on >>>>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>>>> >>>>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new >>>>>>>>> PrintWriter(file, charset) does not. As demonstrated here: >>>>>>>>> >>>>>>>>> File file = new File("tmp.txt")try { >>>>>>>>> String text = " " >>>>>>>>> String charset = "UTF-16LE" >>>>>>>>> >>>>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>>>> println "withPrintWriter" >>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>>>> >>>>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>>>> w.print(text) >>>>>>>>> w.close() >>>>>>>>> println "\n\nnew PrintWriter" >>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) }} finally { >>>>>>>>> file.delete()} >>>>>>>>> >>>>>>>>> Outputs >>>>>>>>> >>>>>>>>> withPrintWriter >>>>>>>>> ff fe 20 00 >>>>>>>>> >>>>>>>>> new PrintWriter >>>>>>>>> 20 00 >>>>>>>>> >>>>>>>>> >>>>>>>>> Is this difference in behavior intentional? It seems kinda odd to >>>>>>>>> me. >>>>>>>>> >>>>>>>>> -Keegan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Guillaume Laforge >>>>>>>> Groovy Project Manager >>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>> >>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Guillaume Laforge >>>>>> Groovy Project Manager >>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>> >>>>>> Blog: http://glaforge.appspot.com/ >>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Guillaume Laforge >>>> Groovy Project Manager >>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>> >>>> Blog: http://glaforge.appspot.com/ >>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>> >>> >>> >> >> >> -- >> Guillaume Laforge >> Groovy Project Manager >> Product Ninja & Advocate at Restlet <http://restlet.com> >> >> Blog: http://glaforge.appspot.com/ >> Social: @glaforge <http://twitter.com/glaforge> / Google+ >> <https://plus.google.com/u/0/114130972232398734985/posts> >> > -- Guillaume Laforge Groovy Project Manager Product Ninja & Advocate at Restlet <http://restlet.com> Blog: http://glaforge.appspot.com/ Social: @glaforge <http://twitter.com/glaforge> / Google+ <https://plus.google.com/u/0/114130972232398734985/posts>
