So now, how to decide what's best? :-) Is a Java reader happy with the BOM? and eats it transparently? (I think in the past that wasn't the case but I may be wrong)
2015-06-09 17:21 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: > That's an excellent point, Paolo. NioGroovyMethods.newWriter claims (in > the JavaDoc) it will write the BOM if needed, but it doesn't because it > uses Java's implementation rather than with Groovy's > writeUTF16BomIfRequired. None of the methods in NioGroovyMethods use > writeUTF16BomIfRequired. > > Whichever we decide, we should be consistent. > > -Keegan > > On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso < > paolo.ditomm...@gmail.com> wrote: > >> I'm wondering if NioGroovyMethods that implement the write methods for >> Path should do the same. >> >> >> Cheers, >> Paolo >> >> >> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganw...@gmail.com> wrote: >> >>> Cool. I'll wait for PR 36 to be merged first, because I also was >>> thinking the Javadoc would be changed from >>> is "UTF-16BE" or "UTF-16LE" >>> to >>> is "UTF-16BE" or "UTF-16LE" (or an equivalent alias) >>> >>> -Keegan >>> >>> >>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glafo...@gmail.com> >>> wrote: >>> >>>> >>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>> >>>>> Created GROOVY-7461 >>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36 >>>>> <https://github.com/apache/incubator-groovy/pull/36>. >>>>> >>>> >>>> Cool! >>>> >>>> >>>>> How would you feel about a PR to copy the Javadoc comment mentioning >>>>> the UTF-16 BOM on File.newWriter to all the other methods that use >>>>> writeUTF16BomIfRequired (at least until we decide we're going to >>>>> change the current behavior)? >>>>> >>>> >>>> Right, worth it! >>>> >>>> >>>>> >>>>> -Keegan >>>>> >>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glafo...@gmail.com> >>>>> wrote: >>>>> >>>>>> Good point! >>>>>> >>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>> >>>>>>> That's only available in Java 7. Isn't Groovy still targeting 1.6 >>>>>>> for the non-indy version? >>>>>>> >>>>>>> -Keegan >>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glafo...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Well spotted! >>>>>>>> >>>>>>>> You could also compare with the StandardCharset, instead of going >>>>>>>> through the name comparison: >>>>>>>> >>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html >>>>>>>> >>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>> >>>>>>>>> No, it's a Groovy bug. >>>>>>>>> >>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, >>>>>>>>> final OutputStream stream) throws IOException { >>>>>>>>> if ("UTF-16BE".equals(charset)) { >>>>>>>>> writeUtf16Bom(stream, true); >>>>>>>>> } else if ("UTF-16LE".equals(charset)) { >>>>>>>>> writeUtf16Bom(stream, false); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> should be >>>>>>>>> >>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, >>>>>>>>> final OutputStream stream) throws IOException { >>>>>>>>> if ("UTF-16BE".equals(Charset.forName(charset).name())) { >>>>>>>>> writeUtf16Bom(stream, true); >>>>>>>>> } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { >>>>>>>>> writeUtf16Bom(stream, false); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll >>>>>>>>> probably want to fix that regardless of what we decide on the >>>>>>>>> *withPrintWriter* question. I'll open a Jira and a PR. >>>>>>>>> >>>>>>>>> -Keegan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge < >>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), >>>>>>>>>> the BOM is automatically discarded when you use one of our reader >>>>>>>>>> methods >>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or >>>>>>>>>> not. >>>>>>>>>> >>>>>>>>>> I tend to think that having the BOM always is a good thing (I >>>>>>>>>> even thought that was mandatory), but Groovy should guess the >>>>>>>>>> endianness >>>>>>>>>> regardless anyway. >>>>>>>>>> >>>>>>>>>> Happy to hear what others think too about all this though. >>>>>>>>>> >>>>>>>>>> Guillaume >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> The code as-is today writes the BOM regardless of platform. I >>>>>>>>>>> just tested in Linux with the same results. I think there are 2 >>>>>>>>>>> parts to >>>>>>>>>>> the question of "what's the correct behavior?" >>>>>>>>>>> >>>>>>>>>>> 1. Should the BOM be written at all, particularly when the >>>>>>>>>>> platform is Windows? >>>>>>>>>>> 2. Should the behavior of *withPrintWriter* differ (even if >>>>>>>>>>> the difference is to be smarter) from the behavior of *new >>>>>>>>>>> PrintWriter*? >>>>>>>>>>> >>>>>>>>>>> *Discussion* >>>>>>>>>>> 1. Strictly speaking, yes. Because RFC 2781 >>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to >>>>>>>>>>> assume big endian if there is no BOM. However, in practice, many >>>>>>>>>>> applications disregard the RFC and assume little-endian because >>>>>>>>>>> that's what Windows >>>>>>>>>>> does >>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>>>>>>>>>> Because of this, the behavior could be changed so that when writing >>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, >>>>>>>>>>> it's >>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and >>>>>>>>>>> Java >>>>>>>>>>> should have done this in their implementation of their PrintWriter. >>>>>>>>>>> >>>>>>>>>>> 2. This is a tough one. Arguably, *withPrintWriter* is doing >>>>>>>>>>> the smarter, more correct behavior, but the typical user would >>>>>>>>>>> assume this >>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I >>>>>>>>>>> certainly >>>>>>>>>>> did). So the question is, is it better to just document this >>>>>>>>>>> difference in >>>>>>>>>>> the GroovyDoc? Or to change the behavior to be closer to Java? >>>>>>>>>>> And if the >>>>>>>>>>> latter, what breakages would that cause within Groovy itself? >>>>>>>>>>> Making that >>>>>>>>>>> change could break folks in production, because they could rely on >>>>>>>>>>> that BOM >>>>>>>>>>> being there, in cases for example where the file is created on >>>>>>>>>>> Windows, but >>>>>>>>>>> then processed on Linux or when working with a third party library >>>>>>>>>>> that is >>>>>>>>>>> more picky about the presence of a BOM. >>>>>>>>>>> >>>>>>>>>>> -Keegan >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge < >>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Now... is it what should be done or not is the good question to >>>>>>>>>>>> ask :-) >>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs? >>>>>>>>>>>> >>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>>>> >>>>>>>>>>>>> I forgot to mention that. Yes, I ran the test mentioned in >>>>>>>>>>>>> Windows. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge < >>>>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> That's a good question. >>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here, >>>>>>>>>>>>>> since I'm on OS X) >>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com> >>>>>>>>>>>>>> : >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding >>>>>>>>>>>>>>> problems. I was intrigued by this SO question >>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM >>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not. As >>>>>>>>>>>>>>> demonstrated here: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> File file = new File("tmp.txt")try { >>>>>>>>>>>>>>> String text = " " >>>>>>>>>>>>>>> String charset = "UTF-16LE" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>>>>>>>>>> println "withPrintWriter" >>>>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>>>>>>>>>> w.print(text) >>>>>>>>>>>>>>> w.close() >>>>>>>>>>>>>>> println "\n\nnew PrintWriter" >>>>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) }} >>>>>>>>>>>>>>> finally { >>>>>>>>>>>>>>> file.delete()} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Outputs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> withPrintWriter >>>>>>>>>>>>>>> ff fe 20 00 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> new PrintWriter >>>>>>>>>>>>>>> 20 00 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is this difference in behavior intentional? It seems kinda >>>>>>>>>>>>>>> odd to me. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Keegan >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>> >>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Guillaume Laforge >>>>>>>>>> Groovy Project Manager >>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>> >>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Guillaume Laforge >>>>>>>> Groovy Project Manager >>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>> >>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Guillaume Laforge >>>>>> Groovy Project Manager >>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>> >>>>>> Blog: http://glaforge.appspot.com/ >>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Guillaume Laforge >>>> Groovy Project Manager >>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>> >>>> Blog: http://glaforge.appspot.com/ >>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>> >>> >>> >> > -- Guillaume Laforge Groovy Project Manager Product Ninja & Advocate at Restlet <http://restlet.com> Blog: http://glaforge.appspot.com/ Social: @glaforge <http://twitter.com/glaforge> / Google+ <https://plus.google.com/u/0/114130972232398734985/posts>