I'm writing out a PCollection<List<String>>. My goal is to write out each element in the list as a new line.
The StringUtf8Coder also writes out a VarInt for the size of the bytes. The StringDelegateCoder with the ListCoder doesn't actually write out text. I think List<String> support should be added to TextIO.Write. Or maybe a new coder needs to be added that outputs text, with support for Lists, KVs, Sets, etc. On Thu, May 19, 2016 at 9:23 PM Kenneth Knowles <[email protected]> wrote: > Hi Jesse, > > StringDelegateCoder does just what you have said: it encodes using > #toString() and decodes assuming a single-arg constructor. > > But by analogy with what you have written, and if I understand your goals > correctly, what you want here is > TextIO.Write.withCoder(StringDelegateCoder.of(List.class)) > since you want to base it on List#toString() not String#toString(). > > That said, probably the best way to write a reliable and/or readable > format with TextIO.Write is to intentionally produce just the string you > want for your output format - including escaping newlines, etc - and then > use StringUtf8Coder. > > Kenn > > On Thu, May 19, 2016 at 9:00 PM, Jesse Anderson <[email protected]> > wrote: > >> I'm trying to write out a List<String> with TextIO.Write. The only >> supported type is String. I ended up writing an anonymous coder. >> >> I want to check if there is a a coder that I couldn't find that would >> just take an object and write out out the .toString() of it. >> >> I tried this: >> >> orderedList.apply(TextIO.Write.withCoder(ListCoder.of(StringDelegateCoder.of(String.class))).to("output/result")); >> >> But a VarInt is encoded along with everything. I'm looking for a coder >> that only writes out the UTF8. >> >> This functionality would be similar to Hadoop TextOutputFormat. It just >> runs a .toString before writing it out. >> >> In the anonymous coder I wrote, I hit a weird issue. This code just >> writes out a bunch of "\n". Yes, value is populated with data. >> dataOutputStream.writeUTF(value); >> dataOutputStream.writeUTF("\n"); >> >> This code works: >> byte[] bytes = value.getBytes(StandardCharsets.UTF_8); >> dataOutputStream.write(bytes); >> dataOutputStream.writeUTF("\n"); >> >> I took this from the string coder. What's odd is that DOS' writeUTF >> should work too. Is there a reason why? >> >> Thanks, >> >> jesse >> > >
