I'm writing out a PCollection<List<String>>. My goal is to write out each
element in the list as a new line.

The StringUtf8Coder also writes out a VarInt for the size of the
bytes. The StringDelegateCoder
with the ListCoder doesn't actually write out text.

I think List<String> support should be added to TextIO.Write. Or maybe a
new coder needs to be added that outputs text, with support for Lists, KVs,
Sets, etc.

On Thu, May 19, 2016 at 9:23 PM Kenneth Knowles <[email protected]> wrote:

> Hi Jesse,
>
> StringDelegateCoder does just what you have said: it encodes using
> #toString() and decodes assuming a single-arg constructor.
>
> But by analogy with what you have written, and if I understand your goals
> correctly, what you want here is 
> TextIO.Write.withCoder(StringDelegateCoder.of(List.class))
> since you want to base it on List#toString() not String#toString().
>
> That said, probably the best way to write a reliable and/or readable
> format with TextIO.Write is to intentionally produce just the string you
> want for your output format - including escaping newlines, etc - and then
> use StringUtf8Coder.
>
> Kenn
>
> On Thu, May 19, 2016 at 9:00 PM, Jesse Anderson <[email protected]>
> wrote:
>
>> I'm trying to write out a List<String> with TextIO.Write. The only
>> supported type is String. I ended up writing an anonymous coder.
>>
>> I want to check if there is a a coder that I couldn't find that would
>> just take an object and write out out the .toString() of it.
>>
>> I tried this:
>>
>> orderedList.apply(TextIO.Write.withCoder(ListCoder.of(StringDelegateCoder.of(String.class))).to("output/result"));
>>
>> But a VarInt is encoded along with everything. I'm looking for a coder
>> that only writes out the UTF8.
>>
>> This functionality would be similar to Hadoop TextOutputFormat. It just
>> runs a .toString before writing it out.
>>
>> In the anonymous coder I wrote, I hit a weird issue. This code just
>> writes out a bunch of "\n". Yes, value is populated with data.
>>           dataOutputStream.writeUTF(value);
>>           dataOutputStream.writeUTF("\n");
>>
>> This code works:
>>           byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
>>           dataOutputStream.write(bytes);
>>           dataOutputStream.writeUTF("\n");
>>
>> I took this from the string coder. What's odd is that DOS' writeUTF
>> should work too. Is there a reason why?
>>
>> Thanks,
>>
>> jesse
>>
>
>

Reply via email to