Re: Keeping around temp datasets

2015-01-21 Thread Stephan Ewen
I agree, the type serializer IO formats should be the best match. They would also work rather efficient. On Tue, Jan 20, 2015 at 2:18 PM, Robert Metzger wrote: > I would not recommend using the AvroInput/Output format because its meant > to be used with Avro types (usually POJOs generated from a

Re: Keeping around temp datasets

2015-01-20 Thread Robert Metzger
I would not recommend using the AvroInput/Output format because its meant to be used with Avro types (usually POJOs generated from an Avro schema). I would use the TypeSerializerInputFormat / OutputFormat. Then you can be sure that its able to read/write all types supported by our system. On Tue,

Keeping around temp datasets

2015-01-20 Thread Alexander Alexandrov
Hi there, I have to implement some generic fallback strategy on top of a more abstract DSL in order to keep datasets in a temp space (e.g. Tachyon). My implementation is based on the 0.8 release. At the moment I am undecided between three options: - BinaryInputFormat / BinaryOutputFormat -