Re: Using Hadoop Input/Output formats

Fabian Hueske Tue, 24 Nov 2015 11:27:27 -0800

Hi Nick,

you can use Flink's HadoopInputFormat wrappers also for the DataStream API.
However, DataStream does not offer as much "sugar" as DataSet because
StreamEnvironment does not offer dedicated createHadoopInput or
readHadoopFile methods.


In DataStream Scala you can read from a Hadoop InputFormat (TextInputFormat
in this case) as follows:

val textData: DataStream[(LongWritable, Text)] = env.createInput(
  new HadoopInputFormat[LongWritable, Text](
    new TextInputFormat,
    classOf[LongWritable],
    classOf[Text],
    new JobConf()
))

The Java version is very similar.

Note: Flink has wrappers for both MR APIs: mapred and mapreduce.

Cheers,
Fabian

2015-11-24 19:36 GMT+01:00 Chiwan Park <chiwanp...@apache.org>:

> I’m not streaming expert. AFAIK, the layer can be used with only DataSet.
> There are some streaming-specific features such as distributed snapshot in
> Flink. These need some supports of source and sink. So you have to
> implement I/O.
>
> > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:
> >
> > I completely missed this, thanks Chiwan. Can these be used with
> DataStreams as well as DataSets?
> >
> > On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park <chiwanp...@apache.org>
> wrote:
> > Hi Nick,
> >
> > You can use Hadoop Input/Output Format without modification! Please
> check the documentation[1] in Flink homepage.
> >
> > [1]
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html
> >
> > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
> > >
> > > Hello,
> > >
> > > Is it possible to use existing Hadoop Input and OutputFormats with
> Flink? There's a lot of existing code that conforms to these interfaces,
> seems a shame to have to re-implement it all. Perhaps some adapter shim..?
> > >
> > > Thanks,
> > > Nick
> >
> > Regards,
> > Chiwan Park
> >
> >
>
> Regards,
> Chiwan Park
>
>
>
>

Re: Using Hadoop Input/Output formats

Reply via email to