Do you want to read the data once or monitor a directory and process new
files as they appear?

Reading from S3 with Flink's current MonitoringFileSource implementation is
not working reliably due to S3's eventual consistent list operation (see
FLINK-9940 [1]).
Reading a directory also has some issues as it won't work with
checkpointing enabled.

These limitations could be worked around with custom source implementations.

Best, Fabian

[1] https://issues.apache.org/jira/browse/FLINK-9940

2018-08-07 19:45 GMT+02:00 srimugunthan dhandapani <
srimugunthan.dhandap...@gmail.com>:

> Thanks for the reply. I was mainly thinking of the usecase of streaming
> job.
> In the approach to port to Flink's SQL API, is it possible to read parquet
> data from S3 and register table in flink?
>
>
> On Tue, Aug 7, 2018 at 1:05 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Mugunthan,
>>
>> this depends on the type of your job. Is it a batch or a streaming job?
>> Some queries could be ported to Flink's SQL API as suggested by the link
>> that Hequn posted. In that case, the query would be executed in Flink.
>>
>> Other options are to use a JDBC InputFormat or persisting the result to
>> files and ingesting it with a monitoring file sink.
>> These options would mean to run the query in Hive/Presto and just
>> ingesting the result (via JDBC or files).
>>
>> It depends on the details, which solution works best for you.
>>
>> Best, Fabian
>>
>> 2018-08-07 3:28 GMT+02:00 Hequn Cheng <chenghe...@gmail.com>:
>>
>>> Hi srimugunthan,
>>>
>>> I found a related link[1]. Hope it helps.
>>>
>>> [1] https://stackoverflow.com/questions/41683108/flink-1-1-3
>>> -interact-with-hive-2-1-0
>>>
>>> On Tue, Aug 7, 2018 at 2:35 AM, srimugunthan dhandapani <
>>> srimugunthan.dhandap...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> I read the Flink documentation  and came across the connectors supported
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>>> dev/connectors/index.html#bundled-connectors
>>>>
>>>> We have some data that  resides in Hive/Presto that needs to be made
>>>> available to the flink job. The data in the hive or presto can be updated
>>>> once in a day or less than that.
>>>>
>>>> Ideally we will connect to the hive or presto , run the query and get
>>>> back the results and use it in a flink job.
>>>> What are the options to achieve something like that?
>>>>
>>>> Thanks,
>>>> mugunthan
>>>>
>>>
>>>
>>
>

Reply via email to