Hi Seth,
Thank you for confirming the issue due to the transition in 1.14.
For now, given my constraints, I will do a simple workaround and download
the whole dataset with java aws library.

For future reference though I would like to solve this
I am actually still on 1.12 at the moment and had actually some issue with
simply using flink-parquet.
I think I would have the same issue with 1.14. The root issue is really
around Hadoop library.

If I simply add `flink-parquet` library as specified in the doc it cannot
compile because of class not found for
`org.apache.hadoop.conf.Configuration`.
If I add `hadoop-common` and mark it as provided, it fails with class not
found at runtime.
If I bundle hadoop with my application jar, the it crashes with filesystem
not found for `s3`.

Did I miss anything in the doc?

Alex

On Tue, Dec 21, 2021 at 10:29 PM Seth Wiesman <sjwies...@gmail.com> wrote:

> Hi Alexandre,
>
> You are correct, BatchTableEnvironment does not exist in 1.14 anymore. In
> 1.15 we will have the state processor API ported to DataStream for exactly
> this reason, it is the last piece to begin officially marking DataSet as
> deprecated. As you can understand, this has been a multi year process and
> there have been some rough edges as components are migrated.
>
> The easiest solution is for you to use 1.12 DataSet <-> Table interop. Any
> savepoint you create using Flink 1.12 you should be able to restore on a
> 1.14 DataStream application.
>
> I am unsure of the issue with the Hadoop plugin, but if using 1.14 is a
> hard requirement, rewriting your input data into another format could also
> be a viable stop-gap solution.
>
> Seth
>
> On Mon, Dec 20, 2021 at 8:57 PM Alexandre Montecucco <
> alexandre.montecu...@grabtaxi.com> wrote:
>
>> Hello,
>>
>> I also face the same issue as documented in a previous mail from the
>> mailing list [1]
>> Basically when using flink-parquet, I get:
>>
>>>  java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
>>
>> I have no idea what I need to do to fix this and could not find anything
>> from the doc. I tried importing various hadoop libraries, but it always
>> causes yet another issue.
>>
>> I think this might be the root cause of my problem.
>>
>> Best,
>> Alex
>>
>> [1] https://lists.apache.org/thread/796m8tww4gqykqm1szb3y5m7t6scgho2
>>
>> On Mon, Dec 20, 2021 at 4:23 PM Alexandre Montecucco <
>> alexandre.montecu...@grabtaxi.com> wrote:
>>
>>> Hello Piotrek,
>>> Thank you for the help.
>>> Regarding the S3 issue I have followed the documentation for the
>>> plugins. Many of our other apps are using S3 through the Hadoop Fs Flink
>>> plugin.
>>> Also, in this case, just reading regular plain text file works, I only
>>> have an issue when using Parquet.
>>>
>>> I tried switching to Flink 1.14, however I am stumbling upon other
>>> blockers.
>>> To give more context, I am trying to build a Flink savepoint for cold
>>> start data. So I am using the Flink State Processor API. But:
>>>  -  Flink State Processor API is using the DataSet api which is now
>>> marked as deprecated (Legacy)
>>>  - the doc you shared regarding reading from Parquet uses the DataStream
>>> API
>>>  - the Flink State Processor API doc [1] states there is interoperability
>>> of DataSet and Table API
>>> <https://nightlies.apache.org/flink/flink-docs-master/dev/table/common.html#integration-with-datastream-and-dataset-api>
>>>  (but the link is now erroneous), it was last correct in Flink 1.12 [2]
>>>
>>> Given that we can convert from DataStream to Table API, I was thinking I
>>> could then convert from Table to DataSet API (though very cumbersome and
>>> unsure if any performance / memory impact).
>>> But for the Table to DataSet conversion, the doc is using a 
>>> BatchTableEnvironment
>>> class which does not seem to exist in Flink 1.14 anymore
>>>
>>> Any recommendations or anything I might have missed?
>>>
>>> Thank you.
>>>
>>> Best,
>>> Alex
>>>
>>>
>>> [1] 
>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/libs/state_processor_api/#state-processor-api
>>>
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/libs/state_processor_api/#state-processor-api>
>>>
>>> [2]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/common.html#integration-with-datastream-and-dataset-api
>>> [3]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/
>>>
>>>
>>> On Fri, Dec 17, 2021 at 8:53 PM Piotr Nowojski <pnowoj...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Reading in the DataStream API (that's what I'm using you are doing)
>>>> from Parquet files is officially supported and documented only since 1.14
>>>> [1]. Before that it was only supported for the Table API. As far as I can
>>>> tell, the basic classes (`FileSource` and `ParquetColumnarRowInputFormat`)
>>>> have already been in the code base since 1.12.x. I don't know how stable it
>>>> was and how well it was working. I would suggest upgrading to Flink 1.14.1.
>>>> As a last resort you can try using the very least the latest version of
>>>> 1.12.x branch as documented by 1.14 version, but I can not guarantee that
>>>> it will be working.
>>>>
>>>> Regarding the S3 issue, have you followed the documentation? [2][3]
>>>>
>>>> Best,
>>>> Piotrek
>>>>
>>>> [1]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/
>>>> [2]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
>>>> [3]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.12/deployment/filesystems/s3.html
>>>>
>>>>
>>>> pt., 17 gru 2021 o 10:10 Alexandre Montecucco <
>>>> alexandre.montecu...@grabtaxi.com> napisał(a):
>>>>
>>>>> Hello everyone,
>>>>> I am struggling to read S3 parquet files from S3 with Flink Streaming
>>>>> 1.12.2
>>>>> I had some difficulty simply reading from local parquet files. I
>>>>> finally managed that part, though the solution feels dirty:
>>>>> - I use the readFile function + ParquetInputFormat abstract class
>>>>> (that is protected) (as I could not find a way to use the public
>>>>> ParquetRowInputFormat).
>>>>> - the open function, in ParquetInputFormat is
>>>>> using org.apache.hadoop.conf.Configuration. I am not sure which import to
>>>>> add. It seems the flink-parquet library is importing the dependency from
>>>>> hadoop-common but the dep is marked as provided. THe doc only shows usage
>>>>> of flink-parquet from Flink SQL. So I am under the impression that this
>>>>> might not work in the streaming case without extra code. I 'solved' this 
>>>>> by
>>>>> adding a dependency to hadoop-common. We did something similar to write
>>>>> parquet data to S3.
>>>>>
>>>>> Now, when trying to run the application to read from S3, I get an
>>>>> exception with root cause:
>>>>> ```
>>>>> Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No
>>>>> FileSystem for scheme "s3"
>>>>> ```
>>>>> I guess there are some issues with hadoop-common not knowing about the
>>>>> flink-s3-hadoop plugin setup. But I ran out of ideas on how to solve this.
>>>>>
>>>>>
>>>>> I also noticed there were some changes with flink-parquet in Flink
>>>>> 1.14, but I had some issues with simply reading data (but I did not
>>>>> investigate so deeply for that version).
>>>>>
>>>>> Many thanks for any help.
>>>>> --
>>>>>
>>>>> [image: Grab] <https://htmlsig.com/t/000001BKA99J>
>>>>>
>>>>> [image: Twitter]  <https://htmlsig.com/t/000001BKDVDC> [image:
>>>>> Facebook]  <https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn]
>>>>> <https://htmlsig.com/t/000001BKYJ3R> [image: Instagram]
>>>>> <https://htmlsig.com/t/000001BH4CH1> [image: Youtube]
>>>>> <https://htmlsig.com/t/0000001BMMNPF>
>>>>>
>>>>> Alexandre Montecucco / Grab, Software Developer
>>>>> alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937
>>>>>
>>>>> Grab
>>>>> 138 Cecil Street, Cecil Court #01-01Singapore 069538
>>>>> https://www.grab.com/ <https://www.grab.com/sg/hitch>
>>>>>
>>>>>
>>>>> By communicating with Grab Inc and/or its subsidiaries, associate
>>>>> companies and jointly controlled entities (“Grab Group”), you are deemed 
>>>>> to
>>>>> have consented to the processing of your personal data as set out in the
>>>>> Privacy Notice which can be viewed at https://grab.com/privacy/
>>>>>
>>>>> This email contains confidential information and is only for the
>>>>> intended recipient(s). If you are not the intended recipient(s), please do
>>>>> not disseminate, distribute or copy this email Please notify Grab Group
>>>>> immediately if you have received this by mistake and delete this email 
>>>>> from
>>>>> your system. Email transmission cannot be guaranteed to be secure or
>>>>> error-free as any information therein could be intercepted, corrupted,
>>>>> lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do
>>>>> not accept liability for any errors or omissions in the contents of this
>>>>> email arises as a result of email transmission. All intellectual property
>>>>> rights in this email and attachments therein shall remain vested in Grab
>>>>> Group, unless otherwise provided by law.
>>>>>
>>>>
>>>
>>> --
>>>
>>> [image: Grab] <https://htmlsig.com/t/000001BKA99J>
>>>
>>> [image: Twitter]  <https://htmlsig.com/t/000001BKDVDC> [image: Facebook]
>>>   <https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn]
>>> <https://htmlsig.com/t/000001BKYJ3R> [image: Instagram]
>>> <https://htmlsig.com/t/000001BH4CH1> [image: Youtube]
>>> <https://htmlsig.com/t/0000001BMMNPF>
>>>
>>> Alexandre Montecucco / Grab, Software Developer
>>> alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937
>>>
>>> Grab
>>> 138 Cecil Street, Cecil Court #01-01Singapore 069538
>>> https://www.grab.com/ <https://www.grab.com/sg/hitch>
>>>
>>
>>
>> --
>>
>> [image: Grab] <https://htmlsig.com/t/000001BKA99J>
>>
>> [image: Twitter]  <https://htmlsig.com/t/000001BKDVDC> [image: Facebook]
>> <https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn]
>> <https://htmlsig.com/t/000001BKYJ3R> [image: Instagram]
>> <https://htmlsig.com/t/000001BH4CH1> [image: Youtube]
>> <https://htmlsig.com/t/0000001BMMNPF>
>>
>> Alexandre Montecucco / Grab, Software Developer
>> alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937
>>
>> Grab
>> 138 Cecil Street, Cecil Court #01-01Singapore 069538
>> https://www.grab.com/ <https://www.grab.com/sg/hitch>
>>
>>
>> By communicating with Grab Inc and/or its subsidiaries, associate
>> companies and jointly controlled entities (“Grab Group”), you are deemed to
>> have consented to the processing of your personal data as set out in the
>> Privacy Notice which can be viewed at https://grab.com/privacy/
>>
>> This email contains confidential information and is only for the intended
>> recipient(s). If you are not the intended recipient(s), please do not
>> disseminate, distribute or copy this email Please notify Grab Group
>> immediately if you have received this by mistake and delete this email from
>> your system. Email transmission cannot be guaranteed to be secure or
>> error-free as any information therein could be intercepted, corrupted,
>> lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do
>> not accept liability for any errors or omissions in the contents of this
>> email arises as a result of email transmission. All intellectual property
>> rights in this email and attachments therein shall remain vested in Grab
>> Group, unless otherwise provided by law.
>>
>

-- 

[image: Grab] <https://htmlsig.com/t/000001BKA99J>

[image: Twitter]  <https://htmlsig.com/t/000001BKDVDC> [image: Facebook]
<https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn]
<https://htmlsig.com/t/000001BKYJ3R> [image: Instagram]
<https://htmlsig.com/t/000001BH4CH1> [image: Youtube]
<https://htmlsig.com/t/0000001BMMNPF>

Alexandre Montecucco / Grab, Software Developer
alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937

Grab
138 Cecil Street, Cecil Court #01-01Singapore 069538
https://www.grab.com/ <https://www.grab.com/sg/hitch>

-- 


By communicating with Grab Inc and/or its subsidiaries, associate 
companies and jointly controlled entities (“Grab Group”), you are deemed to 
have consented to the processing of your personal data as set out in the 
Privacy Notice which can be viewed at https://grab.com/privacy/ 
<https://grab.com/privacy/>


This email contains confidential information 
and is only for the intended recipient(s). If you are not the intended 
recipient(s), please do not disseminate, distribute or copy this email 
Please notify Grab Group immediately if you have received this by mistake 
and delete this email from your system. Email transmission cannot be 
guaranteed to be secure or error-free as any information therein could be 
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain 
viruses. Grab Group do not accept liability for any errors or omissions in 
the contents of this email arises as a result of email transmission. All 
intellectual property rights in this email and attachments therein shall 
remain vested in Grab Group, unless otherwise provided by law.

Reply via email to