Hey Ingo,

Thanks for the suggestion. It's definitely an issue with the Parquet
connector, when we try with the CSV or Blackhole connector it's all fine.

I will be trying this approach and report back.

Thanks,
Natu

On Wed, Dec 8, 2021 at 7:02 PM Ingo Bürk <i...@ververica.com> wrote:

> Hi Natu,
>
> Something you could try is removing the packaged parquet format and
> defining a custom format[1]. For this custom format you can then fix the
> dependencies by packaging all of the following into the format:
>
> * flink-sql-parquet
> * flink-shaded-hadoop-2-uber
> * hadoop-aws
> * aws-java-sdk-bundle
> * guava
>
> This isn't entirely straight-forward, unfortunately, and I haven't
> verified it. However, with Ververica Platform 2.6, to be released shortly
> after Flink 1.15, it should also work again.
>
> [1]
> https://docs.ververica.com/user_guide/sql_development/connectors.html#custom-connectors-and-formats
>
>
> Best
> Ingo
>
> On Tue, Dec 7, 2021 at 6:23 AM Natu Lauchande <nlaucha...@gmail.com>
> wrote:
>
>> Hey Timo and Flink community,
>>
>> I wonder if there is a fix for this issue. The last time I rollbacked to
>> version 12 of Flink and downgraded Ververica.
>>
>> I am really keen to leverage the new features on the latest versions of
>> Ververica 2.5+ , i have tried a myriad of tricks suggested ( example :
>> building the image with hadoop-client libraries) :
>>
>> java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
>>     at java.lang.Class.getDeclaredConstructors0(Native Method)
>>     at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
>>     at java.lang.Class.getDeclaredConstructors(Class.java:2020)
>>     at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass
>> .java:1961)
>>     at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:79)
>>     at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:275)
>>     at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:273)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass
>> .java:272)
>>     at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:694)
>>     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:
>> 2003)
>>     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:
>> 1850)
>>     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream
>> .java:2160)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>>     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream
>> .java:2405)
>>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
>> 2329)
>>     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream
>> .java:2187)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>>     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream
>> .java:2405)
>>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
>> 2329)
>>     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream
>> .java:2187)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>>     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream
>> .java:2405)
>>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
>> 2329)
>>     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream
>> .java:2187)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>>     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream
>> .java:2405)
>>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
>> 2329)
>>     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream
>> .java:2187)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>>     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream
>> .java:2405)
>>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
>> 2329)
>>     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream
>> .java:2187)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
>>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
>>     at org.apache.flink.util.InstantiationUtil.deserializeObject(
>> InstantiationUtil.java:615)
>>     at org.apache.flink.util.InstantiationUtil.deserializeObject(
>> InstantiationUtil.java:600)
>>     at org.apache.flink.util.InstantiationUtil.deserializeObject(
>> InstantiationUtil.java:587)
>>     at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(
>> InstantiationUtil.java:541)
>>     at org.apache.flink.streaming.api.graph.StreamConfig
>> .getStreamOperatorFactory(StreamConfig.java:322)
>>     at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(
>> OperatorChain.java:159)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask
>> .executeRestore(StreamTask.java:551)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask
>> .runWithCleanUpOnFail(StreamTask.java:650)
>>     at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(
>> StreamTask.java:540)
>>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
>>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>>     at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.
>> Configuration
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>     at org.apache.flink.util.FlinkUserCodeClassLoader
>> .loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:64)
>>     at org.apache.flink.util.ChildFirstClassLoader
>> .loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:65)
>>     at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(
>> FlinkUserCodeClassLoader.java:48)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>     ... 48 more
>>
>> This error occurs when writing to StreamFileWriting on S3 in Parquet
>> format.
>>
>>
>> Thanks,
>> Natu
>>
>> On Thu, Jul 22, 2021 at 3:53 PM Timo Walther <twal...@apache.org> wrote:
>>
>>> Thanks, this should definitely work with the pre-packaged connectors of
>>> Ververica platform.
>>>
>>> I guess we have to investigate what is going on. Until then, a
>>> workaround could be to add Hadoop manually and set the HADOOP_CLASSPATH
>>> environment variable. The root cause seems that Hadoop cannot be found.
>>>
>>> Alternatively, you could also build a custom image and include Hadoop in
>>> the lib folder of Flink:
>>>
>>> https://docs.ververica.com/v1.3/platform/installation/custom_images.html
>>>
>>> I hope this helps. I will get back to you if we have a fix ready.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>>
>>> On 22.07.21 14:30, Natu Lauchande wrote:
>>> > Sure.
>>> >
>>> > That's how the ddl table looks like:
>>> >
>>> > CREATETABLEtablea (
>>> >
>>> > `a` BIGINT,
>>> >
>>> > `b` BIGINT,
>>> >
>>> > `c` BIGINT
>>> >
>>> > )
>>> >
>>> > COMMENT ''
>>> >
>>> > WITH(
>>> >
>>> > 'auto-compaction'='false',
>>> >
>>> > 'connector'='filesystem',
>>> >
>>> > 'format'='parquet',
>>> >
>>> > 'parquet.block.size'='134217728',
>>> >
>>> > 'parquet.compression'='SNAPPY',
>>> >
>>> > 'parquet.dictionary.page.size'='1048576',
>>> >
>>> > 'parquet.enable.dictionary'='true',
>>> >
>>> > 'parquet.page.size'='1048576',
>>> >
>>> > 'parquet.writer.max-padding'='2097152',
>>> >
>>> > 'path'='s3a://test/test’,
>>> >
>>> > 'sink.partition-commit.delay'='1 h',
>>> >
>>> > 'sink.partition-commit.policy.kind'='success-file',
>>> >
>>> > 'sink.partition-commit.success-file.name
>>> > <http://sink.partition-commit.success-file.name>'='_SUCCESS',
>>> >
>>> > 'sink.partition-commit.trigger'='process-time',
>>> >
>>> > 'sink.rolling-policy.check-interval'='20 min',
>>> >
>>> > 'sink.rolling-policy.file-size'='128MB',
>>> >
>>> > 'sink.rolling-policy.rollover-interval'='2 h'
>>> >
>>> > );
>>> >
>>> >
>>> >
>>> > When a change the connector to a blackhole it immediately works
>>> without
>>> > errors. I have the redacted the names and paths.
>>> >
>>> >
>>> >
>>> > Thanks,
>>> > Natu
>>> >
>>> >
>>> > On Thu, Jul 22, 2021 at 2:24 PM Timo Walther <twal...@apache.org
>>> > <mailto:twal...@apache.org>> wrote:
>>> >
>>> >     Maybe you can share also which connector/format you are using?
>>> What is
>>> >     the DDL?
>>> >
>>> >     Regards,
>>> >     Timo
>>> >
>>> >
>>> >     On 22.07.21 14:11, Natu Lauchande wrote:
>>> >      > Hey Timo,
>>> >      >
>>> >      > Thanks for the reply.
>>> >      >
>>> >      > No custom file as we are using Flink SQL and submitting the job
>>> >     directly
>>> >      > through the SQL Editor UI. We are using Flink 1.13.1 as the
>>> >     supported
>>> >      > flink version. No custom code all through Flink SQL on UI no
>>> jars.
>>> >      >
>>> >      > Thanks,
>>> >      > Natu
>>> >      >
>>> >      > On Thu, Jul 22, 2021 at 2:08 PM Timo Walther <
>>> twal...@apache.org
>>> >     <mailto:twal...@apache.org>
>>> >      > <mailto:twal...@apache.org <mailto:twal...@apache.org>>> wrote:
>>> >      >
>>> >      >     Hi Natu,
>>> >      >
>>> >      >     Ververica Platform 2.5 has updated the bundled Hadoop
>>> version
>>> >     but this
>>> >      >     should not result in a NoClassDefFoundError exception. How
>>> >     are you
>>> >      >     submitting your SQL jobs? You don't use Ververica's SQL
>>> >     service but
>>> >      >     have
>>> >      >     built a regular JAR file, right? If this is the case, can
>>> you
>>> >     share
>>> >      >     your
>>> >      >     pom.xml file with us? The Flink version stays constant at
>>> 1.12?
>>> >      >
>>> >      >     Regards,
>>> >      >     Timo
>>> >      >
>>> >      >     On 22.07.21 12:22, Natu Lauchande wrote:
>>> >      >      > Good day Flink community,
>>> >      >      >
>>> >      >      > Apache Flink/Ververica Community Edition - Question
>>> >      >      >
>>> >      >      >
>>> >      >      > I am having an issue with my Flink SQL jobs since
>>> updating
>>> >      >     from Flink
>>> >      >      > 1.12/Ververica 2.4 to Ververica 2.5 . For all the jobs
>>> >     running on
>>> >      >      > parquet and S3 i am getting the following error
>>> continuously:
>>> >      >      >
>>> >      >      > INITIALIZING to FAILED on 10.243.3.0:42337-2a3224 @
>>> >      >      > 10-243-3-0.flink-metrics.vvp-jobs.svc.cluster.local
>>> >     (dataPort=39309).
>>> >      >      >
>>> >      >      > java.lang.NoClassDefFoundError:
>>> >     org/apache/hadoop/conf/Configuration
>>> >      >      >
>>> >      >      > at java.lang.Class.getDeclaredConstructors0(Native
>>> Method)
>>> >      >     ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > at
>>> >     java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
>>> >      >      > ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > at
>>> java.lang.Class.getDeclaredConstructors(Class.java:2020)
>>> >      >     ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > *....*
>>> >      >      >
>>> >      >      > at
>>> >     java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
>>> >      >      > ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:615)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:600)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:587)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:541)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:322)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperator(OperatorChain.java:653)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:626)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:616)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:616)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:616)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:181)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:548)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >     org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >     org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > Caused by: java.lang.ClassNotFoundException:
>>> >      >      > org.apache.hadoop.conf.Configuration
>>> >      >      >
>>> >      >      > at
>>> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>> >      >     ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>> >      >     ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:64)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:65)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at
>>> >      >      >
>>> >      >
>>> >
>>>  
>>> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
>>> >      >
>>> >      >      > ~[flink-dist_2.12-1.13.1-stream1.jar:1.13.1-stream1[]
>>> >      >      >
>>> >      >      > at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>> >      >     ~[?:1.8.0_292]
>>> >      >      >
>>> >      >      > ... 57 more
>>> >      >      >
>>> >      >      > 2021-07-22 09:38:43,095 DEBUG
>>> >      >      > org.apache.flink.runtime.scheduler.SharedSlot[] - Remove
>>> >     logical
>>> >      >     slot
>>> >      >      > (SlotRequestId{4297879e795d0516e36a7c26ccc795b2}) for
>>> >     execution
>>> >      >     vertex
>>> >      >      > (id cbc357ccb763df2852fee8c4fc7d55f2_0) from the
>>> physical slot
>>> >      >      > (SlotRequestId{df7c49a6610b56f26aea214c05bcd9ed})
>>> >      >      >
>>> >      >      > 2021-07-22 09:38:43,096 DEBUG
>>> >      >      > org.apache.flink.runtime.scheduler.SharedSlot[] - Release
>>> >     shared
>>> >      >     slot
>>> >      >      > externally
>>> (SlotRequestId{df7c49a6610b56f26aea214c05bcd9ed})
>>> >      >      >
>>> >      >      >
>>> >      >      > Everything works well when i roll back to Ververica v2.4,
>>> >     has anyone
>>> >      >      > experienced this error before.
>>> >      >      >
>>> >      >      > Thanks,
>>> >      >      >
>>> >      >      > Natu
>>> >      >      >
>>> >      >
>>> >
>>>
>>>

Reply via email to