Hi Aljoscha, We need flink-shaded-hadoop2-uber.jar because there is no hadoop distro on the instance the Flink session/jobs is managed from and the process that launches Flink is not a java process, but execs a process that calls the flink script.
-Cliff On Tue, Aug 21, 2018 at 5:11 AM Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Cliff, > > Do you actually need the flink-shaded-hadoop2-uber.jar in lib. If you're > running on YARN, you should be able to just remove them because with YARN > you will have Hadoop in the classpath anyways. > > Aljoscha > > On 21. Aug 2018, at 03:45, vino yang <yanghua1...@gmail.com> wrote: > > Hi Cliff, > > If so, you can explicitly exclude Avro's dependencies from related > dependencies (using <exclude>) and then directly introduce dependencies on > the Avro version you need. > > Thanks, vino. > > Cliff Resnick <cre...@gmail.com> 于2018年8月21日周二 上午5:13写道: > >> Hi Vino, >> >> Unfortunately, I'm still stuck here. By moving the avro dependency chain >> to lib (and removing it from user jar), my OCFs decode but I get the error >> described here: >> >> https://github.com/confluentinc/schema-registry/pull/509 >> >> However, the Flink fix described in the PR above was to move the Avro >> dependency to the user jar. However, since I'm using YARN, I'm required to >> have flink-shaded-hadoop2-uber.jar loaded from lib -- and that has >> avro bundled un-shaded. So I'm back to the start problem... >> >> Any advice is welcome! >> >> -Cliff >> >> >> On Mon, Aug 20, 2018 at 1:42 PM Cliff Resnick <cre...@gmail.com> wrote: >> >>> Hi Vino, >>> >>> You were right in your assumption -- unshaded avro was being added to >>> our application jar via third-party dependency. Excluding it in packaging >>> fixed the issue. For the record, it looks flink-avro must be loaded from >>> the lib or there will be errors in checkpoint restores. >>> >>> On Mon, Aug 20, 2018 at 8:43 AM Cliff Resnick <cre...@gmail.com> wrote: >>> >>>> Hi Vino, >>>> >>>> Thanks for the explanation, but the job only ever uses the Avro (1.8.2) >>>> pulled in by flink-formats/avro, so it's not a class version conflict >>>> there. >>>> >>>> I'm using default child-first loading. It might be a further transitive >>>> dependency, though it's not clear by stack trace or stepping through the >>>> process. When I get a chance I'll look further into it but in case anyone >>>> is experiencing similar problems, what is clear is that classloader order >>>> does matter with Avro. >>>> >>>> On Sun, Aug 19, 2018, 11:36 PM vino yang <yanghua1...@gmail.com> wrote: >>>> >>>>> Hi Cliff, >>>>> >>>>> My personal guess is that this may be caused by Job's Avro conflict >>>>> with the Avro that the Flink framework itself relies on. >>>>> Flink has provided some configuration parameters which allows you to >>>>> determine the order of the classloaders yourself. [1] >>>>> Alternatively, you can debug classloading and participate in the >>>>> documentation.[2] >>>>> >>>>> [1]: >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html >>>>> [2]: >>>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html >>>>> >>>>> Thanks, vino. >>>>> >>>>> Cliff Resnick <cre...@gmail.com> 于2018年8月20日周一 上午10:40写道: >>>>> >>>>>> Our Flink/YARN pipeline has been reading Avro from Kafka for a while >>>>>> now. We just introduced a source of Avro OCF (Object Container Files) >>>>>> read >>>>>> from S3. The Kafka Avro continued to decode without incident, but the OCF >>>>>> files failed 100% with anomalous parse errors in the decoding phase after >>>>>> the schema and codec were successfully read from them. The pipeline would >>>>>> work on my laptop, and when I submitted a test Main program to the Flink >>>>>> Session in YARN, that would also successfully decode. Only the actual >>>>>> pipeline run from the TaskManager failed. At one point I even remote >>>>>> debugged the TaskManager process and stepped through what looked like a >>>>>> normal Avro decode (if you can describe Avro code as normal!) -- until it >>>>>> abruptly failed with an int decode or what-have-you. >>>>>> >>>>>> This stumped me for a while, but I finally tried moving >>>>>> flink-avro.jar from the lib to the application jar, and that fixed it. >>>>>> I'm >>>>>> not sure why this is, especially since there were no typical >>>>>> classloader-type errors. This issue was observed both on Flink 1.5 and >>>>>> 1.6 >>>>>> in Flip-6 mode. >>>>>> >>>>>> -Cliff >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >