As a little update, the pattern for the exclusion of those files in sbt-assembly is the following:
assemblyMergeStrategy in assembly := { case PathList(ps @ _*) if ps.last.endsWith(".DSA") || ps.last.endsWith(".SF") || ps.last.endsWith(".RSA") => MergeStrategy.discard //Other MergeStrategies } 2017-09-25 11:48 GMT+02:00 Federico D'Ambrosio < federico.dambro...@smartlab.ws>: > Hi Urs, > > Thank you very much for your advice, I will look into excluding those > files directly during the assembly. > > 2017-09-25 10:58 GMT+02:00 Urs Schoenenberger <urs.schoenenberger@tngtech. > com>: > >> Hi Federico, >> >> oh, I remember running into this problem some time ago. If I recall >> correctly, this is not a flink issue, but an issue with technically >> incorrect jars from dependencies which prevent the verification of the >> manifest. I was using the maven-shade plugin back then and configured an >> exclusion for these file types. I assume that sbt/sbt-assembly has a >> similar option, this should be more stable than manually stripping the >> jar. >> Alternatively, you could try to find out which dependency puts the >> .SF/etc files there and exclude this dependency altogether, it might be >> a transitive lib dependency that comes with hadoop anyways, or simply >> one that you don't need anyways. >> >> Best, >> Urs >> >> On 25.09.2017 10:09, Federico D'Ambrosio wrote: >> > Hi Urs, >> > >> > Yes the main class is set, just like you said. >> > >> > Still, I might have managed to get it working: during the assembly some >> > .SF, .DSA and .RSA files are put inside the META-INF folder of the jar, >> > possibly coming from some of the new dependencies in the deps tree. >> > Apparently, this caused this weird issue. Using an appropriate pattern >> for >> > discarding the files during the assembly or removing them via zip -d >> should >> > be enough (I sure hope so, since this is some of the worst issues I've >> come >> > across). >> > >> > >> > Federico D'Ambrosio >> > >> > Il 25 set 2017 9:51 AM, "Urs Schoenenberger" < >> urs.schoenenber...@tngtech.com> >> > ha scritto: >> > >> >> Hi Federico, >> >> >> >> just guessing, but are you explicitly setting the Main-Class manifest >> >> attribute for the jar that you are building? >> >> >> >> Should be something like >> >> >> >> mainClass in (Compile, packageBin) := >> >> Some("org.yourorg.YourFlinkJobMainClass") >> >> >> >> Best, >> >> Urs >> >> >> >> >> >> On 23.09.2017 17:53, Federico D'Ambrosio wrote: >> >>> Hello everyone, >> >>> >> >>> I'd like to submit to you this weird issue I'm having, hoping you >> could >> >>> help me. >> >>> Premise: I'm using sbt 0.13.6 for building, scala 2.11.8 and flink >> 1.3.2 >> >>> compiled from sources against hadoop 2.7.3.2.6.1.0-129 (HDP 2.6) >> >>> So, I'm trying to implement an sink for Hive so I added the following >> >>> dependency in my build.sbt: >> >>> >> >>> "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >>> "1.2.1000.2.6.1.0-129" >> >>> >> >>> in order to use hive streaming capabilities. >> >>> >> >>> After importing this dependency, not even using it, if I try to flink >> run >> >>> the job I get >> >>> >> >>> org.apache.flink.client.program.ProgramInvocationException: The >> >> program's >> >>> entry point class 'package.MainObj' was not found in the jar file. >> >>> >> >>> If I remove the dependency, everything goes back to normal. >> >>> What is weird is that if I try to use sbt run in order to run job, *it >> >> does >> >>> find the Main class* and obviously crash because of the missing flink >> >> core >> >>> dependencies (AbstractStateBackend missing and whatnot). >> >>> >> >>> Here are the complete dependencies of the project: >> >>> >> >>> "org.apache.flink" %% "flink-scala" % flinkVersion % "provided", >> >>> "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % >> >> "provided", >> >>> "org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion, >> >>> "org.apache.flink" %% "flink-cep-scala" % flinkVersion, >> >>> "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >>> "1.2.1000.2.6.1.0-129", >> >>> "org.joda" % "joda-convert" % "1.8.3", >> >>> "com.typesafe.play" %% "play-json" % "2.6.2", >> >>> "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "2.0.2", >> >>> "org.scalactic" %% "scalactic" % "3.0.1", >> >>> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >> >>> "de.javakaffee" % "kryo-serializers" % "0.42" >> >>> >> >>> Could it be an issue of dependencies conflicts between mongo-hadoop >> and >> >>> hive hadoop versions (respectively 2.7.1 and 2.7.3.2.6.1.0-129, even >> >>> though no issue between mongodb-hadoop and flink)? I'm even starting >> to >> >>> think that Flink cannot handle that well big jars (before the new >> >>> dependency it was 44M, afterwards it became 115M) when it comes to >> >>> classpath loading? >> >>> >> >>> Any help would be really appreciated, >> >>> Kind regards, >> >>> Federico >> >>> >> >>> >> >>> >> >>> Hello everyone, >> >>> >> >>> I'd like to submit to you this weird issue I'm having, hoping you >> could >> >>> help me. >> >>> Premise: I'm using sbt 0.13.6 for building, scala 2.11.8 and flink >> 1.3.2 >> >>> compiled from sources against hadoop 2.7.3.2.6.1.0-129 (HDP 2.6) >> >>> So, I'm trying to implement an sink for Hive so I added the following >> >>> dependency in my build.sbt: >> >>> >> >>> "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >>> "1.2.1000.2.6.1.0-129" >> >>> >> >>> in order to use hive streaming capabilities. >> >>> >> >>> After importing this dependency, not even using it, if I try to flink >> >>> run the job I get >> >>> >> >>> org.apache.flink.client.program.ProgramInvocationException: The >> >>> program's entry point class 'package.MainObj' was not found in the jar >> >> file. >> >>> >> >>> If I remove the dependency, everything goes back to normal. >> >>> What is weird is that if I try to use sbt run in order to run job, *it >> >>> does find the Main class* and obviously crash because of the missing >> >>> flink core dependencies (AbstractStateBackend missing and whatnot). >> >>> >> >>> Here are the complete dependencies of the project: >> >>> >> >>> "org.apache.flink" %% "flink-scala" % flinkVersion % "provided", >> >>> "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % >> >> "provided", >> >>> "org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion, >> >>> "org.apache.flink" %% "flink-cep-scala" % flinkVersion, >> >>> "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >>> "1.2.1000.2.6.1.0-129", >> >>> "org.joda" % "joda-convert" % "1.8.3", >> >>> "com.typesafe.play" %% "play-json" % "2.6.2", >> >>> "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "2.0.2", >> >>> "org.scalactic" %% "scalactic" % "3.0.1", >> >>> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >> >>> "de.javakaffee" % "kryo-serializers" % "0.42" >> >>> >> >>> Could it be an issue of dependencies conflicts between mongo-hadoop >> and >> >>> hive hadoop versions (respectively 2.7.1 and 2.7.3.2.6.1.0-129, even >> >>> though no issue between mongodb-hadoop and flink)? I'm even starting >> to >> >>> think that Flink cannot handle that well big jars (before the new >> >>> dependency it was 44M, afterwards it became 115M) when it comes to >> >>> classpath loading? >> >>> >> >>> Any help would be really appreciated, >> >>> Kind regards, >> >>> Federico >> >> >> >> -- >> >> Urs Schönenberger - urs.schoenenber...@tngtech.com >> >> >> >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> >> Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller >> >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> >> >> >> >> >> >> >> Hi Urs, >> >> >> >> Yes the main class is set, just like you said. >> >> >> >> Still, I might have managed to get it working: during the assembly >> >> some .SF, .DSA and .RSA files are put inside the META-INF folder of >> >> the jar, possibly coming from some of the new dependencies in the deps >> >> tree. >> >> Apparently, this caused this weird issue. Using an appropriate pattern >> >> for discarding the files during the assembly or removing them via zip >> >> -d should be enough (I sure hope so, since this is some of the worst >> >> issues I've come across). >> >> >> >> >> >> Federico D'Ambrosio >> >> >> >> Il 25 set 2017 9:51 AM, "Urs Schoenenberger" >> >> <urs.schoenenber...@tngtech.com >> >> <mailto:urs.schoenenber...@tngtech.com>> ha scritto: >> >> >> >> Hi Federico, >> >> >> >> just guessing, but are you explicitly setting the Main-Class >> manifest >> >> attribute for the jar that you are building? >> >> >> >> Should be something like >> >> >> >> mainClass in (Compile, packageBin) := >> >> Some("org.yourorg.YourFlinkJobMainClass") >> >> >> >> Best, >> >> Urs >> >> >> >> >> >> On 23.09.2017 17:53, Federico D'Ambrosio wrote: >> >> > Hello everyone, >> >> > >> >> > I'd like to submit to you this weird issue I'm having, hoping >> >> you could >> >> > help me. >> >> > Premise: I'm using sbt 0.13.6 for building, scala 2.11.8 and >> >> flink 1.3.2 >> >> > compiled from sources against hadoop 2.7.3.2.6.1.0-129 (HDP 2.6) >> >> > So, I'm trying to implement an sink for Hive so I added the >> >> following >> >> > dependency in my build.sbt: >> >> > >> >> > "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >> > "1.2.1000.2.6.1.0-129" >> >> > >> >> > in order to use hive streaming capabilities. >> >> > >> >> > After importing this dependency, not even using it, if I try to >> >> flink run >> >> > the job I get >> >> > >> >> > org.apache.flink.client.program.ProgramInvocationException: The >> >> program's >> >> > entry point class 'package.MainObj' was not found in the jar >> file. >> >> > >> >> > If I remove the dependency, everything goes back to normal. >> >> > What is weird is that if I try to use sbt run in order to run >> >> job, *it does >> >> > find the Main class* and obviously crash because of the missing >> >> flink core >> >> > dependencies (AbstractStateBackend missing and whatnot). >> >> > >> >> > Here are the complete dependencies of the project: >> >> > >> >> > "org.apache.flink" %% "flink-scala" % flinkVersion % "provided", >> >> > "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % >> >> "provided", >> >> > "org.apache.flink" %% "flink-connector-kafka-0.10" % >> flinkVersion, >> >> > "org.apache.flink" %% "flink-cep-scala" % flinkVersion, >> >> > "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >> > "1.2.1000.2.6.1.0-129", >> >> > "org.joda" % "joda-convert" % "1.8.3", >> >> > "com.typesafe.play" %% "play-json" % "2.6.2", >> >> > "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "2.0.2", >> >> > "org.scalactic" %% "scalactic" % "3.0.1", >> >> > "org.scalatest" %% "scalatest" % "3.0.1" % "test", >> >> > "de.javakaffee" % "kryo-serializers" % "0.42" >> >> > >> >> > Could it be an issue of dependencies conflicts between >> >> mongo-hadoop and >> >> > hive hadoop versions (respectively 2.7.1 and 2.7.3.2.6.1.0-129, >> >> even >> >> > though no issue between mongodb-hadoop and flink)? I'm even >> >> starting to >> >> > think that Flink cannot handle that well big jars (before the new >> >> > dependency it was 44M, afterwards it became 115M) when it comes >> to >> >> > classpath loading? >> >> > >> >> > Any help would be really appreciated, >> >> > Kind regards, >> >> > Federico >> >> > >> >> > >> >> > >> >> > Hello everyone, >> >> > >> >> > I'd like to submit to you this weird issue I'm having, hoping >> >> you could >> >> > help me. >> >> > Premise: I'm using sbt 0.13.6 for building, scala 2.11.8 and >> >> flink 1.3.2 >> >> > compiled from sources against hadoop 2.7.3.2.6.1.0-129 (HDP 2.6) >> >> > So, I'm trying to implement an sink for Hive so I added the >> >> following >> >> > dependency in my build.sbt: >> >> > >> >> > "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >> > "1.2.1000.2.6.1.0-129" >> >> > >> >> > in order to use hive streaming capabilities. >> >> > >> >> > After importing this dependency, not even using it, if I try to >> >> flink >> >> > run the job I get >> >> > >> >> > org.apache.flink.client.program.ProgramInvocationException: The >> >> > program's entry point class 'package.MainObj' was not found in >> >> the jar file. >> >> > >> >> > If I remove the dependency, everything goes back to normal. >> >> > What is weird is that if I try to use sbt run in order to run >> >> job, *it >> >> > does find the Main class* and obviously crash because of the >> missing >> >> > flink core dependencies (AbstractStateBackend missing and >> whatnot). >> >> > >> >> > Here are the complete dependencies of the project: >> >> > >> >> > "org.apache.flink" %% "flink-scala" % flinkVersion % "provided", >> >> > "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % >> >> "provided", >> >> > "org.apache.flink" %% "flink-connector-kafka-0.10" % >> flinkVersion, >> >> > "org.apache.flink" %% "flink-cep-scala" % flinkVersion, >> >> > "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % >> >> > "1.2.1000.2.6.1.0-129", >> >> > "org.joda" % "joda-convert" % "1.8.3", >> >> > "com.typesafe.play" %% "play-json" % "2.6.2", >> >> > "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "2.0.2", >> >> > "org.scalactic" %% "scalactic" % "3.0.1", >> >> > "org.scalatest" %% "scalatest" % "3.0.1" % "test", >> >> > "de.javakaffee" % "kryo-serializers" % "0.42" >> >> > >> >> > Could it be an issue of dependencies conflicts between >> >> mongo-hadoop and >> >> > hive hadoop versions (respectively 2.7.1 and 2.7.3.2.6.1.0-129, >> >> even >> >> > though no issue between mongodb-hadoop and flink)? I'm even >> >> starting to >> >> > think that Flink cannot handle that well big jars (before the new >> >> > dependency it was 44M, afterwards it became 115M) when it comes >> to >> >> > classpath loading? >> >> > >> >> > Any help would be really appreciated, >> >> > Kind regards, >> >> > Federico >> >> >> >> -- >> >> Urs Schönenberger - urs.schoenenber...@tngtech.com >> >> <mailto:urs.schoenenber...@tngtech.com> >> >> >> >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> >> Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller >> >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> >> >> >> -- >> Urs Schönenberger - urs.schoenenber...@tngtech.com >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> > >