Maybe drop the exclusion for parquet-provided profile ? Cheers
On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner <j...@cloudphysics.com> wrote: > Inline > > On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Interesting. Looking at SparkConf.scala : >> >> val configs = Seq( >> DeprecatedConfig("spark.files.userClassPathFirst", >> "spark.executor.userClassPathFirst", >> "1.3"), >> DeprecatedConfig("spark.yarn.user.classpath.first", null, "1.3", >> "Use spark.{driver,executor}.userClassPathFirst instead.")) >> >> It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first >> are deprecated. >> > > Note that I did use the non-deprecated version, spark.executor. > userClassPathFirst=true. > > >> >> On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> No, we should not add fastutil back. It's up to the app to bring >>> dependencies it needs, and that's how I understand this issue. The >>> question is really, how to get the classloader visibility right. It >>> depends on where you need these classes. Have you looked into >>> spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? >>> >> > > I noted that I tried this in my original email. > > The issue appears related to the fact that parquet is also creating a > shaded > jar and that one leaves out the Long2LongOpenHashMap class. > > FYI, I have subsequently tried removing the exclusion from the spark build > and > that does cause the fastutil classes to be included and the example > works... > > So, should the userClassPathFirst flag work and there is a bug? > > Or is it reasonable to put in a pull request for the elimination of the > exclusion? > > > >> >>> On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>> > bq. depend on missing fastutil classes like Long2LongOpenHashMap >>> > >>> > Looks like Long2LongOpenHashMap should be added to the shaded jar. >>> > >>> > Cheers >>> >>> > >>> > On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner <j...@cloudphysics.com> >>> wrote: >>> >> >>> >> Spark includes the clearspring analytics package but intentionally >>> >> excludes >>> >> the dependencies of the fastutil package (see below). >>> >> >>> >> Spark includes parquet-column which includes fastutil and relocates it >>> >> under >>> >> parquet/ >>> >> but creates a shaded jar file which is incomplete because it shades >>> out >>> >> some >>> >> of >>> >> the fastutil classes, notably Long2LongOpenHashMap, which is present >>> in >>> >> the >>> >> fastutil jar file that parquet-column is referencing. >>> >> >>> >> We are using more of the clearspring classes (e.g. QDigest) and those >>> do >>> >> depend on >>> >> missing fastutil classes like Long2LongOpenHashMap. >>> >> >>> >> Even though I add them to our assembly jar file, the class loader >>> finds >>> >> the >>> >> spark assembly >>> >> and we get runtime class loader errors when we try to use it. >>> >> >>> >> It is possible to put our jar file first, as described here: >>> >> https://issues.apache.org/jira/browse/SPARK-939 >>> >> >>> >> >>> http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment >>> >> >>> >> which I tried with args to spark-submit: >>> >> --conf spark.driver.userClassPathFirst=true --conf >>> >> spark.executor.userClassPathFirst=true >>> >> but we still get the class not found error. >>> >> >>> >> We have tried copying the source code for clearspring into our >>> package and >>> >> renaming the >>> >> package and that makes it appear to work... Is this risky? It >>> certainly >>> >> is >>> >> ugly. >>> >> >>> >> Can anyone recommend a way to deal with this "dependency **ll" ? >>> >> >>> >> >>> >> === The spark/pom.xml file contains the following lines: >>> >> >>> >> <dependency> >>> >> <groupId>com.clearspring.analytics</groupId> >>> >> <artifactId>stream</artifactId> >>> >> <version>2.7.0</version> >>> >> <exclusions> >>> >> >>> >> <exclusion> >>> >> <groupId>it.unimi.dsi</groupId> >>> >> <artifactId>fastutil</artifactId> >>> >> </exclusion> >>> >> </exclusions> >>> >> </dependency> >>> >> >>> >> === The parquet-column/pom.xml file contains: >>> >> <artifactId>maven-shade-plugin</artifactId> >>> >> <executions> >>> >> <execution> >>> >> <phase>package</phase> >>> >> <goals> >>> >> <goal>shade</goal> >>> >> </goals> >>> >> <configuration> >>> >> <minimizeJar>true</minimizeJar> >>> >> <artifactSet> >>> >> <includes> >>> >> <include>it.unimi.dsi:fastutil</include> >>> >> </includes> >>> >> </artifactSet> >>> >> <relocations> >>> >> <relocation> >>> >> <pattern>it.unimi.dsi</pattern> >>> >> <shadedPattern>parquet.it.unimi.dsi</shadedPattern> >>> >> </relocation> >>> >> </relocations> >>> >> </configuration> >>> >> </execution> >>> >> </executions> >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> View this message in context: >>> >> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html >>> >> Sent from the Apache Spark User List mailing list archive at >>> Nabble.com. >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> >> For additional commands, e-mail: user-h...@spark.apache.org >>> >> >>> > >>> >> >> >