Meanwhile, I have submitted a pull request ( https://github.com/awslabs/emr-bootstrap-actions/pull/37) that allows users to place their jars ahead of all other jars in spark classpath. This should serve as a temporary workaround for all class conflicts.
Thanks, Aniket On Mon Jan 05 2015 at 22:13:47 Kelly, Jonathan <jonat...@amazon.com> wrote: > I've noticed the same thing recently and will contact the appropriate > owner soon. (I work for Amazon, so I'll go through internal channels and > report back to this list.) > > In the meantime, I've found that editing spark-env.sh and putting the > Spark assembly first in the classpath fixes the issue. I expect that the > version of Parquet that's being included in the EMR libs just needs to be > upgraded. > > > ~ Jonathan Kelly > > From: Aniket Bhatnagar <aniket.bhatna...@gmail.com> > Date: Sunday, January 4, 2015 at 10:51 PM > To: Adam Gilmore <dragoncu...@gmail.com>, "user@spark.apache.org" < > user@spark.apache.org> > Subject: Re: Issue with Parquet on Spark 1.2 and Amazon EMR > > Can you confirm your emr version? Could it be because of the classpath > entries for emrfs? You might face issues with using S3 without them. > > Thanks, > Aniket > > On Mon, Jan 5, 2015, 11:16 AM Adam Gilmore <dragoncu...@gmail.com> wrote: > >> Just an update on this - I found that the script by Amazon was the >> culprit - not exactly sure why. When I installed Spark manually onto the >> EMR (and did the manual configuration of all the EMR stuff), it worked fine. >> >> On Mon, Dec 22, 2014 at 11:37 AM, Adam Gilmore <dragoncu...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I've just launched a new Amazon EMR cluster and used the script at: >>> >>> s3://support.elasticmapreduce/spark/install-spark >>> >>> to install Spark (this script was upgraded to support 1.2). >>> >>> I know there are tools to launch a Spark cluster in EC2, but I want to >>> use EMR. >>> >>> Everything installs fine; however, when I go to read from a Parquet >>> file, I end up with (the main exception): >>> >>> Caused by: java.lang.NoSuchMethodError: >>> parquet.hadoop.ParquetInputSplit.<init>(Lorg/apache/hadoop/fs/Path;JJJ[Ljava/lang/String;[JLjava/lang/String;Ljava/util/Map;)V >>> at >>> parquet.hadoop.TaskSideMetadataSplitStrategy.generateTaskSideMDSplits(ParquetInputFormat.java:578) >>> ... 55 more >>> >>> It seems to me like a version mismatch somewhere. Where is the >>> parquet-hadoop jar coming from? Is it built into a fat jar for Spark? >>> >>> Any help would be appreciated. Note that 1.1.1 worked fine with >>> Parquet files. >>> >> >>