Re: Issue with Parquet on Spark 1.2 and Amazon EMR

Aniket Bhatnagar Mon, 12 Jan 2015 03:05:46 -0800

Meanwhile, I have submitted a pull request (
https://github.com/awslabs/emr-bootstrap-actions/pull/37) that allows users
to place their jars ahead of all other jars in spark classpath. This should
serve as a temporary workaround for all class conflicts.


Thanks,
Aniket

On Mon Jan 05 2015 at 22:13:47 Kelly, Jonathan <jonat...@amazon.com> wrote:

>   I've noticed the same thing recently and will contact the appropriate
> owner soon.  (I work for Amazon, so I'll go through internal channels and
> report back to this list.)
>
>  In the meantime, I've found that editing spark-env.sh and putting the
> Spark assembly first in the classpath fixes the issue.  I expect that the
> version of Parquet that's being included in the EMR libs just needs to be
> upgraded.
>
>
>  ~ Jonathan Kelly
>
>   From: Aniket Bhatnagar <aniket.bhatna...@gmail.com>
> Date: Sunday, January 4, 2015 at 10:51 PM
> To: Adam Gilmore <dragoncu...@gmail.com>, "user@spark.apache.org" <
> user@spark.apache.org>
> Subject: Re: Issue with Parquet on Spark 1.2 and Amazon EMR
>
>   Can you confirm your emr version? Could it be because of the classpath
> entries for emrfs? You might face issues with using S3 without them.
>
> Thanks,
> Aniket
>
> On Mon, Jan 5, 2015, 11:16 AM Adam Gilmore <dragoncu...@gmail.com> wrote:
>
>>  Just an update on this - I found that the script by Amazon was the
>> culprit - not exactly sure why.  When I installed Spark manually onto the
>> EMR (and did the manual configuration of all the EMR stuff), it worked fine.
>>
>> On Mon, Dec 22, 2014 at 11:37 AM, Adam Gilmore <dragoncu...@gmail.com>
>> wrote:
>>
>>>  Hi all,
>>>
>>>  I've just launched a new Amazon EMR cluster and used the script at:
>>>
>>>  s3://support.elasticmapreduce/spark/install-spark
>>>
>>>  to install Spark (this script was upgraded to support 1.2).
>>>
>>>  I know there are tools to launch a Spark cluster in EC2, but I want to
>>> use EMR.
>>>
>>>  Everything installs fine; however, when I go to read from a Parquet
>>> file, I end up with (the main exception):
>>>
>>>  Caused by: java.lang.NoSuchMethodError:
>>> parquet.hadoop.ParquetInputSplit.<init>(Lorg/apache/hadoop/fs/Path;JJJ[Ljava/lang/String;[JLjava/lang/String;Ljava/util/Map;)V
>>>         at
>>> parquet.hadoop.TaskSideMetadataSplitStrategy.generateTaskSideMDSplits(ParquetInputFormat.java:578)
>>>         ... 55 more
>>>
>>>  It seems to me like a version mismatch somewhere.  Where is the
>>> parquet-hadoop jar coming from?  Is it built into a fat jar for Spark?
>>>
>>>  Any help would be appreciated.  Note that 1.1.1 worked fine with
>>> Parquet files.
>>>
>>
>>

Re: Issue with Parquet on Spark 1.2 and Amazon EMR

Reply via email to