Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

Tim Robertson Mon, 13 Nov 2017 01:57:07 -0800

Thanks JB

On "thoughts":


- Cloudera 5.13 will still default to 1.6 even though a 2.2 parcel is
available (HWX provides both)
- Cloudera support for spark 2 has a list of exceptions (
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html
)
  - I am not sure if the HBaseIO would be affected
  - I am not sure if structured streaming would have implications
  - it might stop customers from being able to run spark 2 at all due to
support agreements
- Spark 2.3 (EOY) will drop Scala 2.10 support
- IBM's now defunct distro only has 1.6
- Oozie doesn't have a spark 2 action (need to use a shell action)
- There are a lot of folks with code running on 1.3,1.4 and 1.5 still
- Spark 2.2+ requires Java 8 too, while <2.2 was J7 like Beam (not sure if
this has other implications for the cross deployment nature of Beam)

My first impressions of Beam was really favourable as it all just worked
first time on a CDH Spark 1.6 cluster.  For us it is lacking resources to
refactor legacy code which delays the 2.2 push.

With that said I think is it very reasonable to have a clear cut off in
Beam, especially if it limits progress / causes headaches in packaging,
robustness etc.  I'd recommend putting it in a 6 month timeframe which
might align with 2.3?

Hope this helps,
Tim











On Mon, Nov 13, 2017 at 10:07 AM, Neville Dipale <nevilled...@gmail.com>
wrote:

> Hi JB,
>
>
>   [X ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>   [ ] 0 (I don't care ;))
>   [ ] -1, I would like to still support Spark 1.x, and so having support
> of both Spark 1.x and 2.x (please provide specific comment)
>
> On 13 November 2017 at 10:32, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Beamers,
>>
>> I'm forwarding this discussion & vote from the dev mailing list to the
>> user mailing list.
>> The goal is to have your feedback as user.
>>
>> Basically, we have two options:
>> 1. Right now, in the PR, we support both Spark 1.x and 2.x using three
>> artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2
>> in your dependencies set depending the Spark target version you want.
>> 2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0.
>> If you still want to use Spark 1.x, then, you will be stuck up to Beam
>> 2.2.0.
>>
>> Thoughts ?
>>
>> Thanks !
>> Regards
>> JB
>>
>>
>> -------- Forwarded Message --------
>> Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
>> Date: Wed, 8 Nov 2017 08:27:58 +0100
>> From: Jean-Baptiste Onofré <j...@nanthrax.net>
>> Reply-To: d...@beam.apache.org
>> To: d...@beam.apache.org
>>
>> Hi all,
>>
>> as you might know, we are working on Spark 2.x support in the Spark
>> runner.
>>
>> I'm working on a PR about that:
>>
>> https://github.com/apache/beam/pull/3808
>>
>> Today, we have something working with both Spark 1.x and 2.x from a code
>> standpoint, but I have to deal with dependencies. It's the first step of
>> the update as I'm still using RDD, the second step would be to support
>> dataframe (but for that, I would need PCollection elements with schemas,
>> that's another topic on which Eugene, Reuven and I are discussing).
>>
>> However, as all major distributions now ship Spark 2.x, I don't think
>> it's required anymore to support Spark 1.x.
>>
>> If we agree, I will update and cleanup the PR to only support and focus
>> on Spark 2.x.
>>
>> So, that's why I'm calling for a vote:
>>
>>   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>>   [ ] 0 (I don't care ;))
>>   [ ] -1, I would like to still support Spark 1.x, and so having support
>> of both Spark 1.x and 2.x (please provide specific comment)
>>
>> This vote is open for 48 hours (I have the commits ready, just waiting
>> the end of the vote to push on the PR).
>>
>> Thanks !
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

Reply via email to