I don't think that we're planning to drop Java 7 support for Spark 2.0.

Personally, I would recommend using Java 8 if you're running Spark 1.5.0+
and are using SQL/DataFrames so that you can benefit from improvements to
code cache flushing in the Java 8 JVMs. Spark SQL's generated classes can
fill up the JVM's code cache, which causes JIT to stop working for new
bytecode. Empirically, it looks like the Java 8 JVMs have an improved
ability to flush this code cache, thereby avoiding this problem.

TL;DR: I'd prefer to run Java 8 with Spark if given the choice.

On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey evil admin:)
> i think the bit about java was from me?
> if so, i meant to indicate that the reality for us is java is 1.7 on most
> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
> even although java 1.7 is getting old as well it would be a major issue for
> me if spark dropped java 1.7 support.
>
> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <carli...@janelia.hhmi.org>
> wrote:
>
>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>> provide quite a few different version of python on our cluster pretty darn
>> easily. All you need is a separate install directory and to set the
>> PYTHON_HOME environment variable to point to the correct python, then have
>> the users make sure the correct python is in their PATH. I understand that
>> other administrators may not be so compliant.
>>
>> Saw a small bit about the java version in there; does Spark currently
>> prefer Java 1.8.x?
>>
>> —Ken
>>
>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <joshro...@databricks.com> wrote:
>>
>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>> while continuing to use a vanilla `python` executable on the executors
>>
>>
>> Whoops, just to be clear, this should actually read "while continuing to
>> use a vanilla `python` 2.7 executable".
>>
>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <joshro...@databricks.com>
>> wrote:
>>
>>> Yep, the driver and executors need to have compatible Python versions. I
>>> think that there are some bytecode-level incompatibilities between 2.6 and
>>> 2.7 which would impact the deserialization of Python closures, so I think
>>> you need to be running the same 2.x version for all communicating Spark
>>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>>> driver while continuing to use a vanilla `python` executable on the
>>> executors (we have environment variables which allow you to control these
>>> separately).
>>>
>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>>> I think all the slaves need the same (or a compatible) version of
>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>>> interesting i didnt know that!
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>
>>>>>> Not to nitpick, but maybe this is important. The Python license is 
>>>>>> GPL-compatible
>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>
>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a 
>>>>>> modified
>>>>>> version without making your changes open source. The GPL-compatible
>>>>>> licenses make it possible to combine Python with other software that is
>>>>>> released under the GPL; the others don’t.
>>>>>>
>>>>>> Nick
>>>>>> ​
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i do not think so.
>>>>>>>
>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>>> not have direct access to those.
>>>>>>>
>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>> launches the app (thanks to yarn).
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed, 
>>>>>>> so
>>>>>>> the client would have to download it and install it themselves, and this
>>>>>>> would mean its an independent install which has to be audited and 
>>>>>>> approved
>>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshro...@databricks.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>> access). Does this address the Python versioning concerns for RHEL 
>>>>>>>> users?
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>> or python version on large company clusters. our current reality for 
>>>>>>>>> the
>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how 
>>>>>>>>> outdated that
>>>>>>>>> is.
>>>>>>>>>
>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>
>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 
>>>>>>>>> 2.6 was
>>>>>>>>> dropped. no point in developing something that doesnt run for 
>>>>>>>>> majority of
>>>>>>>>> customers.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the 
>>>>>>>>>> option
>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>
>>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>>> 2.6 for the next several years? Even though the core Python devs 
>>>>>>>>>> stopped
>>>>>>>>>> supporting it in 2013?
>>>>>>>>>>
>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>
>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But 
>>>>>>>>>> balancing
>>>>>>>>>> that concern against the maintenance burden on this project, I would 
>>>>>>>>>> say
>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>> position to take. There are many tiny annoyances one has to put up 
>>>>>>>>>> with to
>>>>>>>>>> support 2.6.
>>>>>>>>>>
>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just 
>>>>>>>>>> yet...
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>> ju...@esbet.es>님이 작성:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>
>>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>
>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>> escribió:
>>>>>>>>>>>
>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>
>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>
>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>> juliet.hougl...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is 
>>>>>>>>>>>> encouraged.
>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 
>>>>>>>>>>>> 2.6
>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core 
>>>>>>>>>>>>> Python
>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good 
>>>>>>>>>>>>> enough
>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>> allenzhang...@126.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>> meethu.mat...@flytxt.com> 写道:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>> r...@databricks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some 
>>>>>>>>>>>>>>> libraries that
>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince 
>>>>>>>>>>>>>>> the library
>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm 
>>>>>>>>>>>>>>> curious if
>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>>
>

Reply via email to