Hi all,

On dependencies, we've bumped library versions when we think it's safe and
the APIs in the new version are compatible. Or, it's not leaked to the app
classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned
fall into one of those categories. Steve can do a better job explaining
this to me, but we haven't bumped things like Jetty or Guava because they
are on the classpath and are not compatible. There is this line in the
compat guidelines:

   - Existing MapReduce, YARN & HDFS applications and frameworks should
   work unmodified within a major release i.e. Apache Hadoop ABI is supported.

Since Hadoop apps can and do depend on the Hadoop classpath, the classpath
is effectively part of our API. I'm sure there are user apps out there that
will break if we make incompatible changes to the classpath. I haven't read
up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out
there.

Sticking to the theme of "work unmodified", let's think about the user
effort required to upgrade their JDK. This can be a very expensive task. It
might need approval up and down the org, meaning lots of certification,
testing, and signoff. Considering the amount of user effort involved here,
it really seems like dropping a JDK is something that should only happen in
a major release. Else, there's the potential for nasty surprises in a
supposedly "minor" release.

That said, we are in an unhappy place right now regarding JDK6, and it's
true that almost everyone's moved off of JDK6 at this point. So, I'd be
okay with an intermediate 2.x release that drops JDK6 support (but no
incompatible changes to the classpath like Guava). This is basically free,
and we could start using JDK7 idioms like multi-catch and new NIO stuff in
Hadoop code (a minor draw I guess).

My higher-level goal though is to avoid going through this same pain again
when JDK7 goes EOL. I'd like to do a JDK8-based release before then for
this reason. This is why I suggested skipping an intermediate 2.x+JDK7
release and leapfrogging to 3.0+JDK8. 10 months is really not that far in
the future, and it seems like a better place to focus our efforts. I was
also hoping it'd be realistic to fix our classpath leakage by then, since
then we'd have a nice, tight, future-proofed new major release.

Thanks,
Andrew




On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy <a...@hortonworks.com> wrote:

> Andrew,
>
>  Thanks for starting this thread. I'll edit the wiki to provide more
> context around rolling-upgrades etc. which, as I pointed out in the
> original thread, are key IMHO.
>
> On Jun 24, 2014, at 11:17 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> > https://wiki.apache.org/hadoop/MovingToJdk7and8
> >
> > I think based on our current compatibility guidelines, Proposal A is the
> > most attractive. We're pretty hamstrung by the requirement to keep the
> > classpath the same, which would be solved by either OSGI or shading our
> > deps (but that's a different discussion).
>
> I don't see that anywhere in our current compatibility guidelines.
>
> As you can see from
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
> we do not have such a policy (pasted here for convenience):
>
> Java Classpath
>
> User applications built against Hadoop might add all Hadoop jars
> (including Hadoop's library dependencies) to the application's classpath.
> Adding new dependencies or updating the version of existing dependencies
> may interfere with those in applications' classpaths.
>
> Policy
>
> Currently, there is NO policy on when Hadoop's dependencies can change.
>
> Furthermore, we have *already* changed our classpath in hadoop-2.x. Again,
> as I pointed out in the previous thread, here is the precedent:
>
> On Jun 21, 2014, at 5:59 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>
> > Also, this is something we already have done i.e. we updated some of our
> software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as
> dramatic as JDK. Here are some examples:
> > https://issues.apache.org/jira/browse/HADOOP-9991
> > https://issues.apache.org/jira/browse/HADOOP-10102
> > https://issues.apache.org/jira/browse/HADOOP-10103
> > https://issues.apache.org/jira/browse/HADOOP-10104
> > https://issues.apache.org/jira/browse/HADOOP-10503
>
> thanks,
> Arun
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to