Re: [propsoal] Changing classpath creation for hadoop to just use `hadoop classpath` 's output

Eric Newton Wed, 10 Jul 2013 05:45:32 -0700

"Why not just use the hadoop classspath generated by running `hadoop
classpath`"


I like it!

+1



On Tue, Jul 9, 2013 at 11:33 PM, Jonathan Hsieh <j...@cloudera.com> wrote:

> tl;dr
> Ideally the generation of hadoop+accumulo's classpath should only be done
> in one place.  At least for all versions of hadoop i've seen in the 5
> years, there is one place to get hadoop's classpath (the `hadoop classpath`
> command).  Why not use it?
>
> ----
>
> For the hadoop, I've found that hadoop2 package from bigtop deploy to
> different locations than the stock hadoop2 tarballs and require yet another
> set of hadoop path wrangling between hadoop1, hadoop2, and different
> deployment mechanisms.  It's kind of a mess.
>
> hadoop1:
> $HADOOP_HOME/*.jar
> $HADOOP_HOME/lib/*.jar
>
> hadoop2 tarball
> $HADOOP_HOME/lib/share/hadoop/common/*.jar
> $HADOOP_HOME/lib/share/hadoop/common/lib/*.jar
> $HADOOP_HOME/lib/share/hadoop/hdfs/*.jar
> $HADOOP_HOME/lib/share/hadoop/hdfs/lib/*.jar
> $HADOOP_HOME/lib/share/hadoop/mapreduce/*.jar
> $HADOOP_HOME/lib/share/hadoop/mapreduce/lib/*.jar
>
> hadoop2 bigtop rpm
> /usr/lib/hadoop-mapreduce ...
> /usr/lib/hadoop-hdfs ...
> /usr/lib/hadoop-yarn ...
>
> There is a script in place that already generates hadoop classpaths that is
> consistent across for multiple versions and deployments -- by using the
> `hadoop classpath` command. Why not just use the hadoop classspath
> generated by running `hadoop classpath` instead of trying to create
> classpath in java/python/xml code and having to modify for each kind of
> hadoop?
>
> Does this seem reasonable?  Where would there be trouble spots?
>
> Semi-related,  I've found that if HADOOP_HOME/lib doesn't contain certain
> jars (accumulo depends on them) the Platform or Main wrapper programs can
> fail.
>
> A follow on would be to consolidate the 3 accumulo+hadoop classpath
> generation locations into one, or at least to factor out the base
> hadoop+accumulo parts.
>
> The accumulo+hadoop classpath generation happens in  3 places .
> 1) hard coded with hadoop1 locations in java in the AccumuloClassLoader,
> 2) as comments in the accumulo-site.xml.example file,
> 3) in the soon to be deprecated TestUtils.py (in the system/auto test
> suite).
>
> I don't quite understand all of the classloader magic yet, but I'd wager
> that it is used for extensions like iterators.  Could we just have one
> initial point of accumulo+hadoop classpath generation and then use the xml
> config to add more jars that the nested classloaders to handle the magjc
> for dfs jarloading and for iterator extensions?
>
> Thanks,
> Jon.
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // j...@cloudera.com
>

Re: [propsoal] Changing classpath creation for hadoop to just use `hadoop classpath` 's output

Reply via email to