Blueprint changed by James Page: Whiteboard changed: - I have to wonder if the demand for Hadoop really is large enough to - justify the effort we'd be putting into providing it? Are we really at a - point already where having terabytes of data you need to analyse is a - common use case? - Soren + Validation of Assumptions from spec: + + Ubuntu will package Apache Hadoop (rather than one of the various variants). + Cloudera - CDH + Hortonworks - Apache Hadoop - Employ 80% of upstream committers + OpenJDK Support + Ubuntu should be help drivng support for OpenJDK from upstreams + Packaging will align to Apache Bigtop (based on the most Ipopular upstream packaging) - YES + Packaging will focus on the most recent stable release of Hadoop - 0.203.0 - YES + Configuration methods should take into account integration with configuration management tools such as Puppet and Chef - YES + The majority of Java dependencies can be fulfilled through what is already in the archive (see hadoop-dependency-report.tar.gz) + kfs - this can be excluded to disable this feature but does not look like that much work to package. + Apache ftpserver would be required to enable smoke testing - again looks OK to package. + Focus will be on a solid Hadoop core with contrib packages if time permits. + Most dependencies are already in the archive apart from thrift (probably not an issue). + Native integrations must be part of the packaging. + Packages will target universe for this release. + + We need to ensure upstream co-operation from Hortonworks/Cloudera to ensure ongoing collaboration going forward. + + Good support for Hadoop on ARM should be an objective of this work. + + Comments from blueprint: + + I have to wonder if the demand for Hadoop really is large enough to justify the effort we'd be putting into providing it? Are we really at a point already where having terabytes of data you need to analyse is a common use case? - Soren + Sounds like there is demand in the distribution. + - important for Ubuntu Server, to maintain its position as 'best OS for the Cloud' + - the number of users needing to process TBs of data is just increasing; over the life of 12.04 LTS, more and more users will have a need for a map-reduce cluster application; having in the distro will ensure they pick Ubuntu for that application + + Work Items: + [m_3] hadoop community input (what about no thrift?): TODO + [jamespage] Active backport packaging process post 12.04 release: TODO + [m_3] Attend HadoopWorld. :): TODO + [james-page] Check on release schedule for Apache Hadoop between now and Feature Freeze: TODO + [kirkland] Investigate upstream co-operation from Hortonworks/Cloudera to ensure ongoing collaboration going forward: TODO + [negronjl] adjust hadoop charms to have a configurable backend hadoop, get one into the charm repository: TODO + [james-page] Package KFS for Ubuntu: TODO + [james-page] Package Apache ftp-server for Ubuntu: TODO + [james-page] Package Hadoop for Ubuntu: TODO
-- Ubuntu Server + Hadoop and Bigdata https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hadoop -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs