[Blueprint servercloud-p-hadoop] Ubuntu Server + Hadoop and Bigdata

James Page Thu, 03 Nov 2011 08:23:18 -0700

Blueprint changed by James Page:

Whiteboard changed:
- I have to wonder if the demand for Hadoop really is large enough to
- justify the effort we'd be putting into providing it? Are we really at a
- point already where having terabytes of data you need to analyse is a
- common use case? - Soren
+ Validation of Assumptions from spec:
+ 
+ Ubuntu will package Apache Hadoop (rather than one of the various variants).
+     Cloudera - CDH
+     Hortonworks - Apache Hadoop - Employ 80% of upstream committers
+ OpenJDK Support
+     Ubuntu should be help drivng support for OpenJDK from upstreams
+ Packaging will align to Apache Bigtop (based on the most Ipopular upstream 
packaging) - YES
+ Packaging will focus on the most recent stable release of Hadoop - 0.203.0 - 
YES
+ Configuration methods should take into account integration with configuration 
management tools such as Puppet and Chef - YES
+ The majority of Java dependencies can be fulfilled through what is already in 
the archive (see hadoop-dependency-report.tar.gz)
+     kfs - this can be excluded to disable this feature but does not look like 
that much work to package.
+     Apache ftpserver would be required to enable smoke testing - again looks 
OK to package.
+ Focus will be on a solid Hadoop core with contrib packages if time permits.
+ Most dependencies are already in the archive apart from thrift (probably not 
an issue).
+ Native integrations must be part of the packaging.
+ Packages will target universe for this release.
+  
+ We need to ensure upstream co-operation from Hortonworks/Cloudera to ensure 
ongoing collaboration going forward.
+  
+ Good support for Hadoop on ARM should be an objective of this work.
+  
+ Comments from blueprint:
+  
+ I have to wonder if the demand for Hadoop really is large enough to justify 
the effort we'd be putting into providing it? Are we really at a point already 
where having terabytes of data you need to analyse is a common use case? - Soren
+     Sounds like there is demand in the distribution.
+     - important for Ubuntu Server, to maintain its position as 'best OS for 
the Cloud'
+     - the number of users needing to process TBs of data is just increasing; 
over the life of 12.04 LTS, more and more users will have a need for a 
map-reduce cluster application; having in the distro will ensure they pick 
Ubuntu for that application
+  
+ Work Items:
+ [m_3] hadoop community input (what about no thrift?): TODO
+ [jamespage] Active backport packaging process post 12.04 release: TODO
+ [m_3] Attend HadoopWorld. :): TODO
+ [james-page] Check on release schedule for Apache Hadoop between now and 
Feature Freeze: TODO
+ [kirkland] Investigate upstream co-operation from Hortonworks/Cloudera to 
ensure ongoing collaboration going forward: TODO
+ [negronjl] adjust hadoop charms to have a configurable backend hadoop, get 
one into the charm repository: TODO
+ [james-page] Package KFS for Ubuntu: TODO
+ [james-page] Package Apache ftp-server for Ubuntu: TODO
+ [james-page] Package Hadoop for Ubuntu: TODO


-- 
Ubuntu Server + Hadoop and Bigdata
https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hadoop

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

[Blueprint servercloud-p-hadoop] Ubuntu Server + Hadoop and Bigdata

Reply via email to