I currently have implemented in Hadoop, Google's framework for building decision trees (also known as PLANET). It is supposed to scale well in very large datasets. But it has many problems. It scales only well if the dataset has a few attributes. If a dataset has a lot of attributes, that means it will have a lot of map/reduce jobs which means a big start-up cost for all of these jobs. Google however uses it with a lot of modifications on its Hadoop like platform and not on the algorithm itself. PLANET starts with a single vertex and with map reduce jobs you add more and more until the tree is fully build.
I have seen many times that Apache Hama is suitable for iterative algorithms like graphs. Can someone build a new graph with Hama or you just have as input a graph and make some computations on it? Will it be easy to transfer my project to Hama?? Thanks
