Hi all, I am currently trying to find out what frameworks/software/product will support data warehousing/data mining the best.
We get around 1.5+ TB of log data every month and we want to do some reporting on top of that and later on move on to data mining. I am a total newbie in this world, coming from a RDBMS background and wanted to get your opinion on what is the best approach to take in this regard. I looked around the hadoop movement and the corresponding sub projects. I found Hive as a framework can support and scale for this large data. So first of phase of reporting can be done using hive. But can I reuse the same data for data mining through the Mahout project? Can somebody please guide me regarding this? Thanks for your help. HDev.
