I understand that coding MR jobs using a language is required but if we are just processing large amounts of data (Machine Learning for example) we could use Pig. I recently processed 0.25 TB on AWS clusters in a reasonably short time. In this case the development effort is very less.
Thanks, Mohan On Thu, Sep 4, 2014 at 6:41 PM, Alec Ten Harmsel <a...@alectenharmsel.com> wrote: > I would recommend using Hadoop only if you are ingesting a lot of data > and you need reasonable performance at scale. I would recommend starting > with using <insert language/tool of choice> to ingest and transform data > until that process starts taking too long. > > For example, one of our researchers at the University of Michigan had to > process ~150GB of data. Using python, processing that data took about 45 > minutes - it was not worth it to spend extra development time to run it on > Hadoop. This time will change depending on what you need to do and the > hardware available, naturally. > > So until you need to frequently process large amounts of data, I'd stick > with something you're already familiar with. > > Alec Ten Harmsel > > On 09/04/2014 03:30 AM, Henrik Aagaard Jørgensen wrote: > > Dear all, > > > > I’m very new to Hadoop as I’m still trying to grasp its value and > purpose. I do hope my question on this mailing list is OK. > > > > I manage our open data platform at our municipality, using CKAN.org. It > works very well for its purpose of showing data and adding API’s to data. > > > > However, I’m very interested in knowing more about Hadoop and if it would > fit into a (open) data platform, as we are getting more and more data to > show and to work with internally at our municipality. > > > > However, I cannot figure out if it’s the right purpose to use Hadoop for, > if it is “overkill” or… > > > > Could someone elaborate on such topic? > > > > I’ve Googled around a lot and looked at various videos online and Hadoop > seems to have it place, also in an open data platform environment. > > > > Best regards, > > Henrik > > >