Dear all,
I'm very new to Hadoop as I'm still trying to grasp its value and purpose. I
do hope my question on this mailing list is OK.
I manage our open data platform at our municipality, using CKAN.org. It works
very well for its purpose of showing data and adding API's to data.
However,
I would recommend using Hadoop only if you are ingesting a lot of data
and you need reasonable performance at scale. I would recommend starting
with using insert language/tool of choice to ingest and transform data
until that process starts taking too long.
For example, one of our researchers at
I understand that coding MR jobs using a language is required but if we are
just processing large amounts of data (Machine Learning for example) we
could use Pig. I recently processed 0.25 TB on AWS clusters in a reasonably
short time. In this case the development effort is very less.
Thanks,