What we do in production is a combination of two approaches:
1) TSV delimited files (well, \u001 actually, to avoid comma and tab
escaping complexities). Some but not all of these get bulk-loaded into our
reporting database in a post-processing step. We don't insert directly into
the db from Pig
Hey Jon,
I think a common approach is to use Pig (and MR/Hadoop in general) as purely
the heavy lifter, doing all the merge-downs, aggregations and such of the
data. At Nokia we tend to output a lot of data from Pig/MR as TSV or CSV
(using PigStorage) and then use Sqoop to push that into a MySQL D
On 23 March 2011 18:12, Jonathan Holloway wrote:
> I've got a general question surrounding the output of various Pig scripts
> and generally where people are
> storing that data and in what kind of format?
> ...
> At present the results from my Pig scripts end up in HDFS in Pig bag/tuple
> format
I've got a general question surrounding the output of various Pig scripts
and generally where people are
storing that data and in what kind of format?
I read Dmitriy's article on Apache log processing and noticed that the
output of the scripts was a format more
suitable for reporting and graphing