Storing and reporting off Pig data

Jonathan Holloway Wed, 23 Mar 2011 11:13:01 -0700

I've got a general question surrounding the output of various Pig scripts
and generally where people are
storing that data and in what kind of format?


I read Dmitriy's article on Apache log processing and noticed that the
output of the scripts was a format more
suitable for reporting and graphing upon - that of TSV files.

At present the results from my Pig scripts end up in HDFS in Pig bag/tuple
format and I just wondered whether
that was the best practice for large amounts of data in terms of
organisation.  Is anybody using Hive to store the
intermediate Pig data and reporting off that instead?  Or, are people
generating graphs and analyses based off the
raw Pig data in HDFS?

Many thanks,
Jon.

Storing and reporting off Pig data

Reply via email to