Re: Storing and reporting off Pig data

2011-03-23 Thread Dmitriy Ryaboy
What we do in production is a combination of two approaches: 1) TSV delimited files (well, \u001 actually, to avoid comma and tab escaping complexities). Some but not all of these get bulk-loaded into our reporting database in a post-processing step. We don't insert directly into the db from Pig

Re: Storing and reporting off Pig data

2011-03-23 Thread Josh Devins
Hey Jon, I think a common approach is to use Pig (and MR/Hadoop in general) as purely the heavy lifter, doing all the merge-downs, aggregations and such of the data. At Nokia we tend to output a lot of data from Pig/MR as TSV or CSV (using PigStorage) and then use Sqoop to push that into a MySQL D

Re: Storing and reporting off Pig data

2011-03-23 Thread Alex McLintock
On 23 March 2011 18:12, Jonathan Holloway wrote: > I've got a general question surrounding the output of various Pig scripts > and generally where people are > storing that data and in what kind of format? > ... > At present the results from my Pig scripts end up in HDFS in Pig bag/tuple > format

Storing and reporting off Pig data

2011-03-23 Thread Jonathan Holloway
I've got a general question surrounding the output of various Pig scripts and generally where people are storing that data and in what kind of format? I read Dmitriy's article on Apache log processing and noticed that the output of the scripts was a format more suitable for reporting and graphing