I've got a general question surrounding the output of various Pig scripts and generally where people are storing that data and in what kind of format?
I read Dmitriy's article on Apache log processing and noticed that the output of the scripts was a format more suitable for reporting and graphing upon - that of TSV files. At present the results from my Pig scripts end up in HDFS in Pig bag/tuple format and I just wondered whether that was the best practice for large amounts of data in terms of organisation. Is anybody using Hive to store the intermediate Pig data and reporting off that instead? Or, are people generating graphs and analyses based off the raw Pig data in HDFS? Many thanks, Jon.