Re: multiple file storage with pig

2013-07-30 Thread Jacob Perkins
Pablo, For your first question what you want to do is called a projection of your "grouped" relation. Something like this should work: grouped = foreach (group cleaned by (timestamp, sensor_name, sig_generator, sig_id)) generate flatten(group) as (timestamp, sensor_name, sig_generator, sig_id)

Re: multiple file storage with pig

2013-07-30 Thread Miguel Angel Martin junquera
hi: If you do not find an udf in piggybank or in anotherresources that works fine with your requeriments you can create your own udf to filter, evaluate, storage, etc or extend someone. For example to storage in multiple files you can use http://pig.apache.org/docs/r0.11.1/api/org/apache/p

multiple file storage with pig

2013-07-30 Thread Pablo Nebrera
Hello I have this pig script: register '/path_to_jars/elephant-bird-pig-3.0.7.jar'; register '/path_to_jars/json-simple-1.1.1.jar'; register '/path_to_jars/redBorder-pig.jar'; data = load '/data/events/2013/07/29/16h03/part-1.gz' using com.twitter.elephantbird.pig.load.JsonLoader() as (json: