save several 64MB files in Pig Latin

2013-06-07 Thread Pedro Sá da Costa
I am using the instruction: store A into 'result-australia-0' using PigStorage('\t'); to store the data in HDFS. But the problem is that, this creates 1 file with 500MB of size. Instead, want to save several 64MB files. How I do this? -- Best regards,

Re: save several 64MB files in Pig Latin

2013-06-07 Thread Johnny Zhang
Pedro, you can try Piggybank MultiStorage, which split results into different dir/files by specific index attribute. But not sure how it can make sure the file size is 64MB. Why 64MB specifically? what's the connection between your data and 64MB? Johnny On Fri, Jun 7, 2013 at 12:56 AM, Pedro Sá

Re: Processing TSV files using Pig

2013-06-07 Thread Jerry Lam
Hi Johnny, Thanks for the pointer. I will try it. Regards, Jerry On Thu, Jun 6, 2013 at 5:58 PM, Johnny Zhang xiao...@cloudera.com wrote: Hi, Jerry: This seems what you need.

problems with .gz

2013-06-07 Thread William Oberman
I'm using pig 0.11.2. I had been processing ASCII files of json with schema: (key:chararray, columns:bag {column:tuple (timeUUID:chararray, value:chararray, timestamp:long)}) For what it's worth, this is cassandra data, at a fairly low level. But, this was getting big, so I compressed it all

Re: problems with .gz

2013-06-07 Thread Niels Basjes
What are the exact filenames you used? The decompression of input files is based on the filename extention. Niels On Jun 7, 2013 11:11 PM, William Oberman ober...@civicscience.com wrote: I'm using pig 0.11.2. I had been processing ASCII files of json with schema: (key:chararray, columns:bag