I am using the instruction:
store A into 'result-australia-0' using PigStorage('\t');
to store the data in HDFS. But the problem is that, this creates 1 file
with 500MB of size. Instead, want to save several 64MB files. How I do this?
--
Best regards,
Pedro, you can try Piggybank MultiStorage, which split results into
different dir/files by specific index attribute. But not sure how it can
make sure the file size is 64MB. Why 64MB specifically? what's the
connection between your data and 64MB?
Johnny
On Fri, Jun 7, 2013 at 12:56 AM, Pedro Sá
Hi Johnny,
Thanks for the pointer. I will try it.
Regards,
Jerry
On Thu, Jun 6, 2013 at 5:58 PM, Johnny Zhang xiao...@cloudera.com wrote:
Hi, Jerry:
This seems what you need.
I'm using pig 0.11.2.
I had been processing ASCII files of json with schema: (key:chararray,
columns:bag {column:tuple (timeUUID:chararray, value:chararray,
timestamp:long)})
For what it's worth, this is cassandra data, at a fairly low level.
But, this was getting big, so I compressed it all
What are the exact filenames you used?
The decompression of input files is based on the filename extention.
Niels
On Jun 7, 2013 11:11 PM, William Oberman ober...@civicscience.com wrote:
I'm using pig 0.11.2.
I had been processing ASCII files of json with schema: (key:chararray,
columns:bag