Re: gz containing null chars?

2013-06-10 Thread Niels Basjes
My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the null byte and the basic record reader that follows simply continues. In this situation your input file contai

gz containing null chars?

2013-06-10 Thread William Oberman
I posted this to the pig mailing list, but it might be more related to hadoop itself, I'm not sure. Quick recap: I had a file of "\n" separated lines of JSON. I decided to compress it to save on storage costs. After compression I got a different answer for a pig query that basically == "count li