Hello everybody! I'm getting started with pig and I'm trying to understand how to configure io.compression.codecs for local mode. I've got something working but I'm not entirely clear on why it works or if there is a better way to do this.
For some background, I'm trying to read data into pig from an lzo compressed file that contains protobuf entries (using elephant-bird[1] and hadoop-lzo[2]). The instructions for setting up hadoop-lzo say that you need to add an entry to core-site.xml to specify the lzo compression codec. Since I'm running in local mode I don't have a core-site.xml and couldn't figure out if there was another place to configure io.compression.codecs. Unsurprisingly, when I run a script that tries to load an lzo file I get a 'No codec for file' error. I discovered that if I create a core-site.xml and put it in my PIG_CLASSPATH, I can start pig without the '-x local' flag. When I run pig like this it is able to load the lzo file and everything works correctly. Since I don't have hadoop installed, I assume that pig is still running in some sort of local mode. Can someone explain what the difference is between my two test cases above? Is there a different (better?) way to configure io.compression.codecs when running in local mode? Thanks for the help, -Peter [1]: https://github.com/kevinweil/elephant-bird [2]: https://github.com/twitter/hadoop-lzo