Hello everybody!

I'm getting started with pig and I'm trying to understand how to
configure io.compression.codecs for local mode. I've got something working
but I'm not entirely clear on why it works or if there is a better way to
do this.

For some background, I'm trying to read data into pig from an lzo
compressed file that contains protobuf entries (using elephant-bird[1] and
hadoop-lzo[2]). The instructions for setting up hadoop-lzo say that you
need to add an entry to core-site.xml to specify the lzo compression codec.
Since I'm running in local mode I don't have a core-site.xml and couldn't
figure out if there was another place to configure io.compression.codecs.
Unsurprisingly, when I run a script that tries to load an lzo file I get a
'No codec for file' error.

I discovered that if I create a core-site.xml and put it in my
PIG_CLASSPATH, I can start pig without the '-x local' flag. When I run pig
like this it is able to load the lzo file and everything works correctly.

Since I don't have hadoop installed, I assume that pig is still running in
some sort of local mode. Can someone explain what the difference is between
my two test cases above?

Is there a different (better?) way to configure io.compression.codecs when
running in local mode?

Thanks for the help,
-Peter

[1]: https://github.com/kevinweil/elephant-bird
[2]: https://github.com/twitter/hadoop-lzo

Reply via email to