Spark 1.0: Reading JSON LZH Compressed File

Uddin, Nasir M. Mon, 30 Jun 2014 13:10:37 -0700

Hi,

Spark 1.0 has been installed as Standalone - But it can't read any compressed 
(CMX/Snappy) and Sequence file residing on HDFS. The key notable message is: 
"Unable to load native-hadoop library.....". Other related messages are -


Caused by: java.lang.IllegalStateException: Cannot load 
com.ibm.biginsights.compress.CmxDecompressor without native library! at 
com.ibm.biginsights.compress.CmxDecompressor.<clinit>(CmxDecompressor.java:65)

Here is the core-site.xml's key part:
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.ibm.biginsights.compress.CmxCodec</value>
  </property>

Here is the spark.env.sh:
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMORY=10g
export SCALA_HOME=/opt/spark/scala-2.11.1
export JAVA_HOME=/opt/spark/jdk1.7.0_55
export SPARK_HOME=/opt/spark/spark-0.9.1-bin-hadoop2
export ADD_JARS=/opt/IHC/lib/compression.jar
export SPARK_CLASSPATH=/opt/IHC/lib/compression.jar
export SPARK_LIBRARY_PATH=/opt/IHC/lib/native/Linux-amd64-64/
export SPARK_MASTER_WEBUI_PORT=1080
export HADOOP_CONF_DIR=/opt/IHC/hadoop-conf

Note: CMX is an IBM branded splittable LZO based compression codec.

Any help is appreciated.

Thanks,
Nasir
DTCC DISCLAIMER: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify us 
immediately and delete the email and any attachments from your system. The 
recipient should check this email and any attachments for the presence of 
viruses.  The company accepts no liability for any damage caused by any virus 
transmitted by this email.

Spark 1.0: Reading JSON LZH Compressed File

Reply via email to