Problem reading in LZO compressed files

Ognen Duzlevski Sun, 13 Jul 2014 07:50:22 -0700

Hello,

I have been trying to play with the Google ngram dataset provided byAmazon in form of LZO compressed files.

I am having trouble understanding what is going on ;). I have added thecompression jar and native library to the underlying Hadoop/HDFSinstallation, restarted the name node and the datanodes, Spark canobviously see the file but I get gibberish on a read. Any ideas?


See output below:

14/07/13 14:39:19 INFO SparkContext: Added JARfile:/home/ec2-user/hadoop/lib/hadoop-gpl-compression-0.1.0.jar athttp://10.10.0.100:40100/jars/hadoop-gpl-compression-0.1.0.jar withtimestamp 1405262359777

14/07/13 14:39:20 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val f = sc.textFile("hdfs://10.10.0.98:54310/data/1gram.lzo")

14/07/13 14:39:34 INFO MemoryStore: ensureFreeSpace(163793) called withcurMem=0, maxMem=31138775014/07/13 14:39:34 INFO MemoryStore: Block broadcast_0 stored as valuesto memory (estimated size 160.0 KB, free 296.8 MB)f: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at<console>:12


scala> f.take(10)

14/07/13 14:39:43 INFO SparkContext: Job finished: take at <console>:15,took 0.419708348 sres0: Array[String] =Array(SEQ?!org.apache.hadoop.io.LongWritable?org.apache.hadoop.io.Text??#com.hadoop.compression.lzo.LzoCodec????��\<N�#^�??d^�k��\<N�#^�??d^�k��3��??�3???�???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????�?????�?�m??��??hx??????????�??�???�??�??�??�??�??�?�?, �? �? �?, �??�??�??�??�??�??�??�??�??�??�??�??�??�??�? �? �? �? �?�?!�?"�?#�?$�?%�?&�?'�?(�?)�?*�?+�?,�?-�?.�?/�?0�?1�?2�?3�?4�?5�?6�?7�?8�?9�?:�?;�?<�?=�?>�??�?@�?A�?B�?C�?D�?E�?F�?G�?H�?I�?J�?K�?L�?M�?N�?O�?P�?Q�?R�?S�?T�?U�?V�?W�?X�?Y�?Z�?[�?\�?]�?^�?_�?`�?a�?b�?c�?d�?e�?f�?g�?h�?i�?j�?k�?l�?m�?n�?o�?p�?q�?r�?s�?t�?u�?v�?w�?x�?y�?z�?{�?|�?}�?~�?�?��?��?��?��?��?��?��?��?��?��?��?��?��?��?��?��?��?...


Thanks!
Ognen

Problem reading in LZO compressed files

Reply via email to