https://docs.databricks.com/spark/latest/data-sources/read-lzo.html On Wed, Sep 27, 2017 at 6:36 AM 孫澤恩 <gn00710...@gmail.com> wrote:
> Hi All, > > Currently, I follow this blog > http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ > that > I could use hdfs dfs -text to read the LZO file. > But I want to know how to use Spark to read lzo file? > I put the hadoop-lzo.jar to spark/jars and follow the blog > https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md > . > > Here are my script > sc.newAPIHadoopFile(“hfs://<my_path_to_file>", > classOf[com.hadoop.mapreduce.LzoTextInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text]) > val lzoRDD = files.map(_._2.toString) > > The result of it is null. > > Does anyone has some experience of this? > > Sean Sun > > >