Re: reading compress lzo files

Gurvinder Singh Sat, 05 Jul 2014 23:23:19 -0700

On 07/06/2014 05:19 AM, Nicholas Chammas wrote:
> On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
> <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> wrote:
> 
>     csv =
>     
> sc.newAPIHadoopFile(opts.input,"com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text").count()
> 
> Does anyone know what the rough equivalent of this would be in the Scala
> API?
> 
I am not sure, I haven't tested it using scala.
com.hadoop.mapreduce.LzoTextInputFormat class is from this package
https://github.com/twitter/hadoop-lzo


I have installed it from clourdera "hadoop-lzo" package with liblzo2-2
debian package on all of my workers. Make sure you have hadoop-lzo.jar
in your class path for spark.

- Gurvinder

> I am trying the following, but the first import yields an error on my
> |spark-ec2| cluster:
> 
> |import com.hadoop.mapreduce.LzoTextInputFormat
> import org.apache.hadoop.io.LongWritable
> import org.apache.hadoop.io.Text
> 
> sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data",
>  LzoTextInputFormat, LongWritable, Text)
> |
> 
> |scala> import com.hadoop.mapreduce.LzoTextInputFormat
> <console>:12: error: object hadoop is not a member of package com
>        import com.hadoop.mapreduce.LzoTextInputFormat
> |
> 
> Nick
> 
>

Re: reading compress lzo files

Reply via email to