I found it quite painful to figure out all the steps required and have
filed SPARK-2394 https://issues.apache.org/jira/browse/SPARK-2394 to
track improving this. Perhaps I have been going about it the wrong way, but
it seems way more painful than it should be to set up a Spark cluster built
using
On 07/06/2014 05:19 AM, Nicholas Chammas wrote:
On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote:
csv =
Ah, indeed it looks like I need to install this separately
https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1
as it is not part of the core.
Nick
On Sun, Jul 6, 2014 at 2:22 AM, Gurvinder Singh gurvinder.si...@uninett.no
wrote:
On 07/06/2014 05:19 AM,
Pardon, I was wrong about this. There is actually code distributed
under com.hadoop, and that's where this class is. Oops.
https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/source/browse/trunk/src/java/com/hadoop/mapreduce/LzoTextInputFormat.java
On Sun, Jul 6, 2014 at 6:37
Ni Nick,
The cluster I was working on in those linked messages was a private data
center cluster, not on EC2. I'd imagine that the setup would be pretty
similar, but I'm not familiar with the EC2 init scripts that Spark uses.
Also I upgraded that cluster to 1.0 recently and am continuing to use
On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh gurvinder.si...@uninett.no
wrote:
csv =
sc.newAPIHadoopFile(opts.input,com.hadoop
.mapreduce.LzoTextInputFormat,org.apache.hadoop
.io.LongWritable,org.apache.hadoop.io.Text).count()
Does anyone know what the rough equivalent of this would be in
The package com.hadoop.mapreduce certainly looks wrong. If it is a Hadoop
class it starts with org.apache.hadoop
On Jul 6, 2014 4:20 AM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
gurvinder.si...@uninett.no wrote:
csv =
an update on this issue, now spark is able to read the lzo file and
recognize that it has index and starts multiple map tasks. you need to
use following function instead of textFile
csv =