Re: sequenceFile and groupByKey

Shixiong Zhu Sun, 09 Mar 2014 00:22:23 -0800

Hi Kane,

In the sequence file, the class is org.apache.hadoop.io.Text. You need to
convert Text to String. There are two approaches:


1. Use implicit conversions to convert Text to String automatically. I
recommend this one. E.g.,

val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq")
t2.groupByKey().take(5)

2. Use "classOf[Text]" to specify the correct class in the sequence file
and convert Text to String.  E.g.,

import org.apache.hadoop.io.Text
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text])
t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5)


Best Regards,
Shixiong Zhu


2014-03-09 13:30 GMT+08:00 Kane <kane.ist...@gmail.com>:

> when i try to open sequence file:
> val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String],
> classOf[String])
> t2.groupByKey().take(5)
>
> I get:
> org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not
> serializable result: java.io.NotSerializableException:
> org.apache.hadoop.io.Text
>
> another thing is:
> t2.take(5) - returns 5 identical items, i guess I have to map/clone items,
> but i get something like org.apache.hadoop.io.Text cannot be cast to
> java.lang.String, how do i clone it?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: sequenceFile and groupByKey

Reply via email to