Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

Michael Armbrust Thu, 02 Oct 2014 18:45:07 -0700

This is hard to do in general, but you can get what you are asking for by
putting the following class in scope.


implicit class BetterRDD[A: scala.reflect.ClassTag](rdd:
org.apache.spark.rdd.RDD[A]) {
  def dropOne = rdd.mapPartitionsWithIndex((i, iter) => if(i == 0 &&
iter.hasNext) { iter.next; iter } else iter)
}

On Thu, Oct 2, 2014 at 4:06 PM, Sunny Khatri <sunny.k...@gmail.com> wrote:

> You can do filter with startswith ?
>
> On Thu, Oct 2, 2014 at 4:04 PM, SK <skrishna...@gmail.com> wrote:
>
>> Thanks for the help. Yes, I did not realize that the first header line
>> has a
>> different separator.
>>
>> By the way, is there a way to drop the first line that contains the
>> header?
>> Something along the following lines:
>>
>>       sc.textFile(inp_file)
>>           .drop(1)  // or tail() to drop the header line
>>           .map....  // rest of the processing
>>
>> I could not find a drop() function or take the bottom (n) elements for
>> RDD.
>> Alternatively, a way to create the case class schema from the header line
>> of
>> the file  and use the rest for the data would be useful - just as a
>> suggestion.  Currently I am just deleting this header line manually before
>> processing it in Spark.
>>
>>
>> thanks
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p15642.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

Reply via email to