It's something due to the columnar compression. I've seen similar intermittent 
issues when caching DataFrames. "sportingpulse.com" is a value in one of the 
columns of the DF.
________________________________
From: Ted Yu<mailto:yuzhih...@gmail.com>
Sent: ‎10/‎30/‎2015 6:33 PM
To: Zhang, Jingyu<mailto:jingyu.zh...@news.com.au>
Cc: user<mailto:user@spark.apache.org>
Subject: Re: key not found: sportingpulse.com in Spark SQL 1.5.0

I searched for sportingpulse in *.scala and *.java files under 1.5 branch.
There was no hit.

mvn dependency doesn't show sportingpulse either.

Is it possible this is specific to EMR ?

Cheers

On Fri, Oct 30, 2015 at 2:57 PM, Zhang, Jingyu 
<jingyu.zh...@news.com.au<mailto:jingyu.zh...@news.com.au>> wrote:

There is not a problem in Spark SQL 1.5.1 but the error of "key not found: 
sportingpulse.com<http://sportingpulse.com/>" shown up when I use 1.5.0.

I have to use the version of 1.5.0 because that the one AWS EMR support.  Can 
anyone tell me why Spark uses "sportingpulse.com<http://sportingpulse.com/>" 
and how to fix it?

Thanks.

Caused by: java.util.NoSuchElementException: key not found: 
sportingpulse.com<http://sportingpulse.com>

at scala.collection.MapLike$class.default(MapLike.scala:228)

at scala.collection.AbstractMap.default(Map.scala:58)

at scala.collection.mutable.HashMap.apply(HashMap.scala:64)

at 
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)

at 
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)

at 
org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)

at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)

at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)

at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)

at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)

at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)

at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)

at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

at org.apache.spark.scheduler.Task.run(Task.scala:88)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

This message and its attachments may contain legally privileged or confidential 
information. It is intended solely for the named addressee. If you are not the 
addressee indicated in this message or responsible for delivery of the message 
to the addressee, you may not copy or deliver this message or its attachments 
to anyone. Rather, you should permanently delete this message and its 
attachments and kindly notify the sender by reply e-mail. Any content of this 
message and its attachments which does not relate to the official business of 
the sending company must be taken not to have been sent or endorsed by that 
company or any of its related entities. No warranty is made that the e-mail or 
attachments are free from computer virus or other defect.

Reply via email to