[Spark SQL]: DataFrame schema resulting in NullPointerException

2017-11-19 Thread Chitral Verma
Hey, I'm working on this use case that involves converting DStreams to Dataframes after some transformations. I've simplified my code into the following snippet so as to reproduce the error. Also, I've mentioned below my environment settings. *Environment:* Spark Version: 2.2.0 Java: 1.8

Kryo not registered class

2017-11-19 Thread Angel Francisco Orta
Hello, I'm with spark 2.1.0 with scala and I'm registering all classes with kryo, and I have a problem registering this class, org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex$SerializableFileStatus$SerializableBlockLocation[] I can't register with

Re: Multiple transformations without recalculating or caching

2017-11-19 Thread Phillip Henry
A back-of-a-beermat calculation says if you have, say, 20 boxes, saving 1TB should take approximately 15 minutes (with a replication factor of 1 since you don't need it higher for ephemeral data that is relatively easy to generate). This isn't much if the whole job takes hours. You get the added