Cached RDD Block Size - Uneven Distribution

2014-08-03 Thread iramaraju
I am running spark 1.0.0, Tachyon 0.5 and Hadoop 1.0.4. I am selecting a subset of a large dataset and trying to run queries on the cached schema RDD. Strangely, in web UI, I see the following. 150 Partitions Block Name Storage Level Size in Memory â–´Size on Disk Executors rdd_

Nested Case Classes (Found and Required Same)

2014-09-12 Thread iramaraju
I think this is a popular issue, but need help figuring a way around if this issue is unresolved. I have a dataset that has more than 70 columns. To have all the columns fit into my RDD, I am experimenting the following. (I intend to use the InputData to parse the file and have 3 or 4 columnsets to