Hi, We are seeing bad performance as we incrementally load data. Here is the config
Spark standalone cluster spark01 (spark master, shark, hadoop namenode): 15GB RAM, 4vCPU's spark02 (spark worker, hadoop datanode): 15GB RAM, 8vCPU's spark03 (spark worker): 15GB RAM, 8vCPU's spark04 (spark worker): 15GB RAM, 8vCPU's spark worker configuration: spark.local.dir=/path/to/ssd/disk spark.default.parallelism=64 spark.executor.memory=10g spark.serializer=org.apache.spark.serializer.KryoSerializer shark configuration: spark.kryoserializer.buffer.mb=64 mapred.reduce.tasks=30 spark.scheduler.mode=FAIR spark.serializer=org.apache.spark.serializer.KryoSerializer spark.default.parallelism=64 and the performance decreases with more data being loaded into spark simple query like this: select count(*) from customers_cached 0.5 second on 12th Nov 4.24 seconds now We have these errors all over the log 2014-11-20 16:56:42,125 WARN parse.TypeCheckProcFactory (TypeCheckProcFactory.java:convert(180)) - Invalid type entry TOK_INT=null 2014-11-20 16:56:51,988 WARN parse.TypeCheckProcFactory (TypeCheckProcFactory.java:convert(180)) - Invalid type entry TOK_TABLE_OR_COL=null Anyone any ideas to help us resolve this? Can post up anything you need