Hi,

We are seeing bad performance as we incrementally load data. Here is the
config

Spark standalone cluster

spark01 (spark master, shark, hadoop namenode): 15GB RAM, 4vCPU's
spark02 (spark worker, hadoop datanode): 15GB RAM, 8vCPU's
spark03 (spark worker): 15GB RAM, 8vCPU's
spark04 (spark worker): 15GB RAM, 8vCPU's

spark worker configuration:
spark.local.dir=/path/to/ssd/disk
spark.default.parallelism=64
spark.executor.memory=10g
spark.serializer=org.apache.spark.serializer.KryoSerializer

shark configuration:
spark.kryoserializer.buffer.mb=64
mapred.reduce.tasks=30
spark.scheduler.mode=FAIR
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.default.parallelism=64

and the performance decreases with more data being loaded into spark

simple query like this:
select count(*) from customers_cached
0.5 second on 12th Nov
4.24 seconds now

We have these errors all over the log

2014-11-20 16:56:42,125 WARN  parse.TypeCheckProcFactory
(TypeCheckProcFactory.java:convert(180)) - Invalid type entry TOK_INT=null
2014-11-20 16:56:51,988 WARN  parse.TypeCheckProcFactory
(TypeCheckProcFactory.java:convert(180)) - Invalid type entry
TOK_TABLE_OR_COL=null

Anyone any ideas to help us resolve this? Can post up anything you need

Reply via email to