Is it possible to set the number of cores per executor on standalone cluster?

2015-07-17 Thread Zheng, Xudong
Is it possible to set the number of cores per executor on standalone cluster? Because we find that, cores distribution may be very skewed on executor at some time, so the workload is skewed, that make our job become slow. Thanks! -- 郑旭东 Zheng, Xudong

Re: Parquet Hive table become very slow on 1.3?

2015-04-08 Thread Zheng, Xudong
ema, or the schema provided by user via data source > DDL), we don't need to do schema merging on driver side, but defer it to > executor side and each task only needs to reconcile those part-files it > needs to touch. This is also what the Parquet developers did recently for >

Re: Parquet Hive table become very slow on 1.3?

2015-03-31 Thread Zheng, Xudong
s a configuration > to disable schema merging by default when doing Hive metastore Parquet > table conversion. > > Another workaround is to fallback to the old Parquet code by setting > spark.sql.parquet.useDataSourceApi to false. > > Cheng > > > On 3/31/15 2:47 PM, Zh

Parquet Hive table become very slow on 1.3?

2015-03-30 Thread Zheng, Xudong
print the detailed LocatedBlocks info. Another finding is, if I read the Parquet file via scala code form spark-shell as below, it looks fine, the computation will return the result quick as before. sqlContext.parquetFile("data/myparquettable") Any idea about it? Thank you! -- 郑旭东 Zheng, Xudong