Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Yi Tian
Hi all, I have some questions about the SparkSQL and Hive-on-Spark Will SparkSQL support all the hive feature in the future? or just making hive as a datasource of Spark? From Spark 1.1.0 , we have thrift-server support running hql on spark. Will this feature be replaced by Hive on Spark?

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Reynold Xin
On Tue, Sep 23, 2014 at 12:47 AM, Yi Tian tianyi.asiai...@gmail.com wrote: Hi all, I have some questions about the SparkSQL and Hive-on-Spark Will SparkSQL support all the hive feature in the future? or just making hive as a datasource of Spark? Most likely not *ALL* Hive features, but

RE: spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Shao, Saisai
Hi, Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node. Thanks Jerry From: Priya Ch

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Will Benton
Hi Yi, I've had some interest in implementing windowing and rollup in particular for some of my applications but haven't had them on the front of my plate yet. If you need them as well, I'm happy to start taking a look this week. best, wb - Original Message - From: Yi Tian

Re: RFC: Deprecating YARN-alpha API's

2014-09-23 Thread Tom Graves
Any other comments or objections on this? Thanks,Tom On Tuesday, September 9, 2014 4:39 PM, Chester Chen ches...@alpinenow.com wrote: We were using it until recently, we are talking to our customers and see if we can get off it. Chester Alpine Data Labs On Tue, Sep 9, 2014 at

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Michael Armbrust
I actually submitted a patch to do this yesterday: https://github.com/apache/spark/pull/2493 Can you tell us more about your configuration. In particular how much memory/cores do the executors have and what does the schema of your data look like? On Tue, Sep 23, 2014 at 7:39 AM, Cody Koeninger

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Aaron Davidson
This may be related: https://github.com/Parquet/parquet-mr/issues/211 Perhaps if we change our configuration settings for Parquet it would get better, but the performance characteristics of Snappy are pretty bad here under some circumstances. On Tue, Sep 23, 2014 at 10:13 AM, Cody Koeninger

SPARK-3660 : Initial RDD for updateStateByKey transformation

2014-09-23 Thread Soumitra Kumar
Hello fellow developers, Thanks TD for relevant pointers. I have created an issue : https://issues.apache.org/jira/browse/SPARK-3660 Copying the description from JIRA: How to initialize state tranformation updateStateByKey? I have word counts from previous spark-submit run, and want to load

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Yi Tian
Hi, Will We are planning to start implementing these functions. We hope that we could make a general design in following week. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Sep 23, 2014, at 23:39, Will Benton wi...@redhat.com wrote: Hi Yi, I've had some interest in implementing