Re: Spark Scheduler creating Straggler Node

2016-03-08 Thread Prabhu Joseph
I don't just want to replicate all Cached Blocks. I am trying to find a way to solve the issue which i mentioned above mail. Having replicas for all cached blocks will add more cost to customers. On Wed, Mar 9, 2016 at 9:50 AM, Reynold Xin wrote: > You just want to be

Re: Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources.

2016-03-08 Thread Reynold Xin
Isn't this just specified by the user? On Tue, Mar 8, 2016 at 9:49 PM, Hyukjin Kwon wrote: > Hi all, > > Currently, the output from CSV, TEXT and JSON data sources does not have > file extensions such as .csv, .txt and .json (except for compression > extensions such as

Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources.

2016-03-08 Thread Hyukjin Kwon
Hi all, Currently, the output from CSV, TEXT and JSON data sources does not have file extensions such as .csv, .txt and .json (except for compression extensions such as .gz, .deflate and .bz4). In addition, it looks Parquet has the extensions such as .gz.parquet or .snappy.parquet according to

Re: Spark Scheduler creating Straggler Node

2016-03-08 Thread Reynold Xin
You just want to be able to replicate hot cached blocks right? On Tuesday, March 8, 2016, Prabhu Joseph wrote: > Hi All, > > When a Spark Job is running, and one of the Spark Executor on Node A > has some partitions cached. Later for some other stage, Scheduler

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-08 Thread Burak Yavuz
+1 On Tue, Mar 8, 2016 at 10:59 AM, Andrew Or wrote: > +1 > > 2016-03-08 10:59 GMT-08:00 Yin Huai : > >> +1 >> >> On Mon, Mar 7, 2016 at 12:39 PM, Reynold Xin wrote: >> >>> +1 (binding) >>> >>> >>> On Sun, Mar 6, 2016 at 12:08

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-08 Thread Andrew Or
+1 2016-03-08 10:59 GMT-08:00 Yin Huai : > +1 > > On Mon, Mar 7, 2016 at 12:39 PM, Reynold Xin wrote: > >> +1 (binding) >> >> >> On Sun, Mar 6, 2016 at 12:08 PM, Egor Pahomov >> wrote: >> >>> +1 >>> >>> Spark ODBC server is

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-08 Thread Yin Huai
+1 On Mon, Mar 7, 2016 at 12:39 PM, Reynold Xin wrote: > +1 (binding) > > > On Sun, Mar 6, 2016 at 12:08 PM, Egor Pahomov > wrote: > >> +1 >> >> Spark ODBC server is fine, SQL is fine. >> >> 2016-03-03 12:09 GMT-08:00 Yin Yang :

Re: Spark structured streaming

2016-03-08 Thread Michael Armbrust
This is in active development, so there is not much that can be done from an end user perspective. In particular the only sink that is available in apache/master is a testing sink that just stores the data in memory. We are working on a parquet based file sink and will eventually support all the

Re: Spark structured streaming

2016-03-08 Thread Jacek Laskowski
Hi Praveen, I don't really know. I think TD or Michael should know as they personally involved in the task (as far as I could figure it out from the JIRA and the changes). Ping people on the JIRA so they notice your question(s). Pozdrawiam, Jacek Laskowski

Re: Use cases for kafka direct stream messageHandler

2016-03-08 Thread Cody Koeninger
No, looks like you'd have to catch them in the serializer and have the serializer return option or something. The new consumer builds a buffer full of records, not one at a time. On Mar 8, 2016 4:43 AM, "Marius Soutier" wrote: > > > On 04.03.2016, at 22:39, Cody Koeninger

Re: Spark structured streaming

2016-03-08 Thread Praveen Devarao
Thanks Jacek for the pointer. Any idea which package can be used in .format(). The test cases seem to work out of the DefaultSource class defined within the DataFrameReaderWriterSuite [ org.apache.spark.sql.streaming.test.DefaultSource] Thanking You

Re: Spark structured streaming

2016-03-08 Thread Jacek Laskowski
Hi Praveen, I've spent few hours on the changes related to streaming dataframes (included in the SPARK-8360) and concluded that it's currently only possible to read.stream(), but not write.stream() since there are no streaming Sinks yet. Pozdrawiam, Jacek Laskowski

Spark structured streaming

2016-03-08 Thread Praveen Devarao
Hi, I would like to get my hands on the structured streaming feature coming out in Spark 2.0. I have tried looking around for code samples to get started but am not able to find any. Only few things I could look into is the test cases that have been committed under the JIRA umbrella

Re: ML ALS API

2016-03-08 Thread Nick Pentreath
Hi Maciej Yes, that *train* method is intended to be public, but it is marked as *DeveloperApi*, which means that backward compatibility is not necessarily guaranteed, and that method may change. Having said that, even APIs marked as DeveloperApi do tend to be relatively stable. As the comment

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-08 Thread Dongjoon Hyun
Hi, I updated PR https://github.com/apache/spark/pull/11567. But, `lint-java` fails if that file is in the dev folder. (Jenkins fails, too.) So, inevitably, I changed pom.xml instead. Dongjoon. On Mon, Mar 7, 2016 at 11:40 PM, Jacek Laskowski wrote: > Hi, > > At first

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-08 Thread Jacek Laskowski
Hi, At first glance it appears the commit *yesterday* (Warsaw time) broke the build :( https://github.com/apache/spark/commit/0eea12a3d956b54bbbd73d21b296868852a04494 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark