Re: Aggregating over sorted data

2016-12-19 Thread Koert Kuipers
take a look at: https://issues.apache.org/jira/browse/SPARK-15798 On Dec 19, 2016 00:17, "Robin East" wrote: This is also a feature we need for our time-series processing > On 19 Dec 2016, at 04:07, Liang-Chi Hsieh wrote: > > > Hi, > > As I know,

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Mehdi Meziane
We will be interested by the results if you give a try to Dynamic allocation with mesos ! - Mail Original - De: "Michael Gummelt" À: "Sumit Chawla" Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User" ,

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Reynold Xin
The vote passed with the following +1 and -1: +1 Reynold Xin* Sean Owen* Dongjoon Hyun Xiao Li Herman van Hövell tot Westerflier Joseph Bradley* Liwei Lin Denny Lee Holden Karau Adam Roberts vaquar khan 0/+1 (not sure what this means but putting it here just in case) Felix Cheung -1 Franklyn

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit
Tim, We will try to run the application in coarse grain mode, and share the findings with you. Regards Sumit Chawla On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen wrote: > Dynamic allocation works with Coarse grain mode only, we wasn't aware > a need for Fine grain mode

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Timothy Chen
Dynamic allocation works with Coarse grain mode only, we wasn't aware a need for Fine grain mode after we enabled dynamic allocation support on the coarse grain mode. What's the reason you're running fine grain mode instead of coarse grain + dynamic allocation? Tim On Mon, Dec 19, 2016 at 2:45

Re: Kafka Spark structured streaming latency benchmark.

2016-12-19 Thread Shixiong(Ryan) Zhu
Hey Prashant. Thanks for your codes. I did some investigation and it turned out that ContextCleaner is too slow and its "referenceQueue" keeps growing. My hunch is cleaning broadcast is very slow since it's a blocking call. On Mon, Dec 19, 2016 at 12:50 PM, Shixiong(Ryan) Zhu <

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt
> Is this problem of idle executors sticking around solved in Dynamic Resource Allocation? Is there some timeout after which Idle executors can just shutdown and cleanup its resources. Yes, that's exactly what dynamic allocation does. But again I have no idea what the state of dynamic

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit
Great. Makes much better sense now. What will be reason to have spark.mesos.mesosExecutor.cores more than 1, as this number doesn't include the number of cores for tasks. So in my case it seems like 30 CPUs are allocated to executors. And there are 48 tasks so 48 + 30 = 78 CPUs. And i am

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Joris Van Remoortere
That makes sense. From the documentation it looks like the executors are not supposed to terminate: http://spark.apache.org/docs/latest/running-on-mesos.html#fine-grained-deprecated > Note that while Spark tasks in fine-grained will relinquish cores as they > terminate, they will not relinquish

Re: Kafka Spark structured streaming latency benchmark.

2016-12-19 Thread Shixiong(Ryan) Zhu
Hey, Prashant. Could you track the GC root of byte arrays in the heap? On Sat, Dec 17, 2016 at 10:04 PM, Prashant Sharma wrote: > Furthermore, I ran the same thing with 26 GB as the memory, which would > mean 1.3GB per thread of memory. My jmap >

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt
> I should preassume that No of executors should be less than number of tasks. No. Each executor runs 0 or more tasks. Each executor consumes 1 CPU, and each task running on that executor consumes another CPU. You can customize this via spark.mesos.mesosExecutor.cores (

java.lang.AssertionError: assertion failed

2016-12-19 Thread samkum
I am using Apache Spark 2.0.2 and facing following issue while using cartesian product in Spark Streaming module. I am using compression codec as snappy but facing the same issue while using the default one:LZ4, also using kryo for serialization. I also see ample memory available in the executor

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Mehdi Meziane
I think that what you are looking for is Dynamic resource allocation: http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Timothy Chen
Hi Chawla, One possible reason is that Mesos fine grain mode also takes up cores to run the executor per host, so if you have 20 agents running Fine grained executor it will take up 20 cores while it's still running. Tim On Fri, Dec 16, 2016 at 8:41 AM, Chawla,Sumit

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt
Yea, the idea is to use dynamic allocation. I can't speak to how well it works with Mesos, though. On Mon, Dec 19, 2016 at 11:01 AM, Mehdi Meziane wrote: > I think that what you are looking for is Dynamic resource allocation: >

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Nicholas Chammas
Since it’s not a regression from 2.0 (I believe the same issue affects both 2.0 and 2.1) it doesn’t merit a -1 vote according to the voting guidelines. Of course, it would be nice if we could fix the various optimizer issues that all seem to have a workaround that involves persist() (another one

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit
But coarse grained does the exact same thing which i am trying to avert here. At the cost of lower startup, it keeps the resources reserved till the entire duration of the job. Regards Sumit Chawla On Mon, Dec 19, 2016 at 10:06 AM, Michael Gummelt wrote: > Hi > > I

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt
Hi I don't have a lot of experience with the fine-grained scheduler. It's deprecated and fairly old now. CPUs should be relinquished as tasks complete, so I'm not sure why you're seeing what you're seeing. There have been a few discussions on the spark list regarding deprecating the

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Franklyn D'souza
-1 https://issues.apache.org/jira/browse/SPARK-18589 hasn't been resolved by this release and is a blocker in our adoption of spark 2.0. I've updated the issue with some steps to reproduce the error. On Mon, Dec 19, 2016 at 4:37 AM, Sean Owen wrote: > PS, here are the open

stratified sampling scales poorly

2016-12-19 Thread Martin Le
Hi all, I perform sampling on a DStream by taking samples from RDDs in the DStream. I have used two sampling mechanisms: simple random sampling and stratified sampling. Simple random sampling: inputStream.transform(x => x.sample(false, fraction)). Stratified sampling: inputStream.transform(x =>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Sean Owen
PS, here are the open issues for 2.1.0. Forgot this one. No Blockers, but one "Critical": SPARK-16845 org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB SPARK-18669 Update Apache docs regard watermarking in Structured Streaming SPARK-18894 Event time

Re: Aggregating over sorted data

2016-12-19 Thread Robin East
This is also a feature we need for our time-series processing > On 19 Dec 2016, at 04:07, Liang-Chi Hsieh wrote: > > > Hi, > > As I know, Spark SQL doesn't provide native support for this feature now. > After searching, I found only few database systems support it, e.g.,