Look at
org.apache.spark.streaming.scheduler.JobGenerator
it has a RecurringTimer (timer) that will simply post 'JobGenerate' events
to a EventLoop at the batchInterval time.
This EventLoop's thread then picks up these events, uses the
streamingContext.graph' to generate a Job (InputDstream's
Hi cody,
let me try once again to explain with example.
In BatchInfo class of spark "scheduling delay" is defined as
*def schedulingDelay: Option[Long] = processingStartTime.map(_ -
submissionTime)*
I am dumping batchinfo object in my LatencyListener which extends
StreamingListener.
batchTime
This vote passes with nine +1s (five binding) and one binding +0! Thanks
to everyone who tested/voted. I'll start work on publishing the release
today.
+1:
Mark Hamstra*
Moshe Eshel
Egor Pahomov
Reynold Xin*
Yin Huai*
Andrew Or*
Burak Yavuz
Kousuke Saruta
Michael Armbrust*
0:
Sean Owen*
-1:
+1 - Ported all our internal jobs to run on 1.6.1 with no regressions.
On Wed, Mar 9, 2016 at 7:04 AM, Kousuke Saruta
wrote:
> +1 (non-binding)
>
>
> On 2016/03/09 4:28, Burak Yavuz wrote:
>
> +1
>
> On Tue, Mar 8, 2016 at 10:59 AM, Andrew Or
Oh yeah I already added it after your earlier message, have a look.
On Wed, Mar 9, 2016 at 7:45 PM, Mohammed Guller wrote:
> My book on Spark was recently published. I would like to request it to be
> added to the Books section on Spark's website.
>
>
>
> Here are the
My book on Spark was recently published. I would like to request it to be added
to the Books section on Spark's website.
Here are the details about the book.
Title: Big Data Analytics with Spark
Author: Mohammed Guller
Link:
Yeah, to be clear, I'm talking about having only one constructor for a
direct stream, that will give you a stream of ConsumerRecord.
Different needs for topic subscription, starting offsets, etc could be
handled by calling appropriate methods after construction but before
starting the stream.
I'd probably prefer to keep it the way it is, unless it's becoming more
like the function without the messageHandler argument.
Right now I have code like this, but I wish it were more similar looking:
if (parsed.partitions.isEmpty()) {
JavaPairInputDStream
I'm really not sure what you're asking.
On Wed, Mar 9, 2016 at 12:43 PM, Sachin Aggarwal
wrote:
> where are we capturing this delay?
> I am aware of scheduling delay which is defined as processing
> time-submission time not the batch create time
>
> On Wed, Mar 9,
where are we capturing this delay?
I am aware of scheduling delay which is defined as processing
time-submission time not the batch create time
On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger wrote:
> Spark streaming by default will not start processing a batch until the
>
Spark streaming by default will not start processing a batch until the
current batch is finished. So if your processing time is larger than
your batch time, delays will build up.
On Wed, Mar 9, 2016 at 11:09 AM, Sachin Aggarwal
wrote:
> Hi All,
>
> we have batchTime
+1 (non-binding)
On 2016/03/09 4:28, Burak Yavuz wrote:
+1
On Tue, Mar 8, 2016 at 10:59 AM, Andrew Or > wrote:
+1
2016-03-08 10:59 GMT-08:00 Yin Huai >:
+1
It would be nice to have a code-configurable fall-back plan for such cases. Any
generalized solution can cause problems elsewhere.
Simply replicating hot cached blocks would be complicated to maintain and could
cause OOME. In the case I described on the JIRA, the hot partition will be
changing
It would be nice to have a code-configurable fall-back plan for such cases. Any
generalized solution can cause problems elsewhere.
Simply replicating hot cached blocks would be complicated to maintain and could
cause OOME. In the case I described on the JIRA, the hot partition will be
changing
This discussion is going to the Jira. Please refer the Jira if anyone is
interested in this.
On 9 Mar 2016 6:31 p.m., "Sean Owen" wrote:
> From your JIRA, it seems like you're referring to the "part-*" files.
> These files are effectively an internal representation, and I
>From your JIRA, it seems like you're referring to the "part-*" files.
These files are effectively an internal representation, and I would
not expect them to have such an extension. For example, you're not
really guaranteed that the way the data breaks up leaves each file a
valid JSON doc.
On
16 matches
Mail list logo