16, 2016, Karim, Md. Rezaul <
> rezaul.ka...@insight-centre.org> wrote:
>
>> Hi All,
>>
>> I am completely new with Kafka. I was wondering if somebody could provide
>> me some guidelines on how to develop real-time streaming applications using
>> Spark Stre
w with Kafka. I was wondering if somebody could provide
> me some guidelines on how to develop real-time streaming applications using
> Spark Streaming API with Kafka.
>
> I am aware the Spark Streaming and Kafka integration [1]. However, a real
> life example should be better to start?
>
Hi All,
I am completely new with Kafka. I was wondering if somebody could provide
me some guidelines on how to develop real-time streaming applications using
Spark Streaming API with Kafka.
I am aware the Spark Streaming and Kafka integration [1]. However, a real
life example should be better
10, 2016 at 8:42 AM, Manish Malhotra <
> manish.malhotra.w...@gmail.com> wrote:
>
> Hello Spark Devs/Users,
>
> Im trying to solve the use case with Spark Streaming 1.6.2 where for every
> batch ( say 2 mins) data needs to go to the same reducer node after
> grouping by key
wrote:
> sending again.
> any help is appreciated !
>
> thanks in advance.
>
> On Thu, Nov 10, 2016 at 8:42 AM, Manish Malhotra <
> manish.malhotra.w...@gmail.com> wrote:
>
>> Hello Spark Devs/Users,
>>
>> Im trying to solve the use case with Spark Streaming 1
Mon, Nov 14, 2016 at 5:43 PM, Takeshi Yamamuro <linguin@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> The time interval can be controlled by `IdleTimeBetweenReadsInMillis`
>>>> in KinesisClientLibConfiguration though,
>>
sending again.
any help is appreciated !
thanks in advance.
On Thu, Nov 10, 2016 at 8:42 AM, Manish Malhotra <
manish.malhotra.w...@gmail.com> wrote:
> Hello Spark Devs/Users,
>
> Im trying to solve the use case with Spark Streaming 1.6.2 where for every
> batch ( say 2 mins
)
at
org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:232)
at
org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:87)
at
org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock
configurable in the current implementation.
>>>
>>> The detail can be found in;
>>> https://github.com/apache/spark/blob/master/external/kinesis
>>> -asl/src/main/scala/org/apache/spark/streaming/kinesis/
>>> KinesisReceiver.scala#L152
>>>
&g
rable in the current implementation.
>>
>> The detail can be found in;
>> https://github.com/apache/spark/blob/master/external/kinesis
>> -asl/src/main/scala/org/apache/spark/streaming/kinesis
>> /KinesisReceiver.scala#L152
>>
>&
ntLibConfiguration
> though,
> it is not configurable in the current implementation.
>
> The detail can be found in;
> https://github.com/apache/spark/blob/master/external/
> kinesis-asl/src/main/scala/org/apache/spark/streaming/
> kinesis/KinesisReceiver.scala#L152
>
> //
Hi,
The time interval can be controlled by `IdleTimeBetweenReadsInMillis`
in KinesisClientLibConfiguration though,
it is not configurable in the current implementation.
The detail can be found in;
https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/scala/org/apache/spark
Hi
In spark streaming based on receivers - when receiver gets data and store
in blocks for workers to process, How many blocks does receiver gives to
worker.
Say I have a streaming app with 30 sec of batch interval what will happen
1.for first batch(first 30 sec) there will not be any data
each
>> executor.
>>
>> I realize that this is not the original question on this thread, and I
>> don't meant to hijack that. I am also interested in the potential of Akka
>> Stream sources for a Spark Streaming job directly, which could potentially
>> be adapted for b
ad, and I
> don't meant to hijack that. I am also interested in the potential of Akka
> Stream sources for a Spark Streaming job directly, which could potentially
> be adapted for both Kafka and non-kafka use cases, with the emphasis for me
> being on use cases which aren't necessarily Kafka
t; > On Sat, Nov 12, 2016 at 9:55 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>> >>
>> >> Are you certain you aren't getting any failed tasks or other errors?
>> >> Output actions like foreach aren't exactly once and will be retried on
>&g
<c...@koeninger.org>
> wrote:
> >>
> >> Are you certain you aren't getting any failed tasks or other errors?
> >> Output actions like foreach aren't exactly once and will be retried on
> >> failures.
> >>
> >>
> >> On Nov 12,
to hijack that. I am also interested in the potential of Akka Stream
sources for a Spark Streaming job directly, which could potentially be adapted
for both Kafka and non-kafka use cases, with the emphasis for me being on use
cases which aren't necessarily Kafka specific. There are some portions which
actly once and will be retried on
>> failures.
>>
>>
>> On Nov 12, 2016 06:36, "dev loper" <spark...@gmail.com> wrote:
>>>
>>> Dear fellow Spark Users,
>>>
>>> My Spark Streaming application (Spark 2.0 , on AWS EMR yarn c
; be interested in collaborating in the structured streaming version for sure.
>
> On Thu, Nov 10, 2016 at 8:46 AM shyla deshpande <deshpandesh...@gmail.com>
> wrote:
>>
>> I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka,
>> Spark Streaming and Cassandra usi
led tasks or other errors?
> Output actions like foreach aren't exactly once and will be retried on
> failures.
>
> On Nov 12, 2016 06:36, "dev loper" <spark...@gmail.com> wrote:
>
>> Dear fellow Spark Users,
>>
>> My Spark Streaming application (Spark 2.
Are you certain you aren't getting any failed tasks or other errors?
Output actions like foreach aren't exactly once and will be retried on
failures.
On Nov 12, 2016 06:36, "dev loper" <spark...@gmail.com> wrote:
> Dear fellow Spark Users,
>
> My Spark Streaming applic
*Hi *
*is **spark.streaming.blockInterval* for kinesis input stream is hardcoded
to 1 sec or is it configurable ? Time interval at which receiver fetched
data from kinesis .
Means stream batch interval cannot be less than *spark.streaming.blockInterval
and this should be configrable , Also is
esh...@gmail.com>
wrote:
> I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka,
> Spark Streaming and Cassandra using Structured Streaming. But the kafka
> source support for Structured Streaming is not yet available. So now I am
> trying to use Akka Stream as the source
laskowski
On Thu, Nov 10, 2016 at 8:46 AM, shyla deshpande
<deshpandesh...@gmail.com> wrote:
> I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka, Spark
> Streaming and Cassandra using Structured Streaming. But the kafka source
> support for Structured Streaming
Dear fellow Spark Users,
My Spark Streaming application (Spark 2.0 , on AWS EMR yarn cluster)
listens to Campaigns based on live stock feeds and the batch duration is 5
seconds. The applications uses Kafka DirectStream and based on the feed
source there are three streams. As given in the code
Using ProtoBuf for Kafka messages with Spark Streaming because ProtoBuf is
already being used in the system.
Some sample code and reading material for using ProtoBuf for Kafka messages
with Spark Streaming will be helpful.
Thanks
Hello Spark Devs/Users,
Im trying to solve the use case with Spark Streaming 1.6.2 where for every
batch ( say 2 mins) data needs to go to the same reducer node after
grouping by key.
The underlying storage is Cassandra and not HDFS.
This is a map-reduce job, where also trying to use
il.com>
wrote:
I have already integrated common actors. I am also interested, specially to
see how we can achieve end to end back pressure.
2016-11-10 8:46 GMT+01:00 shyla deshpande <deshpandesh...@gmail.com>:
> I am using Spark 2.0.1. I wanted to build a data pipeline using
I have already integrated common actors. I am also interested, specially to
see how we can achieve end to end back pressure.
2016-11-10 8:46 GMT+01:00 shyla deshpande <deshpandesh...@gmail.com>:
> I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka,
> Spa
I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka,
Spark Streaming and Cassandra using Structured Streaming. But the kafka
source support for Structured Streaming is not yet available. So now I am
trying to use Akka Stream as the source to Spark Streaming.
Want to make sure I
uot;).foreach { line =>
if // it's a header
parser = someParserBasedOn(line)
else
items += parser.parse(line)
}
items.iterator
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Apache-Spark-Streaming-how-to-handle-changing-da
Hi,
We are using spark streaming version 1.6.2 and came across a weird behavior.
Our system pulls log events data from flume servers, enrich the events and
save them to ES.
We are using window interval of 15 seconds and the rate on peak hours is
around 70K events.
The average time to process
ark.apache.org
Subject: Re: Spark Streaming Data loss on failure to write BlockAdditionEvent
failure to WAL
For WAL in Spark to work with HDFS, the HDFS version you are running must
support file appends. Contact your HDFS package/installation provider to figure
out whether this is supported by your
Hi, I am using spark streaming process some events. It is deployed in
standalone mode with 1 master and 3 workers. I have set number of cores per
executor to 4 and total num of executors to 24. This means totally 6
executors will be spawned. I have set spread-out to true. So each worker
machine
// maropu
On Mon, Nov 7, 2016 at 1:36 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:
> Hi
>
> By receicer I meant spark streaming receiver architecture- means worker
> nodes are different than receiver nodes. There is no direct consumer/low
> level consumer like of K
omise@6d8f1feb))
> java.lang.IllegalStateException: File exists and there is no append
> support!
> at org.apache.spark.streaming.util.HdfsUtils$.
> getOutputStream(HdfsUtils.scala:35)
> at org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.
> org$ap
if // it's a header
parser = someParserBasedOn(line)
else
items += parser.parse(line)
}
items.iterator
}
On Mon, Nov 7, 2016 at 4:22 PM, coolgar <karllbunn...@gmail.com> wrote:
> I'm using apache spark streaming with the kafka direct consumer. The data
> stre
I'm using apache spark streaming with the kafka direct consumer. The data
stream I'm receiving is log data that includes a header with each block of
messages. Each DStream can therefore have many blocks of messages, each with
it's own header.
The header is used to know how to interpret
m(HdfsUtils.scala:35)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream$lzycompute(FileBasedWriteAheadLogWriter.scala:33)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream(FileBasedWriteAhe
Spark inherits its security from the underlying mechanisms in either YARN or
MESOS (whichever environment you are launching your cluster/jobs)
That said… there is limited support from Ranger. There are three parts to this…
1) Ranger being called when the job is launched…
2) Ranger being
Hi,
Do ranger provide security to spark?If yes,then in what capacity.
Thanks,
Mudit
Hi
By receicer I meant spark streaming receiver architecture- means worker
nodes are different than receiver nodes. There is no direct consumer/low
level consumer like of Kafka in kinesis spark streaming?
Is there any limitation on interval checkpoint - minimum of 1second in
spark streaming
Has anyone worked with AWS Kinesis and retrieved data from it using Spark
Streaming? I am having issues where it’s returning no data. I can connect to
the Kinesis stream and describe using Spark. Is there something I’m missing?
Are there specific IAM security settings needed? I just simply
now if the same spark streaming job can consume from kafka
> 0.8.1 and write the data to kafka 0.9. Just trying to replicate the kafka
> server.
>
> Yes, Kafka's MirrorMaker can be used to replicate, but was curious to know
> if that can be achieved by spark streaming.
Hi,
I wouild like to know if there is code example to write data in InfluxDB
from Spark Streaming in Scala / Python.
Thanks in advance
Gioacchino
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
I would like to know if the same spark streaming job can consume from kafka
0.8.1 and write the data to kafka 0.9. Just trying to replicate the kafka
server.
Yes, Kafka's MirrorMaker can be used to replicate, but was curious to know
if that can be achieved by spark streaming.
Please share
heckpoint the sequence no using some api.
>
>
>
> On Tue, Oct 25, 2016 at 7:07 AM, Takeshi Yamamuro <linguin@gmail.com>
> wrote:
>
>> Hi,
>>
>> The only thing you can do for Kinesis checkpoints is tune the interval of
>> them.
>> https://github.com/apach
gt; replicated across executors.
> However, all the executors that have the replicated data crash,
> IIUC the dataloss occurs.
>
> // maropu
>
> On Mon, Oct 24, 2016 at 4:43 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Does spark streaming c
Hi,
The only thing you can do for Kinesis checkpoints is tune the interval of
them.
https://github.com/apache/spark/blob/master/external/
kinesis-asl/src/main/scala/org/apache/spark/streaming/
kinesis/KinesisUtils.scala#L68
Whether the dataloss occurs or not depends on the storage level you set
On 22 Oct 2016, at 20:58, Nkechi Achara
> wrote:
I do not use rename, and the files are written to, and then moved to a
directory on HDFS in gz format.
in that case there's nothing obvious to mee.
try logging at trace/debug the class:
Does spark streaming consumer for kinesis uses Kinesis Client Library and
mandates to checkpoint the sequence number of shards in dynamo db.
Will it lead to dataloss if consumed datarecords are not yet processed and
kinesis checkpointed the consumed sequenece numbers in dynamo db and spark
Hi,
I am getting
*Remote RPC client disassociated. Likely due to containers exceeding
thresholds, or network issues. Check driver logs for WARN messages.*
error with spark streaming job. I am using spark 2.0.0. The job is simple
windowed aggregation and the stream is read from socket. Average
I do not use rename, and the files are written to, and then moved to a
directory on HDFS in gz format.
On 22 October 2016 at 15:14, Steve Loughran wrote:
>
> > On 21 Oct 2016, at 15:53, Nkechi Achara wrote:
> >
> > Hi,
> >
> > I am using Spark
> On 21 Oct 2016, at 15:53, Nkechi Achara wrote:
>
> Hi,
>
> I am using Spark 1.5.0 to read gz files with textFileStream, but when new
> files are dropped in the specified directory. I know this is only the case
> with gz files as when i extract the file into the
Hi,
I am using Spark 1.5.0 to read gz files with textFileStream, but when new
files are dropped in the specified directory. I know this is only the case
with gz files as when i extract the file into the directory specified the
files are read on the next window and processed.
My code is here:
Hi Guys,
My Spark Streaming Client program works fine as the long as the receiver
receives the data but say my receiver has no more data to receive for few
hours like (4-5 hours) and then its starts receiving the data again at that
point spark client program doesn't seem to process any data
afka at my backend which acts like a bus for messages. Whenever a
> search request is done I publish the nano time of the current system. I
> want to use Spark Streaming to aggregate such data but I am so new to it.
>
> I wanted to follow that example:
> http://spark.apache.org
Hi,
I have a search application and want to monitor queries per second for it.
I have Kafka at my backend which acts like a bus for messages. Whenever a
search request is done I publish the nano time of the current system. I
want to use Spark Streaming to aggregate such data but I am so new
57 AM, Samy Dindane <s...@dindane.com>
wrote:
Hi,
Is it possible to limit the size of the batches returned by the
Kafka
consumer for Spark Streaming?
I am asking because the first batch I get has hundred of millions
of
records
and it takes ages to process and checkpoint them.
Thank
gt;>>
>>>>>
>>>>>>
>>>>>> On Wed, Oct 12, 2016 at 8:53 AM, Samy Dindane <s...@dindane.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> That's what I was looking for, thank yo
6:08 PM, Cody Koeninger wrote:
http://spark.apache.org/docs/latest/configuration.html
"This rate is upper bounded by the values
spark.streaming.receiver.maxRate and
spark.streaming.kafka.maxRatePerPartition if they are set (see
below)."
On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...
;>>
>>>>> The only configuration that works is
>>>>> "spark.streaming.kafka.maxRatePerPartition".
>>>>> That's better than nothing, but I'd be useful to have backpressure
>>>>> enabled
>>>>> for automati
by the values
spark.streaming.receiver.maxRate and
spark.streaming.kafka.maxRatePerPartition if they are set (see
below)."
On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...@dindane.com> wrote:
Hi,
Is it possible to limit the size of the batches returned by the Kafka
consumer for Spark Stream
t; spark.streaming.receiver.maxRate and
>> spark.streaming.kafka.maxRatePerPartition if they are set (see
>> below)."
>>
>> On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...@dindane.com> wrote:
>>>
>>> Hi,
>>>
>>> Is it possible to limit t
dindane.com> wrote:
Hi,
Is it possible to limit the size of the batches returned by the Kafka
consumer for Spark Streaming?
I am asking because the first batch I get has hundred of millions of records
and it takes ages to process and checkpoi
All,
We have an use case in which 2 spark streaming jobs in same EMR cluster.
I am thinking of allowing multiple streaming contexts and run them as 2
separate spark-submit with wait for app completion set to false.
With this, the failure detection and monitoring seems obscure and doesn't
seem
> wrote:
> Hi,
>
> Is it possible to limit the size of the batches returned by the Kafka
> consumer for Spark Streaming?
> I am asking because the first batch I get has hundred of millions of records
> and it takes ages to process and checkp
Hi,
Is it possible to limit the size of the batches returned by the Kafka consumer
for Spark Streaming?
I am asking because the first batch I get has hundred of millions of records
and it takes ages to process and checkpoint them.
Thank you.
Samy
lly
> improved the performance of my particular Spark Streaming application. I'm
> sharing the details in hopes that it may help somebody in a similar situation.
>
> As my program ingested information into HDFS (as parquet files), I noticed
> that the time to process each batch wa
il's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 10 October 2016 at 15:25, Kevin Mellott <kevin.r.mell...@gmail.com>
> wrote:
>
>> Whilst w
application, I found a setting that drastically
> improved the performance of my particular Spark Streaming application. I'm
> sharing the details in hopes that it may help somebody in a similar
> situation.
>
> As my program ingested information into HDFS (as parquet files), I noticed
&
Whilst working on this application, I found a setting that drastically
improved the performance of my particular Spark Streaming application. I'm
sharing the details in hopes that it may help somebody in a similar
situation.
As my program ingested information into HDFS (as parquet files), I
Hi,
Is spark 2.0.0 supports partitioned windows in streaming?
Cheers
Adrienne
Hi all,
I am using Spark Streaming for my use case.
I want to
- partition or group the stream by key
- window the tuples in partitions
and - find max/min element in windows (in every partition)
My code is like:
val keyedStream = socketDataSource.map(s => (s.key,s.va
Hi
I want to enquire does spark streaming has some limitation of 500ms of
batch intreval ?
Is storm better than spark streaming for real time (for latency of just
50-100ms). In spark streaming can parallel batches be run ? If yes is it
supported at productionlevel.
Thanks
Either artifact should work with 0.10 brokers. The 0.10 integration has
more features but is still marked experimental.
On Sep 26, 2016 3:41 AM, "Haopu Wang" <hw...@qilinsoft.com> wrote:
> Hi, in the official integration guide, it says "Spark Streaming 2.0.0 is
> co
Hi, in the official integration guide, it says "Spark Streaming 2.0.0 is
compatible with Kafka 0.8.2.1."
However, in maven repository, I can get
"spark-streaming-kafka-0-10_2.11" which depends on Kafka 0.10.0.0
Is this artifact stable enough? Thank you!
ta coming in to my spark streaming cluster from a message
> queue(not kafka). And this message queue guarantees at least once delivery
> only so there is potential that some of the messages that come in to the
> spark streaming cluster are actually duplicates and I am trying to figure out
1:49 AM, "kant kodali" <kanth...@gmail.com> wrote:
> Hi Guys,
>
> I have bunch of data coming in to my spark streaming cluster from a
> message queue(not kafka). And this message queue guarantees at least once
> delivery only so there is potential that
Hi Guys,
I have bunch of data coming in to my spark streaming cluster from a message
queue(not kafka). And this message queue guarantees at least once delivery only
so there is potential that some of the messages that come in to the spark
streaming cluster are actually duplicates and I am trying
Anybody?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720p27771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hello,
I wrote a spark streaming application in Java. It reads stock trades off of a
data feed receiver and converts them to Tick objects, and uses a microbatch
interval, window interval and sliding interval of 10 seconds. A
JavaPairDStream<String, Iterable> is created where t
How are you producing data? I just tested your code and i can receive the
messages from Kafka.
Regards
Sumit Chawla
On Sun, Sep 18, 2016 at 7:56 PM, Sateesh Karuturi <
sateesh.karutu...@gmail.com> wrote:
> i am very new to *Spark streaming* and i am implementing small exercise
>
com> wrote:
> i am very new to *Spark streaming* and i am implementing small exercise
> like sending *XML* data from *kafka* and need to receive that *streaming* data
> through *spark streaming.* I tried in all possible ways.. but every time
> i am getting *empty values.*
>
>
i am very new to *Spark streaming* and i am implementing small exercise
like sending *XML* data from *kafka* and need to receive that *streaming* data
through *spark streaming.* I tried in all possible ways.. but every time i
am getting *empty values.*
*There is no problem in Kafka side, only
s for time
> 1473491465000 ms (execution: 0.066 s) *(EVENT 2nd time process cost 0.066)*
>
> and the 2nd time processing of the event finished without really doing the
> work.
>
> Help is hugely appreciated.
>
>
>
> --
> View this message in context: http://apache-spark-
commend that the third party application puts an empty file with the same
filename as the original file, but the extension ".uploaded". This is an
indicator that the file has been fully (!) written to the fs. Otherwise you
risk only reading parts of the file.
Then, you can have a
tem listener for this .upload file.
>
> Spark streaming or Kafka are not needed/suitable, if the server is a file
> server. You can use oozie (maybe with a simple custom action) to poll for
> .uploaded files and transmit them.
>
> On 15 Sep 2016, at 19:00, Kappaganthu, Sivaram (E
ave a file system listener for this .upload file.
Spark streaming or Kafka are not needed/suitable, if the server is a file
server. You can use oozie (maybe with a simple custom action) to poll for
.uploaded files and transmit them.
> On 15 Sep 2016, at 19:00, Kappaganthu, Sivaram (ES)
> &
) is it possible for spark-stream to trigger a job after a file is placed
instead of triggering a job at fixed batch interval?
2) If it is not possible with Spark-streaming, can we control this with
Kafka/Flume
Thanks,
Sivaram
ethods in DirectKafkaInputDStream and KafkaRDD
>> respectively if you want to see an example
>>
>> On Tue, Sep 13, 2016 at 9:37 AM, Daan Debie <debie.d...@gmail.com> wrote:
>> > Ah, that makes it much clearer, thanks!
>> >
>> > It also
; It also brings up an additional question: who/what decides on the
> > partitioning? Does Spark Streaming decide to divide a micro batch/RDD
> into
> > more than 1 partition based on size? Or is it something that the "source"
> > (SocketStream, KafkaStream etc.) de
e reliability mechanisms and has been explicitly
> designed for import/export and is tested. Not sure if i would go for spark
> streaming if the use case is only storing, but I do not have the full
> picture of your use case.
>
> Anyway, what you could do is create a directory / hour
Hi everyone,
I'm starting in Spark Streaming and would like to know somethings about data
arriving.
I know that SS uses micro-batches and they are received by workers and sent
to RDD. The master, on defined intervals, receives a poiter to micro-batch
in RDD and can use it to process data using
Hi,
An alternative to Spark could be flume to store data from Kafka to HDFS. It
provides also some reliability mechanisms and has been explicitly designed for
import/export and is tested. Not sure if i would go for spark streaming if the
use case is only storing, but I do not have the full
Hi,
I have a Spark streaming that reads messages/prices from Kafka and writes
it as text file to HDFS.
This is pretty efficient. Its only function is to persist the incoming
messages to HDFS.
This is what it does
dstream.foreachRDD { pricesRDD =>
val x= pricesRDD.co
t makes it much clearer, thanks!
> >
> > It also brings up an additional question: who/what decides on the
> > partitioning? Does Spark Streaming decide to divide a micro batch/RDD
> into
> > more than 1 partition based on size? Or is it something that the "source&quo
Ah, that makes it much clearer, thanks!
It also brings up an additional question: who/what decides on the
partitioning? Does Spark Streaming decide to divide a micro batch/RDD into
more than 1 partition based on size? Or is it something that the "source"
(SocketStream, KafkaStream etc
an example
On Tue, Sep 13, 2016 at 9:37 AM, Daan Debie <debie.d...@gmail.com> wrote:
> Ah, that makes it much clearer, thanks!
>
> It also brings up an additional question: who/what decides on the
> partitioning? Does Spark Streaming decide to divide a micro batch/RDD into
> more t
901 - 1000 of 4567 matches
Mail list logo