Spark streaming and executor object reusage

2015-03-06 Thread Jean-Pascal Billaud
Hi, Reading through the Spark Streaming Programming Guide, I read in the "Design Patterns for using foreachRDD": "Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be re

Spark Streaming input data source list

2015-03-09 Thread Cui Lin
Dear all, Could you send me a list for input data source that spark streaming could support? My list is HDFS, Kafka, textfile?… I am wondering if spark streaming could directly read data from certain port (443 e.g.) that my devices directly send to? Best regards, Cui Lin

Pausing/throttling spark/spark-streaming application

2015-03-14 Thread tulinski
Hi, I created a question on StackOverflow: http://stackoverflow.com/questions/29051579/pausing-throttling-spark-spark-streaming-application I would appreciate your help. Best, Tomek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pausing-throttling-spark

Spark Streaming with compressed xml files

2015-03-15 Thread Vijay Innamuri
Hi All, Processing streaming JSON files with Spark features (Spark streaming and Spark SQL), is very efficient and works like a charm. Below is the code snippet to process JSON files. windowDStream.foreachRDD(IncomingFiles => { val IncomingFilesTable = sqlContext.json

Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Guys, We have a project which builds upon Spark streaming. We use Kafka as the input stream, and create 5 receivers. When this application runs for around 90 hour, all the 5 receivers failed for some unknown reasons. In my understanding, it is not guaranteed that Spark streaming receiver will

Re: Iterative Algorithms with Spark Streaming

2015-03-16 Thread Nick Pentreath
the Spark Streaming batch. On Mon, Mar 16, 2015 at 2:57 PM, Alex Minnaar wrote: > I wanted to ask a basic question about the types of algorithms that are > possible to apply to a DStream with Spark streaming. With Spark it is > possible to perform iterative computations on RDDs li

Re: Spark Streaming S3 Performance Implications

2015-03-21 Thread Chris Fregly
is a multiple of the batch interval). this goes for any spark streaming implementation - not just Kinesis. lemme know if that works for you. thanks! -Chris  _ From: Mike Trienis Sent: Wednesday, March 18, 2015 2:45 PM Subject: Spark Streaming S3 Performance Implicatio

Re: Spark Streaming S3 Performance Implications

2015-03-21 Thread Ted Yu
rds/receivers + 1. > > also, it looks like you're writing to S3 per RDD. you'll want to broaden > that out to write DStream batches - or expand even further and write > window batches (where the window interval is a multiple of the batch > interval). > > this goes

Re: Spark Streaming - Minimizing batch interval

2015-03-25 Thread Sean Owen
items per ms on average", which is different and entirely possible. On Wed, Mar 25, 2015 at 2:53 PM, RodrigoB wrote: > I've been given a feature requirement that means processing events on a > latency lower than 0.25ms. > > Meaning I would have to make sure that Spark streami

Untangling dependency issues in spark streaming

2015-03-29 Thread Neelesh
Hi, My streaming app uses org.apache.httpcomponent:httpclient:4.3.6, but spark uses 4.2.6 , and I believe thats what's causing the following error. I've tried setting spark.executor.userClassPathFirst & spark.driver.userClassPathFirst to true in the config, but that does not solve it either. Fina

Spark Streaming/Flume display all events

2015-03-30 Thread Chong Zhang
Hi, I am new to Spark/Streaming, and tried to run modified FlumeEventCount.scala example to display all events by adding the call: stream.map(e => "Event:header:" + e.event.get(0).toString + "body: " + new String(e.event.getBody.array)).print() The spark-submit

Spark Streaming 1.3 & Kafka Direct Streams

2015-04-01 Thread Neelesh
With receivers, it was pretty obvious which code ran where - each receiver occupied a core and ran on the workers. However, with the new kafka direct input streams, its hard for me to understand where the code that's reading from kafka brokers runs. Does it run on the driver (I hope not), or does i

Re: Spark Streaming S3 Performance Implications

2015-04-01 Thread Mike Trienis
- > equalling at least the number of shards/receivers + 1. > > also, it looks like you're writing to S3 per RDD. you'll want to broaden > that out to write DStream batches - or expand even further and write > window batches (where the window interval is a multiple of the bat

Spark Streaming FileStream Nested File Support

2015-04-03 Thread adamgerst
this limitation? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-FileStream-Nested-File-Support-tp22370.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: spark streaming printing no output

2015-04-14 Thread Shixiong Zhu
Could you see something like this in the console? --- Time: 142905487 ms --- Best Regards, Shixiong(Ryan) Zhu 2015-04-15 2:11 GMT+08:00 Shushant Arora : > Hi > > I am running a spark streaming applic

Re: spark streaming printing no output

2015-04-15 Thread Shushant Arora
ike this in the console? > > --- > Time: 142905487 ms > --- > > > Best Regards, > Shixiong(Ryan) Zhu > > 2015-04-15 2:11 GMT+08:00 Shushant Arora : > >> Hi >> >> I a

Re: spark streaming printing no output

2015-04-15 Thread Shixiong Zhu
ee something like this in the console? >> >> --- >> Time: 142905487 ms >> --- >> >> >> Best Regards, >> Shixiong(Ryan) Zhu >> >> 2015-04-15 2:11 GMT+08:00 Shushant Arora : &

Re: spark streaming printing no output

2015-04-15 Thread Akhil Das
Just make sure you have atleast 2 cores available for processing. You can try launching it in local[2] and make sure its working fine. Thanks Best Regards On Tue, Apr 14, 2015 at 11:41 PM, Shushant Arora wrote: > Hi > > I am running a spark streaming application but on console n

Re: spark streaming printing no output

2015-04-15 Thread Shushant Arora
make sure you have atleast 2 cores available for processing. You can > try launching it in local[2] and make sure its working fine. > > Thanks > Best Regards > > On Tue, Apr 14, 2015 at 11:41 PM, Shushant Arora < > shushantaror...@gmail.com> wrote: > >> Hi >&g

Re: Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
gt; *From:* Akhil Das > *Date:* 2015-04-15 19:12 > *To:* Shushant Arora > *CC:* user > *Subject:* Re: spark streaming with kafka > Once you start your streaming application to read from Kafka, it will > launch receivers on the executor nodes. And you can see them on the > st

Spark Streaming updatyeStateByKey throws OutOfMemory Error

2015-04-21 Thread Sourav Chandra
Hi, We are building a spark streaming application which reads from kafka, does updateStateBykey based on the received message type and finally stores into redis. After running for few seconds the executor process get killed by throwing OutOfMemory error. The code snippet is below

Re: Flume with Spark Streaming Sink

2016-03-20 Thread Ted Yu
$ jar tvf ./external/flume-sink/target/spark-streaming-flume-sink_2.10-1.6.1.jar | grep SparkFlumeProtocol 841 Thu Mar 03 11:09:36 PST 2016 org/apache/spark/streaming/flume/sink/SparkFlumeProtocol$Callback.class 2363 Thu Mar 03 11:09:36 PST 2016 org/apache/spark/streaming/flume/sink

Re: Flume with Spark Streaming Sink

2016-03-20 Thread Luciano Resende
to use the Spark Sink with Flume but it seems I'm missing some > of the dependencies. > I'm running the following code: > > ./bin/spark-shell --master yarn --jars > /home/impact/flumeStreaming/spark-streaming-flume_2.10-1.6.1.jar,/home/impact/flumeStreaming/flume-ng-core-1.6.

Re: Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-04 Thread Ted Yu
bq. I'm on version 2.10 for spark The above is Scala version. Can you give us the Spark version ? Thanks On Mon, Apr 4, 2016 at 2:36 PM, mpawashe wrote: > Hi all, > > I am using Spark Streaming API (I'm on version 2.10 for spark and > streaming), and I am running into a

Re: Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-05 Thread Mayur Pawashe
Hi. I am using 2.10.4 for Scala. 1.6.0 for Spark related dependencies. I am also using spark-streaming-kafka and including kafka (0.8.1.1) which apparently is needed for deserializers. > On Apr 4, 2016, at 6:18 PM, Ted Yu wrote: > > bq. I'm on version 2.10 for spark > >

Re: Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-06 Thread jamborta
you can declare you class serializable, as spark would want to serialise the whole class. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-NotSerializableException-Methods-Closures-tp26672p26689.html Sent from the Apache Spark User List

Re: Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-08 Thread mpawashe
The class declaration is already marked Serializable ("with Serializable") -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-NotSerializableException-Methods-Closures-tp26672p26718.html Sent from the Apache Spark User List mailing li

Monitoring S3 Bucket with Spark Streaming

2016-04-08 Thread Benjamin Kim
Has anyone monitored an S3 bucket or directory using Spark Streaming and pulled any new files to process? If so, can you provide basic Scala coding help on this? Thanks, Ben - To unsubscribe, e-mail: user-unsubscr

Re: Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-08 Thread jamborta
> If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-NotSerializableException-Methods-Closures-tp26672p26718.html > To unsubscribe from Spark Streaming - NotSerializabl

Re: Spark Streaming, Broadcast variables, java.lang.ClassCastException

2016-04-25 Thread mwol
I forgot the streamingContext.start() streamingContext.awaitTermination() in my example code, but the error stays the same... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Broadcast-variables-java-lang-ClassCastException-tp26828p26829

Individual DStream Checkpointing in Spark Streaming

2016-05-05 Thread Akash Mishra
Hi *, I am little confused over the checkpointing of Spark Streaming Context and Individual Streaming context. E.g: JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1)); jssc.checkpoint("hdfs://...") Will start checkpointing the Dstream operation, con

Kafka 0.9 and spark-streaming-kafka_2.10

2016-05-09 Thread Michel Hubert
Hi, I'm thinking of upgdrading our kafka cluster to 0.9. Will this be a problem for the Spark Streaming + Kafka Direct Approach Integration using artifact spark-streaming-kafka_2.10 (1.6.1)? groupId = org.apache.spark artifactId = spark-streaming-kafka_2.10 version = 1.6.1 Becaus

Re: Issue with Spark Streaming UI

2016-05-15 Thread Mich Talebzadeh
http://talebzadehmich.wordpress.com On 14 May 2016 at 07:26, Sachin Janani wrote: > Hi, > I'm trying to run a simple spark streaming application with File Streaming > and its working properly but when I try to monitor the number of events in > the Streaming Ui it shows that as 0.Is this a

Re: Logistic Regression in Spark Streaming

2016-05-27 Thread Alonso Isidoro Roman
I do not have any experience using LR in spark, but you can see that LR is already implemented in mllib. http://spark.apache.org/docs/latest/mllib-linear-methods.html Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman

Re: Logistic Regression in Spark Streaming

2016-05-27 Thread kundan kumar
Agree, we have logistic regression example. I was looking for its counterpart to "StreamingLinearRegressionWithSGD". On Fri, May 27, 2016 at 1:16 PM, Alonso Isidoro Roman wrote: > I do not have any experience using LR in spark, but you can see that LR is > already implemented in mllib. > > http

Spark Streaming - Is window() caching DStreams?

2016-05-27 Thread Marco1982
Dear all, Can someone please explain me how Spark Streaming executes the window() operation? From the Spark 1.6.1 documentation, it seems that windowed batches are automatically cached in memory, but looking at the web UI it seems that operations already executed in previous batches are executed

Spark Streaming - long garbage collection time

2016-06-03 Thread Marco1982
Hi all, I'm running a Spark Streaming application with 1-hour batches to join two data feeds and write the output to disk. The total size of one data feed is about 40 GB per hour (split in multiple files), while the size of the second data feed is about 600-800 MB per hour (also split in mul

Spark streaming micro batch failure handling

2016-06-07 Thread aviemzur
Hi, A question about spark streaming handling of failed micro batch. After a certain amount of task failures, there are no more retries, and the entire batch fails. What seems to happen next is that this batch is ignored and the next micro batch begins, which means not all the data has been

Spark Streaming stateful operation to HBase

2016-06-08 Thread soumick dasgupta
Hi, I am using mapwithstate to keep the state and then ouput the result to HBase. The problem I am facing is when there are no files arriving, the RDD is still emitting the previous state result due to the checkpoint. Is there a way I can restrict not to write that result to HBase, i.e., when the

Re: Processing Time Spikes (Spark Streaming)

2016-06-09 Thread christian.dancu...@rbc.com
What version of Spark are you running? Do you see the heap space slowly increase over time? Have you set the ttl cleaner? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Processing-Time-Spikes-Spark-Streaming-tp22375p27130.html Sent from the Apache Spark

Long Running Spark Streaming getting slower

2016-06-10 Thread john.simon
Hi all, I'm running Spark Streaming with Kafka Direct Stream, but after running a couple of days, the batch processing time almost doubles. I didn't find any slowdown on JVM GC logs, but I did find that Spark broadcast variable reading time increasing. Initially it takes less than 10ms,

Handle empty kafka in Spark Streaming

2016-06-15 Thread Yogesh Vyas
Hi, Does anyone knows how to handle empty Kafka while Spark Streaming job is running ? Regards, Yogesh - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

spark streaming application - deployment best practices

2016-06-15 Thread vimal dinakaran
don't have hadoop or S3 environment. This mode of deployment is inconvenient. I could do spark submit from one node in client mode but it doesn't provide high availablity . What is the best way to deploy spark streaming applications in production ? Thanks Vimal

Spark Streaming 1.5.2+Kafka+Python (docs)

2015-12-23 Thread Vyacheslav Yanuk
Colleagues Documents written about createDirectStream that "This does not use Zookeeper to store offsets. The consumed offsets are tracked by the stream itself. For interoperability with Kafka monitoring tools that depend on Zookeeper, you have to update Kafka/Zookeeper yourself from the streamin

Spark Streaming: process only last events

2016-01-06 Thread Julien Naour
when one process finished it would be ideal that Spark Streaming skip all events that are not the current last ones (by key). I'm not sure that the solution could be done using only Spark Streaming API. As I understand Spark Streaming, DStream RDD will accumulate and be processed one by one a

Re: [Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Dibyendu Bhattacharya
PARK-10900 >>> >>> Pozdrawiam, >>> Jacek >>> >>> Jacek Laskowski | https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark >>> ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ >>> Follow me at https://twit

Re: [Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Dibyendu Bhattacharya
Some discussion is there in https://github.com/dibbhatt/kafka-spark-consumer and some is mentioned in https://issues.apache.org/jira/browse/SPARK-11045 Let me know if those answer your question . In short, Direct Stream is good choice if you need exact once semantics and message ordering , but ma

Spark Streaming: BatchDuration and Processing time

2016-01-17 Thread pyspark2555
is longer than 1 second? What happens in the next batch duration? Thanks. Amit -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html Sent from the Apache Spark User List mailing list archive at

RE: visualize data from spark streaming

2016-01-20 Thread Darren Govoni
Gotta roll your own. Look at kafka and websockets for example. Sent from my Verizon Wireless 4G LTE smartphone Original message From: patcharee Date: 01/20/2016 2:54 PM (GMT-05:00) To: user@spark.apache.org Subject: visualize data from spark streaming Hi, How to

[Spark Streaming][Problem with DataFrame UDFs]

2016-01-20 Thread jpocalan
.1001560.n3.nabble.com/Spark-Streaming-Problem-with-DataFrame-UDFs-tp26024.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands

Re: visualize data from spark streaming

2016-01-20 Thread Vinay Shukla
Or you can use Zeppelin notebook to visualize Spark Streaming. See https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2hvcnRvbndvcmtzLWdhbGxlcnkvemVwcGVsaW4tbm90ZWJvb2tzL21hc3Rlci8yQjUyMlYzWDgvbm90ZS5qc29u and other examples https://github.com/hortonworks

Re: visualize data from spark streaming

2016-01-20 Thread Silvio Fiorito
You’ve got a few options: * Use a notebook tool such as Zeppelin, Jupyter, or Spark Notebook to write up some visualizations which update in time with your streaming batches * Use Spark Streaming to push your batch results to another 3rd-party system with a BI tool that supports

Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
Hi All , What is the best way to tell spark streaming job for the no of partition to to a given topic - Should that be provided as a parameter or command line argument or We should connect to kafka in the driver program and query it Map fromOffsets = new HashMap(); fromOffsets.put(new

Re: FAIR scheduler in Spark Streaming

2016-01-26 Thread Shixiong(Ryan) Zhu
r and sometimes will surprise you. On Tue, Jan 26, 2016 at 9:57 AM, Sebastian Piu wrote: > Hi, > > I'm trying to get *FAIR *scheduling to work in a spark streaming app > (1.6.0). > > I've found a previous mailing list where it is indicated to do: > > dstream.f

Re: FAIR scheduler in Spark Streaming

2016-01-26 Thread Sebastian Piu
eep in mind that setting it to a bigger number will allow jobs of several > batches running at the same time. It's hard to predicate the behavior and > sometimes will surprise you. > > On Tue, Jan 26, 2016 at 9:57 AM, Sebastian Piu > wrote: > >> Hi, >> >>

Re: spark streaming input rate strange

2016-01-27 Thread Akhil Das
How are you verifying the data dropping? Can you send 10k, 20k events and write the same to an output location from spark streaming and verify it? If you are finding a data mismatch then its a problem with your MulticastSocket implementation. Thanks Best Regards On Fri, Jan 22, 2016 at 5:44 PM

Re: Spark Streaming from existing RDD

2016-01-29 Thread Shixiong(Ryan) Zhu
Do you just want to write some unit tests? If so, you can use "queueStream" to create a DStream from a queue of RDDs. However, because it doesn't support metadata checkpointing, it's better to only use it in unit tests. On Fri, Jan 29, 2016 at 7:35 AM, Sateesh Karuturi < sateesh.karutu...@gmail.co

java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread SRK
Hi, I see the following error in Spark Streaming with Kafka Direct. I think that this error is related to Kafka topic. Any suggestions on how to avoid this error would be of great help. java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala

Access batch statistics in Spark Streaming

2016-02-08 Thread Chen Song
Apologize in advance if someone has already asked and addressed this question. In Spark Streaming, how can I programmatically get the batch statistics like schedule delay, total delay and processing time (They are shown in the job UI streaming tab)? I need such information to raise alerts in some

Dynamically Change Log Level Spark Streaming

2016-02-08 Thread Ashish Soni
Hi All , How do change the log level for the running spark streaming Job , Any help will be appriciated. Thanks,

Re: Skip empty batches - spark streaming

2016-02-11 Thread Shixiong(Ryan) Zhu
Are you using a custom input dstream? If so, you can make the `compute` method return None to skip a batch. On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu wrote: > I was wondering if there is there any way to skip batches with zero events > when streaming? > By skip I mean avoid the empty rdd fr

Re: Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
I'm using the Kafka direct stream api but I can have a look on extending it to have this behaviour Thanks! On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" wrote: > Are you using a custom input dstream? If so, you can make the `compute` > method return None to skip a batch. > > On Thu, Feb 11, 201

Re: Skip empty batches - spark streaming

2016-02-11 Thread Shixiong(Ryan) Zhu
Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. Feel free to send a PR to improve it. On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu wrote: > I'm using the Kafka direct stream api but I can have a look on extending > it to have this behaviour > > Thanks! > On 11 Feb 2016 9

Re: Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
Yes, and as far as I recall it also has partitions (empty) which screws up the isEmpty call if the rdd has been transformed down the line. I will have a look tomorrow at the office and see if I can collaborate On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" wrote: > Yeah, DirectKafkaInputDStream a

Re: Skip empty batches - spark streaming

2016-02-11 Thread Andy Davidson
te: Thursday, February 11, 2016 at 1:19 PM To: "Shixiong (Ryan) Zhu" Cc: Sebastian Piu , "user @spark" Subject: Re: Skip empty batches - spark streaming > > Yes, and as far as I recall it also has partitions (empty) which screws up the > isEmpty call if the rdd

Re: Skip empty batches - spark streaming

2016-02-11 Thread Cody Koeninger
Please don't change the behavior of DirectKafkaInputDStream. Returning an empty rdd is (imho) the semantically correct thing to do, and some existing jobs depend on that behavior. If it's really an issue for you, you can either override directkafkainputdstream, or just check isEmpty as the first t

Re: Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
Thanks for clarifying Cody. I will extend the current behaviour for my use case. If there is anything worth sharing I'll run it through the list Cheers On 11 Feb 2016 9:47 p.m., "Cody Koeninger" wrote: > Please don't change the behavior of DirectKafkaInputDStream. > Returning an empty rdd is (im

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread ayan guha
I have a slightly different understanding. Direct stream generates 1 RDD per batch, however, number of partitions in that RDD = number of partitions in kafka topic. On Wed, Feb 17, 2016 at 12:18 PM, Cyril Scetbon wrote: > Hi guys, > > I'm making some tests with Spark and Kafka using a Python sc

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Your understanding is the right one (having re-read the documentation). Still wondering how I can verify that 5 partitions have been created. My job is reading from a topic in Kafka that has 5 partitions and sends the data to E/S. I can see that when there is one task to read from Kafka there ar

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread ayan guha
Hi You can always use RDD properties, which already has partition information. https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html On Wed, Feb 17, 2016 at 2:36 PM, Cyril Scetbon wrote: > Your understanding i

Spark Streaming with Kafka Use Case

2016-02-17 Thread Abhishek Anand
I have a spark streaming application running in production. I am trying to find a solution for a particular use case when my application has a downtime of say 5 hours and is restarted. Now, when I start my streaming application after 5 hours there would be considerable amount of data then in the

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cyril Scetbon
I don't think we can print an integer value in a spark streaming process As opposed to a spark job. I think I can print the content of an rdd but not debug messages. Am I wrong ? Cyril Scetbon > On Feb 17, 2016, at 12:51 AM, ayan guha wrote: > > Hi > > You can alway

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cody Koeninger
You can print whatever you want wherever you want, it's just a question of whether it's going to show up on the driver or the various executors logs On Wed, Feb 17, 2016 at 5:50 AM, Cyril Scetbon wrote: > I don't think we can print an integer value in a spark streaming process

Communication between two spark streaming Job

2016-02-19 Thread Ashish Soni
Hi , Is there any way we can communicate across two different spark streaming job , as below is the scenario we have two spark streaming job one to process metadata and one to process actual data ( this needs metadata ) So if someone did the metadata update we need to update the cache

Constantly increasing Spark streaming heap memory

2016-02-20 Thread Walid LEZZAR
Hi, I'm running a Spark Streaming job that pulls data from Kafka (using the direct approach method - without receiver) and pushes it into elasticsearch. The job is running fine but I was suprised once I opened jconsole to monitor it : I noticed that the heap memory is constantly increasing

Fwd: Evaluating spark streaming use case

2016-02-21 Thread Jatin Kumar
Hello Spark users, I have to aggregate messages from kafka and at some fixed interval (say every half hour) update a memory persisted RDD and run some computation. This computation uses last one day data. Steps are: - Read from realtime Kafka topic X in spark streaming batches of 5 seconds

Re: Evaluating spark streaming use case

2016-02-21 Thread Gerard Maas
computation uses last one day data. Steps are: > > - Read from realtime Kafka topic X in spark streaming batches of 5 seconds > - Filter the above DStream messages and keep some of them > - Create windows of 30 minutes on above DStream and aggregate by Key > - Merge this 30 minute RD

Re: Evaluating spark streaming use case

2016-02-21 Thread Ted Yu
> wrote: > >> Hello Spark users, >> >> I have to aggregate messages from kafka and at some fixed interval (say >> every half hour) update a memory persisted RDD and run some computation. >> This computation uses last one day data. Steps are: >> >> - R

Re: Evaluating spark streaming use case

2016-02-21 Thread Chris Fregly
n to be in. also, is this spark 1.6 with the new mapState() or the old updateStateByKey()? you definitely want the newer 1.6 mapState(). and is there any other way to store and aggregate this data outside of spark? I get a bit nervous when I see people treat spark/streaming like an in-memory

Re: Evaluating spark streaming use case

2016-02-21 Thread Ted Yu
> also, is this spark 1.6 with the new mapState() or the old > updateStateByKey()? you definitely want the newer 1.6 mapState(). > > and is there any other way to store and aggregate this data outside of > spark? I get a bit nervous when I see people treat spark/streaming like

Spark streaming not remembering previous state

2016-02-27 Thread Vinti Maheshwari
Hi All, I wrote spark streaming program with stateful transformation. It seems like my spark streaming application is doing computation correctly with check pointing. But i terminate my program and i start it again, it's not reading the previous checkpointing data and staring from the begi

Spark streaming: StorageLevel.MEMORY_AND_DISK_SER setting for KafkaUtils.createDirectStream

2016-03-02 Thread Vinti Maheshwari
Hi All, I wanted to set *StorageLevel.MEMORY_AND_DISK_SER* in my spark-streaming program as currently i am getting MetadataFetchFailedException*. *I am not sure where i should pass StorageLevel.MEMORY_AND_DISK, as it seems like createDirectStream doesn't allow to pass that parameter.

Number Of Jobs In Spark Streaming

2016-03-04 Thread Sandip Mehta
Hi All, Is it fair to say that, number of jobs in a given spark streaming application is equal to number of actions in an application? Regards Sandeep - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Spark Streaming fileStream vs textFileStream

2016-03-06 Thread Yuval.Itzchakov
HDFS file */ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fileStream-vs-textFileStream-tp26407p26410.html Sent from the Apache Spark User List mailing list archive at

Re: kill Spark Streaming job gracefully

2016-03-14 Thread Shams ul Haque
Any one have any idea? or should i raise a bug for that? Thanks, Shams On Fri, Mar 11, 2016 at 3:40 PM, Shams ul Haque wrote: > Hi, > > I want to kill a Spark Streaming job gracefully, so that whatever Spark > has picked from Kafka have processed. My Spark version is: 1.6.0 >

Spark Streaming Twiiter and Standalone cluster

2016-03-14 Thread palamaury
Hi, I have an issue using Spark Streaming with a Spark Standalone cluster, my job is well submitted but the workers seem to be unreachable. To build the project I'musing sbt-assembly. My version of spark is 1.6.0. here is my streaming conf: val sparkConf = new SparkConf() .setAp

Docker configuration for akka spark streaming

2016-03-14 Thread David Gomez Saavedra
hi everyone, I'm trying to set up spark streaming using akka with a similar example of the word count provided. When using spark master in local mode everything works but when I try to run it the driver and executors using docker I get the following exception 16/03/14 20:32:03

Re: Spark Streaming - Inserting into Tables

2015-07-14 Thread Tathagata Das
Why is .remember not ideal? On Sun, Jul 12, 2015 at 7:22 PM, Brandon White wrote: > Hi Yin, > > Yes there were no new rows. I fixed it by doing a .remember on the > context. Obviously, this is not ideal. > > On Sun, Jul 12, 2015 at 6:31 PM, Yin Huai wrote: > >> Hi Brandon, >> >> Can you explai

spark streaming job to hbase write

2015-07-15 Thread Shushant Arora
Hi I have a requirement of writing in hbase table from Spark streaming app after some processing. Is Hbase put operation the only way of writing to hbase or is there any specialised connector or rdd of spark for hbase write. Should Bulk load to hbase from streaming app be avoided if output of

Spark streaming Processing time keeps increasing

2015-07-15 Thread N B
Hello, We have a Spark streaming application and the problem that we are encountering is that the batch processing time keeps on increasing and eventually causes the application to start lagging. I am hoping that someone here can point me to any underlying cause of why this might happen. The

spark streaming 1.3 coalesce on kafkadirectstream

2015-07-20 Thread Shushant Arora
does spark streaming 1.3 launches task for each partition offset range whether that is 0 or not ? If yes, how can I enforce it to not to launch tasks for empty rdds.Not able t o use coalesce on directKafkaStream. Shall we enforce repartitioning always before processing direct stream ? use case

Json parsing library for Spark Streaming?

2015-07-27 Thread swetha
Hi, What is the proper Json parsing library to use in Spark Streaming? Currently I am trying to use Gson library in a Java class and calling the Java method from a Scala class as shown below: What are the advantages of using Json4S as against using Gson library in a Java class and calling it from

Error in starting Spark Streaming Context

2015-07-29 Thread Sadaf
Hi I am new to Spark Streaming and writing a code for twitter connector. when i run this code more than one time, it gives the following exception. I have to create a new hdfs directory for checkpointing each time to make it run successfully and moreover it doesn't get stopped.

Re: Graceful shutdown for Spark Streaming

2015-07-29 Thread Tathagata Das
terminated. On Wed, Jul 29, 2015 at 6:43 AM, Michal Čizmazia wrote: > How to initiate graceful shutdown from outside of the Spark Streaming > driver process? Both for the local and cluster mode of Spark Standalone as > well as EMR. > > Does sbin/stop-all.sh stop the context gracefully?

Re: Graceful shutdown for Spark Streaming

2015-07-29 Thread anshu shukla
e Spark cluster is the terminated. > > On Wed, Jul 29, 2015 at 6:43 AM, Michal Čizmazia > wrote: > >> How to initiate graceful shutdown from outside of the Spark Streaming >> driver process? Both for the local and cluster mode of Spark Standalone as >> well as EMR.

Re: Graceful shutdown for Spark Streaming

2015-07-30 Thread Tathagata Das
:43 AM, Michal Čizmazia >> wrote: >> >>> How to initiate graceful shutdown from outside of the Spark Streaming >>> driver process? Both for the local and cluster mode of Spark Standalone as >>> well as EMR. >>> >>> Does sbin/stop-all.sh sto

Re: Graceful shutdown for Spark Streaming

2015-07-30 Thread anshu shukla
n safely terminate the Spark cluster. They are two different >>> steps and needs to be done separately ensuring that the driver process has >>> been completely terminated before the Spark cluster is the terminated. >>> >>> On Wed, Jul 29, 2015 at 6:43 AM, Michal Čizma

Re: Upgrade of Spark-Streaming application

2015-07-30 Thread Cody Koeninger
ries - Store the offsets somewhere other than the checkpoint, and provide them on startup using the fromOffsets argument to createDirectStream On Thu, Jul 30, 2015 at 4:07 AM, Nicola Ferraro wrote: > Hi, > I've read about the recent updates about spark-streaming integration with

spark streaming max receiver rate doubts

2015-08-03 Thread Shushant Arora
jobs for interval 2-5 sec be queued and created afterwards or should not be created since all messages are already processed for those interval also? 2.In spark streaming 1.2(Receiver based) if I don't set spark.streaming.receiver.maxRate - will it consume all messages from last offset or it

Re: Upgrade of Spark-Streaming application

2015-08-05 Thread Shushant Arora
ts argument to createDirectStream > > > > > > On Thu, Jul 30, 2015 at 4:07 AM, Nicola Ferraro > wrote: > >> Hi, >> I've read about the recent updates about spark-streaming integration with >> Kafka (I refer to the new approach without receivers). >>

<    7   8   9   10   11   12   13   14   15   16   >