question on spark streaming based on event time

2017-01-28 Thread kant kodali
Hi All, I read through the documentation on Spark Streaming based on event time and how spark handles lags w.r.t processing time and so on.. but what if the lag is too long between the event time and processing time? other words what should I do if I am receiving yesterday's data (the timestamp

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Mark Hamstra
Try selecting a particular Job instead of looking at the summary page for all Jobs. On Sat, Jan 28, 2017 at 4:25 PM, Md. Rezaul Karim < rezaul.ka...@insight-centre.org> wrote: > Hi Jacek, > > I tried accessing Spark web UI on both Firefox and Google Chrome browsers > with ad blocker enabled. I

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi Jacek, I tried accessing Spark web UI on both Firefox and Google Chrome browsers with ad blocker enabled. I do see other options like* User, Total Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. However, I don't see an option for DAG visualization. Please note that

Re: Dynamic resource allocation to Spark on Mesos

2017-01-28 Thread Michael Gummelt
We've talked about that, but it hasn't become a priority because we haven't had a driving use case. If anyone has a good argument for "variable" resource allocation like this, please let me know. On Sat, Jan 28, 2017 at 9:17 AM, Shuai Lin wrote: > An alternative

Complex types handling with spark SQL and parquet

2017-01-28 Thread Antoine HOM
Hello everybody, I have been trying to use complex types (stored in parquet) with spark SQL and ended up having an issue that I can't seem to be able to solve cleanly. I was hoping, through this mail, to get some insights from the community, maybe I'm just missing something obvious in the way I'm

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Jacek Laskowski
Hi, Wonder if you have any adblocker enabled in your browser? Is this the only version giving you this behavior? All Spark jobs have no visualization? Jacek On 28 Jan 2017 7:03 p.m., "Md. Rezaul Karim" < rezaul.ka...@insight-centre.org> wrote: Hi All, I am running a Spark job on my local

Re: kafka structured streaming source refuses to read

2017-01-28 Thread Koert Kuipers
there was also already an existing spark ticket for this: SPARK-18779 On Sat, Jan 28, 2017 at 1:13 PM, Koert Kuipers wrote: > it seems the bug is: > https://issues.apache.org/jira/browse/KAFKA-4547 > > i would advise

Re: kafka structured streaming source refuses to read

2017-01-28 Thread Koert Kuipers
it seems the bug is: https://issues.apache.org/jira/browse/KAFKA-4547 i would advise everyone not to use kafka-clients 0.10.0.2, 0.10.1.0 or 0.10.1.1 On Fri, Jan 27, 2017 at 3:56 PM, Koert Kuipers wrote: > in case anyone else runs into this: > > the issue is that i was using

DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi All, I am running a Spark job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher,

Re: Cached table details

2017-01-28 Thread Shuai Lin
+1 for Jacek's suggestion FWIW: another possible *hacky* way is to write a package in org.apache.spark.sql namespace so it can access the sparkSession.sharedState.cacheManager. Then use scala reflection to read the cache manager's `cachedData` field, which can provide the list of cached

Re: Dynamic resource allocation to Spark on Mesos

2017-01-28 Thread Shuai Lin
> > An alternative behavior is to launch the job with the best resource offer > Mesos is able to give Michael has just made an excellent explanation about dynamic allocation support in mesos. But IIUC, what you want to achieve is something like (using RAM as an example) : "Launch each executor

Re: mapWithState question

2017-01-28 Thread shyla deshpande
Thats a great idea. I will try that. Thanks. On Sat, Jan 28, 2017 at 2:35 AM, Tathagata Das wrote: > 1 state object for each user. > union both streams into a single DStream, and apply mapWithState on it to > update the user state. > > On Sat, Jan 28, 2017 at 12:30

Re: spark architecture question -- Pleas Read

2017-01-28 Thread Sachin Naik
I strongly agree with Jorn and Russell. There are different solutions for data movement depending upon your needs frequency, bi-directional drivers. workflow, handling duplicate records. This is a space is known as " Change Data Capture - CDC" for short. If you need more information, I would be

Re: issue with running Spark streaming with spark-shell

2017-01-28 Thread Chetan Khatri
if you are using any other package give it as argument --packages On Sat, Jan 28, 2017 at 8:14 PM, Jacek Laskowski wrote: > Hi, > > How did you start spark-shell? > > Jacek > > On 28 Jan 2017 11:20 a.m., "Mich Talebzadeh" > wrote: > >> >> Hi, >> >>

Re: issue with running Spark streaming with spark-shell

2017-01-28 Thread Jacek Laskowski
Hi, How did you start spark-shell? Jacek On 28 Jan 2017 11:20 a.m., "Mich Talebzadeh" wrote: > > Hi, > > My spark-streaming application works fine when compiled with Maven with > uber jar file. > > With spark-shell this program throws an error as follows: > > scala>

[ANNOUNCE] Apache Bahir 2.0.2

2017-01-28 Thread Christian Kadner
The Apache Bahir PMC approved the release of Apache Bahir 2.0.2 which provides the following extensions for Apache Spark 2.0.2: - Akka Streaming - MQTT Streaming - MQTT Structured Streaming - Twitter Streaming - ZeroMQ Streaming For more information about Apache Bahir and to

Re: spark 2.02 error when writing to s3

2017-01-28 Thread Steve Loughran
On 27 Jan 2017, at 23:17, VND Tremblay, Paul > wrote: Not sure what you mean by "a consistency layer on top." Any explanation would be greatly appreciated! Paul netflix's s3mper: https://github.com/Netflix/s3mper EMR consistency:

Re: mapWithState question

2017-01-28 Thread Tathagata Das
1 state object for each user. union both streams into a single DStream, and apply mapWithState on it to update the user state. On Sat, Jan 28, 2017 at 12:30 AM, shyla deshpande wrote: > Can multiple DStreams manipulate a state? I have a stream that gives me > total

issue with running Spark streaming with spark-shell

2017-01-28 Thread Mich Talebzadeh
Hi, My spark-streaming application works fine when compiled with Maven with uber jar file. With spark-shell this program throws an error as follows: scala> val dstream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](streamingContext, kafkaParams, topics)

mapWithState question

2017-01-28 Thread shyla deshpande
Can multiple DStreams manipulate a state? I have a stream that gives me total minutes the user spent on a course material. I have another stream that gives me chapters completed and lessons completed by the user. I want to keep track for each user total_minutes, chapters_completed and

Re: spark architecture question -- Pleas Read

2017-01-28 Thread Jörn Franke
Hard to tell. Can you give more insights on what you try to achieve and what the data is about? For example, depending on your use case sqoop can make sense or not. > On 28 Jan 2017, at 02:14, Sirisha Cheruvu wrote: > > Hi Team, > > RIght now our existing flow is > >