Hey Kant,
That won't work either. Your second query may fail, and as long as your
first query is running, you will not know. Put this as the last line
instead:
spark.streams.awaitAnyTermination()
On Tue, Sep 19, 2017 at 10:11 PM, kant kodali wrote:
> Looks like my problem was the order of awai
Hi,
Cody's right.
subscribe - Topic subscription strategy that accepts topic names as a
comma-separated string, e.g. topic1,topic2,topic3 [1]
subscribepattern - Topic subscription strategy that uses Java’s
java.util.regex.Pattern for the topic subscription regex pattern of topics
to subscribe to
Looks like my problem was the order of awaitTermination() for some reason.
*Doesn't work *
outputDS1.writeStream().trigger(Trigger.processingTime(1000)).foreach(new
KafkaSink("hello1")).start().awaitTermination()
outputDS2.writeStream().trigger(Trigger.processingTime(1000)).foreach(new
KafkaSink
Hi,
Ah, right! Start the queries and once they're running, awaitTermination
them.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spar
Hi,
What's the code in readFromKafka to read from hello2 and hello1?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me a
Looks like my problem was the order of awaitTermination() for some reason.
Doesn't work
On Tue, Sep 19, 2017 at 1:54 PM, kant kodali wrote:
> Hi All,
>
> I have the following Psuedo code (I could paste the real code however it
> is pretty long and involves Database calls inside dataset.map
To set Spark2 as default, refer
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_admin.html#default_tools
-Original Message-
From: Gaurav1809 [mailto:gauravhpan...@gmail.com]
Sent: Wednesday, September 20, 2017 9:16 AM
To: user@spark.apache.org
Subject: Cloudera - How t
Hello all,
I downloaded CDH and it comes with Spark 1.6
As per the step by step guide given - I added Spark 2 in the services list.
Now I can see both Spark 1.6 & Spark 2 And when I do Spark-Shell in terminal
window, it starts with Spark 1.6 only. How to switch to Spark 2? What all
_HOMEs or param
---
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
Hi All,
I have the following Psuedo code (I could paste the real code however it is
pretty long and involves Database calls inside dataset.map operation and so
on) so I am just trying to simplify my question. would like to know if
there is something wrong with the following pseudo code?
DataSet i
You should be able to pass a comma separated string of topics to
subscribe. subscribePattern isn't necessary
On Tue, Sep 19, 2017 at 2:54 PM, kant kodali wrote:
> got it! Sorry.
>
> On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote:
>>
>> Hi,
>>
>> Use subscribepattern
>>
>> You haven't
This may also be related to
https://issues.apache.org/jira/browse/SPARK-22033
On Tue, Sep 19, 2017 at 3:40 PM, Mark Bittmann wrote:
> I've run into this before. The EigenValueDecomposition creates a Java
> Array with 2*k*n elements. The Java Array is indexed with a native integer
> type, so 2*k*
got it! Sorry.
On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote:
> Hi,
>
> Use subscribepattern
>
> You haven't googled well enough --> https://jaceklaskowski.
> gitbooks.io/spark-structured-streaming/spark-sql-streaming-
> KafkaSource.html :)
>
> Pozdrawiam,
> Jacek Laskowski
>
> ht
Hi,
Use subscribepattern
You haven't googled well enough -->
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSource.html
:)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-
HI All,
I am wondering How to read from multiple kafka topics using structured
streaming (code below)? I googled prior to asking this question and I see
responses related to Dstreams but not structured streams. Is it possible to
read multiple topics using the same spark structured stream?
sparkSe
I've run into this before. The EigenValueDecomposition creates a Java Array
with 2*k*n elements. The Java Array is indexed with a native integer type,
so 2*k*n cannot exceed Integer.MAX_VALUE values.
The array is created here:
https://github.com/apache/spark/blob/master/mllib/src/
main/scala/org/a
I ended up doing this for the time being. It works but I *think* that
timestamp seems like a rational partitionColumn and I'm wondering if
there's a more built in way:
> df = spark.read.jdbc(
>
> url=os.environ["JDBCURL"],
>
> table="schema.table",
>
> predicates=predicates
>
> )
>
Have you tried to cache? maybe after the collect() and before the map?
> On Sep 19, 2017, at 7:20 AM, Daniel O' Shaughnessy
> wrote:
>
> Thanks for your response Jean.
>
> I managed to figure this out in the end but it's an extremely slow solution
> and not tenable for my use-case:
>
>
Hello guys,
While trying to compute SVD using computeSVD() function, i am getting the
following warning with the follow up exception:
17/09/14 12:29:02 WARN RowMatrix: computing svd with k=49865 and n=191077,
please check necessity
IllegalArgumentException: u'requirement failed: k = 49865 and/or n
How big is the list of fruits in your example? Can you broadcast it?
On Tue, 19 Sep 2017 at 9:21 pm, Daniel O' Shaughnessy <
danieljamesda...@gmail.com> wrote:
> Thanks for your response Jean.
>
> I managed to figure this out in the end but it's an extremely slow
> solution and not tenable for my
Hi,
I've just noticed the new "avg hash probe" metric for HashAggregateExec
operator [1].
My understanding (after briefly going through the code in the change and
around) is as follows:
Average hash map probe per lookup (i.e. `numProbes` / `numKeyLookups`)
NOTE: `numProbes` and `numKeyLookups`
Thanks for your response Jean.
I managed to figure this out in the end but it's an extremely slow solution
and not tenable for my use-case:
val rddX = dfWithSchema.select("event_name").rdd.map(_.getString(0).split(
",").map(_.trim replaceAll ("[\\[\\]\"]", "")).toList)
//val oneRow =
Converted(ev
Thanks Cody,
It worked for me buy keeping num executor with each having 1 core = num of
partitions of kafka.
On Mon, Sep 18, 2017 at 8:47 PM Cody Koeninger wrote:
> Have you searched in jira, e.g.
>
> https://issues.apache.org/jira/browse/SPARK-19185
>
> On Mon, Sep 18, 2017 at 1:56 AM, HARSH
Hi,
I've a csv dataset which has a column with all the details of store open
and close timings as per dates, but the data is highly variant, as follows -
Mon-Fri 10am-9pm, Sat 10am-8pm, Sun 12pm-6pm
Mon-Sat 10am-8pm, Sun Closed
Mon-Sat 10am-8pm, Sun 10am-6pm
Mon-Friday 9-8 / Saturday 10-7 / Sund
Hi All,
Apologies for cross posting, I posted this on Kafka user group. Is there
any way that I can use kerberos ticket cache of the spark executor for
reading from secured kafka? I am not 100% that the executor would do a
kinit, but I presume so to be able to run code / read hdfs etc as the user
26 matches
Mail list logo