date:20170919

Re: Structured streaming coding question

2017-09-19 Thread Burak Yavuz

Hey Kant, That won't work either. Your second query may fail, and as long as your first query is running, you will not know. Put this as the last line instead: spark.streams.awaitAnyTermination() On Tue, Sep 19, 2017 at 10:11 PM, kant kodali wrote: > Looks like my problem was the order of awai

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread Jacek Laskowski

Hi, Cody's right. subscribe - Topic subscription strategy that accepts topic names as a comma-separated string, e.g. topic1,topic2,topic3 [1] subscribepattern - Topic subscription strategy that uses Java’s java.util.regex.Pattern for the topic subscription regex pattern of topics to subscribe to

Re: Structured streaming coding question

2017-09-19 Thread kant kodali

Looks like my problem was the order of awaitTermination() for some reason. *Doesn't work * outputDS1.writeStream().trigger(Trigger.processingTime(1000)).foreach(new KafkaSink("hello1")).start().awaitTermination() outputDS2.writeStream().trigger(Trigger.processingTime(1000)).foreach(new KafkaSink

Re: Structured streaming coding question

2017-09-19 Thread Jacek Laskowski

Hi, Ah, right! Start the queries and once they're running, awaitTermination them. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spar

Re: Structured streaming coding question

2017-09-19 Thread Jacek Laskowski

Hi, What's the code in readFromKafka to read from hello2 and hello1? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me a

Re: Structured streaming coding question

2017-09-19 Thread kant kodali

Looks like my problem was the order of awaitTermination() for some reason. Doesn't work On Tue, Sep 19, 2017 at 1:54 PM, kant kodali wrote: > Hi All, > > I have the following Psuedo code (I could paste the real code however it > is pretty long and involves Database calls inside dataset.map

RE: Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12

2017-09-19 Thread Sudha KS

To set Spark2 as default, refer https://www.cloudera.com/documentation/spark2/latest/topics/spark2_admin.html#default_tools -Original Message- From: Gaurav1809 [mailto:gauravhpan...@gmail.com] Sent: Wednesday, September 20, 2017 9:16 AM To: user@spark.apache.org Subject: Cloudera - How t

Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12

2017-09-19 Thread Gaurav1809

Hello all, I downloaded CDH and it comes with Spark 1.6 As per the step by step guide given - I added Spark 2 in the services list. Now I can see both Spark 1.6 & Spark 2 And when I do Spark-Shell in terminal window, it starts with Spark 1.6 only. How to switch to Spark 2? What all _HOMEs or param

unsubscribe

2017-09-19 Thread shshann

--- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its

unsubscribe

2017-09-19 Thread Brindha Sengottaiyan

Structured streaming coding question

2017-09-19 Thread kant kodali

Hi All, I have the following Psuedo code (I could paste the real code however it is pretty long and involves Database calls inside dataset.map operation and so on) so I am just trying to simplify my question. would like to know if there is something wrong with the following pseudo code? DataSet i

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread Cody Koeninger

You should be able to pass a comma separated string of topics to subscribe. subscribePattern isn't necessary On Tue, Sep 19, 2017 at 2:54 PM, kant kodali wrote: > got it! Sorry. > > On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote: >> >> Hi, >> >> Use subscribepattern >> >> You haven't

Re: SVD computation limit

2017-09-19 Thread Vadim Semenov

This may also be related to https://issues.apache.org/jira/browse/SPARK-22033 On Tue, Sep 19, 2017 at 3:40 PM, Mark Bittmann wrote: > I've run into this before. The EigenValueDecomposition creates a Java > Array with 2*k*n elements. The Java Array is indexed with a native integer > type, so 2*k*

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread kant kodali

got it! Sorry. On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote: > Hi, > > Use subscribepattern > > You haven't googled well enough --> https://jaceklaskowski. > gitbooks.io/spark-structured-streaming/spark-sql-streaming- > KafkaSource.html :) > > Pozdrawiam, > Jacek Laskowski > > ht

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread Jacek Laskowski

Hi, Use subscribepattern You haven't googled well enough --> https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSource.html :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-

How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread kant kodali

HI All, I am wondering How to read from multiple kafka topics using structured streaming (code below)? I googled prior to asking this question and I see responses related to Dstreams but not structured streams. Is it possible to read multiple topics using the same spark structured stream? sparkSe

Re: SVD computation limit

2017-09-19 Thread Mark Bittmann

I've run into this before. The EigenValueDecomposition creates a Java Array with 2*k*n elements. The Java Array is indexed with a native integer type, so 2*k*n cannot exceed Integer.MAX_VALUE values. The array is created here: https://github.com/apache/spark/blob/master/mllib/src/ main/scala/org/a

Re: Question on partitionColumn for a JDBC read using a timestamp from MySql

2017-09-19 Thread lucas.g...@gmail.com

I ended up doing this for the time being. It works but I *think* that timestamp seems like a rational partitionColumn and I'm wondering if there's a more built in way: > df = spark.read.jdbc( > > url=os.environ["JDBCURL"], > > table="schema.table", > > predicates=predicates > > ) >

Re: Nested RDD operation

2017-09-19 Thread Jean Georges Perrin

Have you tried to cache? maybe after the collect() and before the map? > On Sep 19, 2017, at 7:20 AM, Daniel O' Shaughnessy > wrote: > > Thanks for your response Jean. > > I managed to figure this out in the end but it's an extremely slow solution > and not tenable for my use-case: > >

SVD computation limit

2017-09-19 Thread Alexander Ovcharenko

Hello guys, While trying to compute SVD using computeSVD() function, i am getting the following warning with the follow up exception: 17/09/14 12:29:02 WARN RowMatrix: computing svd with k=49865 and n=191077, please check necessity IllegalArgumentException: u'requirement failed: k = 49865 and/or n

Re: Nested RDD operation

2017-09-19 Thread ayan guha

How big is the list of fruits in your example? Can you broadcast it? On Tue, 19 Sep 2017 at 9:21 pm, Daniel O' Shaughnessy < danieljamesda...@gmail.com> wrote: > Thanks for your response Jean. > > I managed to figure this out in the end but it's an extremely slow > solution and not tenable for my

Uses of avg hash probe metric in HashAggregateExec?

2017-09-19 Thread Jacek Laskowski

Hi, I've just noticed the new "avg hash probe" metric for HashAggregateExec operator [1]. My understanding (after briefly going through the code in the change and around) is as follows: Average hash map probe per lookup (i.e. `numProbes` / `numKeyLookups`) NOTE: `numProbes` and `numKeyLookups`

Re: Nested RDD operation

2017-09-19 Thread Daniel O' Shaughnessy

Thanks for your response Jean. I managed to figure this out in the end but it's an extremely slow solution and not tenable for my use-case: val rddX = dfWithSchema.select("event_name").rdd.map(_.getString(0).split( ",").map(_.trim replaceAll ("[\\[\\]\"]", "")).toList) //val oneRow = Converted(ev

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-19 Thread HARSH TAKKAR

Thanks Cody, It worked for me buy keeping num executor with each having 1 core = num of partitions of kafka. On Mon, Sep 18, 2017 at 8:47 PM Cody Koeninger wrote: > Have you searched in jira, e.g. > > https://issues.apache.org/jira/browse/SPARK-19185 > > On Mon, Sep 18, 2017 at 1:56 AM, HARSH

Help needed in Dividing open close dates column into multiple columns in dataframe

2017-09-19 Thread Aakash Basu

Hi, I've a csv dataset which has a column with all the details of store open and close timings as per dates, but the data is highly variant, as follows - Mon-Fri 10am-9pm, Sat 10am-8pm, Sun 12pm-6pm Mon-Sat 10am-8pm, Sun Closed Mon-Sat 10am-8pm, Sun 10am-6pm Mon-Friday 9-8 / Saturday 10-7 / Sund

Spark Executor - jaas.conf with useTicketCache=true

2017-09-19 Thread Hugo Reinwald

Hi All, Apologies for cross posting, I posted this on Kafka user group. Is there any way that I can use kerberos ticket cache of the spark executor for reading from secured kafka? I am not 100% that the executor would do a kinit, but I presume so to be able to run code / read hdfs etc as the user

Re: Structured streaming coding question

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

Re: Structured streaming coding question

Re: Structured streaming coding question

Re: Structured streaming coding question

Re: Structured streaming coding question

RE: Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12

Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12

unsubscribe

unsubscribe

Structured streaming coding question

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

Re: SVD computation limit

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

Re: SVD computation limit

Re: Question on partitionColumn for a JDBC read using a timestamp from MySql

Re: Nested RDD operation

SVD computation limit

Re: Nested RDD operation

Uses of avg hash probe metric in HashAggregateExec?

Re: Nested RDD operation

Re: ConcurrentModificationException using Kafka Direct Stream

Help needed in Dividing open close dates column into multiple columns in dataframe

Spark Executor - jaas.conf with useTicketCache=true

26 matches

Site Navigation

Mail list logo

Footer information