Re: Kafka spark structure streaming out of memory issue

2020-08-13 Thread Srinivas V
It depends on how much memory is available and how much data you are processing. Please provide data size and cluster details to help. On Fri, Aug 14, 2020 at 12:54 AM km.santanu wrote: > Hi > I am using Kafka stateless structure streaming.i have enabled watermark as > 1 > hour.after long runnin

Re: Metrics Problem

2020-06-30 Thread Srinivas V
Then it should permission issue. What kind of cluster is it and which user is running it ? Does that user have hdfs permissions to access the folder where the jar file is ? On Mon, Jun 29, 2020 at 1:17 AM Bryan Jeffrey wrote: > Srinivas, > > Interestingly, I did have the metrics jar pa

Re: Metrics Problem

2020-06-26 Thread Srinivas V
One option is to create your main jar included with metrics jar like a fat jar. On Sat, Jun 27, 2020 at 8:04 AM Bryan Jeffrey wrote: > Srinivas, > > Thanks for the insight. I had not considered a dependency issue as the > metrics jar works well applied on the driver. Perhaps

Re: Metrics Problem

2020-06-26 Thread Srinivas V
It should work when you are giving hdfs path as long as your jar exists in the path. Your error is more security issue (Kerberos) or Hadoop dependencies missing I think, your error says : org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation On Fri, Jun 26, 2020 at 8:44 PM Brya

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

2020-06-26 Thread Srinivas V
Cool. Are you not using watermark ? Also, is it possible to start listening offsets from a specific date time ? Regards Srini On Sat, Jun 27, 2020 at 6:12 AM Eric Beabes wrote: > My apologies... After I set the 'maxOffsetsPerTrigger' to a value such as > '20' it started working. Hopefully

[spark-structured-streaming] [stateful]

2020-06-13 Thread Srinivas V
Does stateful structured streaming work on a stand-alone spark cluster with few nodes? Does it need hdfs ? If not how to get it working without hdfs ? Regards Srini

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
ok, thanks for confirming, I will do it this way. Regards Srini On Tue, Jun 9, 2020 at 11:31 PM Gerard Maas wrote: > Hi Srinivas, > > Reading from different brokers is possible but you need to connect to each > Kafka cluster separately. > Trying to mix connections to two

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
Thanks for the quick reply. This may work but I have like 5 topics to listen to right now, I am trying to keep all topics in an array in a properties file and trying to read all at once. This way it is dynamic and you have one code block like below and you may add or delete topics from the config f

[spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
Hello, In Structured Streaming, is it possible to have one spark application with one query to consume topics from multiple kafka clusters? I am trying to consume two topics each from different Kafka Cluster, but it gives one of the topics as an unknown topic and the job keeps running without com

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
hing < > mailinglist...@gmail.com> wrote: > >> Right, this is exactly how I've it right now. Problem is in the cluster >> mode 'myAcc' does NOT get distributed. Try it out in the cluster mode & you >> will see what I mean. >> >>

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
; > > > .mapGroupsWithState(timeoutConf = >> > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc) >> > > > >> > > > val query = wordCounts.writeStream >> > > > .outputMode(OutputMode.Update) >> > > > ... >> > &g

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
elUpdate call(String productId, Iterator > eventsIterator, GroupState state) { > } > } > > On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > >> >> Yes, accumulators are updated in the call method of StateUpdateTask. Like >> when state times out or when the data

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
gt; > > On Fri, May 29, 2020 at 6:51 AM Srinivas V wrote: > >> Yes it is application specific class. This is how java Spark Functions >> work. >> You can refer to this code in the documentation: >> https://github.com/apache/spark/blob/master/examples/src/main/java/o

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
plication specific class. Does it > have 'updateState' method or something? I googled but couldn't find any > documentation about doing it this way. Can you please direct me to some > documentation. Thanks. > > On Thu, May 28, 2020 at 4:43 AM Srinivas V wrote: > >&g

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
ue) > Thread.sleep(5 * 1000) > } > > //query.awaitTermination() > ``` > > And the accumulator value updated can be found from driver stdout. > > -- > Cheers, > -z > > On Thu, 28 May 2020 17:12:48 +0530 > Srinivas V wrote: > > > yes, I

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
'Stateful Structured Streaming'. In other streaming > applications it works as expected. > > > > On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote: > >> Yes, I am talking about Application specific Accumulators. Actually I am >> getting the values printed in

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Srinivas V
in > your code? I am talking about the Application Specific Accumulators. The > other standard counters such as 'event.progress.inputRowsPerSecond' are > getting populated correctly! > > On Mon, May 25, 2020 at 8:39 PM Srinivas V wrote: > >> Hello, >> Even for me

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Srinivas V
Hello, Even for me it comes as 0 when I print in OnQueryProgress. I use LongAccumulator as well. Yes, it prints on my local but not on cluster. But one consolation is that when I send metrics to Graphana, the values are coming there. On Tue, May 26, 2020 at 3:10 AM Something Something < mailinglis

[structured streaming] [stateful] Null value appeared in non-nullable field

2020-05-23 Thread Srinivas V
Hello, I am listening to a kaka topic through Spark Structured Streaming [2.4.5]. After processing messages for few mins, I am getting below NullPointerException.I have three beans used here 1.Event 2.StateInfo 3.SessionUpdateInfo. I am suspecting that the problem is with StateInfo, when it is wri

Re: GrupState limits

2020-05-12 Thread Srinivas V
If you are talking about total number of objects the state can hold, that depends on the executor memory you have on your cluster apart from rest of the memory required for processing. The state is stored in hdfs and retrieved while processing the next events. If you maintain million objects with e

Re: Spark structured streaming - performance tuning

2020-05-08 Thread Srinivas V
Anyone else can answer below questions on performance tuning Structured streaming? @Jacek? On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and t

Re: Spark structured streaming - performance tuning

2020-05-02 Thread Srinivas V
ith Kafka, every partition in Kafka is mapped into Spark > partition. And in Spark, every partition is mapped into task. But you can > use `coalesce` to decrease the number of Spark partitions, so you'll have > less tasks... > > Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Srinivas V
of > each partition is a separate task that need to be executed, so you need to > plan number of cores correspondingly. > > Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: > SV> Hello, > SV> Can someone point me to a good video or document which takes about &g

Spark structured streaming - performance tuning

2020-04-16 Thread Srinivas V
Hello, Can someone point me to a good video or document which takes about performance tuning for structured streaming app? I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . Trying to figure out best cluster size and number of executors and cores required.

Re: Spark Streaming not working

2020-04-10 Thread Srinivas V
Check if your broker details are correct, verify if you have network connectivity to your client box and Kafka broker server host. On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh wrote: > Hi, > I have a spark streaming application where Kafka is producing > records but unfortunately spa

Re: spark structured streaming GroupState returns weird values from sate

2020-03-31 Thread Srinivas V
non getter methods of the fields defined? Still how is that causing the state object get corrupt so much? On Sat, Mar 28, 2020 at 7:46 PM Srinivas V wrote: > Ok, I will try to create some simple code to reproduce, if I can. Problem > is that I am adding this code in an existing big projec

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
of the business logic which you may want to > redact, and provide full of source code which reproduces the bug? > > On Sat, Mar 28, 2020 at 8:11 PM Srinivas V wrote: > >> Sorry for typos , correcting them below >> >> On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrot

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
Sorry for typos , correcting them below On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrote: > Sorry I was just changing some names not to send exact names. Please > ignore that. I am really struggling with this since couple of days. Can > this happen due to > 1. some of the values b

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
consistent across multiple JVM >> runs to make it work properly, but I suspect it doesn't retain the order. >> >> On Fri, Mar 27, 2020 at 10:28 PM Srinivas V wrote: >> >>> I am listening to Kafka topic with a structured streaming application >>> with

spark structured streaming GroupState returns weird values from sate

2020-03-27 Thread Srinivas V
I am listening to Kafka topic with a structured streaming application with Java, testing it on my local Mac. When I retrieve back GroupState object with state.get(), it is giving some random values for the fields in the object, some are interchanging some are default and some are junk values. See

Re: structured streaming Kafka consumer group.id override

2020-03-19 Thread Srinivas V
from failure or killing , the group id changes, but the > starting offset will be the last one you consumed last time . > > Srinivas V 于2020年3月19日周四 下午12:36写道: > >> Hello, >> 1. My Kafka consumer name is randomly being generated by spark structured >> streaming. Can I ov

structured streaming Kafka consumer group.id override

2020-03-18 Thread Srinivas V
Hello, 1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this? 2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the offse

structured streaming with mapGroupWithState

2020-03-11 Thread Srinivas V
Anyone using this combination for prod? I am planning to use for a use case with 15000 events per second from few Kafka topics. Through events are big, I would just have to take the businessIds, frequency, first and last event timestamp and save this into mapGroupWithState. I need to keep them for

[ spark-streaming ] - Data Locality issue

2020-02-04 Thread Karthik Srinivas
Hi, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Data locality

2020-02-04 Thread Karthik Srinivas
Hi all, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Please unsubscribe me

2016-12-05 Thread Srinivas Potluri

Re: retrieve cell value from a rowMatrix.

2016-01-21 Thread Srivathsan Srinivas
//www.baidu.com/link?url=NXDpnPwRtM663-SOtf7UI7Vn88RsjDBih1D0weJSyZKL55PYEfeX6xc_dQUB3bpcAxBmEF9qhmXxZZFRGk0N43KqVWwLLAjgC6_W43ex9Rm> > ? > > -- 原始邮件 -- > *发件人:* "Srivathsan Srinivas";; > *发送时间:* 2016年1月21日(星期四) 上午9:04 > *收件人:* "user&quo

retrieve cell value from a rowMatrix.

2016-01-20 Thread Srivathsan Srinivas
Hi, Is there a way to retrieve the cell value of a rowMatrix? Like m(i,j)? The docs say that the indices are long. Maybe I am doing something wrong...but, there doesn't seem to be any such direct method. Any suggestions? -- Thanks, Srini.

Spray with Spark-sql build fails with Incompatible dependencies

2014-11-10 Thread Srinivas Chamarthi
-version suffixes in {file:/ [error]org.scalamacros:quasiquotes _2.10, _2.10.3 [trace] Stack trace suppressed: run last *:update for the full output. [error] (*:update) Conflicting cross-version suffixes in: org.scalamacros:quasiq uotes thx srinivas

Re: Spray client reports Exception: akka.actor.ActorSystem.dispatcher()Lscala/concurrent/ExecutionContext

2014-11-10 Thread Srinivas Chamarthi
I am trying to use spark with spray and I have the dependency problem with quasiquotes. The issue comes up only when I include spark dependencies. I am not sure how this one can be excluded. Jianshi: can you let me know what version of spray + akka + spark are you using ? [error]org.scalamac

supported sql functions

2014-11-09 Thread Srinivas Chamarthi
can anyone point me to a documentation on supported sql functions ? I am trying to do a contians operation on sql array type. But I don't know how to type the sql. // like hive function array_contains select * from business where array_contains(type, "insurance") appreciate any help.

Re: Unresolved Attributes

2014-11-09 Thread Srinivas Chamarthi
7;yy' // doesn't work. not sure if theres an issue already and already fixed in master. I will raise an issue if someone else also confirms it. thx srinivas On Sat, Nov 8, 2014 at 3:26 PM, Srinivas Chamarthi < srinivas.chamar...@gmail.com> wrote: > I have an exception when I

contains in array in Spark SQL

2014-11-08 Thread Srinivas Chamarthi
hi, what would be the syntax for check for an attribute in an array data type for my where clause ? select * from business where cateogories contains 'X' // something like this , is this right syntax ?? attribute: categories type: Array thx srinivas

Unresolved Attributes

2014-11-08 Thread Srinivas Chamarthi
I have an exception when I am trying to run a simple where clause query. I can see the name attribute is present in the schema but somehow it still throws the exception. query = "select name from business where business_id=" + business_id what am I doing wrong ? thx srinivas Ex

RE: Data from Mysql using JdbcRDD

2014-08-01 Thread srinivas
Hi Thanks Alli have few more questions on this suppose i don't want to pass where caluse in my sql and is their a way that i can do this. Right now i am trying to modify JdbcRDD class by removing all the paramaters for lower bound and upper bound. But i am getting run time exceptions. Is thei

Data from Mysql using JdbcRDD

2014-07-30 Thread srinivas
Hi, I am trying to get data from mysql using JdbcRDD using code The table have three columns val url = "jdbc:mysql://localhost:3306/studentdata" val username = "root" val password = "root" val mysqlrdd = new org.apache.spark.rdd.JdbcRDD(sc,() => { Class.forName("com.mysql.jd

Does spark streaming fit to our application

2014-07-21 Thread srinivas
Hi, Our application is required to do some aggregations on data that will be coming as a stream for over two months. I would like to know if spark streaming will be suitable for our requirement. After going through some documentation and videos i think we can do aggregations on data based on wind

Re: Spark Streaming Json file groupby function

2014-07-18 Thread srinivas
intln(sqlreport) //sqlreport.foreach(println) sqlreport.saveAsTextFile("/home/ubuntu/spark-1.0.0/external/jsonfile2/"+datenow) }) //results.print() ssc.start() ssc.awaitTermination() } Thanks, -Srinivas -- View this message in context: http://apache-spark-user-list.1001560.

Re: Spark Streaming Json file groupby function

2014-07-17 Thread srinivas
Hi TD, It Worked...Thank you so much for all your help. Thanks, -Srinivas. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p10132.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming Json file groupby function

2014-07-17 Thread srinivas
eption when i am running it 14/07/17 17:11:30 ERROR Executor: Exception in task ID 6 java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) Basically i am trying to do max operation in my sparksql. please

Re: Spark Streaming Json file groupby function

2014-07-16 Thread srinivas
urrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) I am trying to enter data to kafka like {"type":"math","name":"srinivas","score":"10","school":"lfs"} I am thinking of some thing wrong with

Re: Spark Streaming Json file groupby function

2014-07-15 Thread srinivas
waitTermination() } } but received the error [error] /home/ubuntu/spark-1.0.0/external/jsonfile2/src/main/scala/jsonfile.scala:38: No TypeTag available for Record [error] recrdd.registerAsTable("table1") [error] ^ [error] one error found [error] (compile:compile) Compilation fai

Re: Spark Streaming Json file groupby function

2014-07-15 Thread srinivas
untu/spark-1.0.0/external/jsonfile/src/main/scala/jsonfile.scala:36: value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record] [error] recrdd.registerAsTable("table1") [error] ^ [error] one error found [error] (compile:compile) Compilation failed Please look into this and let me k

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi TD, Thanks for ur help...i am able to convert map to records using case class. I am left with doing some aggregations. I am trying to do some SQL type operations on my records set. My code looks like case class Record(ID:Int,name:String,score:Int,school:String) //val records = jsonf.map(m =>

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi, Thanks for ur reply...i imported StreamingContext and right now i am getting my Dstream as something like map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type -> math) map(id -> 321, name -> vasu, mobile -> 73942090, score -> 324, test_type ->sci) map(id -> 432, name -

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi, Thanks for ur reply...i imported StreamingContext and right now i am getting my Dstream as something like map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type -> math) map(id -> 321, name -> vasu, mobile -> 73942090, score -> 324, test_type ->sci) map(id -> 432, name -

Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
hi I am new to spark and scala and I am trying to do some aggregations on json file stream using Spark Streaming. I am able to parse the json string and it is converted to map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type -> math) now i want to use GROUPBY function on eac