Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
these numbers are not part of resource manager logs? On Sun, Jul 8, 2018 at 8:09 AM, kant kodali wrote: > yarn.scheduler.capacity.maximum-am-resource-percent by default is set to > 0.1 and I tried changing it to 1.0 and still no luck. same problem > persists. The master here is yarn and I ju

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-06 Thread kant kodali
Sorry I meant Spark 2.4 in my previous email On Tue, Mar 6, 2018 at 9:15 PM, kant kodali <kanth...@gmail.com> wrote: > Hi TD, > > I agree I think we are better off either with a full fix or no fix. I am > ok with the complete fix being available in master or some branch. I gu

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-06 Thread kant kodali
trictly worse than no > fix. > > TD > > > > On Thu, Feb 22, 2018 at 2:32 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi TD, >> >> I pulled your commit that is listed on this ticket >> https://issues.apache.org/jira/browse/SPARK-234

Re: How to run spark shell using YARN

2018-03-12 Thread kant kodali
nning-on-yarn.html > > On Mon, Mar 12, 2018 at 4:42 PM, kant kodali <kanth...@gmail.com> wrote: > > Hi All, > > > > I am trying to use YARN for the very first time. I believe I configured > all > > the resource manager and name node fine. And then I run the below command

How to run spark shell using YARN

2018-03-12 Thread kant kodali
Hi All, I am trying to use YARN for the very first time. I believe I configured all the resource manager and name node fine. And then I run the below command ./spark-shell --master yarn --deploy-mode client *I get the below output and it hangs there forever *(I had been waiting over 10 minutes)

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website and these are the only three files I changed Do I need to set or change anything in mapred-site.xml (As of now I have not touched mapred-site.xml)? when I do yarn -node

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
any idea? On Wed, Mar 14, 2018 at 12:12 AM, kant kodali <kanth...@gmail.com> wrote: > I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website > <https://dwbi.org/etl/bigdata/183-setup-hadoop-cluster> and these are the > only three files I changed Do I

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
, Mar 14, 2018 at 12:12 AM, kant kodali <kanth...@gmail.com> wrote: > any idea? > > On Wed, Mar 14, 2018 at 12:12 AM, kant kodali <kanth...@gmail.com> wrote: > >> I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website >> <https://dwbi.o

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
> > On Wed, Mar 14, 2018 at 3:25 AM, kant kodali <kanth...@gmail.com> wrote: > >> I am using spark 2.3.0 and hadoop 2.7.3. >> >> Also I have done the following and restarted all. But I still >> see ACCEPTED: waiting for AM container to be allocated, launche

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
ed, Mar 14, 2018 at 5:32 AM, kant kodali <kanth...@gmail.com> wrote: > >> Tried this >> >> ./spark-shell --master yarn --deploy-mode client --executor-memory 4g >> >> >> Same issue. Keeps going forever.. >> >> >> 18/03/14 09:31:25 INFO

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-10 Thread kant kodali
mode, which allow quite a large range of use cases, including > multiple cascading joins. > > TD > > > > On Thu, Mar 8, 2018 at 9:18 AM, Gourav Sengupta <gourav.sengu...@gmail.com > > wrote: > >> super interesting. >> >> On Wed, Mar 7

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-07 Thread kant kodali
append mode? Anyways the moment it is in master I am ready to test so JIRA tickets on this would help to keep track. please let me know. Thanks! On Tue, Mar 6, 2018 at 9:16 PM, kant kodali <kanth...@gmail.com> wrote: > Sorry I meant Spark 2.4 in my previous email > > On Tue, Mar 6,

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
mazon.com/premiumsupport/knowledge- > center/restart-service-emr/ > > > > Femi > > > > *From: *kant kodali <kanth...@gmail.com> > *Date: *Wednesday, March 14, 2018 at 6:16 AM > *To: *Femi Anthony <femib...@gmail.com> > *Cc: *vermanurag <anurag.ve...@fnma

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
Do I need to set SPARK_DIST_CLASSPATH or SPARK_CLASSPATH ? The latest version of spark (2.3) only has SPARK_CLASSPATH. On Wed, Mar 14, 2018 at 11:37 AM, kant kodali <kanth...@gmail.com> wrote: > Hi, > > I am not using emr. And yes I restarted several times. > > On Wed, Ma

Does partition by and order by works only in stateful case?

2018-04-12 Thread kant kodali
Hi All, Does partition by and order by works only in stateful case? For example: select row_number() over (partition by id order by timestamp) from table gives me *SEVERE: Exception occured while submitting the query: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException:

Re: Does partition by and order by works only in stateful case?

2018-04-14 Thread kant kodali
ed in streaming. > > On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> Does partition by and order by works only in stateful case? >> >> For example: >> >> select row_number() over (partition by id ord

when can we expect multiple aggregations to be supported in spark structured streaming?

2018-04-14 Thread kant kodali
Hi All, when can we expect multiple aggregations to be supported in spark structured streaming? For example, id | amount | my_timestamp -- 1 | 5 | 2018-04-01T01:00:00.000Z 1 | 10 | 2018-04-01T01:10:00.000Z 2 | 20

How to select the max row for every group in spark structured streaming 2.3.0 without using order by or mapGroupWithState?

2018-04-15 Thread kant kodali
How to select the max row for every group in spark structured streaming 2.3.0 without using order by or mapGroupWithState? *Input:* id | amount | my_timestamp --- 1 | 5 | 2018-04-01T01:00:00.000Z 1 | 10 | 2018-04-01T01:10:00.000Z 2

Re: is it ok to make I/O calls in UDF? other words is it a standard practice ?

2018-04-23 Thread kant kodali
n be applied for large datasets but may be for small lookup files. Thanks > > On Mon, Apr 23, 2018 at 4:28 PM kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> Is it ok to make I/O calls in UDF? other words is it a standard practice? >> >> Thanks! >> >

Re: can we use mapGroupsWithState in raw sql?

2018-04-17 Thread kant kodali
nction. That does not fit in with the SQL language structure. > > On Mon, Apr 16, 2018 at 7:34 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> can we use mapGroupsWithState in raw SQL? or is it in the roadmap? >> >> Thanks! >> >> >> >

can we use mapGroupsWithState in raw sql?

2018-04-16 Thread kant kodali
Hi All, can we use mapGroupsWithState in raw SQL? or is it in the roadmap? Thanks!

Re: can we use mapGroupsWithState in raw sql?

2018-04-18 Thread kant kodali
ate operations is not > there yet. > > Thanks, > Arun > > From: kant kodali <kanth...@gmail.com> > Date: Tuesday, April 17, 2018 at 11:41 AM > To: Tathagata Das <tathagata.das1...@gmail.com> > Cc: "user @spark" <user@spark.apache.org> > Subject

Re: can we use mapGroupsWithState in raw sql?

2018-04-18 Thread kant kodali
gt; 2018년 4월 19일 (목) 오전 9:43, Arun Mahadevan <ar...@apache.org>님이 작성: > >> The below expr might work: >> >> df.groupBy($"id").agg(max(struct($"amount", >> $"my_timestamp")).as("data")).select($"id", $"data.*&quo

Re: What do I need to set to see the number of records and processing time for each batch in SPARK UI?

2018-03-27 Thread kant kodali
For example in this blog <https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-apache-spark-streaming-applications.html> post. Looking at figure 1 and figure 2 I wonder What I need to do to see those graphs in spark 2.3.0? On Mon, Mar 26, 2018 at 7:10 AM, kant kodali

Does Spark run on Java 10?

2018-04-01 Thread kant kodali
Hi All, Does anybody got Spark running on Java 10? Thanks!

Re: Does Spark run on Java 10?

2018-04-01 Thread kant kodali
-14220 > > On a side note, if some coming version of Scala 2.11 becomes full Java > 9/10 compliant it could work. > > Hope, this helps. > > Thanks, > Muthu > > On Sun, Apr 1, 2018 at 6:57 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> Does anybody got Spark running on Java 10? >> >> Thanks! >> >> >> >

What do I need to set to see the number of records and processing time for each batch in SPARK UI?

2018-03-26 Thread kant kodali
Hi All, I am using spark 2.3.0 and I wondering what do I need to set to see the number of records and processing time for each batch in SPARK UI? The default UI doesn't seem to show this. Thanks@

Re: is there a way of register python UDF using java API?

2018-04-02 Thread kant kodali
UserDefinedPythonFunction(java.lang.String name, org.apache.spark.api.python.PythonFunction func, org.apache.spark.sql.types.DataType dataType, int pythonEvalType, boolean udfDeterministic) { /* compiled code */ } On Sun, Apr 1, 2018 at 3:46 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, >

is there a way of register python UDF using java API?

2018-04-01 Thread kant kodali
Hi All, All of our spark code is in Java wondering if there a way to register python UDF's using java API such that the registered UDF's can be used using raw spark SQL. If there is any other way to achieve this goal please suggest! Thanks

is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?

2018-03-16 Thread kant kodali
Hi All, is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1? Thanks, kant

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread kant kodali
tml#spark-streaming> > settings > may be what you’re looking for: > >- spark.streaming.backpressure.enabled >- spark.streaming.backpressure.initialRate >- spark.streaming.receiver.maxRate >- spark.streaming.kafka.maxRatePerPartition > > ​ > > On Mon, Mar 19, 2018 at 5:27

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-22 Thread kant kodali
e of days back, kindly go through the mail > chain with "*Multiple Kafka Spark Streaming Dataframe Join query*" as > subject, TD and Chris has cleared my doubts, it would help you too. > > Thanks, > Aakash. > > On Thu, Mar 22, 2018 at 7:50 AM, kant kodali <kant

select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread kant kodali
Hi All, I have 10 million records in my Kafka and I am just trying to spark.sql(select count(*) from kafka_view). I am reading from kafka and writing to kafka. My writeStream is set to "update" mode and trigger interval of one second ( Trigger.ProcessingTime(1000)). I expect the counts to be

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread kant kodali
messages exist). > > Hope that makes sense - > > > On Mon, Mar 19, 2018 at 13:36 kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I have 10 million records in my Kafka and I am just trying to >> spark.sql(select count(*) from kafka_view). I am re

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread kant kodali
g on improving this by building a generic mechanism into the > Streaming DataSource V2 so that the engine can do admission control on the > amount of data returned in a source independent way. > > On Tue, Mar 20, 2018 at 2:58 PM, kant kodali <kanth...@gmail.com> wrote: > >&

Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-21 Thread kant kodali
Hi All, Is there a mutable dataframe spark structured streaming 2.3.0? I am currently reading from Kafka and if I cannot parse the messages that I get from Kafka I want to write them to say some "dead_queue" topic. I wonder what is the best way to do this? Thanks!

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-23 Thread kant kodali
y if this is the case? Thanks! On Thu, Mar 22, 2018 at 7:48 AM, kant kodali <kanth...@gmail.com> wrote: > Thanks all! > > On Thu, Mar 22, 2018 at 2:08 AM, Jorge Machado <jom...@me.com> wrote: > >> DataFrames are not mutable. >> >> Jorge Machado >>

How does Spark Structured Streaming determine an event has arrived late?

2018-02-27 Thread kant kodali
I read through the spark structured streaming documentation and I wonder how does spark structured streaming determine an event has arrived late? Does it compare the event-time with the processing time? [image: enter image description here] Taking the above

Re: How does Spark Structured Streaming determine an event has arrived late?

2018-02-27 Thread kant kodali
n if you had processed the stream in real-time a week ago. >> This is fundamentally necessary for achieving the deterministic processing >> that Structured Streaming guarantees. >> >> Regarding the picture, the "time" is actually "event-time". My apologies >>

What eactly is Function shipping?

2018-10-16 Thread kant kodali
Hi All, Everyone talks about how easy function shipping is in Scala. I Immediately go "wait a minute" isn't it just Object serialization and deserialization that existing in Java since a long time ago or am I missing something profound here? Thanks!

Does Spark have a plan to move away from sun.misc.Unsafe?

2018-10-24 Thread kant kodali
Hi All, Does Spark have a plan to move away from sun.misc.Unsafe to VarHandles ? I am trying to find a JIRA issue for this? Thanks!

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-29 Thread kant kodali
Most people when they compare two different programming languages 99% of the time it all seems to boil down to syntax sugar. Performance I doubt Scala is ever faster than Java given that Scala likes Heap more than Java. I had also written some pointless micro-benchmarking code like (Random String

Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread kant kodali
Hi All, I am using Spark 2.3.1 and using YARN as a cluster manager. I currently got 1) 6 YARN containers(executors=6) with 4 executor cores for each container. 2) 6 Kafka partitions from one topic. 3) You can assume every other configuration is set to whatever the default values are. Spawned a

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

2018-10-03 Thread kant kodali
Hi All, How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2? Thanks

Does spark.streaming.concurrentJobs still exist?

2018-10-09 Thread kant kodali
Does spark.streaming.concurrentJobs still exist? spark.streaming.concurrentJobs (default: 1) is the number of concurrent jobs, i.e. threads in streaming-job-executor thread pool

can Spark 2.4 work on JDK 11?

2018-09-25 Thread kant kodali
Hi All, can Spark 2.4 work on JDK 11? I feel like there are lot of features that are added in JDK 9, 10, 11 that can make deployment process a whole lot better and of course some more syntax sugar similar to Scala. Thanks!

can I model any arbitrary data structure as an RDD?

2018-09-25 Thread kant kodali
Hi All, I am wondering if I can model any arbitrary data structure as an RDD? For example, can I model, Red-black trees, Suffix Trees, Radix Trees, Splay Trees, Fibonacci heaps, Tries, Linked Lists etc as RDD's? If so, how? To implement a custom RDD I have to implement compute and getPartitions

How to track batch jobs in spark ?

2018-12-05 Thread kant kodali
Hi All, How to track batch jobs in spark? For example, is there some id or token i can get after I spawn a batch job and use it to track the progress or to kill the batch job itself? For Streaming, we have StreamingQuery.id() Thanks!

Re: How to track batch jobs in spark ?

2018-12-06 Thread kant kodali
from output of list command >> yarn application --kill >> >> On Wed, Dec 5, 2018 at 1:42 PM kant kodali wrote: >> >>> Hi All, >>> >>> How to track batch jobs in spark? For example, is there some id or token >>> i can get after I s

Is there any open source framework that converts Cypher to SparkSQL?

2018-09-14 Thread kant kodali
Hi All, Is there any open source framework that converts Cypher to SparkSQL? Thanks!

How to control batch size while reading from hdfs files?

2019-03-22 Thread kant kodali
Hi All, What determines the batch size while reading from a file from HDFS? I am trying to read files from HDFS and ingest into Kafka using Spark Structured Streaming 2.3.1. I get an error sayiKafkafka batch size is too big and that I need to increase max.request.size. Sure I can increase it but

Is there a way to validate the syntax of raw spark sql query?

2019-03-01 Thread kant kodali
Hi All, Is there a way to validate the syntax of raw spark SQL query? for example, I would like to know if there is any isValid API call spark provides? val query = "select * from table"if(isValid(query)) { sparkSession.sql(query) } else { log.error("Invalid Syntax")} I tried the

Re: Is there a way to validate the syntax of raw spark sql query?

2019-03-05 Thread kant kodali
used. > If there is an issue with your SQL syntax then the method throws below > exception that you can catch > > org.apache.spark.sql.catalyst.parser.ParseException > > > Hope this helps! > > > > Akshay Bhardwaj > +91-97111-33849 > > > On Fri, Mar 1, 2019 a

what is the difference between udf execution and map(someLambda)?

2019-03-17 Thread kant kodali
Hi All, I am wondering what is the difference between UDF execution and map(someLambda)? you can assume someLambda ~= UDF. Any performance difference? Thanks!

Does Spark SQL has match_recognize?

2019-05-25 Thread kant kodali
Hi All, Does Spark SQL has match_recognize? I am not sure why CEP seems to be neglected I believe it is one of the most useful concepts in the Financial applications! Is there a plan to support it? Thanks!

Re: Does Spark SQL has match_recognize?

2019-05-29 Thread kant kodali
Nope Not at all On Sun, May 26, 2019 at 8:15 AM yeikel valdes wrote: > Isn't match_recognize just a filter? > > df.filter(predicate)? > > > On Sat, 25 May 2019 12:55:47 -0700 * kanth...@gmail.com > * wrote > > Hi All, > > Does Spark SQL has match_recognize? I am not sure why CEP

java_method udf is not visible in the API documentation

2019-06-27 Thread kant kodali
Hi All, I see it here https://spark.apache.org/docs/2.3.1/api/sql/index.html#java_method But I don't see it here https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html

How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-05 Thread kant kodali
Hi All, I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions. Thanks!

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-06 Thread kant kodali
Spark doesn't expose that. On Mon, Aug 5, 2019 at 10:54 PM Gourav Sengupta wrote: > Hi, > > exactly my question, I was also looking for ways to gracefully exit spark > structured streaming. > > > Regards, > Gourav > > On Tue, Aug 6, 2019 at 3:43 AM kant kodali wrote:

Connected components using GraphFrames is significantly slower than GraphX?

2020-02-16 Thread kant kodali
Hi All, Trying to understand why connected components algorithms runs much slower than the graphX equivalent? Graphx code creates 16 stages. GraphFrame graphFrame = GraphFrame.fromEdges(edges); Dataset connectedComponents = graphFrame.connectedComponents().setAlgorithm("graphx").run(); and the

Re: How does spark sql evaluate case statements?

2020-04-17 Thread kant kodali
Thanks! On Thu, Apr 16, 2020 at 9:57 PM ZHANG Wei wrote: > Are you looking for this: > https://spark.apache.org/docs/2.4.0/api/sql/#when ? > > The code generated will look like this in a `do { ... } while (false)` > loop: > > do { > ${cond.code} > if (!${cond.isNull} && ${cond.value})

How does spark sql evaluate case statements?

2020-04-06 Thread kant kodali
Hi All, I have the following query and I was wondering if spark sql evaluates the same condition twice in the case statement below? I did .explain(true) and all I get is a table scan so not sure if spark sql evaluates the same condition twice? if it does, is there a way to return multiple values

Re: is RosckDB backend available in 3.0 preview?

2020-04-22 Thread kant kodali
this project is > most widely known implementation for RocksDB backend state store - > https://github.com/chermenin/spark-states > > On Wed, Apr 22, 2020 at 11:32 AM kant kodali wrote: > >> Hi All, >> >> 1. is RosckDB backend available in 3.0 preview? >> 2. if

is RosckDB backend available in 3.0 preview?

2020-04-21 Thread kant kodali
Hi All, 1. is RosckDB backend available in 3.0 preview? 2. if RocksDB can store intermediate results of a stream-stream join can I run streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary

<    1   2   3   4   5