Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
Here is a UI of my thread dump. http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdG Fja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXz FzLnR4dC0tNi0xNy00Ng== On Mon, Oct 31, 2016 at 7:10 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Ryan, > &

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-11-01 Thread kant kodali
Here is a UI of my thread dump. http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdGFja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXzFzLnR4dC0tNi0xNy00Ng== On Mon, Oct 31, 2016 at 10:32 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Vadim, > &g

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-31 Thread kant kodali
me...@datadoghq.com> wrote: > Have you tried to get number of threads in a running process using `cat > /proc//status` ? > > On Sun, Oct 30, 2016 at 11:04 PM, kant kodali <kanth...@gmail.com> wrote: > >> yes I did run ps -ef | grep "app_name" and it is root.

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
e `jstack` to find out > the name of leaking threads? > > On Mon, Oct 31, 2016 at 12:35 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Ryan, >> >> It happens on the driver side and I am running on a client mode (not the >> cluster mode). >> >>

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
if the leak threads are in the driver side. > > Does it happen in the driver or executors? > > On Mon, Oct 31, 2016 at 12:20 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Ryan, >> >> Ahh My Receiver.onStop method is currently empty. >> >> 1) I hav

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
hich types of threads are leaking? > > On Mon, Oct 31, 2016 at 11:50 AM, kant kodali <kanth...@gmail.com> wrote: > >> I am also under the assumption that *onStart *function of the Receiver is >> only called only once by Spark. please correct me if I am wrong. >> &

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
I am also under the assumption that *onStart *function of the Receiver is only called only once by Spark. please correct me if I am wrong. On Mon, Oct 31, 2016 at 11:35 AM, kant kodali <kanth...@gmail.com> wrote: > My driver program runs a spark streaming job. And it spawns a thread by

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
y? > > This may depend on your driver program. Do you spawn any threads in > it? Could you share some more information on the driver program, spark > version and your environment? It would greatly help others to help you > > On Mon, Oct 31, 2016 at 3:47 AM, kant kodali <kanth...@gma

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
so many? On Mon, Oct 31, 2016 at 3:25 AM, Sean Owen <so...@cloudera.com> wrote: > ps -L [pid] is what shows threads. I am not sure this is counting what you > think it does. My shell process has about a hundred threads, and I can't > imagine why one would have thousands unless your ap

why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
when I do ps -elfT | grep "spark-driver-program.jar" | wc -l The result is around 32K. why does it create so many threads how can I limit this?

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
i have no idea > > but i still suspecting the user, > as the user who run spark-submit is not necessary the pid for the JVM > process > > can u make sure when you "ps -ef | grep {your app id} " the PID is root? > On 10/31/16 11:21 AM, kant kodali wrote: > > The ja

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
ur setting, spark job may execute by > other user. > > > On 10/31/16 10:38 AM, kant kodali wrote: > > when I did this > > cat /proc/sys/kernel/pid_max > > I got 32768 > > On Sun, Oct 30, 2016 at 6:36 PM, kant kodali <kanth...@gmail.com> wrote: > >> I believ

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
when I did this cat /proc/sys/kernel/pid_max I got 32768 On Sun, Oct 30, 2016 at 6:36 PM, kant kodali <kanth...@gmail.com> wrote: > I believe for ubuntu it is unlimited but I am not 100% sure (I just read > somewhere online). I ran ulimit -a and this is what I get > &

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
sponding user is busy > in other way > the jvm process will still not able to create new thread. > > btw the default limit for centos is 1024 > > > On 10/31/16 9:51 AM, kant kodali wrote: > > > On Sun, Oct 30, 2016 at 5:22 PM, Chan Chor Pang <chin...@indetail.co.

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
On Sun, Oct 30, 2016 at 5:22 PM, Chan Chor Pang wrote: > /etc/security/limits.d/90-nproc.conf > Hi, I am using Ubuntu 16.04 LTS. I have this directory /etc/security/limits.d/ but I don't have any files underneath it. This error happens after running for 4 to 5 hours. I

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-29 Thread kant kodali
Another thing I forgot to mention is that it happens after running for several hours say (4 to 5 hours) I am not sure why it is creating so many threads? any way to control them? On Fri, Oct 28, 2016 at 12:47 PM, kant kodali <kanth...@gmail.com> wrote: > "dag-schedu

java.lang.OutOfMemoryError: unable to create new native thread

2016-10-28 Thread kant kodali
"dag-scheduler-event-loop" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker( ForkJoinPool.java:1672) at

spark streaming client program needs to be restarted after few hours of idle time. how can I fix it?

2016-10-18 Thread kant kodali
Hi Guys, My Spark Streaming Client program works fine as the long as the receiver receives the data but say my receiver has no more data to receive for few hours like (4-5 hours) and then its starts receiving the data again at that point spark client program doesn't seem to process any data. It

Re: ClassCastException while running a simple wordCount

2016-10-11 Thread kant kodali
;> launcher, driver and workers could lead to the bug you're seeing. A common >>> reason for a mismatch is if the SPARK_HOME environment variable is set. >>> This will cause the spark-submit script to use the launcher determined by >>> that environment variable, regardless of the

Re: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

2016-10-07 Thread kant kodali
perfect! That fixes it all! On Fri, Oct 7, 2016 1:29 AM, Denis Bolshakov bolshakov.de...@gmail.com wrote: You need to have spark-sql, now you are missing it. 7 Окт 2016 г. 11:12 пользователь "kant kodali" <kanth...@gmail.com> написал: Here are the jar files on my class

Re: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

2016-10-05 Thread kant kodali
I am running locally so they all are on one host On Wed, Oct 5, 2016 3:12 PM, Jakob Odersky ja...@odersky.com wrote: Are all spark and scala versions the same? By "all" I mean the master, worker and driver instances.

Re: What is the difference between mini-batch vs real time streaming in practice (not theory)?

2016-09-27 Thread kant kodali
On 27 September 2016 at 08:12, kant kodali <kanth...@gmail.com> wrote: What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame whereas real time streaming is more l

What is the difference between mini-batch vs real time streaming in practice (not theory)?

2016-09-27 Thread kant kodali
What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame whereas real time streaming is more like do something as the data arrives but my biggest question is why not have mini

ideas on de duplication for spark streaming?

2016-09-24 Thread kant kodali
Hi Guys, I have bunch of data coming in to my spark streaming cluster from a message queue(not kafka). And this message queue guarantees at least once delivery only so there is potential that some of the messages that come in to the spark streaming cluster are actually duplicates and I am trying

Not sure why Filter on DStream doesn't get invoked?

2016-09-10 Thread kant kodali
Hi All, I am trying to simplify how to frame my question so below is my code. I see that BAR gets printed but not FOO and I am not sure why? my batch interval is 1 second (something I pass in when I create a spark context). any idea? I have bunch of events and I want to store the number of events

Re: seeing this message repeatedly.

2016-09-03 Thread kant kodali
how to fix this or what I am missing? Any help would be great.Thanks! On Sat, Sep 3, 2016 5:39 PM, kant kodali kanth...@gmail.com wrote: Hi Guys, I am running my driver program on my local machine and my spark cluster is on AWS. The big question is I don't know what are the right settings to

seeing this message repeatedly.

2016-09-03 Thread kant kodali
Hi Guys, I am running my driver program on my local machine and my spark cluster is on AWS. The big question is I don't know what are the right settings to get around this public and private ip thing on AWS? my spark-env.sh currently has the the following lines export

Re: any idea what this error could be?

2016-09-03 Thread kant kodali
@Fridtjof you are right! changing it to this Fixed it! ompile group: org.apache.spark' name: 'spark-core_2.11' version: '2.0.0' compile group: 'org.apache.spark' name: 'spark-streaming_2.11' version: '2.0.0' On Sat, Sep 3, 2016 12:30 PM, kant kodali kanth...@gmail.com wrote: I increased

Re: any idea what this error could be?

2016-09-03 Thread kant kodali
2016, 11:42 kant kodali <kanth...@gmail.com> wrote: I am running this on aws. On Fri, Sep 2, 2016 11:49 PM, kant kodali kanth...@gmail.com wrote: I am running spark in stand alone mode. I guess this error when I run my driver program..I am using spark 2.0.0. any idea

Re: Scala Vs Python

2016-09-01 Thread kant kodali
c'mon man this is no Brainer..Dynamic Typed Languages for Large Code Bases or Large Scale Distributed Systems makes absolutely no sense. I can write a 10 page essay on why that wouldn't work so great. you might be wondering why would spark have it then? well probably because its ease of use for

How to attach a ReceiverSupervisor for a Custom receiver in Spark Streaming?

2016-08-29 Thread kant kodali
How to attach a ReceiverSupervisor for a Custom receiver in Spark Streaming?

java.lang.RuntimeException: java.lang.AssertionError: assertion failed: A ReceiverSupervisor has not been attached to the receiver yet.

2016-08-29 Thread kant kodali
java.lang.RuntimeException: java.lang.AssertionError: assertion failed: A ReceiverSupervisor has not been attached to the receiver yet. Maybe you are starting some computation in the receiver before the Receiver.onStart() has been called.

can I use cassandra for checkpointing during a spark streaming job

2016-08-29 Thread kant kodali
I understand that I cannot use spark streaming window operation without checkpointing to HDFS but Without window operation I don't think we can do much with spark streaming. so since it is very essential can I use Cassandra as a distributed storage? If so, can I see an example on how I can tell

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-27 Thread kant kodali
an example on how I can tell spark cluster to use Cassandra for checkpointing and others if at all. On Fri, Aug 26, 2016 9:50 AM, Steve Loughran ste...@hortonworks.com wrote: On 26 Aug 2016, at 12:58, kant kodali < kanth...@gmail.com > wrote: @Steve your arguments make sense h

Re: is there a HTTP2 (v2) endpoint for Spark Streaming?

2016-08-26 Thread kant kodali
ays communicate using HTTP. HTTP2 for better performance. On Fri, Aug 26, 2016 2:47 PM, kant kodali kanth...@gmail.com wrote: HTTP2 for fully pipelined out of order execution. other words I should be able to send multiple requests through same TCP connection and by out of order execution I m

Re: is there a HTTP2 (v2) endpoint for Spark Streaming?

2016-08-26 Thread kant kodali
p/2? #curious [1] http://bahir.apache.org/ Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Aug 26, 2016 at 9:42 PM, kant kodali < kanth...@gmail.com &

is there a HTTP2 (v2) endpoint for Spark Streaming?

2016-08-26 Thread kant kodali
is there a HTTP2 (v2) endpoint for Spark Streaming?

Re: unable to start slaves from master (SSH problem)

2016-08-26 Thread kant kodali
Fixed..I just had to logout and login the master node for some reason On Fri, Aug 26, 2016 5:32 AM, kant kodali kanth...@gmail.com wrote: Hi, I am unable to start spark slaves from my master node. when I run ./start-all.sh in my master node it brings up the master and but fails for slaves

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-26 Thread kant kodali
arising from such loss, damage or destruction. On 26 August 2016 at 12:58, kant kodali < kanth...@gmail.com > wrote: @Steve your arguments make sense however there is a good majority of people who have extensive experience with zookeeper prefer to avoid zookeeper and given the ease of consul

Re: How to install spark with s3 on AWS?

2016-08-26 Thread kant kodali
s3.awsAcces sKeyId",AccessKey) hadoopConf.set("fs.s3.awsSecre tAccessKey",SecretKey) var jobInput = sc.textFile("s3://path to bucket") Thanks On Fri, Aug 26, 2016 at 5:16 PM, kant kodali < kanth...@gmail.com > wrote: Hi guys, Are there any instructions on how to setup spark with S3 on AWS? Thanks!

unable to start slaves from master (SSH problem)

2016-08-26 Thread kant kodali
Hi, I am unable to start spark slaves from my master node. when I run ./start-all.sh in my master node it brings up the master and but fails for slaves saying "permission denied public key" for slaves but I did add the master id_rsa.pub to my slaves authorized_keys and I checked manually from my

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-26 Thread kant kodali
: On 25 Aug 2016, at 22:49, kant kodali < kanth...@gmail.com > wrote: yeah so its seems like its work in progress. At very least Mesos took the initiative to provide alternatives to ZK. I am just really looking forward for this. https://issues.apache.org/jira/browse/MESOS-3797 I worry abo

How to install spark with s3 on AWS?

2016-08-26 Thread kant kodali
Hi guys, Are there any instructions on how to setup spark with S3 on AWS? Thanks!

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-25 Thread kant kodali
ZFS linux port has got very stable these days given LLNL maintains the linux port and they also use it as a FileSystem for their super computer (The supercomputer is one of the top in the nation is what I heard) On Thu, Aug 25, 2016 4:58 PM, kant kodali kanth...@gmail.com wrote: How about

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-25 Thread kant kodali
or NFS will not able to provide that. On 26 Aug 2016 07:49, "kant kodali" < kanth...@gmail.com > wrote: yeah so its seems like its work in progress. At very least Mesos took the initiative to provide alternatives to ZK. I am just really looking forward for this. https://issues.a

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-25 Thread kant kodali
also uses ZK for leader election. There seems to be some effort in supporting etcd, but it's in progress: https://issues.apache.org/jira/browse/MESOS-1806 On Thu, Aug 25, 2016 at 1:55 PM, kant kodali < kanth...@gmail.com > wrote: @Ofir @Sean very good points. @Mike We dont use Kafka or Hive

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-25 Thread kant kodali
exist that don't read/write data. The premise here is not just replication, but partitioning data across compute resources. With a distributed file system, your big input exists across a bunch of machines and you can send the work to the pieces of data. On Thu, Aug 25, 2016 at 7:57 PM, kant kodali <

Re: quick question

2016-08-25 Thread kant kodali
twork/tutorials/obe/java/HomeWebsocket/WebsocketHome.html#section7 ) Regards, Sivakumaran S On 25-Aug-2016, at 8:09 PM, kant kodali < kanth...@gmail.com > wrote: Your assumption is right (thats what I intend to do). My driver code will be in Java. The link sent by Kevin is a API reference to w

Re: quick question

2016-08-25 Thread kant kodali
format the data in the way your client (dashboard) requires it and write it to the websocket. Is your driver code in Python? The link Kevin has sent should start you off. Regards, Sivakumaran On 25-Aug-2016, at 11:53 AM, kant kodali < kanth...@gmail.com > wrote: yes for now it will be Spark Str

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-25 Thread kant kodali
able for any monetary damages arising from such loss, damage or destruction. On 24 August 2016 at 21:54, kant kodali < kanth...@gmail.com > wrote: What do I loose if I run spark without using HDFS or Zookeper ? which of them is almost a must in practice?

Re: quick question

2016-08-25 Thread kant kodali
it should be On 25-Aug-2016, at 7:08 AM, kant kodali < kanth...@gmail.com > wrote: @Sivakumaran when you say create a web socket object in your spark code I assume you meant a spark "task" opening websocket connection from one of the worker machines to some node.js server in that cas

Re: quick question

2016-08-25 Thread kant kodali
ket object in your dashboard code and receive the data in realtime and update the dashboard. You can use Node.js in your dashboard ( socket.io ). I am sure there are other ways too. Does that help? Sivakumaran S On 25-Aug-2016, at 6:30 AM, kant kodali < kanth...@gmail.com > wrote: so I would need to

How to compute a net (difference) given a bi-directional stream of numbers using spark streaming?

2016-08-24 Thread kant kodali
Hi Guys, I am new to spark but I am wondering how do I compute the difference given a bidirectional stream of numbers using spark streaming? To put it more concrete say Bank A is sending money to Bank B and Bank B is sending money to Bank A throughout the day such that at any given time we want

What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-24 Thread kant kodali
What do I loose if I run spark without using HDFS or Zookeper ? which of them is almost a must in practice?

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
...@collectivei.com wrote: Can you come up with your complete analysis? A snapshot of what you think the code is doing. May be that would help us understand what exactly you were trying to convey. On Aug 23, 2016, at 4:21 PM, kant kodali < kanth...@gmail.com > wrote: apache/spark spark - Mirror of Apache

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
apache/spark spark - Mirror of Apache Spark github.com On Tue, Aug 23, 2016 4:17 PM, kant kodali kanth...@gmail.com wrote: @RK you may want to look more deeply if you are curious. the code starts from here apache/spark spark - Mirror of Apache Spark github.com and it goes here where

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
, Aug 23, 2016 2:39 PM, RK Aduri rkad...@collectivei.com wrote: I just had a glance. AFAIK, that is nothing do with RDDs. It’s a pickler used to serialize and deserialize the python code. On Aug 23, 2016, at 2:23 PM, kant kodali < kanth...@gmail.com > wrote: @Sean well this makes sense but I

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
erialized representation in memory because it may be more compact. This is not the same as saving/writing an RDD to persistent storage as text or JSON or whatever. On Tue, Aug 23, 2016 at 9:28 PM, kant kodali <kanth...@gmail.com> wrote: > @srkanth are you sure? the whole point of

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
to reconstruct an RDD from its lineage in that case. so this sounds very contradictory to me after reading the spark paper. On Tue, Aug 23, 2016 1:28 PM, kant kodali kanth...@gmail.com wrote: @srkanth are you sure? the whole point of RDD's is to store transformations but not the data as the spark paper

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
this data will be serialized before persisting to disk.. Thanks, Sreekanth Jella From: kant kodali Sent: Tuesday, August 23, 2016 3:59 PM To: Nirav Cc: RK Aduri ; srikanth.je...@gmail.com ; user@spark.apache.org Subject: Re: Are RDD's ever persisted to disk? Storing RDD to disk is nothing

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
. There are different RDD save apis for that. Sent from my iPhone On Aug 23, 2016, at 12:26 PM, kant kodali < kanth...@gmail.com > wrote: ok now that I understand RDD can be stored to the disk. My last question on this topic would be this. Storing RDD to disk is nothing but storing JVM byte code to disk (i

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
en to choose the persistency level. http://spark.apache.org/docs/latest/programming-guide.html#which-storage-level-to-choose Thanks, Sreekanth Jella From: kant kodali Sent: Tuesday, August 23, 2016 2:42 PM To: srikanth.je...@gmail.com Cc: user@spark.apache.org Subject: Re: Are RDD's ever persisted

Re: Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
so when do we ever need to persist RDD on disk? given that we don't need to worry about RAM(memory) as virtual memory will just push pages to the disk when memory becomes scarce. On Tue, Aug 23, 2016 11:23 AM, srikanth.je...@gmail.com wrote: Hi Kant Kodali, Based on the input parameter

Are RDD's ever persisted to disk?

2016-08-23 Thread kant kodali
I am new to spark and I keep hearing that RDD's can be persisted to memory or disk after each checkpoint. I wonder why RDD's are persisted in memory? In case of node failure how would you access memory to reconstruct the RDD? persisting to disk make sense because its like persisting to a Network

<    1   2   3   4   5