Here is a UI of my thread dump.
http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdG
Fja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXz
FzLnR4dC0tNi0xNy00Ng==
On Mon, Oct 31, 2016 at 7:10 PM, kant kodali <kanth...@gmail.com> wrote:
> Hi Ryan,
>
&
Here is a UI of my thread dump.
http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdGFja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXzFzLnR4dC0tNi0xNy00Ng==
On Mon, Oct 31, 2016 at 10:32 PM, kant kodali <kanth...@gmail.com> wrote:
> Hi Vadim,
>
&g
me...@datadoghq.com>
wrote:
> Have you tried to get number of threads in a running process using `cat
> /proc//status` ?
>
> On Sun, Oct 30, 2016 at 11:04 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> yes I did run ps -ef | grep "app_name" and it is root.
e `jstack` to find out
> the name of leaking threads?
>
> On Mon, Oct 31, 2016 at 12:35 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi Ryan,
>>
>> It happens on the driver side and I am running on a client mode (not the
>> cluster mode).
>>
>>
if the leak threads are in the driver side.
>
> Does it happen in the driver or executors?
>
> On Mon, Oct 31, 2016 at 12:20 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi Ryan,
>>
>> Ahh My Receiver.onStop method is currently empty.
>>
>> 1) I hav
hich types of threads are leaking?
>
> On Mon, Oct 31, 2016 at 11:50 AM, kant kodali <kanth...@gmail.com> wrote:
>
>> I am also under the assumption that *onStart *function of the Receiver is
>> only called only once by Spark. please correct me if I am wrong.
>>
&
I am also under the assumption that *onStart *function of the Receiver is
only called only once by Spark. please correct me if I am wrong.
On Mon, Oct 31, 2016 at 11:35 AM, kant kodali <kanth...@gmail.com> wrote:
> My driver program runs a spark streaming job. And it spawns a thread by
y?
>
> This may depend on your driver program. Do you spawn any threads in
> it? Could you share some more information on the driver program, spark
> version and your environment? It would greatly help others to help you
>
> On Mon, Oct 31, 2016 at 3:47 AM, kant kodali <kanth...@gma
so many?
On Mon, Oct 31, 2016 at 3:25 AM, Sean Owen <so...@cloudera.com> wrote:
> ps -L [pid] is what shows threads. I am not sure this is counting what you
> think it does. My shell process has about a hundred threads, and I can't
> imagine why one would have thousands unless your ap
when I do
ps -elfT | grep "spark-driver-program.jar" | wc -l
The result is around 32K. why does it create so many threads how can I
limit this?
i have no idea
>
> but i still suspecting the user,
> as the user who run spark-submit is not necessary the pid for the JVM
> process
>
> can u make sure when you "ps -ef | grep {your app id} " the PID is root?
> On 10/31/16 11:21 AM, kant kodali wrote:
>
> The ja
ur setting, spark job may execute by
> other user.
>
>
> On 10/31/16 10:38 AM, kant kodali wrote:
>
> when I did this
>
> cat /proc/sys/kernel/pid_max
>
> I got 32768
>
> On Sun, Oct 30, 2016 at 6:36 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> I believ
when I did this
cat /proc/sys/kernel/pid_max
I got 32768
On Sun, Oct 30, 2016 at 6:36 PM, kant kodali <kanth...@gmail.com> wrote:
> I believe for ubuntu it is unlimited but I am not 100% sure (I just read
> somewhere online). I ran ulimit -a and this is what I get
>
&
sponding user is busy
> in other way
> the jvm process will still not able to create new thread.
>
> btw the default limit for centos is 1024
>
>
> On 10/31/16 9:51 AM, kant kodali wrote:
>
>
> On Sun, Oct 30, 2016 at 5:22 PM, Chan Chor Pang <chin...@indetail.co.
On Sun, Oct 30, 2016 at 5:22 PM, Chan Chor Pang
wrote:
> /etc/security/limits.d/90-nproc.conf
>
Hi,
I am using Ubuntu 16.04 LTS. I have this directory /etc/security/limits.d/
but I don't have any files underneath it. This error happens after running
for 4 to 5 hours. I
Another thing I forgot to mention is that it happens after running for
several hours say (4 to 5 hours) I am not sure why it is creating so many
threads? any way to control them?
On Fri, Oct 28, 2016 at 12:47 PM, kant kodali <kanth...@gmail.com> wrote:
> "dag-schedu
"dag-scheduler-event-loop" java.lang.OutOfMemoryError: unable to create
new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(
ForkJoinPool.java:1672)
at
Hi Guys,
My Spark Streaming Client program works fine as the long as the receiver
receives the data but say my receiver has no more data to receive for few
hours like (4-5 hours) and then its starts receiving the data again at that
point spark client program doesn't seem to process any data. It
;> launcher, driver and workers could lead to the bug you're seeing. A common
>>> reason for a mismatch is if the SPARK_HOME environment variable is set.
>>> This will cause the spark-submit script to use the launcher determined by
>>> that environment variable, regardless of the
perfect! That fixes it all!
On Fri, Oct 7, 2016 1:29 AM, Denis Bolshakov bolshakov.de...@gmail.com
wrote:
You need to have spark-sql, now you are missing it.
7 Окт 2016 г. 11:12 пользователь "kant kodali" <kanth...@gmail.com> написал:
Here are the jar files on my class
I am running locally so they all are on one host
On Wed, Oct 5, 2016 3:12 PM, Jakob Odersky ja...@odersky.com
wrote:
Are all spark and scala versions the same? By "all" I mean the master, worker
and driver instances.
On 27 September 2016 at 08:12, kant kodali <kanth...@gmail.com> wrote:
What is the difference between mini-batch vs real time streaming in practice
(not theory)? In theory, I understand mini batch is something that batches in
the given time frame whereas real time streaming is more l
What is the difference between mini-batch vs real time streaming in practice
(not theory)? In theory, I understand mini batch is something that batches in
the given time frame whereas real time streaming is more like do something as
the data arrives but my biggest question is why not have mini
Hi Guys,
I have bunch of data coming in to my spark streaming cluster from a message
queue(not kafka). And this message queue guarantees at least once delivery only
so there is potential that some of the messages that come in to the spark
streaming cluster are actually duplicates and I am trying
Hi All,
I am trying to simplify how to frame my question so below is my code. I see
that BAR gets printed but not FOO and I am not sure why? my batch interval
is 1 second (something I pass in when I create a spark context). any idea?
I have bunch of events and I want to store the number of events
how to fix this
or what I am missing?
Any help would be great.Thanks!
On Sat, Sep 3, 2016 5:39 PM, kant kodali kanth...@gmail.com
wrote:
Hi Guys,
I am running my driver program on my local machine and my spark cluster is on
AWS. The big question is I don't know what are the right settings to
Hi Guys,
I am running my driver program on my local machine and my spark cluster is on
AWS. The big question is I don't know what are the right settings to get around
this public and private ip thing on AWS? my spark-env.sh currently has the the
following lines
export
@Fridtjof you are right!
changing it to this Fixed it!
ompile group: org.apache.spark' name: 'spark-core_2.11' version: '2.0.0'
compile group: 'org.apache.spark' name: 'spark-streaming_2.11' version: '2.0.0'
On Sat, Sep 3, 2016 12:30 PM, kant kodali kanth...@gmail.com
wrote:
I increased
2016, 11:42 kant kodali <kanth...@gmail.com> wrote:
I am running this on aws.
On Fri, Sep 2, 2016 11:49 PM, kant kodali kanth...@gmail.com
wrote:
I am running spark in stand alone mode. I guess this error when I run my driver
program..I am using spark 2.0.0. any idea
c'mon man this is no Brainer..Dynamic Typed Languages for Large Code Bases or
Large Scale Distributed Systems makes absolutely no sense. I can write a 10 page
essay on why that wouldn't work so great. you might be wondering why would spark
have it then? well probably because its ease of use for
How to attach a ReceiverSupervisor for a Custom receiver in Spark Streaming?
java.lang.RuntimeException: java.lang.AssertionError: assertion failed: A
ReceiverSupervisor has not been attached to the receiver yet. Maybe you are
starting some computation in the receiver before the Receiver.onStart() has been
called.
I understand that I cannot use spark streaming window operation without
checkpointing to HDFS but Without window operation I don't think we can do much
with spark streaming. so since it is very essential can I use Cassandra as a
distributed storage? If so, can I see an example on how I can tell
an example on how
I can tell spark cluster to use Cassandra for checkpointing and others if at
all.
On Fri, Aug 26, 2016 9:50 AM, Steve Loughran ste...@hortonworks.com wrote:
On 26 Aug 2016, at 12:58, kant kodali < kanth...@gmail.com > wrote:
@Steve your arguments make sense h
ays communicate using HTTP. HTTP2 for
better performance.
On Fri, Aug 26, 2016 2:47 PM, kant kodali kanth...@gmail.com wrote:
HTTP2 for fully pipelined out of order execution. other words I should be able
to send multiple requests through same TCP connection and by out of order
execution I m
p/2? #curious
[1] http://bahir.apache.org/
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Fri, Aug 26, 2016 at 9:42 PM, kant kodali < kanth...@gmail.com &
is there a HTTP2 (v2) endpoint for Spark Streaming?
Fixed..I just had to logout and login the master node for some reason
On Fri, Aug 26, 2016 5:32 AM, kant kodali kanth...@gmail.com wrote:
Hi,
I am unable to start spark slaves from my master node. when I run ./start-all.sh
in my master node it brings up the master and but fails for slaves
arising from such loss, damage or destruction.
On 26 August 2016 at 12:58, kant kodali < kanth...@gmail.com > wrote:
@Steve your arguments make sense however there is a good majority of people who
have extensive experience with zookeeper prefer to avoid zookeeper and given the
ease of consul
s3.awsAcces sKeyId",AccessKey)
hadoopConf.set("fs.s3.awsSecre tAccessKey",SecretKey)
var jobInput = sc.textFile("s3://path to bucket")
Thanks
On Fri, Aug 26, 2016 at 5:16 PM, kant kodali < kanth...@gmail.com > wrote:
Hi guys,
Are there any instructions on how to setup spark with S3 on AWS?
Thanks!
Hi,
I am unable to start spark slaves from my master node. when I run
./start-all.sh in my master node it brings up the master and but fails for
slaves saying "permission denied public key" for slaves but I did add the
master id_rsa.pub to my slaves authorized_keys and I checked manually from
my
:
On 25 Aug 2016, at 22:49, kant kodali < kanth...@gmail.com > wrote:
yeah so its seems like its work in progress. At very least Mesos took the
initiative to provide alternatives to ZK. I am just really looking forward for
this.
https://issues.apache.org/jira/browse/MESOS-3797
I worry abo
Hi guys,
Are there any instructions on how to setup spark with S3 on AWS?
Thanks!
ZFS linux port has got very stable these days given LLNL maintains the linux
port and they also use it as a FileSystem for their super computer (The
supercomputer is one of the top in the nation is what I heard)
On Thu, Aug 25, 2016 4:58 PM, kant kodali kanth...@gmail.com wrote:
How about
or NFS will not able to provide that.
On 26 Aug 2016 07:49, "kant kodali" < kanth...@gmail.com > wrote:
yeah so its seems like its work in progress. At very least Mesos took the
initiative to provide alternatives to ZK. I am just really looking forward for
this.
https://issues.a
also uses ZK for leader election. There seems to be some effort in
supporting etcd, but it's in progress:
https://issues.apache.org/jira/browse/MESOS-1806
On Thu, Aug 25, 2016 at 1:55 PM, kant kodali < kanth...@gmail.com > wrote:
@Ofir @Sean very good points.
@Mike We dont use Kafka or Hive
exist that don't read/write data.
The premise here is not just replication, but partitioning data across compute
resources. With a distributed file system, your big input exists across a bunch
of machines and you can send the work to the pieces of data.
On Thu, Aug 25, 2016 at 7:57 PM, kant kodali <
twork/tutorials/obe/java/HomeWebsocket/WebsocketHome.html#section7
)
Regards,
Sivakumaran S
On 25-Aug-2016, at 8:09 PM, kant kodali < kanth...@gmail.com > wrote:
Your assumption is right (thats what I intend to do). My driver code will be in
Java. The link sent by Kevin is a API reference to w
format the data in the way your client
(dashboard) requires it and write it to the websocket.
Is your driver code in Python? The link Kevin has sent should start you off.
Regards,
Sivakumaran
On 25-Aug-2016, at 11:53 AM, kant kodali < kanth...@gmail.com > wrote:
yes for now it will be Spark Str
able for any monetary
damages arising from such loss, damage or destruction.
On 24 August 2016 at 21:54, kant kodali < kanth...@gmail.com > wrote:
What do I loose if I run spark without using HDFS or Zookeper ? which of them is
almost a must in practice?
it should be
On 25-Aug-2016, at 7:08 AM, kant kodali < kanth...@gmail.com > wrote:
@Sivakumaran when you say create a web socket object in your spark code I assume
you meant a spark "task" opening websocket connection from one of the worker
machines to some node.js server in that cas
ket object in your dashboard code and receive the
data in realtime and update the dashboard. You can use Node.js in your dashboard
( socket.io ). I am sure there are other ways too.
Does that help?
Sivakumaran S
On 25-Aug-2016, at 6:30 AM, kant kodali < kanth...@gmail.com > wrote:
so I would need to
Hi Guys,
I am new to spark but I am wondering how do I compute the difference given a
bidirectional stream of numbers using spark streaming? To put it more concrete
say Bank A is sending money to Bank B and Bank B is sending money to Bank A
throughout the day such that at any given time we want
What do I loose if I run spark without using HDFS or Zookeper ? which of them is
almost a must in practice?
...@collectivei.com wrote:
Can you come up with your complete analysis? A snapshot of what you think the
code is doing. May be that would help us understand what exactly you were trying
to convey.
On Aug 23, 2016, at 4:21 PM, kant kodali < kanth...@gmail.com > wrote:
apache/spark spark - Mirror of Apache
apache/spark spark - Mirror of Apache Spark github.com
On Tue, Aug 23, 2016 4:17 PM, kant kodali kanth...@gmail.com wrote:
@RK you may want to look more deeply if you are curious. the code starts from
here
apache/spark spark - Mirror of Apache Spark github.com
and it goes here where
, Aug 23, 2016 2:39 PM, RK Aduri rkad...@collectivei.com wrote:
I just had a glance. AFAIK, that is nothing do with RDDs. It’s a pickler used to
serialize and deserialize the python code.
On Aug 23, 2016, at 2:23 PM, kant kodali < kanth...@gmail.com > wrote:
@Sean
well this makes sense but I
erialized
representation in memory because it may be more compact.
This is not the same as saving/writing an RDD to persistent storage as
text or JSON or whatever.
On Tue, Aug 23, 2016 at 9:28 PM, kant kodali <kanth...@gmail.com> wrote:
> @srkanth are you sure? the whole point of
to reconstruct an RDD
from its lineage in that case. so this sounds very contradictory to me after
reading the spark paper.
On Tue, Aug 23, 2016 1:28 PM, kant kodali kanth...@gmail.com wrote:
@srkanth are you sure? the whole point of RDD's is to store transformations but
not the data as the spark paper
this data will be serialized before persisting to disk..
Thanks,
Sreekanth Jella
From: kant kodali
Sent: Tuesday, August 23, 2016 3:59 PM
To: Nirav
Cc: RK Aduri ; srikanth.je...@gmail.com ; user@spark.apache.org
Subject: Re: Are RDD's ever persisted to disk?
Storing RDD to disk is nothing
. There are different RDD save apis for that.
Sent from my iPhone
On Aug 23, 2016, at 12:26 PM, kant kodali < kanth...@gmail.com > wrote:
ok now that I understand RDD can be stored to the disk. My last question on this
topic would be this.
Storing RDD to disk is nothing but storing JVM byte code to disk (i
en to choose the persistency level.
http://spark.apache.org/docs/latest/programming-guide.html#which-storage-level-to-choose
Thanks,
Sreekanth Jella From: kant kodali
Sent: Tuesday, August 23, 2016 2:42 PM
To: srikanth.je...@gmail.com
Cc: user@spark.apache.org
Subject: Re: Are RDD's ever persisted
so when do we ever need to persist RDD on disk? given that we don't need to
worry about RAM(memory) as virtual memory will just push pages to the disk when
memory becomes scarce.
On Tue, Aug 23, 2016 11:23 AM, srikanth.je...@gmail.com wrote:
Hi Kant Kodali,
Based on the input parameter
I am new to spark and I keep hearing that RDD's can be persisted to memory or
disk after each checkpoint. I wonder why RDD's are persisted in memory? In case
of node failure how would you access memory to reconstruct the RDD? persisting
to disk make sense because its like persisting to a Network
401 - 464 of 464 matches
Mail list logo