Fwd: Cassandra driver upgrade

2022-01-24 Thread Amit Sharma
I am upgrading my cassandra java driver version to the latest 4.13. I have a Cassandra cluster using Cassandra version 3.11.11. I am getting the below runtime error while connecting to cassandra. Before version 4.13 I was using version 3.9 and things were working fine. c.d.o.d.i.c.c.ControlConne

Re: What are your experiences using google cloud platform

2022-01-24 Thread Mich Talebzadeh
OK, What configuration do you have for Dataproc master and worker nodes, what machine types are they? What storage have you allocated for each? Have you specified the Cloud Storage staging bucket? Have you considered autoscaling? https://cloud.google.com/dataproc/docs/concepts/configuring-clust

Re: What are your experiences using google cloud platform

2022-01-24 Thread Andrew Davidson
I think my problem has to do with mega-mem machine. It was hard to get quota for mega-mem machines. I wonder if they are unstable? Any suggestions for how I look at the ‘hardware’? I ran the same job several times. They all failed in different ways. Once looked like sort of networking problem

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Mich Talebzadeh
Hadoop core comprises HDFS (the storage), MapReduce (parallel execution algorithm) and YARN (the resource manager). Spark can use YARN. in either cluster or client mode and can use HDFS for temporary or permanent storage. As HDFS is available and accessible in all nodes, Spark can take advantage

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
spark-submit a spark application on Hadoop (cluster mode) that's what i mean by executing on Hadoop Le lun. 24 janv. 2022 à 18:00, Sean Owen a écrit : > I am still not understanding what you mean by "executing on Hadoop". Spark > does not use Hadoop for execution. Probably can't answer until th

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
I mean the DAG order is somehow altered when executing on Hadoop Le lun. 24 janv. 2022 à 17:17, Sean Owen a écrit : > Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you > mean data? data is read as-is. There is typically no guarantee about > ordering of data in files but y

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Sean Owen
Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you mean data? data is read as-is. There is typically no guarantee about ordering of data in files but you can order data. Still not sure what specifically you are worried about here, but I don't think the kind of thing you're co

triggering spark python app using native REST api

2022-01-24 Thread Michael Williams (SSI)
Hello, I've been trying to work out how to replicate execution of a python app using spark-submit via the CLI using the native spark REST api (http://localhost:6066/v1/submissions/create) for a couple of weeks without success. The environment is docker using the latest docker for spark 3.2 ima

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
I am aware of that, but whenever the chunks of code are returned to Spark from Hadoop (after processing) could they be done not in the ordered way ? could this ever happen ? Le lun. 24 janv. 2022 à 16:14, Sean Owen a écrit : > Hadoop does not run Spark programs, Spark does. How or why would > so

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Sean Owen
Hadoop does not run Spark programs, Spark does. How or why would something, what, modify the byte code? No On Mon, Jan 24, 2022, 9:07 AM sam smith wrote: > My point is could Hadoop go wrong about one Spark execution ? meaning that > it gets confused (given the concurrent distributed tasks) and t

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
My point is could Hadoop go wrong about one Spark execution ? meaning that it gets confused (given the concurrent distributed tasks) and then adds wrong instruction to the program, or maybe does execute an instruction not at its right order (shuffling the order of execution by executing previous on

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Sean Owen
Not clear what you mean here. A Spark program is a program, so what are the alternatives here? program execution order is still program execution order. You are not guaranteed anything about order of concurrent tasks. Failed tasks can be reexecuted so should be idempotent. I think the answer is 'no

Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
Hello guys, I hope my question does not sound weird, but could a Spark execution on Hadoop cluster give different output than the program actually does ? I mean by that, the execution order is messed by hadoop, or an instruction executed twice..; ? Thanks for your enlightenment

Re: may I need a join here?

2022-01-24 Thread Gary Liu
You can use left anti join instead. isin accept a list type, not a column type. On Mon, Jan 24, 2022 at 01:38 Bitfox wrote: > >>> df.show(3) > > ++-+ > > |word|count| > > ++-+ > > | on|1| > > | dec|1| > > |2020|1| > > ++-+ > > only showing top 3 rows > > > >>

Re: What are your experiences using google cloud platform

2022-01-24 Thread Mich Talebzadeh
Dataproc works fine. The current version is Spark 3.1.2. Look at your code, hardware and scaling. HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction

Re: What happens when a partition that holds data under a task fails

2022-01-24 Thread Mich Talebzadeh
Hm, I don't see what partition failure means here. You can have a node or executor failure etc. So let us look at a scenario here irrespective of being a streaming or micro-batch Spark replicates the partitions among multiple nodes. *If one executor fails*, it moves the processing over to the ot