Re: Parallel write to different partitions

2023-09-21 Thread Shrikant Prasad
Found this issue reported earlier but was bulk closed: https://issues.apache.org/jira/browse/SPARK-27030 Regards, Shrikant On Fri, 22 Sep 2023 at 12:03 AM, Shrikant Prasad wrote: > Hi all, > > We have multiple spark jobs running in parallel trying to write into same > hive table

Parallel write to different partitions

2023-09-21 Thread Shrikant Prasad
Hi all, We have multiple spark jobs running in parallel trying to write into same hive table but each job writing into different partition. This was working fine with Spark 2.3 and Hadoop 2.7. But after upgrading to Spark 3.2 and Hadoop 3.2.2, these parallel jobs are failing with FileNotFound

Re: Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
inside main(), not a > member. > Or what other explanation do you have? I don't understand. > > On Mon, Jan 2, 2023 at 10:10 AM Shrikant Prasad > wrote: > >> If that was the case and deserialized session would not work, the >> application would not have worked. >> &

Re: Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
: > It silently allowed the object to serialize, though the > serialized/deserialized session would not work. Now it explicitly fails. > > On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad > wrote: > >> Thats right. But the serialization would be happening in Spark 2.3

Re: Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
re trying to use TestMain methods > in your program. > This was never correct, but now it's an explicit error in Spark 3. The > session should not be a member variable. > > On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad > wrote: > >> Please see these logs. The error is thrown

Re: Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
tor; that's not the issue. See your stack > trace, where it clearly happens in the driver. > > On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad > wrote: > >> Even if I set the master as yarn, it will not have access to rest of the >> spark confs. It will need spark.yarn.app.id.

Re: Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
: > So call .setMaster("yarn"), per the error > > On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad > wrote: > >> We are running it in cluster deploy mode with yarn. >> >> Regards, >> Shrikant >> >> On Mon, 2 Jan 2023 at 6:15 PM, Stelios Ph

Re: Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
cording to where you want to run this > > On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad > wrote: > >> Hi, >> >> I am trying to migrate one spark application from Spark 2.3 to 3.0.1. >> >> The issue can be reproduced using below sample code: >> >>

Spark migration from 2.3 to 3.0.1

2023-01-02 Thread Shrikant Prasad
at TestMain$.(TestMain.scala) >From the exception it appears that it tries to create spark session on executor also in Spark 3 whereas its not created again on executor in Spark 2.3. Can anyone help in identfying why there is this change in behavior? Thanks and Regards, Shrikant -- Regards, Shrikant Prasad

Re: sequence file write

2022-11-14 Thread Shrikant Prasad
I have tried with that also. It gives same exception: ClassNotFoundException: sequencefile.DefaultSource Regards, Shrikant On Mon, 14 Nov 2022 at 6:35 PM, Jie Han wrote: > It seems that the name is “sequencefile”. > > > 2022年11月14日 20:59,Shrikant Prasad 写道: > > >

sequence file write

2022-11-14 Thread Shrikant Prasad
error with Spark 3.2. Is there any change in sequence file support in 3.2 or any code change is required to make it work? Thanks and regards, Shrikant -- Regards, Shrikant Prasad

Re: Dynamic allocation on K8

2022-10-27 Thread Shrikant Prasad
at dynamic allocation is available, however I am not sure how > it works. Spark official docs > <https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work> > say that shuffle service is not yet available. > > Thanks > > Nikhil > -- Regards, Shrikant Prasad

Spark on k8s issues with s3a committer dependencies or config?

2022-03-19 Thread Prasad Paravatha
reading from s3 works, I am getting error 403 access denied while writing to the KMS enabled bucket. I am wondering if I am missing some dependency jars or client configuration properties. I would Appreciate your help if someone can give me a few pointers on this. Regards, Prasad Paravatha

CPU usage from Event log

2022-03-09 Thread Prasad Bhalerao
are using very huge EMR clusters. I am trying to find out the cpu utilization and memory utilization of the nodes. This will help me find out if the clusters are under utilized and reduce the nodes, Is there a better way to get these stats without changing the code? Thanks, Prasad

Re: One click to run Spark on Kubernetes

2022-02-22 Thread Prasad Paravatha
Hi Bo Yang, Would it be something along the lines of Apache livy? Thanks, Prasad On Tue, Feb 22, 2022 at 10:22 PM bo yang wrote: > It is not a standalone spark cluster. In some details, it deploys a Spark > Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) > and

Re: Profiling spark application

2022-01-19 Thread Prasad Bhalerao
Hi, It will require code changes and I am looking at some third party code , I am looking for something which I can just hook to jvm and get the stats.. Thanks, Prasad On Thu, Jan 20, 2022 at 11:00 AM Sonal Goyal wrote: > Hi Prasad, > > Have you checked the SparkListener

Profiling spark application

2022-01-19 Thread Prasad Bhalerao
Hello, Is there any way we can profile spark applications which will show no. of invocations of spark api and their execution time etc etc just the way jprofiler shows all the details? Thanks, Prasad

Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Prasad Paravatha
https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz FYI, unable to download from this location. Also, I don’t see Hadoop 3.3 version in the dist > On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD > wrote: > >  > Many thanks!  > > From: Gengliang Wang

Re: reporting use case

2019-04-04 Thread Prasad Bhalerao
will do my research on this. But please let me know your opinion on this. Thanks, Prasad On Fri 5 Apr, 2019, 1:09 AM Teemu Heikkilä So basically you could have base dump/snapshot of the full database - or > all the required data stored into HDFS or similar system as partitioned > files (ie. orc/p

Re: reporting use case

2019-04-04 Thread Prasad Bhalerao
creating a views will help in this scenario? Can you please tell if I am thinking in right direction? I have two challenges 1) First to load 2-4 TB of data in spark very quickly. 2) And then keep this data updated in spark whenever DB updates are done. Thanks, Prasad On Fri, Apr 5, 2019 at 12:35 AM

reporting use case

2019-04-04 Thread Prasad Bhalerao
it and write to a file? I trying to use kind of data locality in this case. Whenever a data is updated in oracle tables can I refresh the data in spark storage? I can get the update feed using messaging technology. Can some one from community help me with this? Suggestions are welcome. Thanks, Prasad

Re: Warning from user@spark.apache.org

2018-04-16 Thread Prasad Velagaleti
Hello, I got a message saying , messages sent to me (my gmail id) from the mailing list got bounced ? Wonder why ? thanks, Prasad. On Mon, Apr 16, 2018 at 6:16 PM, <user-h...@spark.apache.org> wrote: > Hi! This is the ezmlm program. I'm managing the > user@spark.apache.org

Running Hive Beeline .hql file in Spark

2017-01-24 Thread Ravi Prasad
ote : We run the Hive queries in *sample.hql *and redirect the output in output file output_partition.txt *Spark:* Can anyone tell us how to implement this in *Spark sql* ( ie) Executing the hive.hql file and redirecting the output in one file. Regards Prasad

Running Hive Beeline .hql file in Spark

2017-01-20 Thread Ravi Prasad
ote : We run the Hive queries in *sample.hql *and redirect the output in output file output_partition.txt *Spark:* Can anyone tell us how to implement this in *Spark sql* ( ie) Executing the hive.hql file and redirecting the output in one file. -- ------ Regards, Prasad T

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.spark.sql.DataFrameWriter. Regards Prasad On Thu, Jan 19, 2017 at 4:35 PM, smartzjp <zjp_j...@163.com> wrote: > Beacause the reduce number will b

Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
print the output in the console. I need to redirect the output in local file as well as HDFS file. with the delimiter as "|". We tried with the below code result.saveAsTextFile ("home/Prasad/result.txt") It is not working as expected. -- ------ Prasad. T

Re: Cant join same dataframe twice ?

2016-04-26 Thread Prasad Ravilla
Also, check the column names of df1 ( after joining df2 and df3 ). Prasad. From: Ted Yu Date: Monday, April 25, 2016 at 8:35 PM To: Divya Gehlot Cc: "user @spark" Subject: Re: Cant join same dataframe twice ? Can you show us the structure of df2 and df3 ? Thanks On Mon, Apr 25, 20

Re: Negative Number of Active Tasks in Spark UI

2016-01-05 Thread Prasad Ravilla
I am using Spark 1.5.2. I am not using Dynamic allocation. Thanks, Prasad. On 1/5/16, 3:24 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: >Which version of Spark do you use ? > >This might be related: >https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.

DataFrame withColumnRenamed throwing NullPointerException

2016-01-05 Thread Prasad Ravilla
( around 1TB) I am using Spark version 1.5.2. Thanks in advance for any insights. Regards, Prasad. Below is the code. val userAndFmSegment = userData.as("userdata").join(fmSegmentData.withColumnRenamed("USER_ID", "FM_USER_ID").as("fmsegmentdata

Joining DataFrames - Causing Cartesian Product

2015-12-18 Thread Prasad Ravilla
] InMemoryColumnarTableScan [List of columns ], true, 1, StorageLevel(true, true, false, true, 1), (Repartition 1, false), None) Project [ List of Columns ] Scan AvroRelation[Avro File][List of Columns] Code Generation: true Thanks, Prasad.

Re: Joining DataFrames - Causing Cartesian Product

2015-12-18 Thread Prasad Ravilla
R_ID") .withColumnRenamed("USER_CNTRY_ID","USER_DIM_COUNTRY_ID") .as("userdim") , userAndRetailDates("USER_ID") <=> $"userdim.USER_DIM_USER_ID" && userAndRetailDates("USER_CNTRY_ID") <=> $"us

Re: Large number of conf broadcasts

2015-12-17 Thread Prasad Ravilla
Hi Anders, I am running into the same issue as yours. I am trying to read about 120 thousand avro files into a single data frame. Is your patch part of a pull request from the master branch in github? Thanks, Prasad. From: Anders Arpteg Date: Thursday, October 22, 2015 at 10:37 AM To: Koert

Re: Large number of conf broadcasts

2015-12-17 Thread Prasad Ravilla
Thanks, Koert. Regards, Prasad. From: Koert Kuipers Date: Thursday, December 17, 2015 at 1:06 PM To: Prasad Ravilla Cc: Anders Arpteg, user Subject: Re: Large number of conf broadcasts https://github.com/databricks/spark-avro/pull/95<https://urldefense.proofpoint.com/v2/url?u=ht

Re: spark.authenticate=true YARN mode doesn't work

2015-12-04 Thread Prasad Reddy
I did tried. Same problem. as you said earlier. spark.yarn.keytab spark.yarn.principal are required. On Fri, Dec 4, 2015 at 7:25 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Did you try setting "spark.authenticate.secret" ? > > Cheers > > On Fri, Dec 4, 2015 at

Spark UI keeps redirecting to /null and returns 500

2015-09-10 Thread Rajeev Prasad
I am having problem in accessing spark UI while running in spark-client mode. It works fine in local mode. It keeps redirecting back to itself by adding /null at the end and ultimately run out of size limit for url and returns 500. Look at response below. I have a feeling that I might be missing

Spark UI keep redirecting to /null and returns 500

2015-09-09 Thread Rajeev Prasad
Hi All, I am having problem in accessing spark UI while running in spark-client mode. It works fine in local mode. It keeps redirecting back to itself by adding /null at the end and ultimately run out of size limit for url and returns 500. Look at following below. I have a feeling that I might

RE: Can't access remote Hive table from spark

2015-01-25 Thread Skanda Prasad
This happened to me as well, putting hive-site.xml inside conf doesn't seem to work. Instead I added /etc/hive/conf to SPARK_CLASSPATH and it worked. You can try this approach. -Skanda -Original Message- From: guxiaobo1982 guxiaobo1...@qq.com Sent: ‎25-‎01-‎2015 13:50 To:

Which version of Hive support Spark Shark

2014-07-03 Thread Ravi Prasad
Hi , Can any one please help me to understand which version of Hive support Spark and Shark -- -- Regards, RAVI PRASAD. T

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-04-04 Thread Prasad
Hi Wisely, Could you please post your pom.xml here. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html Sent from the Apache Spark User List

Re: Unable to read HDFS file -- SimpleApp.java

2014-03-19 Thread Prasad
Check this thread out, http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p2807.html -- you have to remove conflicting akka and protbuf versions. Thanks Prasad. -- View this message in context