Re: Spark 3.5.0 bug - Writing a small paraquet dataframe to storage using spark 3.5.0 taking too long

2024-08-06 Thread Bijoy Deb
hen write the df into parquet file, then the ColumnarToRow gets called twice, first takes 10 secs and second one 3 mins. On Wed, 31 Jul, 2024, 10:14 PM Bijoy Deb, wrote: > Hi, > > We are using Spark on-premise to simply read a parquet file from > GCS(Google Cloud storage) into the DataFram

Spark on Mesos Issue - Do I need to install Spark on Mesos slaves

2014-10-10 Thread Bijoy Deb
Hi, I am trying to submit a Spark job on Mesos using spark-submit from my Mesos-Master machine. My SPARK_HOME = /vol1/spark/spark-1.0.2-bin-hadoop2 I have uploaded the spark-1.0.2-bin-hadoop2.tgz to hdfs so that the mesos slaves can download it to invoke the Mesos Spark backend executor. But on

NotSerializableException: org.apache.spark.sql.hive.api.java.JavaHiveContext

2014-09-04 Thread Bijoy Deb
Hello All, I am trying to query a Hive table using Spark SQL from my java code,but getting the following error: *Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.hive.api.java.JavaHiveCo

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-12 Thread bijoy deb
ing that your client libraries are older than what > your server is using (2.0.0-mr1-cdh4.6.0 is IPC version 7). > > Try double-checking that your build is actually using that version > (e.g., by looking at the hadoop jar files in lib_managed/jars). > > On Wed, Jun 11, 2014

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-11 Thread bijoy deb
Any suggestions from anyone? Thanks Bijoy On Tue, Jun 10, 2014 at 11:46 PM, bijoy deb wrote: > Hi all, > > I have build Shark-0.9.1 using sbt using the below command: > > *SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.6.0 sbt/sbt assembly* > > My Hadoop cluster is also having versi

HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-10 Thread bijoy deb
Hi all, I have build Shark-0.9.1 using sbt using the below command: *SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.6.0 sbt/sbt assembly* My Hadoop cluster is also having version 2.0.0-mr1-cdh4.6.0. But when I try to execute the below command from Spark shell,which reads a file from HDFS, I get the "IPC v

Is Spark-1.0.0 not backward compatible with Shark-0.9.1 ?

2014-06-06 Thread bijoy deb
Hi, I am trying to run build Shark-0.9.1 from source,with Spark-1.0.0 as its dependency,using sbt package command.But I am getting the below error during build,which is making me think that perhaps Spark-1.0.0 is not compatible with Shark-0.9.1: [info] Compilation completed in 9.046 s

Re: Integration issue between Apache Shark-0.9.1 (with in-house hive-0.11) and pre-existing CDH4.6 HIVE-0.10 server

2014-05-28 Thread bijoy deb
c-version-7-cannot-communicate-with-client-version > > Can you try finding matching jars for your Hadoop cluster? > > > On Wed, May 28, 2014 at 8:47 AM, bijoy deb wrote: > >> Hi all, >> >> I have installed Apache Shark 0.9.1 on my machine which comes bundled >&

Integration issue between Apache Shark-0.9.1 (with in-house hive-0.11) and pre-existing CDH4.6 HIVE-0.10 server

2014-05-28 Thread bijoy deb
Hi all, I have installed Apache Shark 0.9.1 on my machine which comes bundled with hive-0.11 version of hive jars.I am trying to integrate this with my pre-existing CDH-4.6 version of the Hive server which is of version 0.10.On pointing HIVE_HOME in spark-env.sh to the cloudera version of the hive