3.3 with parquet 1.10
? What are the dos/ don't for it ?
Regards
Pralabh Kumar
Cores and memory setting of driver ?
On Wed, 23 Nov 2022, 12:56 Pralabh Kumar, wrote:
> How many cores and u are running driver with?
>
> On Tue, 22 Nov 2022, 21:00 Nikhil Goyal, wrote:
>
>> Hi folks,
>> We are running a job on our on prem cluster on K8s but writing
How many cores and u are running driver with?
On Tue, 22 Nov 2022, 21:00 Nikhil Goyal, wrote:
> Hi folks,
> We are running a job on our on prem cluster on K8s but writing the output
> to S3. We noticed that all the executors finish in < 1h but the driver
> takes another 5h to finish. Logs:
>
>
Further information . I have kerberized cluster and am also doing the kinit
. Problem is only coming where the proxy user is being used .
On Fri, Apr 22, 2022 at 10:21 AM Pralabh Kumar
wrote:
> Hi
>
> Running Spark 3.2 on K8s with --proxy-user and getting below error and
> then t
Hi
Running Spark 3.2 on K8s with --proxy-user and getting below error and then
the job fails . However when running without a proxy user job is running
fine . Can anyone please help me with the same .
22/04/21 17:50:30 WARN Client: Exception encountered while connecting to
the server :
Hi spark community
I have quick question .I am planning to migrate from spark 3.0.1 to spark
3.2.
Do I need to recompile my application with 3.2 dependencies or application
compiled with 3.0.1 will work fine on 3.2 ?
Regards
Pralabh kumar
)
at
org.apache.spark.util.ThreadUtils$.shutdown(ThreadUtils.scala:348)
Please let me know if there is a solution for it ..
Regards
Pralabh Kumar
m successfully able to run some test cases and some are failing . For
e.g "Run SparkRemoteFileTest using a Remote data file" in KuberneterSuite
is failing.
Is there a way to skip running some of the test cases ?.
Please help me on the same.
Regards
Pralabh Kumar
machine . Is there a way to do the same .
Regards
Pralabh Kumar
Does this property spark.kubernetes.executor.deleteontermination checks
whether the executor which is deleted have shuffle data or not ?
On Tue, 18 Jan 2022, 11:20 Pralabh Kumar, wrote:
> Hi spark team
>
> Have cluster wide property spark.kubernetis.executor.deleteontermination
Hi spark team
Have cluster wide property spark.kubernetis.executor.deleteontermination to
true.
During the long running job, some of the executor got deleted which have
shuffle data. Because of this, in the subsequent stage , we get lot of
spark shuffle fetch fail exceptions.
Please let me
to prefix with hdfs to create db on
hdfs.
Why is there a difference in the behavior, Can you please point me to the
jira which causes this change.
Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
having default values(not explicitly set)
Regards
Pralabh Kumar
3)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
Regards
Pralabh Kumar
Hi developers, users
Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on
recent CVE detected ?
Regards
Pralabh kumar
abase(Hive.java:1556)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1545)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:384)
My guess is authorization through proxy is not working .
Please help
Regards
Pralabh Kumar
responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
please guide me which option to go for . I am personally inclined
to go for option 2 . It also allows the use of the latest spark .
Please help me on the same , as there are not much comparisons online
available keeping Spark 3.0 in perspective.
Regards
Pralabh Kumar
Hi Dev , User
I want to store spark ml model in databases , so that I can reuse them
later on . I am
unable to pickle them . However while using scala I am able to convert them
into byte
array stream .
So for .eg I am able to do something below in scala but not in python
val modelToByteArray
ur query is select sum(x), a from t group by a, then try select
> sum(partial), a from (select sum(x) as partial, a, b from t group by a, b)
> group by a.
>
> rb
>
>
> On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar <pralabhku...@gmail.com>
> wrote:
>
>> Hi
>>
Hi
I am getting the above error in Spark SQL . I have increase (using 5000 )
number of partitions but still getting the same error .
My data most probably is skew.
org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829
at
Hi Spark group
What's the best way to Migrate Hive to Spark
1) Use HiveContext of Spark
2) Use Hive on Spark (
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
)
3) Migrate Hive to Calcite to Spark SQL
Regards
table
1. CREATE EXTERNAL TABLE $temp_output
2. (
3. data String
4. )
5. STORED BY 'ABCStorageHandler' LOCATION '$table_location'
TBLPROPERTIES (
6.
7. );
when I migrate to Spark it says STORED BY operation is not permitted.
Regards
Pralabh Kumar
On Thu, Feb 8, 2018
Hi
Spark 2.0 doesn't support stored by . Is there any alternative to achieve
the same.
I am using spark 2.1.0
On Fri, Feb 2, 2018 at 5:08 PM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:
> Hi
>
> I am performing broadcast join where my small table is 1 gb . I am
> getting following error .
>
> I am using
>
>
> org.apache.spark.SparkException:
>
Hi
I am performing broadcast join where my small table is 1 gb . I am getting
following error .
I am using
org.apache.spark.SparkException:
. Available: 0, required: 28869232. To avoid this, increase
spark.kryoserializer.buffer.max value
I increase the value to
Hi
Does hive and spark uses same SQL parser provided by ANTLR . Did they
generate the same logical plan .
Please help on the same.
Regards
Pralabh Kumar
Hi
Is there a convenient way /open source project to convert PIG scripts to
Spark.
Regards
Pralabh Kumar
Hi arun
rdd1.groupBy(_.city).map(s=>(s._1,s._2.toList.toString())).toDF("city","data").write.
*partitionBy("city")*.csv("/data")
should work for you .
Regards
Pralabh
On Sat, Sep 2, 2017 at 7:58 AM, Ryan wrote:
> you may try foreachPartition
>
> On Fri, Sep 1, 2017 at
what's is your exector memory , please share the code also
On Fri, Aug 18, 2017 at 10:06 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
>
> HI,
>
> I am getting below error when running spark sql jobs. This error is thrown
> after running 80% of tasks. any solution?
>
>
Run the spark context in multithreaded way .
Something like this
val spark = SparkSession.builder()
.appName("practice")
.config("spark.scheduler.mode","FAIR")
.enableHiveSupport().getOrCreate()
val sc = spark.sparkContext
val hc = spark.sqlContext
val thread1 = new Thread {
=schema.substring(0,schema.length-1)
val sqlSchema =
StructType(schema.split(",").map(s=>StructField(s,StringType,false)))
sqlContext.createDataFrame(newDataSet,sqlSchema).show()
Regards
Pralabh Kumar
On Mon, Jul 17, 2017 at 1:55 PM, nayan sharma <nayansharm...@gmail.com>
put default value inside lit
df.withcolumn("date",lit("constant value"))
On Fri, Jun 30, 2017 at 10:20 PM, sudhir k wrote:
> Can we add a column to dataframe with a default value like sysdate .. I am
> calling my udf but it is throwing error col expected .
>
> On spark
into
this , and if that's not the case ,then Could you please share your code
,and training/testing data for better understanding.
Regards
Pralabh Kumar
On Wed, Jun 28, 2017 at 11:45 AM, neha nihal <nehaniha...@gmail.com> wrote:
>
> Hi,
>
> I am using Apache spark 2.0.2 randomfor
n Tue, Jun 27, 2017 at 9:17 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
> My words cause misunderstanding.
> Step 1:A is submited to spark.
> Step 2:B is submitted to spark.
>
> Spark gets two independent jobs.The FAIR is used to schedule A and B.
>
> Jeffrey' code did not ca
.
But in one thread ,one submit will complete and then the another one will
start . If there are independent stages in one job, then those will run
parallel.
I agree with Bryan Jeffrey .
Regards
Pralabh Kumar
On Tue, Jun 27, 2017 at 9:03 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
> I think
replicas .
>
> 2017-06-22
> --
> lk_spark
> ------
>
> *发件人:*Pralabh Kumar <pralabhku...@gmail.com>
> *发送时间:*2017-06-22 17:23
> *主题:*Re: spark2.1 kafka0.10
> *收件人:*"lk_spark"<lk_sp...@163.com>
> *抄送
smaller set of memory used on given executor
for broadcast variables through UI ?
Regards
Pralabh Kumar
On Thu, Jun 22, 2017 at 4:39 AM, Bryan Jeffrey <bryan.jeff...@gmail.com>
wrote:
> Satish,
>
> I agree - that was my impression too. However I am seeing a smaller set of
> s
How many replicas ,you have for this topic .
On Thu, Jun 22, 2017 at 9:19 AM, lk_spark wrote:
> java.lang.IllegalStateException: No current assignment for partition
> pages-2
> at org.apache.kafka.clients.consumer.internals.SubscriptionState.
>
make sense :)
On Sun, Jun 18, 2017 at 8:38 AM, 颜发才(Yan Facai) <facai@gmail.com> wrote:
> Yes, perhaps we could use SQLTransformer as well.
>
> http://spark.apache.org/docs/latest/ml-features.html#sqltransformer
>
> On Sun, Jun 18, 2017 at 10:47 AM, Pralabh Kumar &l
Hi Yan
Yes sql is good option , but if we have to create ML Pipeline , then having
transformers and set it into pipeline stages ,would be better option .
Regards
Pralabh Kumar
On Sun, Jun 18, 2017 at 4:23 AM, 颜发才(Yan Facai) <facai@gmail.com> wrote:
> To filter data, how about
quot;abce","happy")).toDF("col1")
val trans = new CategoryTransformer("1")
data.show()
trans.transform(data).show()
This transformer will make sure , you always have values in col1 as
provided by you.
Regards
Pralabh Kumar
On Fri, Jun 16, 2017 at 8:10 PM, S
Hi satvik
Can u please provide an example of what exactly you want.
On 16-Jun-2017 7:40 PM, "Saatvik Shah" wrote:
> Hi Yan,
>
> Basically the reason I was looking for the categorical datatype is as
> given here
val getlength=udf((idx1:Int,idx2:Int, data : String)=>
data.substring(idx1,idx2))
data.select(getlength(lit(1),lit(2),data("col1"))).collect
On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:
> Use lit , give me some time , I'll provide an exam
t; and end index ? I try it with errors. Does the udf parameters could only
> be a column type?
>
> 2017-06-16
> --
> lk_spark
> ------
>
> *发件人:*Pralabh Kumar <pralabhku...@gmail.com>
> *发送时间:*2017-06-16 17:49
&g
sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))
On Fri, Jun 16, 2017 at 9:21 AM, lk_spark wrote:
> hi,all
> I define a udf with multiple parameters ,but I don't know how to
> call it with DataFrame
>
> UDF:
>
> def ssplit2
level.
Jira SPARK-20199 <https://issues.apache.org/jira/browse/SPARK-20199>
Please let me know , if my understanding is correct.
Regards
Pralabh Kumar
On Fri, Jun 16, 2017 at 7:53 AM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:
> Hi everyone
>
> Currently GBT doesn
46 matches
Mail list logo