Register an Aggregator as an UDAF for Spark SQL 3

2021-08-11 Thread AlstonWilliams
HI all, I use Spark 3.0.2, I have written an Aggregator function, and I wanna register it to Spark SQL, so I can call it by ThriftServer. In Spark 2.4, I can extends `UserDefinedAggregationFunction`, and use the following statement to register it in Spark SQL shell: ``` CREATE

How can I config hive.metastore.warehouse.dir

2021-08-11 Thread igyu
I need write data to hive with spark val proper = new Properties proper.setProperty("fs.defaultFS", "hdfs://nameservice1") proper.setProperty("dfs.nameservices", "nameservice1") proper.setProperty("dfs.ha.namenodes.nameservice1", "namenode337,namenode369")

How can I config hive.metastore.warehouse.dir

2021-08-11 Thread igyu
I need write data to hive with spark val proper = new Properties proper.setProperty("fs.defaultFS", "hdfs://nameservice1") proper.setProperty("dfs.nameservices", "nameservice1") proper.setProperty("dfs.ha.namenodes.nameservice1", "namenode337,namenode369")

Spark Structured Streaming Dyanamic Allocation

2021-08-11 Thread Zhenyu Hu
Hey folks: does Spark Structured Streaming have any plans for dynamic scaling? Currently Spark only has a dynamic scaling mechanism for batch jobs

Spark DStream Dynamic Allocation

2021-08-11 Thread Zhenyu Hu
1. First of all, I would like to ask whether the dynamic scaling of Spark DStream is available now? It is not mentioned in the Spark documentation 2. Spark DStream dynamic scaling will randomly kill a non-receiver executor when the average processing delay divided by the batch processing interval

about ShellBasedUnixGroupsMapping question

2021-08-11 Thread igyu
when I read hive I get WARN 21/08/12 10:01:29 WARN ShellBasedUnixGroupsMapping: unable to return groups for user jztwk PartialGroupNameException Does not support partial group name resolution on Windows. GetLocalGroupsForUser error (1332): ? at

Dynamic Allocation& ExecutorMonitor Shuffle Timeout & CacheTimeout

2021-08-11 Thread Zhenyu Hu
In private class Tracker of org.apache.spark.scheduler.dynalloc.ExecutorMonitor, the method ` updateTimeout ` will take the min of

Re: Performance of PySpark jobs on the Kubernetes cluster

2021-08-11 Thread David Diebold
Hi Mich, I don't quite understand why the driver node is using so much CPU, but it may be unrelated to your executors being underused. About your executors being underused, I would check that your job generated enough tasks. Then I would check spark.executor.cores and spark.tasks.cpus parameters

Spark Issues while upgrade to 2.4 from 1.6 in Parcels

2021-08-11 Thread Harsh Sharma
hi Team , we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded version of park from 1.6 to 2.4 . While executing a spark program we are getting the below error : Please help us how to resolve in cloudera parcels. There are suggestion to install spark gateway roles