Update:
I got some questions/responses on the SPIP docs and GitHub PR,
looking forward to more feedback!
I have discussed offline with Kent Yao, and he will shepherd this SPIP.
Thanks,
Cheng Pan
> On Sep 4, 2025, at 14:16, Cheng Pan wrote:
>
> Hi all,
>
> I’d like to propos
feedback!
JIRA: SPARK-53484
SPIP docs:
https://docs.google.com/document/d/1Ahk4C16o1Jj1TbLg5ylzgHjvu2Ic2zTrcMuvLjqSoAQ
PoC PR: https://github.com/apache/spark/pull/52110
Thanks,
Cheng Pan
-
To unsubscribe e-mail: dev-unsubscr
+1 (non-binding)
Env info: Hadoop 3.4.2, OpenJDK 17, Ubuntu focal arm64
I tested Spark on YARN mode with ESS enabled, and Spark Standalone mode,
run some basic queries, everything looks good.
Thanks,
Cheng Pan
> On Sep 2, 2025, at 13:47, dongj...@apache.org wrote:
>
> Pleas
+1, thank you for driving this.
Thanks,
Cheng Pan
> On Aug 26, 2025, at 00:31, Dongjoon Hyun wrote:
>
> Hi, All.
>
> Since the Apache Spark 4.0.0 tag was created in May, more than three months
> have passed.
>
> https://github.com/apache/spark/releases/tag/v4.
+1 (non-binding)
I verified:
1. LICENSE/NOTICE are present
2. Signatures is correct
3. Build source code and run UT (I have to replace sparksrc folder with the
content of spark-4.0.0.tgz to make the source happen)
Thanks,
Cheng Pan
> On Jun 10, 2025, at 00:59, Martin Grund wrote:
>
package MUST provide a LICENSE file and a NOTICE file ...
[1]
https://github.com/apache/spark-connect-go/blob/v0.1.0-rc1/spark/version.go#L19
[2] https://dist.apache.org/repos/dist/dev/spark
[3] https://www.apache.org/legal/release-policy.html#source-packages
[4] https://www.apache.org/legal/releas
more source packages, which MUST be
> sufficient for a user to build and test the release provided they have access
> to the appropriate platform and tools. A source release SHOULD not contain
> compiled code.
[1] https://www.apache.org/legal/release-policy.html#publication
Thanks,
Ch
SHS for every two months to recover it, issue gone after upgrading to Spark
3.3 and switching to RocksDB.
Scale and Performance: we keep ~800k applications event logs for the event log
HDFS directory, multiple threads re-parsing to rebuild listing.rdb takes
~15mins.
Thanks,
Cheng Pan
>
+1 (non-binding)
Thanks,
Cheng Pan
> On Jun 2, 2025, at 03:00, L. C. Hsieh wrote:
>
> Hi all,
>
> I would like to start a vote on the new real-time mode in Apache Spark
> Structured Streaming.
>
> Discussion thread:
> https://lists.apache.org/thread/ovmfbzfkc3t9
+1 (non-binding)
Thanks,
Cheng Pan
> On May 20, 2025, at 12:54, Yuming Wang wrote:
>
> +1
>
> On Tue, May 20, 2025 at 12:19 PM Szehon Ho <mailto:szehon.apa...@gmail.com>> wrote:
>> +1 (non-binding)
>>
>> Checked signature, checksum, ran basic
+1 (non-binding)
Deployed on a YARN cluster, run some TPC-H queries.
Passed Apache Kyuubi integration test.
Thanks,
Cheng Pan
> On May 14, 2025, at 06:28, Wenchen Fan wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 4.0.0.
>
> The
+1 (non-binding)
Thanks,
Cheng Pan
> On May 13, 2025, at 21:07, Denny Lee wrote:
>
> +1 (non-binding)
>
> On Tue, May 13, 2025 at 03:38 Peter Toth <mailto:peter.t...@gmail.com>> wrote:
>> +1
>>
>> On Tue, May 13, 2025 at 12:24 PM beliefer
Does the following options works for you?
./bin/spark-shell --conf spark.jars.ivy=${HOME}/.ivy2
./bin/spark-shell --conf spark.jars.ivy=/Users/yourname/.ivy2
I think the issue is that ~ is not interpreted by shell and just passthrough to
the Ivy lib.
Thanks,
Cheng Pan
> On Apr 29, 2025,
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the
latest workable version is Parquet 1.13.1.
Thanks,
Cheng Pan
> On Apr 21, 2025, at 16:53, Wenchen Fan wrote:
>
> +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
> https://github.com/
+1 (non-binding)
Thanks,
Cheng Pan
> On Apr 9, 2025, at 22:22, Sandy Ryza wrote:
>
> We started to get some votes on the discussion thread, so I'd like to move to
> a formal vote on adding support for declarative pipelines.
>
> *Discussion thread: *
> https
+1 (non-binding)
Glad to see Spark SQL extended to streaming use cases.
Thanks,
Cheng Pan
> On Apr 9, 2025, at 14:43, Anton Okolnychyi wrote:
>
> +1
>
> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <mailto:qcsd2...@gmail.com>> пише:
>> +1 I'm delighted tha
/hadoop/hive/ql/exec/FunctionRegistry.java#L208
Thanks,
Cheng Pan
> On Mar 7, 2025, at 13:15, Wenchen Fan wrote:
>
> RC2 fails and I'll cut RC3 next week. Thanks for the feedback!
>
> On Thu, Mar 6, 2025 at 6:44 AM Chris Nauroth <mailto:cnaur...@apache.org>> wrot
+1 (non-binding)
Pass integration tests with Apache Kyuubi.
Thanks,
Cheng Pan
> On Feb 27, 2025, at 14:04, Chris Nauroth wrote:
>
> +1 (non-binding)
>
> * Verified all checksums.
> * Verified all signatures.
> * Built from source, with multiple profiles, to full succe
Found another issue, Spark 4.0.0 RC1 fails to submit to Kerberized cluster.
I have filled SPARK-51311 and opened [1] to fix it.
[1] https://github.com/apache/spark/pull/50077
Thanks,
Cheng Pan
> On Feb 24, 2025, at 17:47, Cheng Pan wrote:
>
> I found an issue in [1], would be grea
I found an issue in [1], would be great if some committers could take a look.
[1] https://github.com/apache/spark/pull/45504#issuecomment-2665957194
Thanks,
Cheng Pan
> On Feb 24, 2025, at 12:13, Wenchen Fan wrote:
>
> This vote failed. I'll cut RC2 later this week. Thanks t
+1
Thanks,
Cheng Pan
> On Jan 9, 2025, at 12:28, Wenchen Fan wrote:
>
> Hi all,
>
> Following the discussion[1], I'd like to start the vote for 'Use plain text
> logs by default'.
>
> Note: This is not to overthrow the previous vote that adds t
+1
Thanks,
Cheng Pan
> On Dec 17, 2024, at 17:23, 杨杰 wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.4.
>
> The vote is open until Dec 20, 10:00:00 UTC and passes if a majority +1 PMC
> votes are cast, with
> a minimum of
+1 (non-binding)
I tested it with Apache Iceberg and Apache Kyuubi.
Thanks,
Cheng Pan
> On Dec 16, 2024, at 16:33, 杨杰 wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.4.
>
> The vote is open until Dec 19, 09:00:00 UTC and passe
Congrats!
Thanks,
Cheng Pan
> On Nov 17, 2024, at 06:50, Gengliang Wang wrote:
>
> Congrats!
>
>
> On Sat, Nov 16, 2024 at 4:12 AM xianjin <mailto:xian...@apache.org>> wrote:
>> Congrats!
>>
>> Sent from my iPhone
>>
>>>
+1 (non-binding)
I checked
- Signatures and checksums are good.
- Build success from source code.
- Pass integration test with Apache Kyuubi [1]
[1] https://github.com/apache/kyuubi/pull/6699
Thanks,
Cheng Pan
> On Sep 16, 2024, at 15:24, Dongjoon Hyun wrote:
>
> Please vote on
+1 (non-binding)
Seems there were(or are) some issues[1] on lists.apache.org
[1] https://issues.apache.org/jira/browse/INFRA-26025
Thanks,
Cheng Pan
> On Aug 9, 2024, at 15:15, Chao Sun wrote:
>
> +1
>
> On Fri, Aug 9, 2024 at 12:10 AM XiDuo You wrote:
> +1
>
>
+1 (non-binding)
- All links are valid and look good
- Successful built from source code on Ubuntu 22.04 x86 with Java 17
- Have integrated and played with Zeppelin, Kyuubi, Iceberg and Hadoop, no
unexpected issues found.
Thanks,
Cheng Pan
> On Jul 26, 2024, at 21:32, Kent Yao wrote:
>
+1 (non-binding)
Thanks,
Cheng Pan
> On Jul 3, 2024, at 08:59, Hyukjin Kwon wrote:
>
> Hi all,
>
> I’d like to start a vote for moving Spark Connect server to builtin package
> (Client API layer stays external).
>
> Please also refer to:
>
>
-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. What
about moving the whole `connect` folder to the top level?
Thanks,
Cheng Pan
> On Jul 2, 2024, at 08:19, Hyukjin Kwon wrote:
>
> Hi all,
>
> I would like to discuss moving Spark Connect server to builtin
FYI, I have submitted SPARK-48651(https://github.com/apache/spark/pull/47010)
to update the Spark on YARN docs for JDK configuration, looking forward to your
feedback.
Thanks,
Cheng Pan
> On Jun 18, 2024, at 02:00, George Magiros wrote:
>
> I successfully submitte
$SPARK_CONF_DIR/spark-defaults.conf
spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17
spark.executorEnv.JAVA_HOME=/opt/openjdk-17
[1]
https://github.com/awesome-kyuubi/hadoop-testing/commit/9f7c0d7388dfc7fbe6e4658515a6c28d5ba93c8e
Thanks,
Cheng Pan
> On Jun 18, 2024, at 02:00, George Magiros wr
+1 (non-binding)
- All links are valid
- Run some basic quires using YARN client mode with Apache Hadoop v3.3.6, HMS
2.3.9
- Pass integration tests with Apache Kyuubi v1.9.1 RC0
Thanks,
Cheng Pan
> On May 29, 2024, at 02:48, Wenchen Fan wrote:
>
> Please vote on releasing the
762)
~[?:?]
at
org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
... 38 more
Thanks,
Cheng Pan
> On May 11, 2024, at 13:55, Wenchen Fan wrote:
>
> Please vote on releasing the following
+1 (non-binding)
Thanks,
Cheng Pan
On Sat, Apr 27, 2024 at 9:29 AM Holden Karau wrote:
>
> +1
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/ho
will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
Thanks,
Cheng Pan
> On Apr 15, 2024, at 09:58, Jungtaek Lim wrote:
>
> W.r.t. state data source - reader (SPARK-45511), there are several follow-up
> tickets, but we don't plan to address them
+1, non-binding
Thanks,
Cheng Pan
> On Apr 15, 2024, at 14:14, John Zhuge wrote:
>
> +1 (non-binding)
>
> On Sun, Apr 14, 2024 at 7:18 PM Jungtaek Lim
> wrote:
> +1 (non-binding), thanks Dongjoon.
>
> On Sun, Apr 14, 2024 at 7:22 AM Dongjoon Hyun wrote:
>
-samples/emr-remote-shuffle-service
[4] https://github.com/apache/celeborn/issues/2140
Thanks,
Cheng Pan
> On Apr 6, 2024, at 21:41, Mich Talebzadeh wrote:
>
> I have seen some older references for shuffle service for k8s,
> although it is not clear they are talking about a generic shuff
ible
product, in the same position as Amazon RDS for MySQL, neither official support
declaration nor CI verification is required, but considering the adoption rate
of those products, reasonable patches should be considered too.
Thanks,
Cheng Pan
On 2024/03/25 06:47:10 Dongjoon Hyun wrote:
> H
-innovation-and-long-term-support-lts-versions/
[3] https://github.com/apache/spark/pull/45581
[4] https://aws.amazon.com/rds/mysql/
[5] https://learn.microsoft.com/en-us/azure/mysql/concepts-version-policy
Thanks,
Cheng Pan
-
To
+1 (non-binding)
- Build successfully from source code.
- Pass integration tests with Spark ClickHouse Connector[1]
[1] https://github.com/housepower/spark-clickhouse-connector/pull/299
Thanks,
Cheng Pan
> On Feb 20, 2024, at 10:56, Jungtaek Lim wrote:
>
> Thanks Sean, let'
+1 (non-binding)
Thanks,
Cheng Pan
> On Nov 15, 2023, at 01:41, L. C. Hsieh wrote:
>
> Hi all,
>
> I’d like to start a vote for SPIP: An Official Kubernetes Operator for
> Apache Spark.
>
> The proposal is to develop an official Java-based Kubernetes operator
> f
> Not really - this is not designed to be a replacement for the current
> approach.
That's what I assumed too. But my question is, as a user, how to write a
spark-submit command to submit a Spark app to leverage this operator?
Thanks,
Cheng Pan
> On Nov 11, 2023, at 03:21, Zho
Thanks for this impressive proposal, I have a basic question, how does
spark-submit work with this operator? Or it enforces that we must use `kubectl
apply -f spark-job.yaml`(or K8s client in programming way) to submit Spark app?
Thanks,
Cheng Pan
> On Nov 10, 2023, at 04:05, Zhou Ji
+1 (non-binding)
Passed integration test with Apache Kyuubi.
Thanks for driving this release.
Thanks,
Cheng Pan
> On Aug 11, 2023, at 06:36, L. C. Hsieh wrote:
>
> +1
>
> Thanks Yuming.
>
> On Thu, Aug 10, 2023 at 3:24 PM Dongjoon Hyun wrote:
>>
>> +1
-hive
Thanks,
Cheng Pan
> On Aug 8, 2023, at 10:09, Wenchen Fan wrote:
>
> I think the principle is we should remove things that block us from
> supporting new things like Java 21, or come with a significant maintenance
> cost. If there is no benefit to removing deprecated API
ink it's impossible.
[1]
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
[2] https://issues.apache.org/jira/browse/SPARK-42539
[3] https://issues.apache.org/jira/browse/HIVE-27560
[4] https://github.com/apache/spark/pull/33989#issuecomment-926277286
Thank
Congratulations! Peter and Xiduo!
Thanks,
Cheng Pan
> On Aug 7, 2023, at 10:58, Gengliang Wang wrote:
>
> Congratulations! Peter and Xiduo!
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
API.
Thanks,
Cheng Pan
> On Jun 16, 2023, at 12:14, Allison Wang
> wrote:
>
> Hi everyone,
>
> I would like to start a discussion on “Python Data Source API”.
>
> This proposal aims to introduce a simple API in Python for Data Sources. The
> idea is to enable
/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L215
Thanks,
Cheng Pan
> On May 31, 2023, at 03:49, Bjørn Jørgensen wrote:
>
> @Dongjoon Hyun Thank you.
>
> I have two points to discuss.
> First, we are currently conducting tes
+CC dev@hbase
Thanks,
Cheng Pan
On Fri, May 19, 2023 at 4:08 AM Steve Loughran
wrote:
>
>
>
> On Thu, 18 May 2023 at 03:45, Cheng Pan wrote:
>>
>> Steve, thanks for the information, I think HADOOP-17046 should be fine for
>> the Spark case.
>>
>> Had
+CC dev@hbase From: Steve Loughran Date: Friday, May 19, 2023 at 04:08To: Cc: dev Subject: Re: Remove protobuf 2.5.0 from Spark dependencies On Thu, 18 May 2023 at 03:45, Cheng Pan <cheng...@apache.org> wrote:Steve, thanks for the information, I think HADOOP-17046 should be fine for the
classes from
hadoop-client-runtime.
Thanks,
Cheng Pan
On May 17, 2023 at 04:10:43, Dongjoon Hyun wrote:
> Thank you for sharing, Steve.
>
> Dongjoon
>
> On Tue, May 16, 2023 at 11:44 AM Steve Loughran
> wrote:
>
>> I have some bad news here which is even though hadoop
by the
kinesis client, into the kinesis assembly jar
- Spark itself's core/connect/protobuf modules use protobuf 3, also shaded
and relocated all protobuf 3 deps.
Feel free to comment if you still have any concerns.
[1] https://github.com/apache/spark/pull/41153
Thanks,
Cheng Pan
]
https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive
Thanks,
Cheng Pan
On Apr 18, 2023 at 00:38:23, Elliot West wrote:
> Hi Ankit,
>
> While not a part of Spark, there is a project called 'WaggleDance' that
> can federate multiple Hive m
add the ASF License
> header to this file[1]?
[1]
> ./common/src/main/java/org/apache/celeborn/common/network/util/LimitedInputStream.java
Willem Jiang
Thanks,
Cheng Pan
On Mar 1, 2023 at 15:04:52, Dongjoon Hyun wrote:
> Since both license headers are Apache License 2.0, we don't s
The key point here is, how do you jump to the log service from Spark UI to
explore or download logs of each Pod like Spark on Yarn?
Thanks,
Cheng Pan
On Nov 2, 2022 at 18:32:26, Martin Andersson
wrote:
> Hello Cheng.
>
> I don't quite understand, why can't you configure
/pull/32456
Thanks,
Cheng Pan
+1 (non-binding)
- Passed Apache Kyuubi (Incubating) integration tests[1]
- Run some jobs on our internal K8s cluster
[1] https://github.com/apache/incubator-kyuubi/pull/3507
Thanks,
Cheng Pan
On Wed, Oct 19, 2022 at 9:13 AM Yikun Jiang wrote:
>
> +1, also test passed with spark-
+1 (non-binding)
* Verified SPARK-39313 has been address[1]
* Passed integration test w/ Apache Kyuubi (Incubating)[2]
[1] https://github.com/housepower/spark-clickhouse-connector/pull/123
[2] https://github.com/apache/incubator-kyuubi/pull/2817
Thanks,
Cheng Pan
On Wed, Jun 8, 2022 at 7:04 AM
+1 (non-binding)
Integration test passed[1] with my project[2].
[1] https://github.com/housepower/spark-clickhouse-connector/runs/3834335017
[2] https://github.com/housepower/spark-clickhouse-connector
Thanks,
Cheng Pan
On Sat, Oct 9, 2021 at 2:01 PM Ye Zhou wrote:
> +1 (non-bind
60 matches
Mail list logo