Re: Welcome new Apache Spark committers

2024-08-13 Thread bo yang
Congratulations! On Tue, Aug 13, 2024 at 1:14 AM Ruifeng Zheng wrote: > Congratulations, everyone! > > On Tue, Aug 13, 2024 at 12:14 PM Gengliang Wang wrote: > >> Congratulations, everyone! >> >> On Mon, Aug 12, 2024 at 7:10 PM Denny Lee wrote: >> >>> Congrats Allison, Martin, and Haejoon! >>>

Re: [VOTE] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-12 Thread bo yang
+1 On Mon, Aug 12, 2024 at 9:14 AM Matthew Powers wrote: > +1 (non-binding) > > On Mon, Aug 12, 2024 at 12:11 PM Denny Lee wrote: > >> +1 (non-binding) >> >> On Mon, Aug 12, 2024 at 16:43 Reynold Xin >> wrote: >> >>> +1 >>> >>> On Mon, Aug 12, 2024 at 10:28 AM Mich Talebzadeh < >>> mich.talebz

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-09 Thread bo yang
+1 to start small as an experiment to see how people use GitHub issue... On Thu, Aug 8, 2024 at 11:54 PM Kent Yao wrote: > +1 > > On 2024/08/08 23:24:32 Hyukjin Kwon wrote: > > SGTM > > > > On Thu, 8 Aug 2024 at 14:53, Martin Grund > > > wrote: > > > > > Hi folks, > > > > > > I wanted to start

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread bo yang
+1 On Tue, Jul 9, 2024 at 12:29 PM Mridul Muralidharan wrote: > > +1 > > Regards, > Mridul > > > On Tue, Jul 9, 2024 at 10:19 AM Xianjin YE wrote: > >> +1 >> >> > On Jul 9, 2024, at 22:41, L. C. Hsieh wrote: >> > >> > +1 >> > >> > On Tue, Jul 9, 2024 at 1:13 AM Wenchen Fan wrote: >> >> >> >>

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-05 Thread bo yang
+1 This is a great suggestion, thanks Hyukjin! On Thu, Jul 4, 2024 at 4:11 AM Hyukjin Kwon wrote: > Alright! let me start the vote! > > On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh > wrote: > >> A good point agreed. >> >> Mich Talebzadeh, >> Technologist | Architect | Data Engineer | Generati

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread bo yang
+1 (non-binding) On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: > +1 (non-binding) > > Thanks, > Cheng Pan > > > On Jul 3, 2024, at 08:59, Hyukjin Kwon wrote: > > Hi all, > > I’d like to start a vote for moving Spark Connect server to builtin > package (Client API layer stays external). > > P

Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-12 Thread bo yang
+1 On Sat, May 11, 2024 at 4:43 PM huaxin gao wrote: > +1 > > On Sat, May 11, 2024 at 4:35 PM L. C. Hsieh wrote: > >> +1 >> >> On Sat, May 11, 2024 at 3:11 PM Chao Sun wrote: >> > >> > +1 >> > >> > On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh wrote: >> >> >> >> Hi all, >> >> >> >> I’d like to

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread bo yang
+1 On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon wrote: > +1 > > On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > >> +1 >> >> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: >> > >> > +1 >> > >> > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun >> wrote: >> >> >> >> I'll start with my +1

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread bo yang
+1 On Fri, Apr 12, 2024 at 12:34 PM huaxin gao wrote: > +1 > > On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun wrote: > >> +1 >> >> Thank you! >> >> I hope we can customize `dev/merge_spark_pr.py` script per repository >> after this PR. >> >> Dongjoon. >> >> On 2024/04/12 03:28:36 "L. C. Hsieh" w

Re: Versioning of Spark Operator

2024-04-10 Thread bo yang
Cool, looks like we have two options here. Option 1: Spark Operator and Connect Go Client versioning independent of Spark, e.g. starting with 0.1.0. Pros: they can evolve versions independently. Cons: people will need an extra step to decide the version when using Spark Operator and Connect Go Cli

Re: Versioning of Spark Operator

2024-04-09 Thread bo yang
Thanks Liang-Chi for the Spark Operator work, and also the discussion here! For Spark Operator and Connector Go Client, I am guessing they need to support multiple versions of Spark? e.g. same Spark Operator may support running multiple versions of Spark, and Connector Go Client might support mult

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread bo yang
+1 (non-binding) On Mon, Apr 1, 2024 at 10:19 AM Felix Cheung wrote: > +1 > -- > *From:* Denny Lee > *Sent:* Monday, April 1, 2024 10:06:14 AM > *To:* Hussein Awala > *Cc:* Chao Sun ; Hyukjin Kwon ; > Mridul Muralidharan ; dev > *Subject:* Re: [VOTE] SPIP: Pure Pyt

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread bo yang
+1 On Wed, Mar 13, 2024 at 7:19 AM Tom Graves wrote: > Similar as others, will be interested in working out api's and details > but overall in favor of it. > > +1 > > Tom Graves > On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan < > mri...@gmail.com> wrote: > > > > I am suppo

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread bo yang
+1 On Tue, Nov 14, 2023 at 7:18 PM huaxin gao wrote: > +1 > > On Tue, Nov 14, 2023 at 10:45 AM Holden Karau > wrote: > >> +1 >> >> On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: >> >>> +1 >>> >>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>> >>> On Nov 14, 2023, at 10:14 AM

Re: Write Spark Connection client application in Go

2023-09-14 Thread bo yang
at’s so cool! Great work y’all :) >> >> On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: >> >>> Hi Spark Friends, >>> >>> Anyone interested in using Golang to write Spark application? We created >>> a Spark Connect Go Client library >>>

Write Spark Connection client application in Go

2023-09-12 Thread bo yang
Hi Spark Friends, Anyone interested in using Golang to write Spark application? We created a Spark Connect Go Client library . Would love to hear feedback/thoughts from the community. Please see the quick start guide

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread bo yang
Thanks Holden for bringing this up! Maybe another thing to think about is how to make dynamic allocation more friendly with Kubernetes and disaggregated shuffle storage? On Mon, Aug 7, 2023 at 1:27 PM Holden Karau wrote: > So I wondering if there is interesting in revisiting some of how Spark

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread bo yang
Hi Martin, Thanks a lot for preparing the new repo and making it super easy for me to just copy my code to the new repo! I will create a new PR there. > I think the PR is fine from a code perspective as a starting point. I've prepared the go repository with all the things necessary so that it red

Re: [CONNECT] New Clients for Go and Rust

2023-05-31 Thread bo yang
Just see the discussions here! Really appreciate Martin and other folks helping on my previous Golang Spark Connect PR ( https://github.com/apache/spark/pull/41036)! Great to see we have a new repo for Spark Golang Connect client. Thanks Hyukjin! I am thinking to migrate my PR to this new repo. Wo

Re: How can I get the same spark context in two different python processes

2022-12-12 Thread bo yang
In theory, maybe a Jupyter notebook or something similar could achieve this? e.g. running some Jypyter kernel inside Spark driver, then another Python process could connect to that kernel. But in the end, this is like Spark Connect :) On Mon, Dec 12, 2022 at 2:55 PM Kevin Su wrote: > Also, is

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
Yes, it should be possible, any interest to work on this together? Need more hands to add more features here :) On Tue, May 17, 2022 at 2:06 PM Holden Karau wrote: > Could we make it do the same sort of history server fallback approach? > > On Tue, May 17, 2022 at 10:41 PM bo ya

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
to behave like that Web Application Proxy. It will simplify settings to access Spark UI on Kubernetes. On Mon, May 16, 2022 at 11:46 PM wilson wrote: > what's the advantage of using reverse proxy for spark UI? > > Thanks > > On Tue, May 17, 2022 at 1:47 PM bo yang wrote:

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
Thanks Holden :) On Mon, May 16, 2022 at 11:12 PM Holden Karau wrote: > Oh that’s rad 😊 > > On Tue, May 17, 2022 at 7:47 AM bo yang wrote: > >> Hi Spark Folks, >> >> I built a web reverse proxy to access Spark UI on Kubernetes (working >>

Reverse proxy for Spark UI on Kubernetes

2022-05-16 Thread bo yang
Hi Spark Folks, I built a web reverse proxy to access Spark UI on Kubernetes (working together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). Want to share here in case other people have similar need. The reverse proxy code is here: https://github.com/datapunchorg/spark-ui-re

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
chart to > deploy Spark and some other stuff on K8S? > > ons. 23. feb. 2022 kl. 17:49 skrev bo yang : > >> Hi Sarath, let's follow up offline on this. >> >> On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy < >> sarath.annare...@gmail.com> wrote: >&

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Hi Sarath, let's follow up offline on this. On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy wrote: > Hi bo > > How do we start? > > Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc > > > Thanks > Sarath > > > Sent from my iPhone

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Guidance is appreciated. > > Sarath > > Sent from my iPhone > > On Feb 23, 2022, at 2:01 AM, bo yang wrote: > >  > > Right, normally people start with simple script, then add more stuff, like > permission and more components. After some time, people want to run the >

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 23 Feb 2022 at 04:06, bo yang wrote: > >> Hi Spark Community, >> >> We built an open source tool to deploy and run Spark on Kubernetes with a >>

Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
you share link to the source? > > בתאריך יום ד׳, 23 בפבר׳ 2022, 6:52, מאת bo yang ‏: > >> We do not have SaaS yet. Now it is an open source project we build in our >> part time , and we welcome more people working together on that. >> >> You could specify cluster s

Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
r > about 1 hour. Do you have the SaaS solution for this? I can pay as I did. > > Thanks > > On Wed, Feb 23, 2022 at 12:21 PM bo yang wrote: > >> It is not a standalone spark cluster. In some details, it deploys a Spark >> Operator (https://github.com/GoogleCloudPlatfo

Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
ion of spark? or just the standalone node? > > Thanks > > On Wed, Feb 23, 2022 at 12:06 PM bo yang wrote: > >> Hi Spark Community, >> >> We built an open source tool to deploy and run Spark on Kubernetes with a >> one click command. For example, on AWS, it co

One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
Hi Spark Community, We built an open source tool to deploy and run Spark on Kubernetes with a one click command. For example, on AWS, it could automatically create an EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will be able to use curl or a CLI tool to submit Spark applica

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-05 Thread bo yang
+1 (non-binding) On Wed, Jan 5, 2022 at 11:01 PM Holden Karau wrote: > +1 (binding) > > On Wed, Jan 5, 2022 at 5:31 PM William Wang > wrote: > >> +1 (non-binding) >> >> Yikun Jiang 于2022年1月6日周四 09:07写道: >> >>> Hi all, >>> >>> I’d like to start a vote for SPIP: "Support Customized Kubernetes >>

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-05 Thread bo yang
Hi Mich, Curious what do you mean “The constraint seems to be that you can fit one Spark executor pod per Kubernetes node and from my tests you don't seem to be able to allocate more than 50% of RAM on the node to the container", Would you help to explain a bit? Asking this because there could be

Re: Apache Spark 3.2 Expectation

2021-02-28 Thread bo yang
+1 for better support for disaggregated shuffle (push-based shuffle is a great example, also there are Facebook shuffle service and Uber remote shuffle service ). There w

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread bo yang
Thanks guys for the discussion in the email and also this afternoon! >From our experience, we do not need to change Spark DAG scheduler to implement a remote shuffle service. Current Spark shuffle manager interfaces are pretty good and easy to implement. But we do feel the need to modify MapStatus

Re: Enabling fully disaggregated shuffle on Spark

2019-11-20 Thread bo yang
19, 2019 at 4:05 PM Ryan Blue >> wrote: >> >>> I'm interested in remote shuffle services as well. I'd love to hear >>> about what you're using in production! >>> >>> rb >>> >>> On Tue, Nov 19, 2019 at 2:43 PM bo yang wr

Re: Enabling fully disaggregated shuffle on Spark

2019-11-19 Thread bo yang
Hi Ben, Thanks for the writing up! This is Bo from Uber. I am in Felix's team in Seattle, and working on disaggregated shuffle (we called it remote shuffle service, RSS, internally). We have put RSS into production for a while, and learned a lot during the work (tried quite a few techniques to imp

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread bo yang
+1 This is great work, allowing plugin of different sort shuffle write/read implementation! Also great to see it retain the current Spark configuration (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl). On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah wrote: > Hi everyone, > > >

Support structured plan logging

2018-10-11 Thread bo yang
Hi All, Are there any people interested in adding structured plan logging in Spark? Currently the logical/physical plan could be logged as plain text via explain() method, which has some issues, for example, string truncation and difficult for tool/program to use. This PR