Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread John Zhuge
eleases/spark-release-3-5-1.html >> >> We would like to acknowledge all community members for contributing to >> this >> release. This release would not have been possible without you. >> >> Jungtaek Lim >> >> ps. Yikun is helping us through releasing the official docker image for >> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available. >> >> -- John Zhuge

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread John Zhuge
tps://github.com/apache/arrow-datafusion-comet for more details if >> you are interested. We'd love to collaborate with people from the open >> source community who share similar goals. >> >> Thanks, >> Chao >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- John Zhuge

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread John Zhuge
Jun 2021 at 00:44, Holden Karau >>>> wrote: >>>> >>>>> Hi Folks, >>>>> >>>>> I'm continuing my adventures to make Spark on containers party and I >>>>> was wondering if folks have experience with the different batch >>>>> scheduler options that they prefer? I was thinking so that we can >>>>> better support dynamic allocation it might make sense for us to >>>>> support using different schedulers and I wanted to see if there are >>>>> any that the community is more interested in? >>>>> >>>>> I know that one of the Spark on Kube operators supports >>>>> volcano/kube-batch so I was thinking that might be a place I start >>>>> exploring but also want to be open to other schedulers that folks >>>>> might be interested in. >>>>> >>>>> Cheers, >>>>> >>>>> Holden :) >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>>> - >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- John Zhuge

Re: Timestamp Difference/operations

2018-10-12 Thread John Zhuge
Yeah, operator "-" does not seem to be supported, however, you can use "datediff" function: In [9]: select datediff(CAST('2000-02-01 12:34:34' AS TIMESTAMP), CAST('2000-01-01 00:00:00' AS TIMESTAMP)) Out[9]: +-

Re: Handle BlockMissingException in pyspark

2018-08-06 Thread John Zhuge
BlockMissingException typically indicates the HDFS file is corrupted. Might be an HDFS issue, Hadoop mailing list is a better bet: u...@hadoop.apache.org. Capture at the full stack trace in executor log. If the file still exists, run `hdfs fsck -blockId blk_1233169822_159765693` to determine wheth

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
ed from). > > On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge wrote: > > Thanks Jacek and Marcelo! > > > > Any reason it is not sourced? Any security consideration? > > > > > > On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin > wrote: > >> > >&g

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Thanks Jacek and Marcelo! Any reason it is not sourced? Any security consideration? On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin wrote: > On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > > spark

Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-02 Thread John Zhuge
ermode. See the YARN-related Spark Properties > <https://github.com/apache/spark/blob/master/docs/running-on-yarn.html#spark-properties> > for > more information. Does it mean spark-env.sh will not be sourced when starting AM in cluster mode? Does this paragraph appy to executor as well? Thanks, -- John Zhuge

streaming+sql with block has been removed error

2015-11-05 Thread ZhuGe
Hi all:I am trying to implement the "spark streaming +sql and dataframe" case described in this post https://databricks.com/blog/2015/07/30/diving-into-spark-streamings-execution-model.htmlI use rabbit mq as the datasource.My code sample is like this: countByValueAndWindow(Seconds(5), Second

master die and worker registration failed with duplicated worker id

2015-10-19 Thread ZhuGe
Hi all:We met a serial of weir problem in our standalone cluster with 2 masters(zk election agent). Q1 :Firstly, we find the active master would lose leadership at some point and shutdown itself. [INFO 2015-10-17 13:00:15 (ClientCnxn.java:1083)] Client session timed out, have not heard from serv

spark with internal ip

2015-09-21 Thread ZhuGe
Hi there:We recently add one NIC to each node of the cluster(stand alone) for larger bandwidth, and we modify the /etc/hosts file, so the hostname points to the new NIC's ip address(internal).What we want to achieve is that, communication between nodes would go through the new NIC. It seems th

master hung after killing the streaming sc

2015-09-17 Thread ZhuGe
Hi there:we recently deploy a streaming application in our stand alone cluster. And we found a issue when we trying to stop the streaming sc(has been working for several days)with the kill command in the spark ui. By kill command, i mean the 'kill' button in the "Submission ID" column of "Runni

hive.contrib.serde2.RegexSerDe not found

2015-07-27 Thread ZhuGe
Hi all:I am testing the performance of hive on spark sql.The existing table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex' = '(.*?)\\|\\^\\|(.*?)\\|\\^\\|(.*?)\\|\\^\\|(.*?)\\|\\^\\|(.*?)\\|\\^\\|(.*?)\\|\\^\\|(.*?)\\|\

Would driver shutdown cause app dead?

2015-07-21 Thread ZhuGe
Hi all:I am a bit confuse about the work of driver.In our productin enviroment, we have a spark streaming app running in standone mode. what we concern is that if the driver shutdown accidently(host shutdown or whatever). would the app running normally? Any explanation would be appreciated!! C

workers no route to host

2015-03-31 Thread ZhuGe
Hi,i set up a standalone cluster of 5 machines(tmaster, tslave1,2,3,4) with spark-1.3.0-cdh5.4.0-snapshort. when i execute the sbin/start-all.sh, the master is ok, but i cant see the web ui. Moreover, the worker logs is something like this: Spark assembly has been built with Hive, including Data

start-slave.sh failed with ssh port other than 22

2015-03-16 Thread ZhuGe
Hi all:I am new to spark and i want to set up a cluster of 3 nodes( standalone mode)I can start the master and see the web ui.Because the ssh port of the 3 nodes is configured to 58518, so when i use sbin/start-slave.sh, the log message shows ssh: connect to host node1 port 22: connection refuse