leveldbjni dependency

2020-09-09 Thread mundaym
Hi all, I am currently building Spark from source and also have to build leveldbjni from source because the binary release (which is platform dependent) in mvnrepository does not support my target platform (s390x). People have run into similar problems when building for other platforms too (notabl

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Prashant Sharma
Thanks again, looks like it works now. Please take a look. On Thu, Sep 10, 2020 at 11:42 AM Prashant Sharma wrote: > Hi Wenchen and Sean, > > Thanks for looking into this and all the details. > > I have now updated the key in those keyservers. Now, how do I refresh > nexus? > > Thanks, > > On Th

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Prashant Sharma
Hi Wenchen and Sean, Thanks for looking into this and all the details. I have now updated the key in those keyservers. Now, how do I refresh nexus? Thanks, On Thu, Sep 10, 2020 at 9:13 AM Sean Owen wrote: > Yes I can do that and I am sure it's fine, but why has it been visible in > the past a

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Sean Owen
Yes I can do that and I am sure it's fine, but why has it been visible in the past and not now? Minor thing to fix. On Wed, Sep 9, 2020, 9:09 PM Wenchen Fan wrote: > Sean, you need to login https://repository.apache.org/ and pick the > staging repo 1361, then check its status, you will see this

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Wenchen Fan
Sean, you need to login https://repository.apache.org/ and pick the staging repo 1361, then check its status, you will see this [image: image.png] On Thu, Sep 10, 2020 at 9:26 AM Mridul Muralidharan wrote: > > I imported our KEYS file locally [1] to validate ... did not use external > keyserver.

Re: 回复: For the same data source in two SQLs, how to read it once?

2020-09-09 Thread Gang Li
I will pay attention in the future, thank you very much for your suggestions. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

回复: For the same data source in two SQLs, how to read it once?

2020-09-09 Thread Liu Genie
I think Yang Sun means createTempView, which will cache data in memory instead of writing to disk. And I sugguest this post should be discussed in u...@spark.apache.org mail list. 发件人: Gang Li 发送时间: 2020年9月9日 17:06 收件人: dev@spark.apache.org 主题: Re: For the same

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Mridul Muralidharan
I imported our KEYS file locally [1] to validate ... did not use external keyserver. Regards, Mridul [1] wget https://dist.apache.org/repos/dist/dev/spark/KEYS -O - | gpg --import On Wed, Sep 9, 2020 at 8:03 PM Wenchen Fan wrote: > I checked > https://repository.apache.org/content/repositories

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Sean Owen
I see that this link isn't exposed as noted before; what are you seeing regarding a signature? The .asc signatures appear valid if you import the KEYS file in the e-mail, and that is the source of truth for signing releases. On Wed, Sep 9, 2020 at 8:04 PM Wenchen Fan wrote: > > I checked > https

Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Wenchen Fan
I checked https://repository.apache.org/content/repositories/orgapachespark-1361/ , it says the Signature Validation failed. Prashant, can you double-check your gpg key and make sure it's uploaded to public key servers like the following? http://pool.sks-keyservers.net:11371 http://keyserver.ubunt

Re: Question about differences between batch and streaming training of LogisticRegression Algorithm in Spark3.0

2020-09-09 Thread Sean Owen
I'm not sure that second count can be optimized away, as it's used a few times. Are you sure it takes that long? how are you measuring that and is it not perhaps the effect of caching the data the first time? What is the nature of the data that it takes that long? On Wed, Sep 9, 2020 at 6:21 AM cf

Question about differences between batch and streaming training of LogisticRegression Algorithm in Spark3.0

2020-09-09 Thread cfang1109
HI ALL, We want to use socket streaming data to train a LR Model with StreamingLogisticRegressionWithSGD and now have some questions. 1,The trainOn method of StreamingLogisticRegressionWithSGD contains a part of code like this, data.foreachRDD{ (rdd, time) => if (!rdd.isEmpty) { ... } } A

Re: For the same data source in two SQLs, how to read it once?

2020-09-09 Thread Gang Li
If use a temporary table, the execution process is shown in the following figure Is there any way to achieve the following figure? than

Re: For the same data source in two SQLs, how to read it once?

2020-09-09 Thread Gang Li
Writing to the temporary table does allow the data source to read once, but writing to the temporary table will have disk I/O operations, and there is no effective use of Spark RDD's memory-based operations -- Sent

Re: For the same data source in two SQLs, how to read it once?

2020-09-09 Thread Yang shun
> > You can do this by creating a temporary table. > 1.Ensure that all fields are included and cached as a dataset when the data is first pulled(age、sex、other...) 2.When outputting to different tables, select different fields of the cached dataset.

For the same data source in two SQLs, how to read it once?

2020-09-09 Thread Gang Li
: - INSERT OVERWRITE TABLE spark_input_test2 PARTITION(dt='20200909') SELECT name, number, age FROM spark_input_test WHERE dt='20200908'