Here are the latest DSv2 sync notes. Please reply with updates or
corrections.
*Attendees*:
Ryan Blue
Michael Armbrust
Gengliang Wang
Matt Cheah
John Zhuge
*Topics*:
Wenchen’s reorganization proposal
Problems with TableProvider - property map isn’t sufficient
New PRs:
- ReplaceTable: https
Yeah, PyArrow is the only other PySpark dependency we check for a minimum
version. We updated that not too long ago to be 0.12.1, which I think we
are still good on for now.
On Fri, Jun 14, 2019 at 11:36 AM Felix Cheung
wrote:
> How about pyArrow?
>
> --
> *From:* Hol
+1 (binding)
I think this is a really important feature for spark.
First, there is already a lot of interest in alternative shuffle storage in
the community. There is already a lot of interest in alternative shuffle
storage, from dynamic allocation in kubernetes, to even just improving
stabilit
How about pyArrow?
From: Holden Karau
Sent: Friday, June 14, 2019 11:06:15 AM
To: Felix Cheung
Cc: Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp
Subject: Re: [DISCUSS] Increasing minimum supported version of Pandas
Are there other Python dependencie
Are there other Python dependencies we should consider upgrading at the
same time?
On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung
wrote:
> So to be clear, min version check is 0.23
> Jenkins test is 0.24
>
> I’m ok with this. I hope someone will test 0.23 on releases though before
> we sign off?
>
So to be clear, min version check is 0.23
Jenkins test is 0.24
I’m ok with this. I hope someone will test 0.23 on releases though before we
sign off?
From: shane knapp
Sent: Friday, June 14, 2019 10:23:56 AM
To: Bryan Cutler
Cc: Dongjoon Hyun; Holden Karau; Hyuk
Now, you can see the exposed component labels (ordered by the number of
PRs) here and click the component to search.
https://github.com/apache/spark/labels?sort=count-desc
Dongjoon.
On Fri, Jun 14, 2019 at 1:15 AM Dongjoon Hyun
wrote:
> Hi, All.
>
> JIRA and PR is ready for reviews.
>
> h
Just surfacing this change as it's probably pretty good to go, but, a)
I'm not a jQuery / JS expert and b) we don't have comprehensive UI
tests.
https://github.com/apache/spark/pull/24843
I'd like to get us up to a modern jQuery for 3.0, to keep up with
security fixes (which was the minor motivat
excellent. i shall not touch anything. :)
On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler wrote:
> Shane, I think 0.24.2 is probably more common right now, so if we were to
> pick one to test against, I still think it should be that one. Our Pandas
> usage in PySpark is pretty conservative, so i
Shane, I think 0.24.2 is probably more common right now, so if we were to
pick one to test against, I still think it should be that one. Our Pandas
usage in PySpark is pretty conservative, so it's pretty unlikely that we
will add something that would break 0.23.X.
On Fri, Jun 14, 2019 at 10:10 AM
+1 (non-binding). This API is versatile and flexible enough to handle
Bloomberg's internal use-cases. The ability for us to vary implementation
strategies is quite appealing. It is also worth to note the minimal changes
to Spark core in order to make it work. This is a very much needed addition
wit
ah, ok... should we downgrade the testing env on jenkins then? any
specific version?
shane, who is loathe (and i mean LOATHE) to touch python envs ;)
On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler wrote:
> I should have stated this earlier, but when the user does something that
> requires Pand
I should have stated this earlier, but when the user does something that
requires Pandas, the minimum version is checked against what was imported
and will raise an exception if it is a lower version. So I'm concerned that
using 0.24.2 might be a little too new for users running older clusters. To
+1 This is great work, allowing plugin of different sort shuffle write/read
implementation! Also great to see it retain the current Spark configuration
(spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah wrote:
> Hi everyone,
>
>
>
We opened a thread for voting yesterday, so please participate!
-Matt Cheah
From: Yue Li
Date: Thursday, June 13, 2019 at 7:22 PM
To: Saisai Shao , Imran Rashid
Cc: Matt Cheah , "Yifei Huang (PD)" ,
Mridul Muralidharan , Bo Yang , Ilan Filonenko
, Imran Rashid , Justin Uang
, Liang Tan
just to everyone knows, our python 3.6 testing infra is currently on
0.24.2...
On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun
wrote:
> +1
>
> Thank you for this effort, Bryan!
>
> Bests,
> Dongjoon.
>
> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau wrote:
>
>> I’m +1 for upgrading, although since
Thank you for the early notice, Shane! :)
Dongjoon
On Fri, Jun 14, 2019 at 9:13 AM shane knapp wrote:
> the campus colo will be performing some electrical maintenance, which
> means that they'll be powering off the entire building.
>
> since the jenkins cluster is located in that colo, we are m
+1
Thank you for this effort, Bryan!
Bests,
Dongjoon.
On Fri, Jun 14, 2019 at 4:24 AM Holden Karau wrote:
> I’m +1 for upgrading, although since this is probably the last easy chance
> we’ll have to bump version numbers easily I’d suggest 0.24.2
>
>
> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kw
the campus colo will be performing some electrical maintenance, which means
that they'll be powering off the entire building.
since the jenkins cluster is located in that colo, we are most definitely
affected. :)
i'll be out of town that weekend, but will have one of my sysadmins bring
everythin
I’m +1 for upgrading, although since this is probably the last easy chance
we’ll have to bump version numbers easily I’d suggest 0.24.2
On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon wrote:
> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and
> pandas combinations. Spark 3 sho
Hi, All.
JIRA and PR is ready for reviews.
https://issues.apache.org/jira/browse/SPARK-28051 (Exposing JIRA issue
component types at GitHub PRs)
https://github.com/apache/spark/pull/24871
Bests,
Dongjoon.
On Thu, Jun 13, 2019 at 10:48 AM Dongjoon Hyun
wrote:
> Thank you for the feedbacks and
21 matches
Mail list logo