Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-21 Thread Kshitij
Hi, There is no dataframe spark API which writes/creates a single file instead of directory as a result of write operation. Below both options will create directory with a random file name. df.coalesce(1).write.csv() df.write.csv() Instead of creating directory with standard files

Re: [DISCUSSION] Avoiding duplicate work

2020-02-21 Thread Takeshi Yamamuro
Yea, +1 to the Sean suggestion. When we see a comment "I'm working on this" on the jira comment, I think we need to say "Are you still working on this?" to avoid duplicate work there. On Sat, Feb 22, 2020 at 2:20 AM Nicholas Chammas wrote: > +1 to what Sean said. > > On Fri, Feb 21, 2020 at

Re: Breaking API changes in Spark 3.0

2020-02-21 Thread Holden Karau
So my view of how common & stable API removal should go (in general I want to be clear exceptions can and do make sense) 1) Deprecate API 2) Release replacement API 3) Provide migration guidance (ideally in deprecated annotation, but possible in release notes or elsewhere) 4) Remove old API I

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-21 Thread Michael Armbrust
This plan for evolving the TRIM function to be more standards compliant sounds much better to me than the original change to just switch the order. It pushes users in the right direction and cleans up our tech debt without silently breaking existing workloads. It means that programs won't return

Re: [DISCUSSION] Avoiding duplicate work

2020-02-21 Thread Nicholas Chammas
+1 to what Sean said. On Fri, Feb 21, 2020 at 10:14 AM Sean Owen wrote: > We've avoided using Assignee because it implies that someone 'owns' > resolving the issue, when we want to keep it collaborative, and many > times in the past someone would ask to be assigned and then didn't > follow

Re: [DISCUSSION] Avoiding duplicate work

2020-02-21 Thread Sean Owen
We've avoided using Assignee because it implies that someone 'owns' resolving the issue, when we want to keep it collaborative, and many times in the past someone would ask to be assigned and then didn't follow through. You can comment on the JIRA to say "I'm working on this" but that has the

Re: [DISCUSSION] Avoiding duplicate work

2020-02-21 Thread younggyu Chun
what if both are looking at code and they don't make a merge request? I guess we can't still see what's going on because that Jira ticket won't show the linked PR. On Fri, 21 Feb 2020 at 09:58, Wenchen Fan wrote: > The JIRA ticket will show the linked PR if there are any, which indicates > that

Re: [DISCUSSION] Avoiding duplicate work

2020-02-21 Thread Wenchen Fan
The JIRA ticket will show the linked PR if there are any, which indicates that someone is working on it if the PR is active. Maybe the bot should also leave a comment on the JIRA ticket to make it clearer? On Fri, Feb 21, 2020 at 10:54 PM younggyu Chun wrote: > Hi All, > > I would like to

[DISCUSSION] Avoiding duplicate work

2020-02-21 Thread younggyu Chun
Hi All, I would like to suggest to use "Assignee" functionality in the JIRA when we are working on a project. When we pick a ticket to work on we don't know who is doing that right now. Recently I spent my time to solve an issue and made a merge request but this was actually a duplicate work.