Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto

2021-03-04 Thread Oldrich Vlasic
ching". From: Jeff Evans Sent: Thursday, March 4, 2021 2:55 PM To: Oldrich Vlasic Cc: Russell Spitzer ; Sean Owen ; user ; Ondřej Havlíček Subject: Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto Why not perform a df.select(...) before the final write

Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto

2021-03-04 Thread Jeff Evans
) from > falling victim to this. > -- > *From:* Russell Spitzer > *Sent:* Wednesday, March 3, 2021 3:31 PM > *To:* Sean Owen > *Cc:* Oldrich Vlasic ; user < > user@spark.apache.org>; Ondřej Havlíček > *Subject:* Re: [Spark SQL, intermedi

Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto

2021-03-04 Thread Oldrich Vlasic
; user ; Ondřej Havlíček Subject: Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto Yep this is the behavior for Insert Into, using the other write apis does schema matching I believe. On Mar 3, 2021, at 8:29 AM, Sean Owen mailto:sro...@gmail.com>> wrote: I

Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto

2021-03-03 Thread Russell Spitzer
Yep this is the behavior for Insert Into, using the other write apis does schema matching I believe. > On Mar 3, 2021, at 8:29 AM, Sean Owen wrote: > > I don't have any good answer here, but, I seem to recall that this is because > of SQL semantics, which follows column ordering not naming

Re: [Spark SQL, intermediate+] possible bug or weird behavior of insertInto

2021-03-03 Thread Sean Owen
I don't have any good answer here, but, I seem to recall that this is because of SQL semantics, which follows column ordering not naming when performing operations like this. It may well be as intended. On Tue, Mar 2, 2021 at 6:10 AM Oldrich Vlasic < oldrich.vla...@datasentics.com> wrote: > Hi,

[Spark SQL, intermediate+] possible bug or weird behavior of insertInto

2021-03-02 Thread Oldrich Vlasic
Hi, I have encountered a weird and potentially dangerous behaviour of Spark concerning partial overwrites of partitioned data. Not sure if this is a bug or just abstraction leak. I have checked Spark section of Stack Overflow and haven't found any relevant questions or answers. Full minimal