Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Mich Talebzadeh
These are good points. In traditional RDBMSs, SQL query results without an explicit *ORDER BY* clause may vary in order due to optimization, especially when no clustered index is defined. In contrast, systems like Hive and Spark SQL, which are based on distributed file storage, do not rely on

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Mich Talebzadeh
Hi Nicholas, Your point "In SQL, the result order of any query is implementation-dependent without an explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 times in a row and get 10 different orderings." yes I concur my understanding is the same. In SQL, the result

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Reynold Xin
It should be the same as SQL. Otherwise it takes away a lot of potential future optimization opportunities. On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com > wrote: > > I’ve always considered DataFrames to be logically equivalent to SQL tables > or queries. > >

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Sean Owen
I think it's the same, and always has been - yes you don't have a guaranteed ordering unless an operation produces a specific ordering. Could be the result of order by, yes; I believe you would be guaranteed that reading input files results in data in the order they appear in the file, etc. 1:1

Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Nicholas Chammas
I’ve always considered DataFrames to be logically equivalent to SQL tables or queries. In SQL, the result order of any query is implementation-dependent without an explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 times in a row and get 10 different orderings. I

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Ruifeng Zheng
Thanks Yuanjian for driving this release, Congratulations! On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk wrote: > Thank you for the work, Yuanjian! > > On Mon, Sep 18, 2023 at 6:28 AM beliefer wrote: > >> Congratulations! Apache Spark. >> >> >> >> At 2023-09-16 01:01:40, "Yuanjian Li" wrote: >>

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Maxim Gekk
Thank you for the work, Yuanjian! On Mon, Sep 18, 2023 at 6:28 AM beliefer wrote: > Congratulations! Apache Spark. > > > > At 2023-09-16 01:01:40, "Yuanjian Li" wrote: > > Hi All, > > We are happy to announce the availability of *Apache Spark 3.5.0*! > > Apache Spark 3.5.0 is the sixth release