Yes, you can expect each partition file to be sorted by "col1" and "col2".
However, values for "col1" will be "randomly" allocated to partition
files, but all rows with the same value for "col1" will reside in the
same one partition file.
What kind of unexpected sort order do you observe?
Hi Mayur,
In Java, you can do futures.get with a timeout and then cancel the future once
timeout has been reached in the catch block. There should be something similar
in scala as well.
Eg: https://stackoverflow.com/a/16231834
Hi!
We expected the order of sorted partitions to be preserved after a
dataframe write. We use the following code to write out one file per
partition, with the rows sorted by a column.
*df.repartition($"col1").sortWithinPartitions("col1", "col2")
.write.partitionBy("col1")
Okay, so for the problem to the solution that is powerful
On Thu, 15 Sept 2022, 14:48 Mayur Benodekar, wrote:
> Hi Gourav,
>
> It’s the way the framework is
>
>
> Sent from my iPhone
>
> On Sep 15, 2022, at 02:02, Gourav Sengupta
> wrote:
>
>
> Hi,
>
> Why spark and why scala?
>
> Regards,
Hi Gourav,It’s the way the framework is Sent from my iPhoneOn Sep 15, 2022, at 02:02, Gourav Sengupta wrote:Hi,Why spark and why scala? Regards,GouravOn Wed, 7 Sept 2022, 21:42 Mayur Benodekar, wrote: am new to scala and spark both .I have a code in scala which executes
Hi,
Why spark and why scala?
Regards,
Gourav
On Wed, 7 Sept 2022, 21:42 Mayur Benodekar, wrote:
> am new to scala and spark both .
>
> I have a code in scala which executes quieres in while loop one after the
> other.
>
> What we need to do is if a particular query takes more than a certain