We scanned 3 versions of spark 3.0.0, 3.1.3, 3.2.1
On Tue, 26 Apr, 2022, 18:46 Bjørn Jørgensen,
wrote:
> What version of spark is it that you have scanned?
>
>
>
> tir. 26. apr. 2022 kl. 12:48 skrev HARSH TAKKAR :
>
>> Hello,
>>
>> Please let me know if there is a fix available for following
>
Your test result just gave the verdict so #2 is the answer - Spark
ignores those non-numeric rows completely when aggregating the average.
On 5/1/22 8:20 PM, wilson wrote:
I did a small test as follows.
scala> df.printSchema()
root
|-- fruit: string (nullable = true)
|-- number: string (null
sorry i have found what's the reasons. for null I can not compare it
directly. I have wrote a note for this.
https://bigcount.xyz/how-spark-handles-null-and-abnormal-values.html
Thanks.
wilson wrote:
do you know why the select results below have not consistent behavior?
-
I did a small test as follows.
scala> df.printSchema()
root
|-- fruit: string (nullable = true)
|-- number: string (nullable = true)
scala> df.show()
+--+--+
| fruit|number|
+--+--+
| apple| 2|
|orange| 5|
|cherry| 7|
| plum| xyz|
+--+--+
scala> df.agg
Hello,
I think I noticed some Spark behavior that might have an enormous potential
for performance improvement when reading files from a folder with (many
nested) hive-style partitions and at the same time applying a filter to the
partition columns.
Situation
I have JSON files with log informatio
Hello
my dataset has abnormal values in the column whose normal values are
numeric. I can select them as:
scala> df.select("up_votes").filter($"up_votes".rlike(regex)).show()
+---+
| up_votes|
+---+
| <|
| <|
|fx-|
|