date:20220501

Re: Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-05-01 Thread HARSH TAKKAR

We scanned 3 versions of spark 3.0.0, 3.1.3, 3.2.1 On Tue, 26 Apr, 2022, 18:46 Bjørn Jørgensen, wrote: > What version of spark is it that you have scanned? > > > > tir. 26. apr. 2022 kl. 12:48 skrev HARSH TAKKAR : > >> Hello, >> >> Please let me know if there is a fix available for following >

Re: how spark handle the abnormal values

2022-05-01 Thread Artemis User

Your test result just gave the verdict so #2 is the answer - Spark ignores those non-numeric rows completely when aggregating the average. On 5/1/22 8:20 PM, wilson wrote: I did a small test as follows. scala> df.printSchema() root |-- fruit: string (nullable = true) |-- number: string (null

Re: spark null values calculation

2022-05-01 Thread wilson

sorry i have found what's the reasons. for null I can not compare it directly. I have wrote a note for this. https://bigcount.xyz/how-spark-handles-null-and-abnormal-values.html Thanks. wilson wrote: do you know why the select results below have not consistent behavior? -

Re: how spark handle the abnormal values

2022-05-01 Thread wilson

I did a small test as follows. scala> df.printSchema() root |-- fruit: string (nullable = true) |-- number: string (nullable = true) scala> df.show() +--+--+ | fruit|number| +--+--+ | apple| 2| |orange| 5| |cherry| 7| | plum| xyz| +--+--+ scala> df.agg

Idea for improving performance when reading from hive-like partition folders and specifying a filter [Spark 3.2]

2022-05-01 Thread Martin

Hello, I think I noticed some Spark behavior that might have an enormous potential for performance improvement when reading files from a folder with (many nested) hive-style partitions and at the same time applying a filter to the partition columns. Situation I have JSON files with log informatio

how spark handle the abnormal values

2022-05-01 Thread wilson

Hello my dataset has abnormal values in the column whose normal values are numeric. I can select them as: scala> df.select("up_votes").filter($"up_votes".rlike(regex)).show() +---+ | up_votes| +---+ | <| | <| |fx-| |

Re: Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

Re: how spark handle the abnormal values

Re: spark null values calculation

Re: how spark handle the abnormal values

Idea for improving performance when reading from hive-like partition folders and specifying a filter [Spark 3.2]

how spark handle the abnormal values

6 matches

Site Navigation

Mail list logo

Footer information