date:20221228

Re: Profiling data quality with Spark

2022-12-28 Thread infa elance

You can also look at informatica data quality that runs on spark. Of course it’s not free but you can sign up for a 30 day free trial. They have both profiling and prebuilt data quality rules and accelerators. Sent from my iPhoneOn Dec 28, 2022, at 10:02 PM, vaquar khan wrote:@ Gourav Sengupta wh

Re: EXT: Re: Check if shuffle is caused for repartitioned pyspark dataframes

2022-12-28 Thread Vibhor Gupta

Hi Shivam, I think what you are looking for is bucket optimization. The execution engine (spark) knows how the data was shuffled before persisting it. Unfortunately this is not supported when you use vanilla parquet files. Try saving the dataframe using the saveAsTable

Cannot build Apache Spark 3.3.1 with Apache Hive 3.1.2 and Apache Hadoop 3.1.1

2022-12-28 Thread שוהם יהודה

Hi Team I have a problem with building Apache Spark compatible with Apache Hive 3.1.2. I believe Apache Spark supports Hive 3.1.2 as I saw it in the docs. https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html I saw in the docs the following guide to build spark: https://spark.apac

Re: Profiling data quality with Spark

2022-12-28 Thread vaquar khan

@ Gourav Sengupta why you are sending unnecessary emails ,if you think snowflake good plz use it ,here question was different and you are talking totally different topic. Plz respects group guidelines Regards, Vaquar khan On Wed, Dec 28, 2022, 10:29 AM vaquar khan wrote: > Here you can find a

Re: Profiling data quality with Spark

2022-12-28 Thread vaquar khan

Here you can find all details , you just need to pass spark dataframe and deequ also generate recommendations for rules and you can also write custom complex rules. https://aws.amazon.com/blogs/big-data/test-data-quality-at-scale-with-deequ/ Regards, Vaquar khan On Wed, Dec 28, 2022, 9:40 AM raj

Re: Profiling data quality with Spark

2022-12-28 Thread rajat kumar

Thanks for the input folks. Hi Vaquar , I saw that we have various types of checks in GE and Deequ. Could you please suggest what types of check did you use for Metric based columns Regards Rajat On Wed, Dec 28, 2022 at 12:15 PM vaquar khan wrote: > I would suggest Deequ , I have implemented

Re: [Spark Core] [Advanced] [How-to] How to map any external field to job ids spawned by Spark.

2022-12-28 Thread Gourav Sengupta

Hi Khalid, just out of curiosity, does the API help us in setting JOB ID's or just job Descriptions? Regards, Gourav Sengupta On Wed, Dec 28, 2022 at 10:58 AM Khalid Mammadov wrote: > There is a feature in SparkContext to set localProperties > (setLocalProperty) where you can set your Request

Re: [Spark Core] [Advanced] [How-to] How to map any external field to job ids spawned by Spark.

2022-12-28 Thread Khalid Mammadov

There is a feature in SparkContext to set localProperties (setLocalProperty) where you can set your Request ID and then using SparkListener instance read that ID with Job ID using onJobStart event. Hope this helps. On Tue, 27 Dec 2022, 13:04 Dhruv Toshniwal, wrote: > TL;Dr - > how-to-map-extern

Re: Profiling data quality with Spark

Re: EXT: Re: Check if shuffle is caused for repartitioned pyspark dataframes

Cannot build Apache Spark 3.3.1 with Apache Hive 3.1.2 and Apache Hadoop 3.1.1

Re: Profiling data quality with Spark

Re: Profiling data quality with Spark

Re: Profiling data quality with Spark

Re: [Spark Core] [Advanced] [How-to] How to map any external field to job ids spawned by Spark.

Re: [Spark Core] [Advanced] [How-to] How to map any external field to job ids spawned by Spark.

8 matches

Site Navigation

Mail list logo

Footer information