You can also look at informatica data quality that runs on spark. Of course it’s not free but you can sign up for a 30 day free trial. They have both profiling and prebuilt data quality rules and accelerators. Sent from my iPhoneOn Dec 28, 2022, at 10:02 PM, vaquar khan wrote:@ Gourav Sengupta
Hi Shivam,
I think what you are looking for is bucket optimization. The execution engine
(spark) knows how the data was shuffled before persisting it.
Unfortunately this is not supported when you use vanilla parquet files.
Try saving the dataframe using the
Hi Team
I have a problem with building Apache Spark compatible with Apache Hive
3.1.2.
I believe Apache Spark supports Hive 3.1.2 as I saw it in the docs.
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
I saw in the docs the following guide to build spark:
@ Gourav Sengupta why you are sending unnecessary emails ,if you think
snowflake good plz use it ,here question was different and you are talking
totally different topic.
Plz respects group guidelines
Regards,
Vaquar khan
On Wed, Dec 28, 2022, 10:29 AM vaquar khan wrote:
> Here you can find
Here you can find all details , you just need to pass spark dataframe and
deequ also generate recommendations for rules and you can also write custom
complex rules.
https://aws.amazon.com/blogs/big-data/test-data-quality-at-scale-with-deequ/
Regards,
Vaquar khan
On Wed, Dec 28, 2022, 9:40 AM
Thanks for the input folks.
Hi Vaquar ,
I saw that we have various types of checks in GE and Deequ. Could you
please suggest what types of check did you use for Metric based columns
Regards
Rajat
On Wed, Dec 28, 2022 at 12:15 PM vaquar khan wrote:
> I would suggest Deequ , I have
Hi Khalid,
just out of curiosity, does the API help us in setting JOB ID's or just job
Descriptions?
Regards,
Gourav Sengupta
On Wed, Dec 28, 2022 at 10:58 AM Khalid Mammadov
wrote:
> There is a feature in SparkContext to set localProperties
> (setLocalProperty) where you can set your Request
There is a feature in SparkContext to set localProperties
(setLocalProperty) where you can set your Request ID and then using
SparkListener instance read that ID with Job ID using onJobStart event.
Hope this helps.
On Tue, 27 Dec 2022, 13:04 Dhruv Toshniwal,
wrote:
> TL;Dr -
>