Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mich Talebzadeh
graph processing in Spark. I saw someone created some documents HTH Mich Talebzadeh, *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-tho

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Mich Talebzadeh
+1 on the assumption that we should phase this release on an incremental basis. Probably will take us to end of release 5. HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Mich Talebzadeh
ffs of complexity, resource availability and long-term gains. HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Mich Talebzadeh
+1 Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://w

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-09-30 Thread Mich Talebzadeh
should prioritize the health of the Spark ecosystem and ensure that we are investing resources into actively maintained components. HTH Mich Talebzadeh Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London

Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-11 Thread Mich Talebzadeh
+1 Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://w

Re: Question about Releases and EOL

2024-08-29 Thread Mich Talebzadeh
ement declaring Spark 2.4.0 as the final minor release, the fact that 2.4.8 is still being maintained suggests it might be an LTS release. This is likely due to its continued usage? HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.

Re: Please review (ValidateExternalType should return child in error)

2024-08-25 Thread Mich Talebzadeh
ards, > Mark Andreev > > > On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh > wrote: > >> Hi Mark, >> >> You have already done that and have made the request for review. >> >> +1 for me >> >> Mich Talebzadeh, >> >> Architect |

Re: Please review (ValidateExternalType should return child in error)

2024-08-21 Thread Mich Talebzadeh
Hi Mark, You have already done that and have made the request for review. +1 for me Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia

Re: Please review (ValidateExternalType should return child in error)

2024-08-20 Thread Mich Talebzadeh
ted}." By providing this additional context, developers can more efficiently pinpoint and resolve schema mismatches. HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College L

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Mich Talebzadeh
k -f convert_sum.awk size.txt 11.88 GB Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Mich Talebzadeh
Hi Kent, Can you if possible provide a heuristic estimate of space reduction your proposal is going to achieve? Thanks Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Mich Talebzadeh
Hi Kent, Can you if possible please provide a heuristic estimate of storage reduction that will be achieved through this approach? Thanks Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial C

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Mich Talebzadeh
achieved through this approach. Overall, the proposal offers a viable solution for managing Spark documentation while reducing storage concerns. However, addressing the potential complexity of managing older documentation versions is crucial. +1 for me Mich Talebzadeh, Architect | Data Engineer | Data

Re: [VOTE] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-12 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin pr

Re: [外部邮件] Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-11 Thread Mich Talebzadeh
nt to your dedication. Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linked

Re: ASF board report for August 2024

2024-08-11 Thread Mich Talebzadeh
nd Haejoon Lee in July 2024. - Kent Yao joined the PMC on August 8th, 2024. I believe this will ensure consistency and provide accurate information about the recent changes to the project's governance structure. HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-09 Thread Mich Talebzadeh
it. 4. Monitor and evaluate: Track key metrics and gather feedback throughout the PoC. HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Mich Talebzadeh
Maybe you should look into deploying GitHub Large File Storage <https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage> (LFS). If applicable, store large documentation files in LFS to reduce the repository size. HTH Mich Tale

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Mich Talebzadeh
watch: - Integration and Synchronization: - Maintenance and Management: - Need to clearly communicate the new process to the community to avoid confusion Cheers Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/w

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Mich Talebzadeh
existing Jira issues to GitHub Issues? HTH, Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom *Disc

Re: [Issue] Spark SQL - broadcast failure

2024-07-16 Thread Mich Talebzadeh
It will help if you mention the Spark version and the piece of problematic code HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-14 Thread Mich Talebzadeh
I was looking at this email trail and the original one raised by Martin Grund. I too agree that mistakes can and do happen. On my part, kudos to Martin for raising the issue and . @Hyukjin Kwon for quick action that helped avoid potential delays. Thanks both. Mich Talebzadeh, Technologist

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Mich Talebzadeh
+1 non-binding Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin pr

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Mich Talebzadeh
A good point agreed. Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Link

Re: BUG :: UI Spark

2024-05-26 Thread Mich Talebzadeh
ts shuffle data when caching is involved. Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The information provided is correct to the best of my kno

Re: BUG :: UI Spark

2024-05-26 Thread Mich Talebzadeh
UI's display, not necessarily a bug in the Spark framework itself. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The information provid

Re: BUG :: UI Spark

2024-05-26 Thread Mich Talebzadeh
actual number of records processed. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Discl

Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-13 Thread Mich Talebzadeh
+0 For reasons I outlined in the discussion thread https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Mich Talebzadeh
evident that this approach functions more like dynamic scripts than traditional compiled stored procedures. HTH Mich Talebzadeh,Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-p

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-10 Thread Mich Talebzadeh
Hi, If the underlying table changes (DDL), if I recall from RDBMSs like Oracle, the stored procedure will be invalidated as it is a compiled object. How is this going to be handled? Does it follow the same mechanism? Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Mich Talebzadeh
Allocation: Spark allocates memory for the cached DataFrame. Depending on the cluster configuration and available memory, this allocation can take some time. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin

Re: Why spark-submit works with package not with jar

2024-05-06 Thread Mich Talebzadeh
Thanks David. I wanted to explain the difference between Package and Jar with comments from the community on previous discussions back a few years ago. cheers Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
ersions and are not expected to meet the same level of stability and completeness as release candidates or final releases. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mi

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
@Wenchen Fan Thanks for the update! To clarify, is the vote for approving a specific preview build, or is it for moving towards an RC stage? I gather there is a distinction between these two? Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
rsion available for evaluation as soon as it is feasible" HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybo

Re: [SparkListener] Accessing classes loaded via the '--packages' option

2024-05-04 Thread Mich Talebzadeh
and a its dependencies listed in maven *HTH* Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *

Fwd: Why spark-submit works with package not with jar

2024-05-04 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Spark Materialized Views: Improve Query Performance and Data Management

2024-05-03 Thread Mich Talebzadeh
at the ticket and add your comments. Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The information provided is correct to the best

Re: Issue with Materialized Views in Spark SQL

2024-05-03 Thread Mich Talebzadeh
t of was that uUsing materialized views with Spark Structured Streaming and Change Data Capture (CDC) is a potential solution for efficiently streaming view data updates in this scenario. . Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom

Issue with Materialized Views in Spark SQL

2024-05-02 Thread Mich Talebzadeh
similar issue or if there are any insights into why this discrepancy exists between Spark SQL and Hive. Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzade

Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread Mich Talebzadeh
Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the b

Re: Potential Impact of Hive Upgrades on Spark Tables

2024-05-01 Thread Mich Talebzadeh
o test the Spark applications thoroughly after a Hive upgrade, which will necessitates liaising with Hive group as your are relying on their metdadata Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profil

Potential Impact of Hive Upgrades on Spark Tables

2024-04-30 Thread Mich Talebzadeh
ample, depending on the severity of the changes, the Hive metastore schema might change, which could require Spark code to be updated to handle these changes in how table metadata is represented. Is this assertion correct? Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Gener

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Mich Talebzadeh
so presented in Hortonworks meet-up. Hive on Spark Engine Versus Spark Using Hive Metastore <https://www.linkedin.com/pulse/hive-spark-engine-versus-using-metastore-mich-talebzadeh-ph-d-/> With regard to why I castred +1 votre for one and -1 for the other, I think it is my prerogative how I vo

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-28 Thread Mich Talebzadeh
may require a number of changes to the old scripts. Hence my concern. As a matter of interest has anyone liaised with the Hive team to ensure they have introduced the additional changes you outlined? HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London U

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Mich Talebzadeh
nt the importance of carefully evaluating the impact of changing the default behaviour. Mich TalebzadehTechnologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> ht

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Mich Talebzadeh
s thorough consideration. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:*

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
ok thanks got it Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The infor

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
durability when choosing a catalog solution for production deployments. In many cases, a combination of in-memory and disk-based catalog solutions may offer the best balance of performance and resilience for demanding large scale workloads. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
Well, I will be surprised because Derby database is single threaded and won't be much of a use here. Most Hive metastore in the commercial world utilise postgres or Oracle for metastore that are battle proven, replicated and backed up. Mich Talebzadeh, Technologist | Architect | Data Eng

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
nother thing is that if I understand correctly, and I might be totally wrong here, the internal spark catalog is a local installation of hive metastore anyway, so I'm not sure what the catalog has to do with anything" .I don't understand this. Do you mean a Derby database? HTH Mich Tal

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
seamless interaction with Spark applications and libraries. 5) There seems to be some similarity with spark catalog and Databricks unity catalog, so that may favour the choice. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
t;.. because we support better." Are you referring to the performance of Spark catalog (I believe it is internal) or integration with Spark? HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <http

Re: Which version of spark version supports parquet version 2 ?

2024-04-17 Thread Mich Talebzadeh
y (if possible) to see if it indirectly enables v2 writing with Spark 3.2.0. HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/&g

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Mich Talebzadeh
Hi Prem, Regrettably this is not my area of speciality. I trust another colleague will have a more informed idea. Alternatively you may raise an SPIP for it. Spark Project Improvement Proposals (SPIP) | Apache Spark <https://spark.apache.org/improvement-proposals.html> HTH Mich Tale

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Mich Talebzadeh
er, taking klaws of diminishing returns, I would not advise that either.. You can ofcourse usse gzip for compression that may be more suitable for your needs. HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Mich Talebzadeh
Sorry you have a point there. It was released in version 3.00. What version of spark are you using? Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Mich Talebzadeh
the library itself. However, you can have a look at this https://github.com/apache/parquet-mr/blob/master/CHANGES.md HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-14 Thread Mich Talebzadeh
+ 1 for me It makes it more compatible with the other ANSI SQL compliant products. Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Mich Talebzadeh
t require additional security considerations. - integration and support in the cloud HTH Technologist | Solutions Architect | Data Engineer | Generative AI Mich Talebzadeh, London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> ht

Re: External Spark shuffle service for k8s

2024-04-08 Thread Mich Talebzadeh
anks Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is cor

Fwd: Apache Spark 3.4.3 (?)

2024-04-07 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Re: External Spark shuffle service for k8s

2024-04-07 Thread Mich Talebzadeh
Thanks Cheng for the heads up. I will have a look. Cheers Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: External Spark shuffle service for k8s

2024-04-07 Thread Mich Talebzadeh
a Kubernetes cluster. They can include these configurations in the Spark application code or pass them as command-line arguments or environment variables during application submission. HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view

Re: External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
better performance and scalability for handling larger datasets efficiently. Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
files systems come into it. I will be interested in hearing more about any progress on this. Thanks . Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Scheduling jobs using FAIR pool

2024-04-01 Thread Mich Talebzadeh
Hi, Have you put this question to Databricks forum Data Engineering - Databricks <https://community.databricks.com/t5/data-engineering/bd-p/data-engineering> Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin p

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Mich Talebzadeh
looks fine except that processing all Unicode whitespace characters might add overhead to the parsing process, potentially impacting performance. Although I think this is a moot point +1 Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the

Re: Improved Structured Streaming Documentation Proof-of-Concept

2024-03-25 Thread Mich Talebzadeh
issues brought up in the user group and otherwise). Perhaps using a section such as the proposed "Knowledge Sharing Hub'', may become more relevant. Moreover, the examples have to reflect real life scenarios and conversly will be of limited use otherwise. HTH Mich Talebzadeh, Tech

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
ality. Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
n entertain this idea. They seem to have a well defined structure for hosting topics. Let me know your thoughts Thanks <https://community.databricks.com/t5/knowledge-sharing-hub/bd-p/Knowledge-Sharing-Hub> Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kin

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
the information (topics) are provided as best efforts and cannot be guaranteed. Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
- Databricks <https://community.databricks.com/t5/knowledge-sharing-hub/bd-p/Knowledge-Sharing-Hub> Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/&

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct

A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
uld not be that difficult. If anyone is supportive of this proposal, let the usual +1, 0, -1 decide HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The informat

Re: Enhanced Console Sink for Structured Streaming

2024-03-12 Thread Mich Talebzadeh
addBatch" : 37, "commitOffsets" : 41, "getBatch" : 0, "latestOffset" : 0, "queryPlanning" : 5, "triggerExecution" : 187, "walCommit" : 104 }, "stateOperators" : [ ], "sources" : [ { "description" :

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mich Talebzadeh
+1 Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to th

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Mich Talebzadeh
Splendid. Thanks Gengliang Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information pr

SPARK-44951, Improve Spark Dynamic Allocation

2024-03-08 Thread Mich Talebzadeh
Hi all, On this ticket, improve Spark Dynamic Allocation <https://issues.apache.org/jira/browse/SPARK-44951> I see no movement since it was opened back in August 2023 I may be wrong of course Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-05 Thread Mich Talebzadeh
ly working with the filtered dataset, representing the partitions that would have hypothetically succeeded. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Mich Talebzadeh
eboots for whatever reason. Look at the host logs or run /usr/bin/dmesg to see what happened.. Good luck Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mich Talebzadeh
arer for everyone at first glance. Cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* T

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-02 Thread Mich Talebzadeh
Data validation checks: Implement data validation checks after processing to identify potential duplicates or missing data. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Mich Talebzadeh
ds additional processing overhead but can ensure data integrity. HTH Mich Talebzadeh, Dad | Technologist London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Mich Talebzadeh
ache.org/jira/browse/SPARK-24815> This will ensure everyone involved can benefit from your team's expertise and facilitate further collaboration. Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://

Please unlock Jira ticket for SPARK-24815, Dynamic resource allocation for structured streaming

2024-02-26 Thread Mich Talebzadeh
tps://issues.apache.org/jira/browse/SPARK-24815> For now I have volunteered to mentor the team until a committer volunteers to take it over. This should not be that strenuous hopefully. Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Li

Proposal about moving on from the Shepherd terminology in SPIPs

2024-02-23 Thread Mich Talebzadeh
s or another alternative proposal). HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The inf

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
n involved lately and would be missing a lot of context." So we need to improvise and see how best we can drive this and similar ones. We wait a short while for a response otherwise I am happy to give a hand if needed and work with you guys to drive this. It is something worthwhile. HTH T Mich

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
d give technical justifications. OK a shepherd from PMC members is required. Maybe Jungtaek Lee can kindly help the process cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebz

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
Hi Pavan, Do you have a list of votes for this feature by any chance? Does it pass the required condition as approved? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-p

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
I can see it was closed. Was it because of inactivity? Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Tale

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Mich Talebzadeh
Ok thanks for your clarifications Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The infor

Re: ASF board report draft for February

2024-02-18 Thread Mich Talebzadeh
Np, thanks for addressing the point promptly Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: ASF board report draft for February

2024-02-18 Thread Mich Talebzadeh
n. Please stay tuned!" I would be inclined to leave that line out for now. The rest is fine. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-16 Thread Mich Talebzadeh
Hi Chao, As a cool feature - Compared to standard Spark, what kind of performance gains can be expected with Comet? - Can one use Comet on k8s in conjunction with something like a Volcano addon? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-15 Thread Mich Talebzadeh
Hi,I gather from the replies that the plugin is not currently available in the form expected although I am aware of the shell script. Also have you got some benchmark results from your tests that you can possibly share? Thanks, Mich Talebzadeh, Dad | Technologist | Solutions Architect

  1   2   3   4   >