Pros: No need for Scala skills, Java can be used. Other companies are already doing it. > Support Yarn execution But not only… Complex use-case for import can easily be done in Java (see https://spark-summit.org/eu-2017/events/extending-apache-sparks-ingestion-building-your-own-java-data-source/ - sorry shameless self promo). Can be parallelized on all nodes of the cluster
Cons: > No ETL gui. But works very well with libraries like log4j so we can track the process His jg On Nov 1, 2017, at 22:58, van den Heever, Christian CC <christian.vandenhee...@standardbank.co.za<mailto:christian.vandenhee...@standardbank.co.za>> wrote: Dear Spark users I have been asked to provide a presentation / business case as to why to use spark and java as ingestion tool for HDFS and HIVE And why to move away from an etl tool. Could you be so kind as to provide with some pros and cons to this. I have the following : Pros: In house build – code can be changes on the fly to suite business need. Software is free Can out of the box run on all nodes Will support all Apache based software. Fast deu to in memory processing Spark UI can visualise execution Support checkpoint data loads Support echama regesty for custom schema and inference. Support Yarn execution Mlibs can be used in need. Data linage support deu to spar usage. Cons Skills needed to maintain and build In memory cabibility can become bottleneck if not managed No ETL gui. Maybe point be to an article if you have one. Thanks a mill. Christian Standard Bank email disclaimer and confidentiality note Please go to www.standardbank.co.za/site/homepage/emaildisclaimer.html<http://www.standardbank.co.za/site/homepage/emaildisclaimer.html> to read our email disclaimer and confidentiality note. Kindly email disclai...@standardbank.co.za<mailto:disclai...@standardbank.co.za> (no content or subject line necessary) if you cannot view that page and we will email our email disclaimer and confidentiality note to you.